Runpod•5mo ago

Can't deploy Qwen/Qwen2.5-14B-Instruct-1M on serverless

Steps to reproduce: 1. Use Serverless vLLM quick deploy for Qwen/Qwen2.5-14B-Instruct-1M (image attached) 2. Proceed with default config. 3. Try and send a request. Error:

2025-06-18T12:58:36.147823280Z INFO 06-18 12:58:36 [model_runner.py:1170] Starting to load model Qwen/Qwen2.5-14B-Instruct-1M...
2025-06-18T12:58:36.449947523Z engine.py:116  2025-06-18 12:58:36,449 Error initializing vLLM engine: FlashAttentionImpl.__init__() got an unexpected keyword argument 'layer_idx'

2025-06-18T12:58:36.147823280Z INFO 06-18 12:58:36 [model_runner.py:1170] Starting to load model Qwen/Qwen2.5-14B-Instruct-1M...
2025-06-18T12:58:36.449947523Z engine.py:116  2025-06-18 12:58:36,449 Error initializing vLLM engine: FlashAttentionImpl.__init__() got an unexpected keyword argument 'layer_idx'

How do I fix this? I've been trying to troubleshoot this all morning. All help appreciated 🙏

4 Replies

Unknown User•5mo ago

Message Not Public

EllroyOP•5mo ago

Yes - doesn't appear to be env variables on worker vLLM – https://github.com/runpod-workers/worker-vllm

GitHub

GitHub - runpod-workers/worker-vllm: The RunPod worker template for...

The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm

Unknown User•5mo ago

Message Not Public

Foopop•5mo ago

For my use case a env variable was missing to but I did a PR and they merged it in like 2h. Try that.

Gaming

Programming

Can't deploy Qwen/Qwen2.5-14B-Instruct-1M on serverless

Did you find this page helpful?