R
Runpod5mo ago
Ellroy

Can't deploy Qwen/Qwen2.5-14B-Instruct-1M on serverless

Steps to reproduce: 1. Use Serverless vLLM quick deploy for Qwen/Qwen2.5-14B-Instruct-1M (image attached) 2. Proceed with default config. 3. Try and send a request. Error:
2025-06-18T12:58:36.147823280Z INFO 06-18 12:58:36 [model_runner.py:1170] Starting to load model Qwen/Qwen2.5-14B-Instruct-1M...
2025-06-18T12:58:36.449947523Z engine.py:116 2025-06-18 12:58:36,449 Error initializing vLLM engine: FlashAttentionImpl.__init__() got an unexpected keyword argument 'layer_idx'
2025-06-18T12:58:36.147823280Z INFO 06-18 12:58:36 [model_runner.py:1170] Starting to load model Qwen/Qwen2.5-14B-Instruct-1M...
2025-06-18T12:58:36.449947523Z engine.py:116 2025-06-18 12:58:36,449 Error initializing vLLM engine: FlashAttentionImpl.__init__() got an unexpected keyword argument 'layer_idx'
How do I fix this? I've been trying to troubleshoot this all morning. All help appreciated 🙏
No description
4 Replies
Unknown User
Unknown User5mo ago
Message Not Public
Sign In & Join Server To View
Ellroy
EllroyOP5mo ago
Yes - doesn't appear to be env variables on worker vLLM – https://github.com/runpod-workers/worker-vllm
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
Unknown User
Unknown User5mo ago
Message Not Public
Sign In & Join Server To View
Foopop
Foopop5mo ago
For my use case a env variable was missing to but I did a PR and they merged it in like 2h. Try that.

Did you find this page helpful?