Chat completion (template) not working with VLLM 0.6.3 + Serverless
I deployed https://huggingface.co/xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k model through the Serverless UI, setting max model context window to 129024 and quantization to awq. I deploy it using the lastest version of vllm (0.6.3) provided by runpod.
I ran into the following errors
Client-side
4 Replies
This request runs fine without error:
But this request give me error:
Here's a partial error from server-end:
There isn't any reported error on the Qwen Github regarding the chat template (it uses the SAME template as a model that was released months ago), so i suspect this is a runpod specific error?
Facing this same issue. Do we have a solution for this?
using the lastest version of vllm (0.6.3)The latest RunPod vLLM version is 0.9.1.
Unknown User•2mo ago
Message Not Public
Sign In & Join Server To View