Runpod•12mo ago

Chat completion (template) not working with VLLM 0.6.3 + Serverless

I deployed https://huggingface.co/xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k model through the Serverless UI, setting max model context window to 129024 and quantization to awq. I deploy it using the lastest version of vllm (0.6.3) provided by runpod. I ran into the following errors Client-side

ChatCompletion(id=None, choices=None, created=None, model=None, object='error', service_tier=None, system_fingerprint=None, usage=None, code=400, message="expected token 'end of print statement', got 'name'", param=None, type='BadRequestError')

ChatCompletion(id=None, choices=None, created=None, model=None, object='error', service_tier=None, system_fingerprint=None, usage=None, code=400, message="expected token 'end of print statement', got 'name'", param=None, type='BadRequestError')

xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k · Hugging Face

4 Replies

xxxyyyOP•12mo ago

This request runs fine without error:

response = client.completions.create(
    model="xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k",
    prompt="Runpod is the best platform because",
    temperature=0,
    max_tokens=100,
)

response = client.completions.create(
    model="xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k",
    prompt="Runpod is the best platform because",
    temperature=0,
    max_tokens=100,
)

But this request give me error:

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[{"role": "user", "content": "Who are you?"}],
    temperature=0,
    max_tokens=100,
)

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[{"role": "user", "content": "Who are you?"}],
    temperature=0,
    max_tokens=100,
)

Here's a partial error from server-end:

2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] jinja2.exceptions.TemplateSyntaxError: expected token 'end of print statement', got 'name'\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "<unknown>", line 27, in template\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] raise rewrite_traceback_stack(source=source)\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "/usr/local/lib/python3.10/dist-packages/jinja2/environment.py", line 939, in handle_exception\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] self.handle_exception(source=source_hint)\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "/usr/local/lib/python3.10/dist-packages/jinja2/environment.py", line 768, in compile\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]

2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] jinja2.exceptions.TemplateSyntaxError: expected token 'end of print statement', got 'name'\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "<unknown>", line 27, in template\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] raise rewrite_traceback_stack(source=source)\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "/usr/local/lib/python3.10/dist-packages/jinja2/environment.py", line 939, in handle_exception\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] self.handle_exception(source=source_hint)\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "/usr/local/lib/python3.10/dist-packages/jinja2/environment.py", line 768, in compile\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]

There isn't any reported error on the Qwen Github regarding the chat template (it uses the SAME template as a model that was released months ago), so i suspect this is a runpod specific error?

Partha K•4mo ago

Facing this same issue. Do we have a solution for this?

3WaD•4mo ago

using the lastest version of vllm (0.6.3)

The latest RunPod vLLM version is 0.9.1.

Unknown User•4mo ago

Message Not Public

Gaming

Programming

Chat completion (template) not working with VLLM 0.6.3 + Serverless

Did you find this page helpful?