vllm worker OpenAI stream

Hi everyone,

I followed the Runpod documentation to create a simple OpenAI client code using a serverless endpoint for the Llava model (llava-hf/llava-1.5-7b-hf). However, I encountered the following error:

ChatCompletion(id=None, choices=None, created=None, model=None, object='error', service_tier=None, system_fingerprint=None, usage=None, code=400, message='As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.', param=None, type='BadRequestError')


Has anyone experienced this issue? Any suggestions for resolving it?

Code:
client = OpenAI(
    api_key="key",
    base_url=f"https://api.runpod.ai/v2/123123123/openai/v1"
)
response = client.chat.completions.create(
    model="llava-hf/llava-1.5-7b-hf",
    messages=[{"role": "user", "content": "Hello, how can I use RunPod's serverless platform?"}],
    temperature=0.7,
    max_tokens=100
)
print(response.choices[0].message.content)
Was this page helpful?