Serverless not running with NIM - NVidia custom models
I'm trying to run a Service Instance in Runpod. When I launch the instance, it seems to start running. However, when I send a "Hello World" test request using the instance's own interface, on the Runpod screen, for example, the request is placed in a queue and doesn't finish processing. If I send a second request, it continues to enter the same queue. I'm having trouble getting the serverless instance to run. I followed all the steps in the Runpod documentation to do this.
Thanks!
12 Replies
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
tank you for answer. Now I am using Hugginfaces model. But the prompt is very simple and answer is note complete. Example:
{
"input": {
"prompt": "Hi, what can you help me with?",
"temperature": 0.7,
"top_p": 0.9,
"top_k": 50,
"max_new_tokens": 200,
"repetition_penalty": 1.1,
"do_sample": true
}
}
Answer:
{
"delayTime": 816,
"executionTime": 1011,
"id": "sync-bbe044ef-6ef2-4606-b178-312a8ec8bfcf-u1",
"output": [
{
"choices": [
{
"tokens": [
" I'm just living in Phoenix, tried to find someone who is on 06/"
]
}
],
"usage": {
"input": 10,
"output": 16
}
}
],
"status": "COMPLETED",
"workerId": "l0ijmr7lxa0x9e"
}
You're trying to use standard text completion with a chat-trained model.
https://platform.openai.com/docs/guides/completions#chat-completions-vs-completions
Thank, I am using this: https://huggingface.co/deepseek-ai/DeepSeek-R1-0528
Yes, you have to use array of messages like this:
not a simple prompt string.
This is my payload used on postman.
{
"input": {
"prompt": "Hi, what can you help me with?",
"temperature": 0.7,
"top_p": 0.9,
"top_k": 50,
"max_new_tokens": 200,
"repetition_penalty": 1.1,
"do_sample": true
}
}
Please, can you give me a little more help, where put this part?
[{"role": "user", "content": 'Translate the following English text to French: "{text}"'}]
or the body must contains only this part?
[{"role": "user", "content": 'Translate the following English text to French: "{text}"'}]
Thanks!
Ohh yes! I will try. Tkank you!
body:
{
"input": {
"messages": [{"role": "user", "content":"What is the France capital ?"}],
"temperature": 0.7,
"top_p": 0.9,
"top_k": 50,
"max_new_tokens": 200,
"repetition_penalty": 1.1,
"do_sample": true
}
}
response:
{
"delayTime": 30491,
"error": "{'object': 'error', 'message': 'Chat template does not exist for this model, you must provide a single string input instead of a list of messages', 'type': 'BadRequestError', 'param': None, 'code': 400}",
"executionTime": 82,
"id": "sync-df6c369e-7f63-4f23-8195-2c9a235c168a-u1",
"status": "FAILED",
"workerId": "jqyqmkdo0u24w6"
}
Serverless + vLLM + Huggingface: https://huggingface.co/deepseek-ai/DeepSeek-R1-0528
I just tried to remove the brackets "[" and "]" from messages like a single item but the error is the same.
That's either a bug or a vLLM misconfiguration. That model definitely has chat_template in tokenizer_config.json.
Check the worker docs and ask vLLM AI if you need help with the settings. Or share them here so we can check.
ok, thanks
try to set it then via env variable