Serverless not running with NIM - NVidia custom models

I'm trying to run a Service Instance in Runpod. When I launch the instance, it seems to start running. However, when I send a "Hello World" test request using the instance's own interface, on the Runpod screen, for example, the request is placed in a queue and doesn't finish processing. If I send a second request, it continues to enter the same queue. I'm having trouble getting the serverless instance to run. I followed all the steps in the Runpod documentation to do this. Thanks!
12 Replies
Unknown User
Unknown User4mo ago
Message Not Public
Sign In & Join Server To View
Gestefane Rabbi
Gestefane RabbiOP4mo ago
tank you for answer. Now I am using Hugginfaces model. But the prompt is very simple and answer is note complete. Example: { "input": { "prompt": "Hi, what can you help me with?", "temperature": 0.7, "top_p": 0.9, "top_k": 50, "max_new_tokens": 200, "repetition_penalty": 1.1, "do_sample": true } } Answer: { "delayTime": 816, "executionTime": 1011, "id": "sync-bbe044ef-6ef2-4606-b178-312a8ec8bfcf-u1", "output": [ { "choices": [ { "tokens": [ " I'm just living in Phoenix, tried to find someone who is on 06/" ] } ], "usage": { "input": 10, "output": 16 } } ], "status": "COMPLETED", "workerId": "l0ijmr7lxa0x9e" }
3WaD
3WaD4mo ago
You're trying to use standard text completion with a chat-trained model. https://platform.openai.com/docs/guides/completions#chat-completions-vs-completions
3WaD
3WaD4mo ago
Yes, you have to use array of messages like this:
[{"role": "user", "content": 'Translate the following English text to French: "{text}"'}]
[{"role": "user", "content": 'Translate the following English text to French: "{text}"'}]
not a simple prompt string.
Gestefane Rabbi
Gestefane RabbiOP4mo ago
This is my payload used on postman. { "input": { "prompt": "Hi, what can you help me with?", "temperature": 0.7, "top_p": 0.9, "top_k": 50, "max_new_tokens": 200, "repetition_penalty": 1.1, "do_sample": true } } Please, can you give me a little more help, where put this part? [{"role": "user", "content": 'Translate the following English text to French: "{text}"'}] or the body must contains only this part? [{"role": "user", "content": 'Translate the following English text to French: "{text}"'}] Thanks!
3WaD
3WaD4mo ago
{
"input": {
"messages": [{"role": "user", "content":"Hi, what can you help me with?"}],
"temperature": 0.7,
"top_p": 0.9,
"top_k": 50,
"max_new_tokens": 200,
"repetition_penalty": 1.1,
"do_sample": true
}
}
{
"input": {
"messages": [{"role": "user", "content":"Hi, what can you help me with?"}],
"temperature": 0.7,
"top_p": 0.9,
"top_k": 50,
"max_new_tokens": 200,
"repetition_penalty": 1.1,
"do_sample": true
}
}
Gestefane Rabbi
Gestefane RabbiOP4mo ago
Ohh yes! I will try. Tkank you!
Gestefane Rabbi
Gestefane RabbiOP4mo ago
body: { "input": { "messages": [{"role": "user", "content":"What is the France capital ?"}], "temperature": 0.7, "top_p": 0.9, "top_k": 50, "max_new_tokens": 200, "repetition_penalty": 1.1, "do_sample": true } } response: { "delayTime": 30491, "error": "{'object': 'error', 'message': 'Chat template does not exist for this model, you must provide a single string input instead of a list of messages', 'type': 'BadRequestError', 'param': None, 'code': 400}", "executionTime": 82, "id": "sync-df6c369e-7f63-4f23-8195-2c9a235c168a-u1", "status": "FAILED", "workerId": "jqyqmkdo0u24w6" } Serverless + vLLM + Huggingface: https://huggingface.co/deepseek-ai/DeepSeek-R1-0528 I just tried to remove the brackets "[" and "]" from messages like a single item but the error is the same.
3WaD
3WaD4mo ago
That's either a bug or a vLLM misconfiguration. That model definitely has chat_template in tokenizer_config.json. Check the worker docs and ask vLLM AI if you need help with the settings. Or share them here so we can check.
Gestefane Rabbi
Gestefane RabbiOP4mo ago
ok, thanks
Madiator2011
Madiator20114mo ago
try to set it then via env variable

Did you find this page helpful?