Runpod•4mo ago

Serverless not running with NIM - NVidia custom models

I'm trying to run a Service Instance in Runpod. When I launch the instance, it seems to start running. However, when I send a "Hello World" test request using the instance's own interface, on the Runpod screen, for example, the request is placed in a queue and doesn't finish processing. If I send a second request, it continues to enter the same queue. I'm having trouble getting the serverless instance to run. I followed all the steps in the Runpod documentation to do this. Thanks!

12 Replies

Unknown User•4mo ago

Message Not Public

Gestefane RabbiOP•4mo ago

tank you for answer. Now I am using Hugginfaces model. But the prompt is very simple and answer is note complete. Example: { "input": { "prompt": "Hi, what can you help me with?", "temperature": 0.7, "top_p": 0.9, "top_k": 50, "max_new_tokens": 200, "repetition_penalty": 1.1, "do_sample": true } } Answer: { "delayTime": 816, "executionTime": 1011, "id": "sync-bbe044ef-6ef2-4606-b178-312a8ec8bfcf-u1", "output": [ { "choices": [ { "tokens": [ " I'm just living in Phoenix, tried to find someone who is on 06/" ] } ], "usage": { "input": 10, "output": 16 } } ], "status": "COMPLETED", "workerId": "l0ijmr7lxa0x9e" }

3WaD•4mo ago

You're trying to use standard text completion with a chat-trained model. https://platform.openai.com/docs/guides/completions#chat-completions-vs-completions

Gestefane RabbiOP•4mo ago

Thank, I am using this: https://huggingface.co/deepseek-ai/DeepSeek-R1-0528

deepseek-ai/DeepSeek-R1-0528 · Hugging Face

3WaD•4mo ago

Yes, you have to use array of messages like this:

[{"role": "user", "content": 'Translate the following English text to French: "{text}"'}]

[{"role": "user", "content": 'Translate the following English text to French: "{text}"'}]

not a simple prompt string.

Gestefane RabbiOP•4mo ago

This is my payload used on postman. { "input": { "prompt": "Hi, what can you help me with?", "temperature": 0.7, "top_p": 0.9, "top_k": 50, "max_new_tokens": 200, "repetition_penalty": 1.1, "do_sample": true } } Please, can you give me a little more help, where put this part? [{"role": "user", "content": 'Translate the following English text to French: "{text}"'}] or the body must contains only this part? [{"role": "user", "content": 'Translate the following English text to French: "{text}"'}] Thanks!

3WaD•4mo ago

{
  "input": {
    "messages": [{"role": "user", "content":"Hi, what can you help me with?"}],
    "temperature": 0.7,
    "top_p": 0.9,
    "top_k": 50,
    "max_new_tokens": 200,
    "repetition_penalty": 1.1,
    "do_sample": true
  }
}

{
  "input": {
    "messages": [{"role": "user", "content":"Hi, what can you help me with?"}],
    "temperature": 0.7,
    "top_p": 0.9,
    "top_k": 50,
    "max_new_tokens": 200,
    "repetition_penalty": 1.1,
    "do_sample": true
  }
}

Gestefane RabbiOP•4mo ago

Ohh yes! I will try. Tkank you!

Gestefane RabbiOP•4mo ago

body: { "input": { "messages": [{"role": "user", "content":"What is the France capital ?"}], "temperature": 0.7, "top_p": 0.9, "top_k": 50, "max_new_tokens": 200, "repetition_penalty": 1.1, "do_sample": true } } response: { "delayTime": 30491, "error": "{'object': 'error', 'message': 'Chat template does not exist for this model, you must provide a single string input instead of a list of messages', 'type': 'BadRequestError', 'param': None, 'code': 400}", "executionTime": 82, "id": "sync-df6c369e-7f63-4f23-8195-2c9a235c168a-u1", "status": "FAILED", "workerId": "jqyqmkdo0u24w6" } Serverless + vLLM + Huggingface: https://huggingface.co/deepseek-ai/DeepSeek-R1-0528 I just tried to remove the brackets "[" and "]" from messages like a single item but the error is the same.

deepseek-ai/DeepSeek-R1-0528 · Hugging Face

3WaD•4mo ago

That's either a bug or a vLLM misconfiguration. That model definitely has chat_template in tokenizer_config.json. Check the worker docs and ask vLLM AI if you need help with the settings. Or share them here so we can check.

Gestefane RabbiOP•4mo ago

ok, thanks

Madiator2011•4mo ago

try to set it then via env variable

Gaming

Programming

Serverless not running with NIM - NVidia custom models

Did you find this page helpful?