R
RunPod3mo ago
BAS014

Workers keep respawning and requests queue indefinetely

Hi there I tried asking in the "ask-ai" channel. but I need some more help. "I've just deployed a servless endpoint on 3 regions, when 1 worker gets to about 7 mins running, it goes to idle then spawns a new worker. over and over. Is this normal? its a small model and workers have been running now for a bout 35 mins? I tried a request but that just goes into a queue and doesnt get completed" I then deleted my endpoint and recreated and still the same. I've tried "mistralai/Mistral-Small-24B-Instruct-2501" and then "deepseek-ai/DeepSeek-V3" Maybe the models are the issue? I understand you dont pay when workers are in idle? Now I manually terminated the worker and canceled the request else it will run till my credit is finished. Anyone have any ideas or am I missing something, do I have to create a handler because the docs dont say I need to before I run the "hello world" test? Thanks
14 Replies
BAS014
BAS014OP3mo ago
Oh and these are the regions I tried eu-se-1 and eur-is-1 and eur-is-2. incase that could be the issue
Jason
Jason3mo ago
VLLM?, not sure if its supported, if it is, check the logs whats up in the logs.. and yes idle workers arent charged maybe its your execution timeout? try increasing it abit, and check the logs
BAS014
BAS014OP3mo ago
Hi @nerdylive Thanks so much for the response. appreciated. From what I understand "mistralai/Mistral-Small-24B-Instruct-2501" and "deepseek-ai/DeepSeek-V3" should be fine. vLLM support. "deepseek-ai/DeepSeek-V3" is one of the suggested models when creating an end-point, and that does it too. I've left the execution timeout to 600 seconds. Can't find anthing suspicious in the logs. The setup was pretty much default except for MAX_MODEL_LEN = 8192 (as suggested in the docs) Maybe I'm missunderstaning something regarding the models I've tried. Thanks again
Jason
Jason3mo ago
So the model isn't loaded yet in your logs
BAS014
BAS014OP3mo ago
Ok yeah, how long does that take? It runs for 10 minutes then I’ve terminated the worker, last night at one stage it ran for 35 minutes and then I terminated it. Not sure how this works. Thanks for the help
Jason
Jason3mo ago
I'm not sure, you'll have to test it out, each model is different and larger ones will take more time But how do you send a request
BAS014
BAS014OP3mo ago
Ok perfect, thanks so much. I’ll try a smaller model I’ve been running the request from my python app and the Hello World request test in serverless setup I've tried "meta-llama/Meta-Llama-3-8B-Instruct" and getting the same result as @Stewette I'll try a few different configs etc. maybe I find something. will keep you updated
Jason
Jason3mo ago
oh maybe OOM? try a bigger GPU with more VRAM deepseek r1 model is big, so it should run OOM on most serverless except you ask custom specifications from runpod that has enough vram OOM -> worker dies -> trying to load again in new worker -> cycle? i guess something like that
BAS014
BAS014OP3mo ago
Great, let me try, Thanks @nerdylive!
Stewette
Stewette3mo ago
@BAS014 I tried switching it up by following runpod tutorials and running a 160gb serverless instance with attached network volume for a 70b model. I followed the official runpod tutorials. It started up the instance without error, downloaded the model to the network volume, loaded it into memory, and then just... nothing. A request stayed pending in the queue even though the worker was up and it didn't process it. I let it hang like that for about 5 minutes before just pulling the plug. I've tried other configurations, and even when I don't get an error message of any kind and the instance has enough memory, nothing happens. I've been following official runpod tutorials the whole time. The service might be really spotty, but I've already been charged a few dollars just trying to follow the official docs, which is concerning.
Jason
Jason3mo ago
how about 10 minutes? maybe it takes long.. can you share the logs? what does the vllm logs look like in your worker
BAS014
BAS014OP3mo ago
Hi @Stewette, @nerdylive I wasnt successfull so I sent a support ticket to RunPod. They replied and asked for logs. I'll let you know what happens.
sjt80
sjt802mo ago
Hi @BAS014 were they able to help you? I am having the same problem loading a decent sized model onto 4 x GPU's. I have tried extending the executionTimeout to 30 mins on both the request and on the serverless endpoint configuration but my worker ignores it. It currently 'gives up' on the worker just before 10 mins each time. It's so frustrating as the logs show the model either partially loading in memory or completely loads but moves on to the next worker right before it finishes the job!
Jason
Jason2mo ago
Hi guys if your model is open source and working in vllm, let me know, maybe i can try few different setups and let you guys know what configs working or send logs

Did you find this page helpful?