Runpod•9mo ago

Workers keep respawning and requests queue indefinetely

Hi there I tried asking in the "ask-ai" channel. but I need some more help. "I've just deployed a servless endpoint on 3 regions, when 1 worker gets to about 7 mins running, it goes to idle then spawns a new worker. over and over. Is this normal? its a small model and workers have been running now for a bout 35 mins? I tried a request but that just goes into a queue and doesnt get completed" I then deleted my endpoint and recreated and still the same. I've tried "mistralai/Mistral-Small-24B-Instruct-2501" and then "deepseek-ai/DeepSeek-V3" Maybe the models are the issue? I understand you dont pay when workers are in idle? Now I manually terminated the worker and canceled the request else it will run till my credit is finished. Anyone have any ideas or am I missing something, do I have to create a handler because the docs dont say I need to before I run the "hello world" test? Thanks

14 Replies

BAS014OP•9mo ago

Oh and these are the regions I tried eu-se-1 and eur-is-1 and eur-is-2. incase that could be the issue

Unknown User•9mo ago

Message Not Public

BAS014OP•9mo ago

Hi @nerdylive Thanks so much for the response. appreciated. From what I understand "mistralai/Mistral-Small-24B-Instruct-2501" and "deepseek-ai/DeepSeek-V3" should be fine. vLLM support. "deepseek-ai/DeepSeek-V3" is one of the suggested models when creating an end-point, and that does it too. I've left the execution timeout to 600 seconds. Can't find anthing suspicious in the logs. The setup was pretty much default except for MAX_MODEL_LEN = 8192 (as suggested in the docs) Maybe I'm missunderstaning something regarding the models I've tried. Thanks again

logs.txt

Unknown User•9mo ago

Message Not Public

BAS014OP•9mo ago

Ok yeah, how long does that take? It runs for 10 minutes then I’ve terminated the worker, last night at one stage it ran for 35 minutes and then I terminated it. Not sure how this works. Thanks for the help

Unknown User•9mo ago

Message Not Public

BAS014OP•9mo ago

Ok perfect, thanks so much. I’ll try a smaller model I’ve been running the request from my python app and the Hello World request test in serverless setup I've tried "meta-llama/Meta-Llama-3-8B-Instruct" and getting the same result as @Stewette I'll try a few different configs etc. maybe I find something. will keep you updated

Unknown User•9mo ago

Message Not Public

BAS014OP•9mo ago

Great, let me try, Thanks @nerdylive!

Stewette•9mo ago

@BAS014 I tried switching it up by following runpod tutorials and running a 160gb serverless instance with attached network volume for a 70b model. I followed the official runpod tutorials. It started up the instance without error, downloaded the model to the network volume, loaded it into memory, and then just... nothing. A request stayed pending in the queue even though the worker was up and it didn't process it. I let it hang like that for about 5 minutes before just pulling the plug. I've tried other configurations, and even when I don't get an error message of any kind and the instance has enough memory, nothing happens. I've been following official runpod tutorials the whole time. The service might be really spotty, but I've already been charged a few dollars just trying to follow the official docs, which is concerning.

Unknown User•9mo ago

Message Not Public

BAS014OP•9mo ago

Hi @Stewette, @nerdylive I wasnt successfull so I sent a support ticket to RunPod. They replied and asked for logs. I'll let you know what happens.

sjt80•8mo ago

Hi @BAS014 were they able to help you? I am having the same problem loading a decent sized model onto 4 x GPU's. I have tried extending the executionTimeout to 30 mins on both the request and on the serverless endpoint configuration but my worker ignores it. It currently 'gives up' on the worker just before 10 mins each time. It's so frustrating as the logs show the model either partially loading in memory or completely loads but moves on to the next worker right before it finishes the job!

Unknown User•8mo ago

Message Not Public

Gaming

Programming

Workers keep respawning and requests queue indefinetely

Did you find this page helpful?