R
Runpod•7d ago
mentiro

vLLM jobs not processing: "deferring container creation"

We just noticed there are 2000+ jobs waiting in our queue and no jobs in progress. I'm getting super-frustrated with Serverless. In the logs I see this message: "deferring container creation: waiting for models to complete: [meta-llama/llama-3.3-70b-instruct]" I just terminated a few workers hoping that they would start back up and work again, but can someone help me figure out how to resolve this? Why are my workers not processing jobs (which has been working mostly ok for a couple of weeks now with no changes)
10 Replies
mentiro
mentiroOP•7d ago
I'm just using the stock vLLM container with cached models and llama 3.3. You would think that would work flawlessly with Serverless. I was able to tweak our endpoint settings and terminate workers, and one worker started up and processed jobs. However, 5 more are stuck in "Initializing" - for a queue of 2000 you'd expect more workers. And the "Initializing" workers don't show any Logs, so I have no idea why they aren't starting. I even terminated 2-3, and when new ones started "Initializing" they also have no logs and seem stuck. The first server has been started up for 15 minutes now processing jobs, and all other workers are just stuck in "Initializing" with no logs. @flash-singh @justin (New) [Staff Not Staff] any help here?
J.
J.•7d ago
Hi @mentiro , can you dm me your email runpod email? Ill look to bring this up to the team
mentiro
mentiroOP•7d ago
Any update here @justin (New) [Staff Not Staff] ? I'd like to make changes to our endpoint settings, but I don't want to mess with things before your team can help diagnose the issue.
J.
J.•7d ago
Currently being looked at / worked on by one our cloud engineers. Not specific updates as of this moment. Just an updated, will follow up thro zendesk communication to prevent splitting communication.
mentiro
mentiroOP•7d ago
Hi @justin (New) [Staff Not Staff] - I've posted a lot of updates and we're back to an urgent production issue because we have jobs backing up and no workers will run to handle them. I haven't heard any updates on my ZenDesk ticket in an hour. Any way you can check on it? Thanks! ZenDesk seems to think it's a problem with Hugging Face gated models. They suggested my HF token was expired, but it is not. I still tried switching the token and it did not fix it. They then suggested switching to an ungated version of Llama and creating a new endpoint. This also did not work - the new endpoint is just stuck in "initializing" This makes me think that the problem is not with gated models, but a deeper problem with Serverless or maybe Serverless cached models.
Dj
Dj•7d ago
@justin (New) [Staff Not Staff]
J.
J.•7d ago
Hi do u think we can call Might be easier 🙂
mentiro
mentiroOP•7d ago
Yes, sure.
J.
J.•7d ago
dmed
abrarfahim
abrarfahim•6d ago
same here. did u fined a solution?

Did you find this page helpful?