Runpod•3mo ago

vLLM jobs not processing: "deferring container creation"

We just noticed there are 2000+ jobs waiting in our queue and no jobs in progress. I'm getting super-frustrated with Serverless.

In the logs I see this message: "deferring container creation: waiting for models to complete: [meta-llama/llama-3.3-70b-instruct]"

I just terminated a few workers hoping that they would start back up and work again, but can someone help me figure out how to resolve this? Why are my workers not processing jobs (which has been working mostly ok for a couple of weeks now with no changes)

mentiroOP•10/14/25, 2:28 PM

I'm just using the stock vLLM container with cached models and llama 3.3. You would think that would work flawlessly with Serverless.

mentiroOP•10/14/25, 2:45 PM

I was able to tweak our endpoint settings and terminate workers, and one worker started up and processed jobs. However, 5 more are stuck in "Initializing" - for a queue of 2000 you'd expect more workers. And the "Initializing" workers don't show any Logs, so I have no idea why they aren't starting. I even terminated 2-3, and when new ones started "Initializing" they also have no logs and seem stuck.

mentiroOP•10/14/25, 2:47 PM

The first server has been started up for 15 minutes now processing jobs, and all other workers are just stuck in "Initializing" with no logs.

mentiroOP•10/14/25, 2:54 PM

@flash-singh @justin (New) [Staff Not Staff] any help here?

J.•10/14/25, 2:57 PM

Hi @mentiro , can you dm me your email runpod email? Ill look to bring this up to the team

mentiroOP•10/14/25, 3:22 PM

Any update here @justin (New) [Staff Not Staff] ? I'd like to make changes to our endpoint settings, but I don't want to mess with things before your team can help diagnose the issue.

Mmentiro Any update here @justin (New) [Staff Not Staff] ? I'd like to make changes to ou...

J.•10/14/25, 3:33 PM

Currently being looked at / worked on by one our cloud engineers. Not specific updates as of this moment.

Mmentiro Any update here @justin (New) [Staff Not Staff] ? I'd like to make changes to ou...

J.•10/14/25, 3:56 PM

Just an updated, will follow up thro zendesk communication to prevent splitting communication.

mentiroOP•10/14/25, 4:53 PM

Hi @justin (New) [Staff Not Staff] - I've posted a lot of updates and we're back to an urgent production issue because we have jobs backing up and no workers will run to handle them. I haven't heard any updates on my ZenDesk ticket in an hour. Any way you can check on it? Thanks!

mentiroOP•10/14/25, 6:24 PM

ZenDesk seems to think it's a problem with Hugging Face gated models. They suggested my HF token was expired, but it is not. I still tried switching the token and it did not fix it. They then suggested switching to an ungated version of Llama and creating a new endpoint. This also did not work - the new endpoint is just stuck in "initializing"

mentiroOP•10/14/25, 6:25 PM

This makes me think that the problem is not with gated models, but a deeper problem with Serverless or maybe Serverless cached models.

Dj•10/14/25, 6:37 PM

@justin (New) [Staff Not Staff]

Mmentiro ZenDesk seems to think it's a problem with Hugging Face gated models. They sugge...

J.•10/14/25, 6:37 PM

Hi do u think we can call

J.•10/14/25, 6:37 PM

Might be easier

mentiroOP•10/14/25, 6:37 PM

Yes, sure.

Mmentiro Yes, sure.

J.•10/14/25, 6:38 PM

dmed

Mmentiro We just noticed there are 2000+ jobs waiting in our queue and no jobs in progres...

abrarfahim•10/15/25, 3:23 PM

same here. did u fined a solution?

mentiroOP•10/21/25, 4:38 PM

@abrarfahim I had been talking with @J. about this. It seems like there were some changes to downloading models, and if a model failed to download, the worker would just be stuck in "initializing" forever.

My account has seen some improvement - I am able to get workers up. However, I still end up with several workers stuck in "initializing" forever.

Justin mentioned that they were going to be rolling out a patch on Monday, but I'm sure the AWS outage overshadowed that. I'm waiting to hear when the update is released to see if it fixes this issue. Any updates?

Jason•10/21/25, 11:26 PM

If you use a hf token does it helps?

Jason•10/21/25, 11:26 PM

For the model download

Mmentiro @abrarfahim I had been talking with @J. about this. It seems like there were so...

J.•10/22/25, 5:43 PM

Patch went out yesterday, but validating that things are stable

marrrcin_•10/27/25, 6:50 PM

This does not seem to be fixed, I've tried running vLLM using latest hub image (v2.9.6) and it does not initialize at all - tried with / without HF_TOKEN, it gets stuck in an endless deferring container creation: waiting for models to complete:deferring container creation: waiting for models to complete: loop.

marrrcin_•10/27/25, 6:50 PM

@justin (New) [Staff Not Staff] can you shed some light on it?

marrrcin_•10/27/25, 7:54 PM

As far as I can see, it's a problem with the vLLM image itself, rolling back to 2.6.0 works just fine

vLLM jobs not processing: "deferring container creation"

Similar Threads

Similar Threads

Similar Threads