Stuck initializing vLLM
I'm using the official runpod vLLM image for a serverless endpoint, using all default settings (besides the model), and my workers are all stuck initializing. I have a job in the queue (just hitting /v1/models) for 20+ minutes now and there are five workers (3 regular + 2 extra) just spinning "initializing". I don't see anything in the worker logs. What am I doing wrong, and more to the point, how do I figure out what I'm doing wrong? Just seeing logs would be nice. (Endpoint id: 99ky9cmdcjj2hm)
15 Replies
I'm having this problem now - did you figure out any solutions?
same issue here. been stuck for 14 hours now

@dpk
Escalated To Zendesk
The thread has been escalated to Zendesk!
I gave up. I've had this issue several times in the past, I assume it's just the way serverless is
Unknown User•6d ago
Message Not Public
Sign In & Join Server To View
@Jason This issue still persists today. I can't launch a serverless endpoint using vllm with default settings.
Unknown User•6d ago
Message Not Public
Sign In & Join Server To View
Yes, I just started one about 10 minutes ago using a docker image and still initializing

doesnt roll out without being stuck initializing for minimum 12 hours for me

Unknown User•6d ago
Message Not Public
Sign In & Join Server To View
cant find them it was 12+ hours ago. some of the workers: b3b4sx1nxx51wh / aurqsfa3kndbu8
that link does not work btw
Unknown User•5d ago
Message Not Public
Sign In & Join Server To View
I'm still having this problem. I had heard that you were going to be releasing an update to possibly fix this yesterday. I'm sure the AWS outage overshadowed that, but any update on fixing this issue?
Unknown User•21h ago
Message Not Public
Sign In & Join Server To View