R
RunPod2mo ago
necreiP

Serverless vLLM workers crash

Whenever I create a serverless vLLM (doesn't matter what model I use), the workers all end up crashing and having the status "unhealthy". I went on the vLLM supported models website and I use only models that are supported. The last time I ran a serverless vLLM, I used meta-llama/Llama-3.1-70B, and used a proper huggingface token that allows access to the model. The result of trying to run the default "Hello World" prompt on this serverless vLLM is in the attached images. A worker has the status running, but when you open the stats, they all are at 0%, and there are no logs. The worker then has the status unhealthy, and is moved to the Extra section. In this specific scenario, the last worker had the status idle and never picked up the request. I didn't let it sit for too long, only about 10 minutes, but it did not pick up the request and start working.
No description
No description
4 Replies
Jason
Jason2mo ago
Can you check the logs?
necreiP
necreiPOP4w ago
no, when they crash the logs disappear
riverfog7
riverfog74w ago
i mean they will CUDA OOM 70B model needs 140+GB VRAM in FP16 and u r giving vLLM only 96GB lower precision to FP8 or INT4 INT4 with 32K context should work DM me if it keeps crashing
Jason
Jason4w ago
is it the default template? oh a40 yes you might need more vram.. select other gpu/ more gpu i'd recommend discusing here instead

Did you find this page helpful?