R
Runpod7mo ago
necreiP

Serverless vLLM workers crash

Whenever I create a serverless vLLM (doesn't matter what model I use), the workers all end up crashing and having the status "unhealthy". I went on the vLLM supported models website and I use only models that are supported. The last time I ran a serverless vLLM, I used meta-llama/Llama-3.1-70B, and used a proper huggingface token that allows access to the model. The result of trying to run the default "Hello World" prompt on this serverless vLLM is in the attached images. A worker has the status running, but when you open the stats, they all are at 0%, and there are no logs. The worker then has the status unhealthy, and is moved to the Extra section. In this specific scenario, the last worker had the status idle and never picked up the request. I didn't let it sit for too long, only about 10 minutes, but it did not pick up the request and start working.
No description
No description
4 Replies
Unknown User
Unknown User7mo ago
Message Not Public
Sign In & Join Server To View
necreiP
necreiPOP7mo ago
no, when they crash the logs disappear
riverfog7
riverfog77mo ago
i mean they will CUDA OOM 70B model needs 140+GB VRAM in FP16 and u r giving vLLM only 96GB lower precision to FP8 or INT4 INT4 with 32K context should work DM me if it keeps crashing
Unknown User
Unknown User7mo ago
Message Not Public
Sign In & Join Server To View

Did you find this page helpful?