Runpod•7mo ago

Serverless vLLM workers crash

Whenever I create a serverless vLLM (doesn't matter what model I use), the workers all end up crashing and having the status "unhealthy". I went on the vLLM supported models website and I use only models that are supported. The last time I ran a serverless vLLM, I used meta-llama/Llama-3.1-70B, and used a proper huggingface token that allows access to the model. The result of trying to run the default "Hello World" prompt on this serverless vLLM is in the attached images. A worker has the status running, but when you open the stats, they all are at 0%, and there are no logs. The worker then has the status unhealthy, and is moved to the Extra section. In this specific scenario, the last worker had the status idle and never picked up the request. I didn't let it sit for too long, only about 10 minutes, but it did not pick up the request and start working.

4 Replies

Unknown User•7mo ago

Message Not Public

necreiPOP•7mo ago

no, when they crash the logs disappear

riverfog7•7mo ago

i mean they will CUDA OOM 70B model needs 140+GB VRAM in FP16 and u r giving vLLM only 96GB lower precision to FP8 or INT4 INT4 with 32K context should work DM me if it keeps crashing

Unknown User•7mo ago

Message Not Public

Gaming

Programming

Serverless vLLM workers crash

Did you find this page helpful?