RunpodR
Runpod11mo ago
zethos

Need help in fixing long running deployments in serverless vLLM

Hi, I am trying to deploy migtissera/Tess-3-Mistral-Large-2-123B model on serverless using 8 48GB GPUs using vLLM.

The total size of model weights around 245 GB.

I have tried two ways: 1st way without any network volume, it takes really long time to serve first request as it needs to download the weights. Then if the worker goes to idle and I sent a request again it downloads weights and takes long time.

2nd way: I tried to use a 300GB network storage but it is usually gets stuck at half of model weights downloading and then the worker gets killed.

I am losing money fast because of this. Please help.
I have attached all the screenshots.
image.png
image.png
image.png
Was this page helpful?