Need help in fixing long running deployments in serverless vLLM
Hi, I am trying to deploy migtissera/Tess-3-Mistral-Large-2-123B model on serverless using 8 48GB GPUs using vLLM.
The total size of model weights around 245 GB.
I have tried two ways: 1st way without any network volume, it takes really long time to serve first request as it needs to download the weights. Then if the worker goes to idle and I sent a request again it downloads weights and takes long time.
2nd way: I tried to use a 300GB network storage but it is usually gets stuck at half of model weights downloading and then the worker gets killed.
I am losing money fast because of this. Please help. I have attached all the screenshots.
Recent Announcements
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!