R
Runpod9mo ago
zethos

Need help in fixing long running deployments in serverless vLLM

Hi, I am trying to deploy migtissera/Tess-3-Mistral-Large-2-123B model on serverless using 8 48GB GPUs using vLLM. The total size of model weights around 245 GB. I have tried two ways: 1st way without any network volume, it takes really long time to serve first request as it needs to download the weights. Then if the worker goes to idle and I sent a request again it downloads weights and takes long time. 2nd way: I tried to use a 300GB network storage but it is usually gets stuck at half of model weights downloading and then the worker gets killed. I am losing money fast because of this. Please help. I have attached all the screenshots.
No description
No description
No description
10 Replies
zethos
zethosOP9mo ago
@nerdylive Could you please help? I tried loading the model to network volume using a pod and then attach the network volume to the serverless instaance. Still, its taking time to load.
Unknown User
Unknown User9mo ago
Message Not Public
Sign In & Join Server To View
zethos
zethosOP9mo ago
yes. I increased the execution timeout and it works for some time. And then when worker goes idle, again it needs to load the models and a 15-20 misn wait.
Unknown User
Unknown User9mo ago
Message Not Public
Sign In & Join Server To View
zethos
zethosOP9mo ago
Yes, this is what I am going to do. The model has around 51 files with total siz around 240 GB. I am thinking of building the docker image with the whole 245 GB of files inside using the Option 2 montioned here: https://github.com/runpod-workers/worker-vllm?tab=readme-ov-file#option-2-build-docker-image-with-model-inside Do you think it will be too much 245 GB + take around 20 GB for Ubuntu and cuda drivers.
Unknown User
Unknown User9mo ago
Message Not Public
Sign In & Join Server To View
zethos
zethosOP9mo ago
yeah, previously I tried with many whisper and Bert based models in a single docker image itself and it worked. May be beacuse of docker image size is small. Hardly 20 GB max. yeah, thanks. Can you tell me the exact template? Or, you were refering to the vLLM template?
Unknown User
Unknown User9mo ago
Message Not Public
Sign In & Join Server To View
zethos
zethosOP9mo ago
yeah, also dockerhub has size limit of 100GB, so I cannot put modelfiles inside docker and upload to dockerhub
Unknown User
Unknown User9mo ago
Message Not Public
Sign In & Join Server To View

Did you find this page helpful?