R
RunPod•4mo ago
Superintendent

Deepseek coder on serverless

Hello, new serverless user here, i would be using the vllm worker, so whenever it gets spun up from a coldstart, does it have to download the model everytime? Id be running it in fp16 which means it be about 14gb of data to download
6 Replies
justin
justin•4mo ago
If ur script says so then yes So u can either bake into ur docker image or use network storage to persist ur model between runs network storage had some impact to speed due to being on an external drive essentially but can still be decent What you can do to make it easy on yourself is that if you have a Docker file, write a simple bash script to trigger a tiny python script to do a VLLM job like "hello world" and it will "automatically" go and download the models and stuff, into the docker file during build itme 🙂 or again, network volume
Superintendent
Superintendent•4mo ago
oh really neat thanks
justin
justin•4mo ago
**i could be wrong on this for VLLM actually lol. I wonder if VLLM will crash bc of no GPU, ive remember it has done that There might be other ways to do it, like prob could just download the model urself where VLLM expects it but idk how VLLM downloads / prepares models using an HF download or curl whatever.
JJonahJ
JJonahJ•4mo ago
I’ve been following the instructions for ‘option 2’ on this page: https://github.com/runpod-workers/worker-vllm
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
JJonahJ
JJonahJ•4mo ago
It’s like ‘open a folder, github bash clone the repo, open the command line, put in that one line
sudo docker build -t username/image:tag --build-arg MODEL_NAME="TheBloke/deepseek-coder-33B-instruct-AWQ" --build-arg BASE_PATH="/models" .
sudo docker build -t username/image:tag --build-arg MODEL_NAME="TheBloke/deepseek-coder-33B-instruct-AWQ" --build-arg BASE_PATH="/models" .
Windows doesn’t need sudo. Model name is copied using the huggingface button. Username/image:tag needs to be your username and chosen image name/tag (I’m sure you know this already) and to be in all lower-case, and runpod requires a tag (I’ve just been mostly using 0.1 so far) It’s been working. edit: I put the name in for deepseek coder awq quantized. I have not tried this one personally. Note that GGUF quants won’t work with vLLM afaik.
Alpay Ariyak
Alpay Ariyak•4mo ago
If you attach a network volume to the endpoint, then the model will only be downloaded once, as long as you’re using our vLLM worker.