Setting up MODEL_BASE_PATH when building worker-vllm image
I'm a little confused about this parameter in setting up worker-vllm. It seems to default to /runpod-volume, which to me implies a network volume, instead of getting baked into the image, but I'm not sure. A few questions:
1) If set to "/runpod-volume", does this mean that the model will be downloaded to that path automatically, and therefore won't be a part of the image (resulting in a much smaller image)?
2) Will I therefore need to set up a network volume when creating the endpoint?
3) Does the model get downloaded every time workers are created from a cold start? If not, then will I need to "run" a worker for a given amount of time at first to download the model?
1) If set to "/runpod-volume", does this mean that the model will be downloaded to that path automatically, and therefore won't be a part of the image (resulting in a much smaller image)?
2) Will I therefore need to set up a network volume when creating the endpoint?
3) Does the model get downloaded every time workers are created from a cold start? If not, then will I need to "run" a worker for a given amount of time at first to download the model?
Solution
Hey, if you are downloading the model at build, it will create a local folder within the image with whatever the model base path is and store it there
If you want to download onto the network volume, you can do the first option or build the image without model-related arguments, specifying the env variables mentioned in the docs
For example:
1.
2. Create an endpoint with 2x80gb (might have to request it from our team) and attach a network volume
3. specify the model name as mixtral in environment variables
When you send the first request to the endpoint, the worker will download the model to the network volume
If you want to download onto the network volume, you can do the first option or build the image without model-related arguments, specifying the env variables mentioned in the docs
For example:
1.
sudo docker build -t xyz:123 . and add cuda version arg if you need2. Create an endpoint with 2x80gb (might have to request it from our team) and attach a network volume
3. specify the model name as mixtral in environment variables
When you send the first request to the endpoint, the worker will download the model to the network volume
