Hello, new serverless user here, i would be using the vllm worker, so whenever it gets spun up from a coldstart, does it have to download the model everytime? Id be running it in fp16 which means it be about 14gb of data to download
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!