For the Runpod Serverless vLLM worker- are the Docker images cached on GPU hosts or is the ~12 GB vLLM image pulled from Docker Hub every time a new worker spins up? In other words, do serverless workers reuse cached images across runs, or does each cold start trigger a full image pull?
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!