Clarification on Serverless vLLM image caching
For the Runpod Serverless vLLM worker- are the Docker images cached on GPU hosts or is the ~12 GB vLLM image pulled from Docker Hub every time a new worker spins up?
In other words, do serverless workers reuse cached images across runs, or does each cold start trigger a full image pull?
In other words, do serverless workers reuse cached images across runs, or does each cold start trigger a full image pull?
