© 2026 Hedgehog Software, LLC

Twitter GitHub Discord

More

Communities Docs About Terms Privacy

Clarification on Serverless vLLM image caching - Runpod

Runpod•4mo ago•

6 replies

Clarification on Serverless vLLM image caching

For the Runpod Serverless vLLM worker- are the Docker images cached on GPU hosts or is the ~12 GB vLLM image pulled from Docker Hub every time a new worker spins up?
In other words, do serverless workers reuse cached images across runs, or does each cold start trigger a full image pull?

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

21,202Members

Resources

Recent Announcements

Similar Threads

Was this page helpful?

Similar Threads

Efficient serverless release with image caching

RRunpod / ⚡｜serverless

Serverless VLLM batching

RRunpod / ⚡｜serverless

Serverless vllm - lora

RRunpod / ⚡｜serverless

vLLM Serverless error

RRunpod / ⚡｜serverless