Worker Throttling Issues After Attaching Network Volume to Serverless Endpoint
I'm currently operating a queue-based serverless endpoint that handles LLM (Large Language Model) workloads. To optimize performance by caching model weights across workers, I attempted to attach a network volume. However, this configuration has led to significant degradation in worker availability—most workers now appear as throttled for the majority of the day. Is there a recommended workaround or alternative approach to achieve model weight caching without causing these throttling issues? Any guidance on best practices for shared storage in serverless setups would be greatly appreciated.
Recent Announcements
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!