I’m using RunPod Serverless, and even if I set maxWorkers = 5 for an endpoint, almost all requests end up getting throttled. The pod starts, but most calls go straight into throttled state instead of getting a worker assigned. Is this normal for this GPU type, or am I missing something in my setup? Any insights would be appreciated.
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!