Serverless endpoint keeps getting throttled

I’m using RunPod Serverless, and even if I set maxWorkers = 5 for an endpoint, almost all requests end up getting throttled.
The pod starts, but most calls go straight into throttled state instead of getting a worker assigned.
Is this normal for this GPU type, or am I missing something in my setup?
Any insights would be appreciated.
Was this page helpful?