Regular "throttled" status

Hi, I've configured a serverless endpoint with the max_workers setting explicitly set to 1. I've observed that the single worker for this endpoint frequently enters and stays in the "Throttled" state. This seems to be causing significant delays in request processing, making them take much longer than the actual inference time. Notably, this endpoint performed perfectly throughout the previous week. I only started noticing this frequent 'Throttled' status and the associated delays this week, starting around April 28th. Could you provide some insights into potential factors that might be causing this frequent throttling?
No description
Solution:
when you set max worker to 1, your worker only deploy to single machine, when you not using it, we will give that machine to other people and when machine is fully used, your worker will be throttled. Highly suggest to avoid set max worker to be 1.
Jump to solution
3 Replies
DIRECTcut ▲
DIRECTcut ▲3w ago
Noticed this too, only for endpoints with max_workers of 1. Seems to depend on the datacenter too. I just bumped workers to 2-3 and it goes away As a small hack to speed up redeploys while having multiple workers - you can scale them to 0 max_workers and back up to your value, which immediately starts the redeploy process
Jason
Jason3w ago
More demand on the datacenter probably Shortage of gpus
Solution
yhlong00000
yhlong000003w ago
when you set max worker to 1, your worker only deploy to single machine, when you not using it, we will give that machine to other people and when machine is fully used, your worker will be throttled. Highly suggest to avoid set max worker to be 1.

Did you find this page helpful?