Regular "throttled" status
Hi,
I've configured a serverless endpoint with the
max_workers
setting explicitly set to 1.
I've observed that the single worker for this endpoint frequently enters and stays in the "Throttled" state. This seems to be causing significant delays in request processing, making them take much longer than the actual inference time.
Notably, this endpoint performed perfectly throughout the previous week. I only started noticing this frequent 'Throttled' status and the associated delays this week, starting around April 28th.
Could you provide some insights into potential factors that might be causing this frequent throttling?
Solution:Jump to solution
when you set max worker to 1, your worker only deploy to single machine, when you not using it, we will give that machine to other people and when machine is fully used, your worker will be throttled. Highly suggest to avoid set max worker to be 1.
3 Replies
Noticed this too, only for endpoints with max_workers of 1. Seems to depend on the datacenter too. I just bumped workers to 2-3 and it goes away
As a small hack to speed up redeploys while having multiple workers - you can scale them to 0 max_workers and back up to your value, which immediately starts the redeploy process
More demand on the datacenter probably
Shortage of gpus
Solution
when you set max worker to 1, your worker only deploy to single machine, when you not using it, we will give that machine to other people and when machine is fully used, your worker will be throttled. Highly suggest to avoid set max worker to be 1.