Regular "throttled" status
Hi,
I've configured a serverless endpoint with the
I've observed that the single worker for this endpoint frequently enters and stays in the "Throttled" state. This seems to be causing significant delays in request processing, making them take much longer than the actual inference time.
Notably, this endpoint performed perfectly throughout the previous week. I only started noticing this frequent 'Throttled' status and the associated delays this week, starting around April 28th.
Could you provide some insights into potential factors that might be causing this frequent throttling?
I've configured a serverless endpoint with the
max_workers setting explicitly set to 1.I've observed that the single worker for this endpoint frequently enters and stays in the "Throttled" state. This seems to be causing significant delays in request processing, making them take much longer than the actual inference time.
Notably, this endpoint performed perfectly throughout the previous week. I only started noticing this frequent 'Throttled' status and the associated delays this week, starting around April 28th.
Could you provide some insights into potential factors that might be causing this frequent throttling?

Solution
when you set max worker to 1, your worker only deploy to single machine, when you not using it, we will give that machine to other people and when machine is fully used, your worker will be throttled. Highly suggest to avoid set max worker to be 1.