"Request Count scaling strategy adjusts worker numbers according to total requests in the queue and in progress. It automatically adds workers as the number of requests increases, ensuring tasks are handled efficiently. Total Workers Formula: Math.ceil((requestsInQueue + requestsInProgress) / 4)
Use this when you have many requests and workers won't have a chance to idle (e.g., with vLLM). This allows your app to scale down when traffic drops. With queue delay, once a worker scales up, if it's always busy, which makes scaling down harder."
"Request Count scaling strategy adjusts worker numbers according to total requests in the queue and in progress. It automatically adds workers as the number of requests increases, ensuring tasks are handled efficiently. Total Workers Formula: Math.ceil((requestsInQueue + requestsInProgress) / 4)
Use this when you have many requests and workers won't have a chance to idle (e.g., with vLLM). This allows your app to scale down when traffic drops. With queue delay, once a worker scales up, if it's always busy, which makes scaling down harder."