Load balancing + scaling

Hello,

Does anyone know how requests are balanced across workers?
This is important to understand in the context of autoscaling — especially if I’m using scaling based on queue delay and idle timeout.

I expected that a job would be taken by an active worker first, and only if no active workers are available, a scaled-up worker would take over.
Was this page helpful?