Load balancing + scaling
Hello,
Does anyone know how requests are balanced across workers?
This is important to understand in the context of autoscaling — especially if I’m using scaling based on queue delay and idle timeout.
I expected that a job would be taken by an active worker first, and only if no active workers are available, a scaled-up worker would take over.
Does anyone know how requests are balanced across workers?
This is important to understand in the context of autoscaling — especially if I’m using scaling based on queue delay and idle timeout.
I expected that a job would be taken by an active worker first, and only if no active workers are available, a scaled-up worker would take over.