Runpod•5mo ago

Load balancing + scaling

Hello, Does anyone know how requests are balanced across workers? This is important to understand in the context of autoscaling — especially if I’m using scaling based on queue delay and idle timeout. I expected that a job would be taken by an active worker first, and only if no active workers are available, a scaled-up worker would take over.

10 Replies

Unknown User•5mo ago

Message Not Public

Eugene_SwanleyOP•5mo ago

If I have two warm workers with num active = 1, how are jobs balanced between them?

Unknown User•5mo ago

Message Not Public

Eugene_SwanleyOP•5mo ago

It’s about autoscaling. If I have even a single request that triggers scaling, then a new worker ends up running almost all the time — unless I set a very short idle timeout. But setting it too low could lead to long cold starts. I was hoping that this scaled worker wouldn’t receive any requests as long as there are “active” (non-scaled) workers available.

Unknown User•5mo ago

Message Not Public

Eugene_SwanleyOP•5mo ago

Scaling up works perfectly, but scaling down doesn’t — at least in my case. Imagine there’s a spike in workload, and a couple of additional workers are spun up. After the spike ends, those workers can’t scale down because they’re still getting requests — so they never go idle. That’s exactly what’s happening on my side. As a result, I end up paying extra for workers I no longer need to handle the current workload.

Unknown User•5mo ago

Message Not Public

Poddy•5mo ago

@Eugene_Swanley

Escalated To Zendesk

The thread has been escalated to Zendesk!

Ticket ID: #18610

Unknown User•5mo ago

Message Not Public

Eugene_SwanleyOP•5mo ago

Probably it can be not idle, but is able to handle additional request (as I use concurrent workers), and I'm not sure that it's respected by the balancer

Gaming

Programming

Load balancing + scaling

Did you find this page helpful?