Runpod•2w ago

New load balancer serverless endpoint type questions

Hey team ! In the past, i've tried to use runpod's queue based serverless for my voice AI project but the added job queue latency was just making this impossible. Voice AI required sub 200ms inference latency and the overhead made it huge and unpredictable. This is ok for long running jobs but not for high frequency / low latency. This new load balancer serverless endpoint type looks amazing and seem to be solving a real feature gap in the GPU provider game. However, i'm lacking some informations: - Scaling algorithm type: how does the auto scaler decide it's time to boot up a new pod ? In my case i'd like to use either numbers of sessions per worker, or average time to first token - How is the load balancer actually balancing ? Is there any way to implement sticky sessions for instance ? Especially in the vllm example, it's better if the same conversation stay on the same worker 🙏 None of this stuff appear to be documented, and I think these are some pretty important parameters for a load balencer. Waiting for some guidance on this as this is the only thing preventing us to migrate our infra to it 🙂

8 Replies

Unknown User•2w ago

Message Not Public

morrowOP•2w ago

I couldn't see any of this in the create "load balancer" endpoint creation form. I do remember they are available with regular "job queue" serverless endpoints thought.

Unknown User•2w ago

Message Not Public

morrowOP•2w ago

Ah ok I need to deploy first then edit 😅

Unknown User•2w ago

Message Not Public

morrowOP•2w ago

Do you think I could use the API to programmatically start or end workers based on my own metrics ?

Unknown User•2w ago

Message Not Public

flash-singh•2w ago

- so far no sticky session support - no support for programmatic of start or end workers, this is interesting to explore, we still want to avoid any latencties; any external metrics introduces higher latency

Gaming

Programming

New load balancer serverless endpoint type questions

Did you find this page helpful?