How to configure auto scaling for load balancing endpoints?
From the documentation: "The method used to scale up workers on the created Serverless endpoint. If QUEUE_DELAY, workers are scaled based on a periodic check to see if any requests have been in queue for too long. If REQUEST_COUNT, the desired number of workers is periodically calculated based on the number of requests in the endpoint's queue. Use QUEUE_DELAY if you need to ensure requests take no longer than a maximum latency, and use REQUEST_COUNT if you need to scale based on the number of requests."
From what I understand the load balancing endpoints don't have a queue? How do I configure the auto scaling to work with serverless endpoints?
From what I understand the load balancing endpoints don't have a queue? How do I configure the auto scaling to work with serverless endpoints?