RunpodR
Runpod2y ago
17 replies
pxmwxd

Serverless doesn't scale

Endpoint id: cilhdgrs7rbzya
I have some requests which requrie workers with 4 GTX 4090s. “max worker” of the endpoint is 150 and “Request Count” in Scale type is 1.

When I sent 78 requests concurrently, only ~20% of these requests could start in 10s. P80 need to wait for ~600s.

Is this because there is not enough GPUs? When stock status “availibity: high, how many workers can I expect to scale up in the mean time?
Was this page helpful?