Delay Time spike via public API; same worker next job is ~2s
Hey folks! I’m seeing intermittent high Delay Time on a serverless endpoint and would love a sanity check.
Setup: A40 (others enabled), concurrency=1, Auto Scaling = Request Count (4 req/worker), up to 9 workers, Min Workers sometimes >0.
Symptom: Via public API, Delay Time jumps to 1–2 min. Same worker then handles the next request with ~2s delay. Execution Time goes from ~1m8s (first) to ~27s (next).
Logs during slow runs look like cold start:
Questions:
1) Does autoscaling count “starting/not-READY” workers as available, making requests wait and inflating Delay Time?
2) Does Delay Time include warm-up time until the worker becomes READY?
3) Any recent changes that could increase cold-start time on A40 pools?
Thanks!
Setup: A40 (others enabled), concurrency=1, Auto Scaling = Request Count (4 req/worker), up to 9 workers, Min Workers sometimes >0.
Symptom: Via public API, Delay Time jumps to 1–2 min. Same worker then handles the next request with ~2s delay. Execution Time goes from ~1m8s (first) to ~27s (next).
Logs during slow runs look like cold start:
Questions:
1) Does autoscaling count “starting/not-READY” workers as available, making requests wait and inflating Delay Time?
2) Does Delay Time include warm-up time until the worker becomes READY?
3) Any recent changes that could increase cold-start time on A40 pools?
Thanks!