Delay Time spike via public API; same worker next job is ~2s
Hey folks! I’m seeing intermittent high Delay Time on a serverless endpoint and would love a sanity check. Setup: A40 (others enabled), concurrency=1, Auto Scaling = Request Count (4 req/worker), up to 9 workers, Min Workers sometimes >0. Symptom: Via public API, Delay Time jumps to 1–2 min. Same worker then handles the next request with ~2s delay. Execution Time goes from ~1m8s (first) to ~27s (next). Logs during slow runs look like cold start: Questions: 1) Does autoscaling count “starting/not-READY” workers as available, making requests wait and inflating Delay Time? 2) Does Delay Time include warm-up time until the worker becomes READY? 3) Any recent changes that could increase cold-start time on A40 pools? Thanks!
Recent Announcements
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!