Serverless Image generation times

Hey Runpod Team
we’ve been experiencing noticeable instability on Runpod lately, primarily related to how requests are assigned to workers. It seems that queued requests are sometimes being mapped to workers that are still pulling containers instead of ready ones. This leads to significant and inconsistent response times across our workloads. For a standard img2img workflow,
here’s a recent snapshot from our monitoring:

{
"total_requests": 27,
"success_count": 27,
"fail_count": 0,
"avg_runtime_s": 45.41,
"min_runtime_s": 19.04,
"max_runtime_s": 123.57,
"median_runtime_s": 33.38,
"p90_runtime_s": 76.83,
"p95_runtime_s": 89.63
}

Historically, our median runtime has been around 25 seconds for this workflow, so the current instability represents a substantial deviation.
We don’t believe this is related to CUDA (as before, fixed by runpod). The main issue seems to be the assignment of queued requests to non-ready (re-pulling) workers, which introduces unnecessary delays and resource waste. This behavior is directly impacting our customers through increased response times and occasional service interruptions.
We burning through money and customer trust, so i wish we can get some help. Already write in our dedicated channel on slack to the runpod team, but no answer yet.
Could you please look into this and advise if there’s a way to prevent queued requests from being routed to workers that are still pulling containers? Any guidance on stabilizing this behavior would be greatly appreciated.
Thanks a lot for your support!

Serverless Image generation times

Similar Threads

Serverless Image generation times

Similar Threads

Similar Threads

Similar Threads