RunpodR
Runpod7mo ago
3WaD

Queue Delay Time

What is currently normal delay time? I remember that previously, it was normal to have delay times in milliseconds and container startups were near instant even on the coldest of cold starts. But lately, I have been observing queue delays I don't recognize. Up to the point where even my vLLM image can fully initialize the engine on cold start and compute a full response almost in the same time RunPod takes just to start the container. Similar goes for SDXL, although it is a bit better there. Does container image size affect it? But then why would it happen also on warm requests?

This makes it unthinkable to use. Creating a new endpoint or downgrading the RunPod SDK version didn't help. And overall there's nothing I could find that would allow me to influence this further. Plus as I said, it's not happening only on cold starts. The delay on a warm worker is even more extreme as it's sometimes longer than the execution time itself. (See screenshots)

*Please note that delay time in this case is truly only the queue delay, as I had to move the initialization (loading models etc.) into the handler where it's counted as execution because I wanted to allow users to change vLLM configuration per-cold start via request payload.

Any help would be appreciated since this is a major deal breaker.
Was this page helpful?
Queue Delay Time - Runpod