What does the delay time and execution mean in the request page?

Hey all, I'm not sure what the delay time mean in the Requests page. Is it about the cold start? Could someone help me understand it? Also, the execution time seems to be way larger than the duration I've logged. Is the execution time means the excution time of the handler function? Thanks!
Solution:
Yes, execution time is the execution time of the handler function runpod.serverless.start(). Delay time is not only cold start time, but also includes the time that your request is in the queue before a worker picks it up. Delay time can be dramatically impacted if all of your workers are throttled.
Jump to solution
23 Replies
Chanchana
Chanchana6mo ago
I am also curious about it. I think it'd be better if there's a page detailing how each duration was measured i.e. when exactly the timer starts and when exactly it stops.
Solution
ashleyk
ashleyk6mo ago
Yes, execution time is the execution time of the handler function runpod.serverless.start(). Delay time is not only cold start time, but also includes the time that your request is in the queue before a worker picks it up. Delay time can be dramatically impacted if all of your workers are throttled.
ssssteven
ssssteven6mo ago
Thanks @ashleyk it seems like the execution time is much longer than what I log in the prediction handler. I simply wrapped the prediction method with two timestamp and print out the timestamp diff at the end. It seems to be 40% longer than it should be. How should I debug this?
ashleyk
ashleyk6mo ago
Not sure, maybe @Justin can provide some insight
Justin Merrell
Justin Merrell6mo ago
What runpod SDK version are you using?
ssssteven
ssssteven6mo ago
runpod==1.5.0 thank you for helping
Justin Merrell
Justin Merrell6mo ago
Do you have an endpoint ID and/or job ids along with what what you are expecting the times to be?
ssssteven
ssssteven6mo ago
I just did another check, and it seems better now. There might be a bug in my code, but I will continue monitor it. Thanks for all your help. This platform is awesome @Justin Could you help me check this request id? 4c2a6cff-9576-423d-8c5e-a3f98d0ba9af-u1
ashleyk
ashleyk6mo ago
He is in the US so you will probably have to wait for him tio come online in a few hours time and the request ID may have expired by then.
ssssteven
ssssteven6mo ago
no rush. Actually, i think I know what's going on.. No need to take a look.. thank you all
Justin Merrell
Justin Merrell6mo ago
I see the request, however the endpoint does not appear to send out any pings. Which version of the python SDK are you using? And what is your active worker count set to?
ssssteven
ssssteven6mo ago
don't worry about it.. it turns out it's my aws lambda function has some issues. Sorry for wasting your time
ribbit
ribbit4mo ago
@ashleyk hi sorry to bring this up again, so does that mean execution time only measures what's happening inside the handler function? for example at:
runpod.serverless.start(
{
"handler": handler
}
)
runpod.serverless.start(
{
"handler": handler
}
)
So any other process outside of the handler function will be recorded in delay time?
ashleyk
ashleyk4mo ago
Yes thats correct.
ribbit
ribbit4mo ago
thanks!
justin
justin4mo ago
delay time will be the time in the queue + cold start (time it takes for worker to turn on if not already active) + anything that maybe is outside the scope of the handler like imports or if u load variables outside of it
ribbit
ribbit4mo ago
ahh ok thanks!
teddycatsdomino
teddycatsdomino4mo ago
I'm also trying to understand delay time and what we can do to improve. I'm getting delay times that are in excess of 10 minutes with only a single queued request. I have five workers (all GPU throttled).
ashleyk
ashleyk4mo ago
If all your workers are throttled, they will cause a massive increase in delay time. Delay time is time your request is in queue + cold start time.
teddycatsdomino
teddycatsdomino4mo ago
Is there anything we can do about throttled workers? I'm seeing a few parallel conversations about this in different Discord threads. The advice in other threads was to try other regions.
ashleyk
ashleyk4mo ago
No, you can either try changing to a different GPU tier with higher availability or if you're using network storage, create a new network storage in a different region and create a new endpoint that has higher availability.
justin
justin4mo ago
I think with the new changes, it should hopefully prevent the worst case scenario of full throttle, the service is down, but you'll still have a few throttled workers. if your tied to a network region, then it can be the region is unavailable, why if your model if small u should look into deploying it all in one image. About < 35GB for a docker container I find good, and I try to stay below 30GB personally
teddycatsdomino
teddycatsdomino4mo ago
You've just answered a question I posted in another thread. Where does that 35gb number come from? I think we could probably bundle everything into a container around that size.