RunPod•16mo ago

What does the delay time and execution mean in the request page?

Hey all, I'm not sure what the delay time mean in the Requests page. Is it about the cold start? Could someone help me understand it? Also, the execution time seems to be way larger than the duration I've logged. Is the execution time means the excution time of the handler function? Thanks!

Solution:

Yes, execution time is the execution time of the handler function runpod.serverless.start(). Delay time is not only cold start time, but also includes the time that your request is in the queue before a worker picks it up. Delay time can be dramatically impacted if all of your workers are throttled.

Jump to solution

23 Replies

Chanchana•16mo ago

I am also curious about it. I think it'd be better if there's a page detailing how each duration was measured i.e. when exactly the timer starts and when exactly it stops.

Solution

ashleyk•16mo ago

sssstevenOP•16mo ago

Thanks @ashleyk it seems like the execution time is much longer than what I log in the prediction handler. I simply wrapped the prediction method with two timestamp and print out the timestamp diff at the end. It seems to be 40% longer than it should be. How should I debug this?

ashleyk•16mo ago

Not sure, maybe @Justin can provide some insight

Justin Merrell•16mo ago

What runpod SDK version are you using?

sssstevenOP•16mo ago

runpod==1.5.0 thank you for helping

Justin Merrell•16mo ago

Do you have an endpoint ID and/or job ids along with what what you are expecting the times to be?

sssstevenOP•16mo ago

I just did another check, and it seems better now. There might be a bug in my code, but I will continue monitor it. Thanks for all your help. This platform is awesome @Justin Could you help me check this request id? 4c2a6cff-9576-423d-8c5e-a3f98d0ba9af-u1

ashleyk•16mo ago

He is in the US so you will probably have to wait for him tio come online in a few hours time and the request ID may have expired by then.

sssstevenOP•16mo ago

no rush. Actually, i think I know what's going on.. No need to take a look.. thank you all

Justin Merrell•16mo ago

I see the request, however the endpoint does not appear to send out any pings. Which version of the python SDK are you using? And what is your active worker count set to?

sssstevenOP•16mo ago

don't worry about it.. it turns out it's my aws lambda function has some issues. Sorry for wasting your time

ribbit•15mo ago

@ashleyk hi sorry to bring this up again, so does that mean execution time only measures what's happening inside the handler function? for example at:

runpod.serverless.start(
        {
            "handler": handler
        }
    )

runpod.serverless.start(
        {
            "handler": handler
        }
    )

So any other process outside of the handler function will be recorded in delay time?

ashleyk•15mo ago

Yes thats correct.

ribbit•15mo ago

thanks!

J.•15mo ago

delay time will be the time in the queue + cold start (time it takes for worker to turn on if not already active) + anything that maybe is outside the scope of the handler like imports or if u load variables outside of it

ribbit•15mo ago

ahh ok thanks!

teddycatsdomino•15mo ago

I'm also trying to understand delay time and what we can do to improve. I'm getting delay times that are in excess of 10 minutes with only a single queued request. I have five workers (all GPU throttled).

ashleyk•15mo ago

If all your workers are throttled, they will cause a massive increase in delay time. Delay time is time your request is in queue + cold start time.

teddycatsdomino•15mo ago

Is there anything we can do about throttled workers? I'm seeing a few parallel conversations about this in different Discord threads. The advice in other threads was to try other regions.

ashleyk•15mo ago

No, you can either try changing to a different GPU tier with higher availability or if you're using network storage, create a new network storage in a different region and create a new endpoint that has higher availability.

J.•15mo ago

I think with the new changes, it should hopefully prevent the worst case scenario of full throttle, the service is down, but you'll still have a few throttled workers. if your tied to a network region, then it can be the region is unavailable, why if your model if small u should look into deploying it all in one image. About < 35GB for a docker container I find good, and I try to stay below 30GB personally

teddycatsdomino•15mo ago

You've just answered a question I posted in another thread. Where does that 35gb number come from? I think we could probably bundle everything into a container around that size.

Gaming

Programming

What does the delay time and execution mean in the request page?

Did you find this page helpful?