Runpod•7mo ago

DeleayTime beeing really high

I am running a serverless worker with cpu only and have a really high delayTime. First boot takes ~ 8 seconds, after that i have around 1 seconds delay time for each request. My executionTime is only 0.1 seconds so my delayTime is 10x my executionTime. When i had a serverless gpu worker my delayTime was way lower than this, is there a fix for that? Thanks in advance

21 Replies

Unknown User•7mo ago

Message Not Public

riverfog7•7mo ago

if you are using a custom image then pulling the image can take time

larsOP•7mo ago

The initial loading does take longer, approximately 8 seconds when the worker is turned on. However, the delay time remains around 1 second for subsequent requests while the worker stays on. If i am sending 100 requests the delayTime remains high, so i guess that it is no boot/loading issue. As mentioned, it's CPU-only, so there is no GPU booting or loading involved. I am using a custom image, but the initial loading time is not my concern. Rather, I am concerned about the continuous high delay time after the first run.

Unknown User•7mo ago

Message Not Public

larsOP•6mo ago

I am running an efficientnet on the pod. The runpod api gives me 2 times, the response is looking like that: for the first run: {'delayTime': 9169, 'executionTime': 173, .... "output": ....} for subsequent runs: {'delayTime': 850, 'executionTime': 160, .... "output": ....} i timed the execution time of my code, which is a little below the stated executionTime, about what i expect. I restricted the cpu usage on my machine and it was about the same as the executionTime of runpod. I thought that the delayTime is something in the background of runpod that i personally can't influence. But almost a second is way to high, especially if they say in the documentation that it should be very small.

Unknown User•6mo ago

Message Not Public

larsOP•6mo ago

I Timed my code directly at the start of the handler until before the handler returns its result. From the logs:

[worker_id] [info] RUNPOD_HANDLER.py   :56   2025-05-06 10:04:01,694 Threshold: 0.04\n
[worker_id] [info] Started.
[worker_id] [info] Jobs in progress: 1
[worker_id] [info] Jobs in queue: 1
[worker_id] [info] Finished.
[worker_id] [info] RUNPOD_HANDLER.py   :99   2025-05-06 10:04:00,466 Time for everything: 0.5s\n

[worker_id] [info] RUNPOD_HANDLER.py   :56   2025-05-06 10:04:01,694 Threshold: 0.04\n
[worker_id] [info] Started.
[worker_id] [info] Jobs in progress: 1
[worker_id] [info] Jobs in queue: 1
[worker_id] [info] Finished.
[worker_id] [info] RUNPOD_HANDLER.py   :99   2025-05-06 10:04:00,466 Time for everything: 0.5s\n

The "Time for everything" is pretty accurate with the logging, that part is fine. What i don't understand is why it takes about a second from ending of job 1 to the start of job 2 (from 10:04:00,466 to 10:04:01,694). This does not make sense to me, i want to speed up this part, but i think this is from runpod? Is there something i can do to speed this up?

riverfog7•6mo ago

try timing the efficientnet inference part first then you can know if its your code taking long or runpod's side taking long

larsOP•6mo ago

The inference time of my 2 efficientnets in this code is 0.22 and 0.25 Seconds. In the example from above my code started at 10:03:59,969, so almost exactly 0.5 seconds for everything. I still got a DelayTime from about 1 second between each of my runs.

riverfog7•6mo ago

maybe runpod takes time to register that the requests are finished is the batch size per worker set to 1? i mean cocurrent requests that the worker can handle at a time

larsOP•6mo ago

yes

riverfog7•6mo ago

what happens if it is set to higher than 1

Unknown User•6mo ago

Message Not Public

larsOP•6mo ago

I think the slow part is somewhere between the return and the start of the next job. I don't get the exact times of e.g. "Finished" and "Started", so i can only check the times directly after the job started and directly before the result gets returned. I tested with sleep times between runs. If i send the next request directly after i received the answer i have a delay time of 500-1300ms. If i wait 1 second between each requests(via sleep) my delay time is only 100-150ms. So i guess their handling does something there that uses a lot of time. my idle timeout is way higher than that, so that can't be the answer. If i use run instead of runsync i get the same large delay times, whether i send 2 or 15 requests

riverfog7•6mo ago

@lars does your workers server requests one by one or do they support sort of batching or some cocurrency like processing multiple requests at a given moment

Unknown User•6mo ago

Message Not Public

riverfog7•6mo ago

batching should speed up things a lot

Unknown User•6mo ago

Message Not Public

riverfog7•6mo ago

Not that one It should be faster in total

Unknown User•6mo ago

Message Not Public

larsOP•6mo ago

that does speed up the process if there are multiple concurrent requests pending, thanks. I still don't know why the delaytime is so high if i send one request after the last one finished but it helps, thank you.

Gaming

Programming

DeleayTime beeing really high

Did you find this page helpful?