R
Runpod7mo ago
lars

DeleayTime beeing really high

I am running a serverless worker with cpu only and have a really high delayTime. First boot takes ~ 8 seconds, after that i have around 1 seconds delay time for each request. My executionTime is only 0.1 seconds so my delayTime is 10x my executionTime. When i had a serverless gpu worker my delayTime was way lower than this, is there a fix for that? Thanks in advance
21 Replies
Unknown User
Unknown User7mo ago
Message Not Public
Sign In & Join Server To View
riverfog7
riverfog77mo ago
if you are using a custom image then pulling the image can take time
lars
larsOP7mo ago
The initial loading does take longer, approximately 8 seconds when the worker is turned on. However, the delay time remains around 1 second for subsequent requests while the worker stays on. If i am sending 100 requests the delayTime remains high, so i guess that it is no boot/loading issue. As mentioned, it's CPU-only, so there is no GPU booting or loading involved. I am using a custom image, but the initial loading time is not my concern. Rather, I am concerned about the continuous high delay time after the first run.
Unknown User
Unknown User7mo ago
Message Not Public
Sign In & Join Server To View
lars
larsOP6mo ago
I am running an efficientnet on the pod. The runpod api gives me 2 times, the response is looking like that: for the first run: {'delayTime': 9169, 'executionTime': 173, .... "output": ....} for subsequent runs: {'delayTime': 850, 'executionTime': 160, .... "output": ....} i timed the execution time of my code, which is a little below the stated executionTime, about what i expect. I restricted the cpu usage on my machine and it was about the same as the executionTime of runpod. I thought that the delayTime is something in the background of runpod that i personally can't influence. But almost a second is way to high, especially if they say in the documentation that it should be very small.
Unknown User
Unknown User6mo ago
Message Not Public
Sign In & Join Server To View
lars
larsOP6mo ago
I Timed my code directly at the start of the handler until before the handler returns its result. From the logs:
[worker_id] [info] RUNPOD_HANDLER.py :56 2025-05-06 10:04:01,694 Threshold: 0.04\n
[worker_id] [info] Started.
[worker_id] [info] Jobs in progress: 1
[worker_id] [info] Jobs in queue: 1
[worker_id] [info] Finished.
[worker_id] [info] RUNPOD_HANDLER.py :99 2025-05-06 10:04:00,466 Time for everything: 0.5s\n
[worker_id] [info] RUNPOD_HANDLER.py :56 2025-05-06 10:04:01,694 Threshold: 0.04\n
[worker_id] [info] Started.
[worker_id] [info] Jobs in progress: 1
[worker_id] [info] Jobs in queue: 1
[worker_id] [info] Finished.
[worker_id] [info] RUNPOD_HANDLER.py :99 2025-05-06 10:04:00,466 Time for everything: 0.5s\n
The "Time for everything" is pretty accurate with the logging, that part is fine. What i don't understand is why it takes about a second from ending of job 1 to the start of job 2 (from 10:04:00,466 to 10:04:01,694). This does not make sense to me, i want to speed up this part, but i think this is from runpod? Is there something i can do to speed this up?
riverfog7
riverfog76mo ago
try timing the efficientnet inference part first then you can know if its your code taking long or runpod's side taking long
lars
larsOP6mo ago
The inference time of my 2 efficientnets in this code is 0.22 and 0.25 Seconds. In the example from above my code started at 10:03:59,969, so almost exactly 0.5 seconds for everything. I still got a DelayTime from about 1 second between each of my runs.
riverfog7
riverfog76mo ago
maybe runpod takes time to register that the requests are finished is the batch size per worker set to 1? i mean cocurrent requests that the worker can handle at a time
lars
larsOP6mo ago
yes
riverfog7
riverfog76mo ago
what happens if it is set to higher than 1
Unknown User
Unknown User6mo ago
Message Not Public
Sign In & Join Server To View
lars
larsOP6mo ago
I think the slow part is somewhere between the return and the start of the next job. I don't get the exact times of e.g. "Finished" and "Started", so i can only check the times directly after the job started and directly before the result gets returned. I tested with sleep times between runs. If i send the next request directly after i received the answer i have a delay time of 500-1300ms. If i wait 1 second between each requests(via sleep) my delay time is only 100-150ms. So i guess their handling does something there that uses a lot of time. my idle timeout is way higher than that, so that can't be the answer. If i use run instead of runsync i get the same large delay times, whether i send 2 or 15 requests
riverfog7
riverfog76mo ago
@lars does your workers server requests one by one or do they support sort of batching or some cocurrency like processing multiple requests at a given moment
Unknown User
Unknown User6mo ago
Message Not Public
Sign In & Join Server To View
riverfog7
riverfog76mo ago
batching should speed up things a lot
Unknown User
Unknown User6mo ago
Message Not Public
Sign In & Join Server To View
riverfog7
riverfog76mo ago
Not that one It should be faster in total
Unknown User
Unknown User6mo ago
Message Not Public
Sign In & Join Server To View
lars
larsOP6mo ago
that does speed up the process if there are multiple concurrent requests pending, thanks. I still don't know why the delaytime is so high if i send one request after the last one finished but it helps, thank you.

Did you find this page helpful?