DeleayTime beeing really high
I am running a serverless worker with cpu only and have a really high delayTime. First boot takes ~ 8 seconds, after that i have around 1 seconds delay time for each request. My executionTime is only 0.1 seconds so my delayTime is 10x my executionTime. When i had a serverless gpu worker my delayTime was way lower than this, is there a fix for that?
Thanks in advance
21 Replies
Unknown User•7mo ago
Message Not Public
Sign In & Join Server To View
if you are using a custom image then pulling the image can take time
The initial loading does take longer, approximately 8 seconds when the worker is turned on. However, the delay time remains around 1 second for subsequent requests while the worker stays on. If i am sending 100 requests the delayTime remains high, so i guess that it is no boot/loading issue. As mentioned, it's CPU-only, so there is no GPU booting or loading involved. I am using a custom image, but the initial loading time is not my concern. Rather, I am concerned about the continuous high delay time after the first run.
Unknown User•7mo ago
Message Not Public
Sign In & Join Server To View
I am running an efficientnet on the pod.
The runpod api gives me 2 times, the response is looking like that:
for the first run: {'delayTime': 9169, 'executionTime': 173, .... "output": ....}
for subsequent runs: {'delayTime': 850, 'executionTime': 160, .... "output": ....}
i timed the execution time of my code, which is a little below the stated executionTime, about what i expect. I restricted the cpu usage on my machine and it was about the same as the executionTime of runpod. I thought that the delayTime is something in the background of runpod that i personally can't influence. But almost a second is way to high, especially if they say in the documentation that it should be very small.
Unknown User•6mo ago
Message Not Public
Sign In & Join Server To View
I Timed my code directly at the start of the handler until before the handler returns its result.
From the logs:
The "Time for everything" is pretty accurate with the logging, that part is fine.
What i don't understand is why it takes about a second from ending of job 1 to the start of job 2 (from 10:04:00,466 to 10:04:01,694). This does not make sense to me, i want to speed up this part, but i think this is from runpod? Is there something i can do to speed this up?
try timing the efficientnet inference part first
then you can know if its your code taking long or runpod's side taking long
The inference time of my 2 efficientnets in this code is 0.22 and 0.25 Seconds. In the example from above my code started at 10:03:59,969, so almost exactly 0.5 seconds for everything. I still got a DelayTime from about 1 second between each of my runs.
maybe runpod takes time to register that the requests are finished
is the batch size per worker set to 1?
i mean cocurrent requests that the worker can handle at a time
yes
what happens if it is set to higher than 1
Unknown User•6mo ago
Message Not Public
Sign In & Join Server To View
I think the slow part is somewhere between the return and the start of the next job. I don't get the exact times of e.g. "Finished" and "Started", so i can only check the times directly after the job started and directly before the result gets returned.
I tested with sleep times between runs. If i send the next request directly after i received the answer i have a delay time of 500-1300ms.
If i wait 1 second between each requests(via sleep) my delay time is only 100-150ms. So i guess their handling does something there that uses a lot of time.
my idle timeout is way higher than that, so that can't be the answer.
If i use run instead of runsync i get the same large delay times, whether i send 2 or 15 requests
@lars does your workers server requests one by one
or do they support sort of batching
or some cocurrency
like processing multiple requests at a given moment
Unknown User•6mo ago
Message Not Public
Sign In & Join Server To View
batching should speed up things a lot
Unknown User•6mo ago
Message Not Public
Sign In & Join Server To View
Not that one
It should be faster in total
Unknown User•6mo ago
Message Not Public
Sign In & Join Server To View
that does speed up the process if there are multiple concurrent requests pending, thanks. I still don't know why the delaytime is so high if i send one request after the last one finished but it helps, thank you.