R
RunPod4w ago
lars

DeleayTime beeing really high

I am running a serverless worker with cpu only and have a really high delayTime. First boot takes ~ 8 seconds, after that i have around 1 seconds delay time for each request. My executionTime is only 0.1 seconds so my delayTime is 10x my executionTime. When i had a serverless gpu worker my delayTime was way lower than this, is there a fix for that? Thanks in advance
6 Replies
Jason
Jason4w ago
what does your delay time consist of? is it the loading or the queue wait if loading, maybe your model is in gpu already its fast because theres fast boot in gpu serverless but in cpu serverless there is no such thing, so it takes longer to boot up and load the model every time a worker turned off and turned on
riverfog7
riverfog74w ago
if you are using a custom image then pulling the image can take time
lars
larsOP2w ago
The initial loading does take longer, approximately 8 seconds when the worker is turned on. However, the delay time remains around 1 second for subsequent requests while the worker stays on. If i am sending 100 requests the delayTime remains high, so i guess that it is no boot/loading issue. As mentioned, it's CPU-only, so there is no GPU booting or loading involved. I am using a custom image, but the initial loading time is not my concern. Rather, I am concerned about the continuous high delay time after the first run.
Jason
Jason2w ago
im not sure what you're running so "high delay time after the first run" consist on what process ( is it inference, loading models, processing? ) in your worker? you should debug, time any process in your worker and see which takes the most time so maybe from there you can take actions or, maybe your workers are busy with that many request, try adding the max worker, and adjust the scaling type/scaling factor in your endpoint
As mentioned, it's CPU-only, so there is no GPU booting or loading involved
correct.
lars
larsOP5d ago
I am running an efficientnet on the pod. The runpod api gives me 2 times, the response is looking like that: for the first run: {'delayTime': 9169, 'executionTime': 173, .... "output": ....} for subsequent runs: {'delayTime': 850, 'executionTime': 160, .... "output": ....} i timed the execution time of my code, which is a little below the stated executionTime, about what i expect. I restricted the cpu usage on my machine and it was about the same as the executionTime of runpod. I thought that the delayTime is something in the background of runpod that i personally can't influence. But almost a second is way to high, especially if they say in the documentation that it should be very small.
Jason
Jason5d ago
I see, it looks kind of normal to me? It's just the queue processing or the other code within runpod I believe or your startup? Is it continous How did you time on your code BTW? Execution time provides the time running job after serverless.start()

Did you find this page helpful?