R
Runpod4w ago
kaz133

Too big delay time. How can I reduce it?

It is important for me that my request is executed in 1 second or less. But the request is executed in about 1.5 seconds. I have optimized the docker image a lot, its weight is now 300 MB. And if the cold start time is ok for me (200 ms), then the delay time is not ( How can it be optimized? Right now I am in demo mode, trying to understand if the platform will allow me to get a response in 1 second.
7 Replies
yhlong00000
yhlong000004w ago
The very first request will probably take a bit longer due to the cold start, but subsequent requests should be pretty fast. If you have constant traffic, the delay should be minimal. Also, have you include your model in the Docker image? avoid downloading anything when the container starts, which can slow things down.
kaz133
kaz133OP4w ago
300 MB is the entire size of the container, nothing is loaded on top, it is immediately ready to execute the command. My cold start time is 250-350 ms, this is ok... The execution time, for example, is 2 seconds. I will work on this. its ok, too.. But the "delay time" ~ 1200 ms, which I would like to get rid of.. Are there any options to reduce it? Or is there no other way on your infrastructure?
yhlong00000
yhlong000003w ago
Can you share the endpoint id, I can take a look
kaz133
kaz133OP3w ago
No description
No description
No description
kaz133
kaz133OP3w ago
I'm not strong in python. Could it be the start time of the http server inside the container? Does the time that python itself starts affect any indicators? I tried to pack all python with pyinstaller. It helped to reduce the size of my docker image even more, but it didn't seem to have any effect on delay time.
yhlong00000
yhlong000003w ago
After you send a request, it goes into a queue, and then the worker takes a bit of time to wake up and pick it up. From what I can see, most requests are picked up from the queue in between 300ms - 1 second. If you set the active worker to 1 and run a few tests, it’d be good to see how it performs in that setup.

Did you find this page helpful?