Too big delay time. How can I reduce it?
It is important for me that my request is executed in 1 second or less. But the request is executed in about 1.5 seconds.
I have optimized the docker image a lot, its weight is now 300 MB. And if the cold start time is ok for me (200 ms), then the delay time is not (
How can it be optimized?
Right now I am in demo mode, trying to understand if the platform will allow me to get a response in 1 second.
7 Replies
The very first request will probably take a bit longer due to the cold start, but subsequent requests should be pretty fast. If you have constant traffic, the delay should be minimal.
Also, have you include your model in the Docker image? avoid downloading anything when the container starts, which can slow things down.
300 MB is the entire size of the container, nothing is loaded on top, it is immediately ready to execute the command.
My cold start time is 250-350 ms, this is ok...
The execution time, for example, is 2 seconds. I will work on this. its ok, too..
But the "delay time" ~ 1200 ms, which I would like to get rid of.. Are there any options to reduce it? Or is there no other way on your infrastructure?
Can you share the endpoint id, I can take a look



I'm not strong in python. Could it be the start time of the http server inside the container? Does the time that python itself starts affect any indicators?
I tried to pack all python with pyinstaller. It helped to reduce the size of my docker image even more, but it didn't seem to have any effect on delay time.
After you send a request, it goes into a queue, and then the worker takes a bit of time to wake up and pick it up. From what I can see, most requests are picked up from the queue in between 300ms - 1 second.
If you set the active worker to 1 and run a few tests, it’d be good to see how it performs in that setup.