[RUNPOD] Minimize Worker Load Time (Serverless)

Hey fellow developers,

I'm currently facing a challenge with worker load time in my setup. I'm using a network volume for models, which is working well. However, I'm struggling with Dockerfile re-installing Python dependencies, taking around 70 seconds.

API request handling is smooth, clocking in at 15 seconds, but if the worker goes inactive, the 70-second wait for the next request is a bottleneck. Any suggestions on optimizing this process? Can I use a network volume for Python dependencies like I do for models, or are there any creative solutions out there? Sadly, no budget for an active worker.

Thanks for your insights!
Screenshot_2024-01-09_at_7.17.07_PM.png
Solution
Initializing models over a network volume can inherently be slow bc ur booting from a different harddrive. If u can is easier to bake into the docker image as ashelyk said.

Ur other option is increase idle times after a worker is active that way ur first request is initialized the model into vram and subsequent requests are easy to pick up for the worker
Was this page helpful?