R
RunPod•5mo ago
foxhound

[RUNPOD] Minimize Worker Load Time (Serverless)

Hey fellow developers, I'm currently facing a challenge with worker load time in my setup. I'm using a network volume for models, which is working well. However, I'm struggling with Dockerfile re-installing Python dependencies, taking around 70 seconds. API request handling is smooth, clocking in at 15 seconds, but if the worker goes inactive, the 70-second wait for the next request is a bottleneck. Any suggestions on optimizing this process? Can I use a network volume for Python dependencies like I do for models, or are there any creative solutions out there? Sadly, no budget for an active worker. Thanks for your insights!
No description
Solution:
Initializing models over a network volume can inherently be slow bc ur booting from a different harddrive. If u can is easier to bake into the docker image as ashelyk said. Ur other option is increase idle times after a worker is active that way ur first request is initialized the model into vram and subsequent requests are easy to pick up for the worker...
Jump to solution
25 Replies
ashleyk
ashleyk•5mo ago
Your Dockerfile is installing the Python dependencies directly into the docker image/container so there is no need to install them on the network volume, and they most certainly are not the bottleneck. Dockerfile dependency installation only happens when you build the image, it does not install them when your worker loads. You probably want to bake your model into the docker image, loading models from network storage is extremely slow, especially for large models.
foxhound
foxhound•5mo ago
you're right, then i guess the issue lies in the initial loading of models into VRAM before preprocessing, disabling models offloading helps when its on. Otherwise, everthing gets reinitialised.
Solution
justin
justin•5mo ago
Initializing models over a network volume can inherently be slow bc ur booting from a different harddrive. If u can is easier to bake into the docker image as ashelyk said. Ur other option is increase idle times after a worker is active that way ur first request is initialized the model into vram and subsequent requests are easy to pick up for the worker
Jack
Jack•5mo ago
I'm facing a similar issue running A1111 on Serverless Endpoints, it takes about ~60-70 seconds to start up to perform a 3 second generation task. Is it possible to bake a customized A1111 instance onto a Docker image and having the serverless endpoint loading that docker image directly, skipping the process of having the endpoint load from a Network Volume containing the A1111 instance?
justin
justin•5mo ago
Yes Your docker file is a self-container snapshot so you can bake whatever models you want into it
justin
justin•5mo ago
GitHub
GitHub - justinwlin/AudioCraft-Runpod-Serverless-and-GPU-Pod-NOT-A-...
AudioCraft public example runpod. Contribute to justinwlin/AudioCraft-Runpod-Serverless-and-GPU-Pod-NOT-A-RUNNING-EXAMPLE development by creating an account on GitHub.
justin
justin•5mo ago
An ex. of me prebaking in an audiocraft ML model into my docker, which loads a preload model script, which essentially executes a function to go and call the model from wherever.
justin
justin•5mo ago
https://github.com/justinwlin/runpodWhisperx/blob/master/Dockerfile WhisperX model, where I call a preload function u can actually see in the repo.
GitHub
runpodWhisperx/Dockerfile at master · justinwlin/runpodWhisperx
Runpod WhisperX Docker Container Repo. Contribute to justinwlin/runpodWhisperx development by creating an account on GitHub.
ashleyk
ashleyk•5mo ago
There is no way around this. A1111 takes very long to start up, you will have to use diffusers instead of the bloated monstrosity that is A1111. @Jack also ensure that you have flash boot enabled.
Jack
Jack•5mo ago
Wait so is it possible loading A1111 onto a Docker image, and skip using a Network Volume like @justin mentioned? I'm not too familiar with Docker but it seems like there are some github repos offering a Docker container for A1111 like this one - https://github.com/AbdBarho/stable-diffusion-webui-docker
GitHub
GitHub - AbdBarho/stable-diffusion-webui-docker: Easy Docker setup ...
Easy Docker setup for Stable Diffusion with user-friendly UI - GitHub - AbdBarho/stable-diffusion-webui-docker: Easy Docker setup for Stable Diffusion with user-friendly UI
ashleyk
ashleyk•5mo ago
@Jack if you want to do this you are on your own or you can pay someone a consulting fee to do it for you. RunPod simply provide the infrastructure to use, they can't hold your hand every step of the way and do everything for you.
justin
justin•5mo ago
@Jack The short answer is yes 🙂 However you loaded it into the A1111 on a Network volume is how you can do the same with Docker. All Docker is a text file to tell it how to build a snapshot. So you can say: 1) Take this image, and put it into the snapshot under this folder. I wouldn't recommend to use /stable-diffusion-webui-docker b/c you do need it to be based off of the runpod template - will save you a lot of trouble, like my templates / answers above. I recommend to start with Runpod Pytorch Template on GPU Pod, and see if you can go through manual installation steps, such as installing and so on. Definitely, I am not familiar with docker when I started, but asking chatgpt and Phdind, is great. https://phind.com/ I would maybe start off looking at my audiocraft repository, start with a pytorch from base, and just practice COPYING over a test.py file, into a folder, and then pushing it to dockerhub and using it. https://github.com/justinwlin/FooocusRunpod This is an ex. of me doing it for Fooocus
GitHub
GitHub - justinwlin/FooocusRunpod
Contribute to justinwlin/FooocusRunpod development by creating an account on GitHub.
justin
justin•5mo ago
That way you have low iteration time, then you can increase your complexity
Jack
Jack•5mo ago
Super helpful comment, thanks justin. You're right about not using the pre-made stable-diffusion-webui-docker, I was running into some trouble running it. I'm going to try the approach you mentioned and start with a basic PyTorch template and go from there, see if I can build this dockerfile one line at a time
justin
justin•5mo ago
https://discord.com/channels/912829806415085598/1194695853026328626/1194695853026328626 Here is a resource I just wrote on it on getting started / aggregating the different topics on it.
Jack
Jack•5mo ago
Thanks @justin for that resource. Especially thanks for mentioning depot. It's honestly a life saver for me as I'm using a Macbook for development as well, and dealing with Docker locally is a nightmare. I found an official runpod worker for A1111 by Runpod, but it's not actively maintained and getting issues. Either way it's a great starting point for using A1111 on a worker without needing Network Volumes https://github.com/runpod-workers/worker-a1111
GitHub
GitHub - runpod-workers/worker-a1111: Automatic1111 serverless worker.
Automatic1111 serverless worker. . Contribute to runpod-workers/worker-a1111 development by creating an account on GitHub.
justin
justin•5mo ago
@Jack I have never used it myself but another community member wrote a pretty extensive one here: https://github.com/ashleykleynhans/runpod-worker-a1111
GitHub
GitHub - ashleykleynhans/runpod-worker-a1111: RunPod Serverless Wor...
RunPod Serverless Worker for the Automatic1111 Stable Diffusion API - GitHub - ashleykleynhans/runpod-worker-a1111: RunPod Serverless Worker for the Automatic1111 Stable Diffusion API
justin
justin•5mo ago
He even documented shit much better than a1111y (which i personally found none of lol) Glad you enjoy depot haha, i do too. im on mac, so also a nightmare. also sometimes sooooooo slow 😦
justin
justin•5mo ago
Generative labs is good too: https://www.youtube.com/@generativelabs
justin
justin•5mo ago
I guess the issue with both approach is that they both load from a network volume - which can be slow but maybe that is just necessary for a1111y maybe due to the amount of models / sizes of models
justin
justin•5mo ago
Or if there nothign too crazy ur doing: https://docs.runpod.io/reference/stable-diffusion-xl Runpod does own their own endpints u can use instead of launching ur own
RunPod
Stable Diffusion XL
A text-to-image model from StabilityAI
Jack
Jack•5mo ago
Yeah 8GB RAM on Mac using Docker is 💀 💀 Yeah using Network Volume just isn't the way to go for A1111. The cold start time is 60-70 secs for a 3 second photo generation which is too long. Loading everything onto the Docker image is 100% the way to go
justin
justin•5mo ago
(i dont use a1111y btw) lol, im a fooocus person xD, so i have zero clue anything about it. and i dont believe fooocus has an api ooof
Jack
Jack•5mo ago
I saw that but they use diffusers. I need to load some extensions for A1111 for my app, so I need to use A1111 as a template for my app
justin
justin•5mo ago
Ah makes sense Hmm, gl. yeah. if you do end up staying bottle necked on the runpod worker thing, maybe check out ashelyks thing, or as the above message said, start with the pytorch runpod, maybe refer to see if they did anything special for a1111y, and just go line by line a lot of my first times was just asking chatgpt, combine these two docker images xD (with varying success loll) gl