Runpod•2y ago

[RUNPOD] Minimize Worker Load Time (Serverless)

Hey fellow developers, I'm currently facing a challenge with worker load time in my setup. I'm using a network volume for models, which is working well. However, I'm struggling with Dockerfile re-installing Python dependencies, taking around 70 seconds. API request handling is smooth, clocking in at 15 seconds, but if the worker goes inactive, the 70-second wait for the next request is a bottleneck. Any suggestions on optimizing this process? Can I use a network volume for Python dependencies like I do for models, or are there any creative solutions out there? Sadly, no budget for an active worker. Thanks for your insights!

Solution:

Initializing models over a network volume can inherently be slow bc ur booting from a different harddrive. If u can is easier to bake into the docker image as ashelyk said. Ur other option is increase idle times after a worker is active that way ur first request is initialized the model into vram and subsequent requests are easy to pick up for the worker...

Jump to solution

25 Replies

ashleyk•2y ago

Your Dockerfile is installing the Python dependencies directly into the docker image/container so there is no need to install them on the network volume, and they most certainly are not the bottleneck. Dockerfile dependency installation only happens when you build the image, it does not install them when your worker loads. You probably want to bake your model into the docker image, loading models from network storage is extremely slow, especially for large models.

foxhoundOP•2y ago

you're right, then i guess the issue lies in the initial loading of models into VRAM before preprocessing, disabling models offloading helps when its on. Otherwise, everthing gets reinitialised.

Solution

J.•2y ago

Jack•2y ago

I'm facing a similar issue running A1111 on Serverless Endpoints, it takes about ~60-70 seconds to start up to perform a 3 second generation task. Is it possible to bake a customized A1111 instance onto a Docker image and having the serverless endpoint loading that docker image directly, skipping the process of having the endpoint load from a Network Volume containing the A1111 instance?

J.•2y ago

Yes Your docker file is a self-container snapshot so you can bake whatever models you want into it

J.•2y ago

https://github.com/justinwlin/AudioCraft-Runpod-Serverless-and-GPU-Pod-NOT-A-RUNNING-EXAMPLE

GitHub

GitHub - justinwlin/AudioCraft-Runpod-Serverless-and-GPU-Pod-NOT-A-...

AudioCraft public example runpod. Contribute to justinwlin/AudioCraft-Runpod-Serverless-and-GPU-Pod-NOT-A-RUNNING-EXAMPLE development by creating an account on GitHub.

J.•2y ago

An ex. of me prebaking in an audiocraft ML model into my docker, which loads a preload model script, which essentially executes a function to go and call the model from wherever.

J.•2y ago

https://github.com/justinwlin/runpodWhisperx/blob/master/Dockerfile WhisperX model, where I call a preload function u can actually see in the repo.

GitHub

runpodWhisperx/Dockerfile at master · justinwlin/runpodWhisperx

Runpod WhisperX Docker Container Repo. Contribute to justinwlin/runpodWhisperx development by creating an account on GitHub.

ashleyk•2y ago

There is no way around this. A1111 takes very long to start up, you will have to use diffusers instead of the bloated monstrosity that is A1111. @Jack also ensure that you have flash boot enabled.

Jack•2y ago

Wait so is it possible loading A1111 onto a Docker image, and skip using a Network Volume like @justin mentioned? I'm not too familiar with Docker but it seems like there are some github repos offering a Docker container for A1111 like this one - https://github.com/AbdBarho/stable-diffusion-webui-docker

GitHub

GitHub - AbdBarho/stable-diffusion-webui-docker: Easy Docker setup ...

Easy Docker setup for Stable Diffusion with user-friendly UI - GitHub - AbdBarho/stable-diffusion-webui-docker: Easy Docker setup for Stable Diffusion with user-friendly UI

ashleyk•2y ago

@Jack if you want to do this you are on your own or you can pay someone a consulting fee to do it for you. RunPod simply provide the infrastructure to use, they can't hold your hand every step of the way and do everything for you.

J.•2y ago

@Jack The short answer is yes 🙂 However you loaded it into the A1111 on a Network volume is how you can do the same with Docker. All Docker is a text file to tell it how to build a snapshot. So you can say: 1) Take this image, and put it into the snapshot under this folder. I wouldn't recommend to use /stable-diffusion-webui-docker b/c you do need it to be based off of the runpod template - will save you a lot of trouble, like my templates / answers above. I recommend to start with Runpod Pytorch Template on GPU Pod, and see if you can go through manual installation steps, such as installing and so on. Definitely, I am not familiar with docker when I started, but asking chatgpt and Phdind, is great. https://phind.com/ I would maybe start off looking at my audiocraft repository, start with a pytorch from base, and just practice COPYING over a test.py file, into a folder, and then pushing it to dockerhub and using it. https://github.com/justinwlin/FooocusRunpod This is an ex. of me doing it for Fooocus

GitHub

GitHub - justinwlin/FooocusRunpod

Contribute to justinwlin/FooocusRunpod development by creating an account on GitHub.

J.•2y ago

That way you have low iteration time, then you can increase your complexity

Jack•2y ago

Super helpful comment, thanks justin. You're right about not using the pre-made stable-diffusion-webui-docker, I was running into some trouble running it. I'm going to try the approach you mentioned and start with a basic PyTorch template and go from there, see if I can build this dockerfile one line at a time

J.•2y ago

https://discord.com/channels/912829806415085598/1194695853026328626/1194695853026328626 Here is a resource I just wrote on it on getting started / aggregating the different topics on it.

Jack•2y ago

Thanks @justin for that resource. Especially thanks for mentioning depot. It's honestly a life saver for me as I'm using a Macbook for development as well, and dealing with Docker locally is a nightmare. I found an official runpod worker for A1111 by Runpod, but it's not actively maintained and getting issues. Either way it's a great starting point for using A1111 on a worker without needing Network Volumes https://github.com/runpod-workers/worker-a1111

GitHub

GitHub - runpod-workers/worker-a1111: Automatic1111 serverless worker.

Automatic1111 serverless worker. . Contribute to runpod-workers/worker-a1111 development by creating an account on GitHub.

J.•2y ago

@Jack I have never used it myself but another community member wrote a pretty extensive one here: https://github.com/ashleykleynhans/runpod-worker-a1111

GitHub

GitHub - ashleykleynhans/runpod-worker-a1111: RunPod Serverless Wor...

RunPod Serverless Worker for the Automatic1111 Stable Diffusion API - GitHub - ashleykleynhans/runpod-worker-a1111: RunPod Serverless Worker for the Automatic1111 Stable Diffusion API

J.•2y ago

He even documented shit much better than a1111y (which i personally found none of lol) Glad you enjoy depot haha, i do too. im on mac, so also a nightmare. also sometimes sooooooo slow 😦

J.•2y ago

Generative labs is good too: https://www.youtube.com/@generativelabs

YouTube

Generative Labs

J.•2y ago

I guess the issue with both approach is that they both load from a network volume - which can be slow but maybe that is just necessary for a1111y maybe due to the amount of models / sizes of models

J.•2y ago

Or if there nothign too crazy ur doing: https://docs.runpod.io/reference/stable-diffusion-xl Runpod does own their own endpints u can use instead of launching ur own

RunPod

Stable Diffusion XL

A text-to-image model from StabilityAI

Jack•2y ago

Yeah 8GB RAM on Mac using Docker is 💀 💀 Yeah using Network Volume just isn't the way to go for A1111. The cold start time is 60-70 secs for a 3 second photo generation which is too long. Loading everything onto the Docker image is 100% the way to go

J.•2y ago

(i dont use a1111y btw) lol, im a fooocus person xD, so i have zero clue anything about it. and i dont believe fooocus has an api ooof

Jack•2y ago

I saw that but they use diffusers. I need to load some extensions for A1111 for my app, so I need to use A1111 as a template for my app

J.•2y ago

Ah makes sense Hmm, gl. yeah. if you do end up staying bottle necked on the runpod worker thing, maybe check out ashelyks thing, or as the above message said, start with the pytorch runpod, maybe refer to see if they did anything special for a1111y, and just go line by line a lot of my first times was just asking chatgpt, combine these two docker images xD (with varying success loll) gl

Gaming

Programming

[RUNPOD] Minimize Worker Load Time (Serverless)

Did you find this page helpful?