Unreasonably high start times on serverless workers

I'm trying to deploy a serverless endpoint for A1111 instances using a preconfigured network volume. I've followed the steps shown in this tutorial https://www.youtube.com/watch?v=gv6F9Vnd6io But my workers seem to be running for multiple minutes with the container logs filled with the same message "Service not ready yet. Retrying..." Am I missing something here?
Generative Labs
YouTube
Setting Up a Stable Diffusion API with Control Net using RunPod Ser...
In this step-by-step guide, we'll show you how to leverage the power of RunPod to create your own Stable Diffusion API with ControlNet enabled. Here's what we'll cover in this tutorial: Creating a Network Volume for robust model storage. Installing Stable Diffusion and configuring it on the Network Volume. Developing a Serverless Stable Diffus...
No description
17 Replies
justin
justin6mo ago
do u have a picture of ur template? just wondering There are many reasons actually a network volume can be slow, but the fact it isn't ready yet is indicating to me maybe something else "Service not ready yet. Retrying..." that isn't yet related to network volumes + also share ur logs
Shaggbagg
Shaggbagg6mo ago
Putting in the image is all i've done for setting up the template
No description
Shaggbagg
Shaggbagg6mo ago
No description
Shaggbagg
Shaggbagg6mo ago
Here's what the logs look like
justin
justin6mo ago
Hmm that is very weird I think for now just kill the request if not already def seems… hard to debug maybe staff will know
Shaggbagg
Shaggbagg6mo ago
Also ran the A1111 inside a pod to make sure thats not the problem
No description
justin
justin6mo ago
Maybe can try.. https://github.com/ashleykleynhans/runpod-worker-a1111 Ik this is pretty well documented… tho havent tried myself.
GitHub
GitHub - ashleykleynhans/runpod-worker-a1111: RunPod Serverless Wor...
RunPod Serverless Worker for the Automatic1111 Stable Diffusion API - GitHub - ashleykleynhans/runpod-worker-a1111: RunPod Serverless Worker for the Automatic1111 Stable Diffusion API
justin
justin6mo ago
But either way this is weird prob staff will have better idea
Shaggbagg
Shaggbagg6mo ago
I actually tried that first and had the same problem with high initialization times of around 90s
justin
justin6mo ago
I see did it work tho before? not getting stuck
Shaggbagg
Shaggbagg6mo ago
It did work
justin
justin6mo ago
I see I think ur high initialization times then with ashelyks and (the currently pending unknown) generative labs for some reason is bc of network volumes The main thing is network volumes is a separate hard drive so loading big models off diff hard drives can take a long time So potentially to get faster speed: Build a custom docker file modifying ashelyk’s to just have a folder however it expects and download ur models into there :). Can use a platform like depot to speed up the building: https://discord.com/channels/912829806415085598/1194693049897463848 Building the dockerfile the way i do it is ask chatgpt how to add it by telling it the steps i took manually in a jupyter notebook / terminal
Shaggbagg
Shaggbagg6mo ago
If i'm not using a network volume with a preinstalled A1111 in it, wont my image have to install A1111 and download every needed model on every worker before servicing a request? I was planning on using the --skip-install cmdline argument on a preinstalled A1111 to reduce load times for generations
Jack
Jack6mo ago
Hey @Shaggbagg . I am working on the exact same problem as you. I started off with installing A1111 on a Network Volumes and noticed the cold start time are extremely high, between 60-100 secs. Then @justin recommended to install everything directly on a Docker container, and skipping the Network Volume altogether. I'm currently working on doing that right now, but running into some issues. I sent a friend request, maybe we can help each other since we're working on the same thing.
Shaggbagg
Shaggbagg6mo ago
Can you help me figure out what the cooldown period actually refers to? I assumed it was the time between finishing one request while having none pending and a new one coming in. Looking at these requests and workers, even with a 60s cooldown time, the worker seems to die out before handling a request i send within 5 secs of getting the previous response, which leads me to believe they may be calculating the cooldown differently than what I expect
No description
Shaggbagg
Shaggbagg6mo ago
@justin
justin
justin6mo ago
are you talking about delay time? what is cooldown volume *cool down period Delay time is all the time before execution meaning the time it sat in the queue before it got picked up by a worker execution times are when the worker is actually working on it u aren't being charged for delay time, ur being charged for the time the worker is running + the time the worker is active but maybe not doing anything (which is configurable in the advance setting) + cold start time on the worker Something i do for ex. is every time i get a request, i let the worker stay active for another 2 mins so it can immediately pick. up another request and avoid cold start @Jack / @Shaggbagg HMMMM. Im playing around with it too. Im in the process of trying to see if this dockerfile builds and imma load it up on the GPU Pod and play around with it for debugging sake