R
RunPodβ€’6mo ago
Blanchon.jl

Best practices

Hey πŸ‘‹πŸ» I have a few questions regarding Runpod serverless, specifically related to image generation tasks (like Stable Diffusion). 1. Storage - NVME vs Network Volume: I've read in some posts that storing models directly in the docker container is more cost and speed-efficient compared to using a network volume. For tasks involving various Stable Diffusion templates, does this mean all templates must be stored within the Docker container? 2. Instance Warm-Up: Is there a way to pre-warm an instance based on a custom event from my side? For instance, if a user logs into my platform, and there's a high likelihood they'll initiate a computation, can I pre-start a worker for them? What would be the best approach to do this? Maybe a dummy call to the handler to activate a worker without triggering any calculations? 3. Intermediate/Real-Time Results: What's the best method for sending interim results to the client in image generation tasks? For language models, this is typically done using a 'yield' generator, but I'm unsure if this applies to image tasks like real-time Stable Diffusion. Is there something like runpod.serverless.progress_update for this purpose? How would this function on the client side? 4. Worker Consistency: I plan to use a single template for about 20 different jobs that share a common base but load different models into VRAM. If a client uses a real-time Stable Diffusion task and a worker loads the necessary models into VRAM for that task, can the subsequent use of this task be linked to the same worker? For instance, if a client makes a request, the worker processes it and then a second request follows shortly after. Is there a way to ensure this second request goes to the same worker to avoid reloading models into VRAM? Thanks a lot for your help. Any level of detail in your response is appreciated ❀️
13 Replies
Blanchon.jl
Blanchon.jlβ€’6mo ago
@flash-singh and @ashleyk: I'm pinging you because I've see many answer from you on the same subject πŸ˜‰ ashleykleynhans, in your great ashleykleynhans/runpod-worker-inswapper repo you provide both Network volume and standalone. Which one is the best according to your experience ?
flash-singh
flash-singhβ€’6mo ago
1. if you can store data in docker image rather than network volume, it is more efficient 2. you can make a no-op event request, it can complete immediately and then use idle time to have the worker stay warm for x seconds; we may introduce a feature that will allow you to do that without any additional code 3. yes progress update is better; its a webhook you can call, a functional call in your handler; yield is used with streaming 4. this is bit complicated and we are looking at ways to solve it; right now what will happen, worker is launch up, take the request, load x model into vram and finish request, pickup another request and if model is x then it can avoid loading into vram; if model is y then it will have to reload the new model, this isn't most efficient and we haven't quite found a good way to handle the case 4. in future we are thinking about some way for worker to communicate i can only handle x workload, try to give me that; worst case i will wait idle and spin down; best case i will avoid reloading model into vram
ashleyk
ashleykβ€’6mo ago
Standalone is better, you don't need a network volume for it.
Blanchon.jl
Blanchon.jlβ€’6mo ago
Yes 4 may be a pretty common workflow in the future and as we (users) don't have any control over the routing this might be quite bloking Thanks you very much πŸ™ŒπŸ» 4 is even more important when you have something like a chain of call with the same models (like an SDXL Turbo drawing app) and that you might want to be as close as realtime as possible Maybe a in house session management into the handler script could solve some problem but this look like overkill
justin
justinβ€’6mo ago
For #4, I think as you said the in-house management is prob the easiest tbh. I'd just spin up a proxy server on like fly.io or somewhere as an endpoint, to hit dedicated endpoints for that model that is already built into the Docker container. The problem with runpod I think is that unless you work with runpod directly, you can't exceed 30 workers. The only thing I can think of is to make extra accounts. Baking into the Docker image is always going to be faster than reading off another drive. So I just find staying away from network volumes is the best bet.
Blanchon.jl
Blanchon.jlβ€’6mo ago
Howw I wasn't awaire of the runpod workers limitation, I might exceed this very quickly. I hope this is easy to "work with runpod" anyway.
justin
justinβ€’6mo ago
As long as you have okay usage, I've seen runpod increase the limit for a lot of people on this server just let a staff know
Blanchon.jl
Blanchon.jlβ€’6mo ago
Yes, I think I will go with baking the model into the image
justin
justinβ€’6mo ago
Usually you have to load your account with 300 dollars, to get the increase serverless worker to 30 endpoints 200 for 20 endpoints But past that is just reaching out to staff but ive seen a lot of ppl screenshot like 50 workers lol on a single endpoint
justin
justinβ€’6mo ago
No description
justin
justinβ€’6mo ago
this invisible button with an up icon and not a secondary button color >.<, but if your use case has higher traffic prob just better dm a staff / post a request
Blanchon.jl
Blanchon.jlβ€’6mo ago
Yep I'm making an image generation SAAS if it's go well (let's hope i will go well ^^), I might exceed that Thanks you very much, I appreciate having some user review
ashleyk
ashleykβ€’5mo ago
Enterprise customers who spend a large amount on RunPod can get more than 30 workers. Its not a limitation, its just a limitation as to what you can increase it to yourself. RunPod helps enterprise customers by increasing it for them,