Recommended DC and Container Size Limits/Costs
Hello, I’m new to deploying web apps and currently using a persistent network drive along with serverless containers to generate images. My app requires at least 24GB of RAM, and I’ve encountered some challenges in my current region (EU-RO-1): there aren’t many A100 or H100 GPUs available, and most of the 4090 GPUs are throttled.
Recommended Data Centers: Are there specific geographic data centers you’d recommend for better GPU availability and performance?
Performance and Costs: Since my usage isn’t constant, the containers often ‘wake up’ from idle or after being used by someone else. When this happens, the models (ComfyUI) have to load, leading to generation times ranging from 20 seconds to 3-4 minutes. I assume this delay occurs because the models are loading from a network-mounted drive rather than locally.
If I preload the models onto the containers to avoid this transfer, will it increase my container costs?
Where can I find information about container size limits and associated pricing?
Additional Resources: Could you recommend sources to learn more about best practices, cost optimization, and efficient use of serverless containers for workloads like mine?
30 Replies
You can check when you create a network storage, which gpu is available in which dc
Yeah, It seems like H100s are not avilable generally they come and go, i will stick to EU-RO-1
I apprecaite you taking the time to respond! 🔥🤘🏆
It may or may not your container cost, if it's running ( even for moving files) it will be charged.. But worth trying if your worker is active. Because if your worker isn't always running, when it goes back to idle ( off) the files will be lost and you have to move it again
What container size limits? I don't think there is any, but if you have a larger docker image( don't know how much till it's slow enough) surely it will take longer to load
Oh or you mean by having the model inside image? Yeah sure people say it's faster if your models aren't too much and huge
Yeah, I am thinking its worth upploading my models (Flux1 and Shuttle3.1) once to the image and let them load once when they innitialize. this way when they get a job, the loading is quick as i can see the large delay of 2 minutes is when comfyui loads the model , it's currently on the mounted drive and i think even though its the same DC, its just too slow to move 23GB on the network.
Yeah so that is exactly what i am not sure about - is there a limit? or a cost to the size of the containers? i could not find anywhere to read about that.
You got me exactly now. that's what i meant 👍
Any experience with that?
i don't mind having 100GB image, i only upload it once and let them deploy it.
Don't know if Runpod do have any guide for best practices but try checking the blog, maybe use depot.dev (optional) to optimize your image, watch some YouTube videos for container building, some blogs to optimize docker image, and finally resource from runpod docs to load model properly if you control the code( in this case you want to launch comfyui before you call the serverless.Start() function in your handler python file)
Yeah try it if it's slow enough then maybe don't use that or try to split into different images (maybe a bit difficult)
I do have some experience with having model inside the image but it's not that big.. It works well
Sure, I will try and report back 🙂 and it will be the first time I am useful to others on a discord channel haha 🙂
yeah people already complaining that sometimes its 20 sceonds and sometimes 4 minutes 🙂
Yup I'd like to hear that
So i need to ensure it is faster. also do you know what does the 'always active' option is on serverless? is that like a pod? always on, always charging?
seems good in terms of performance but might not be smart to do at the start as my demand is still very low
not many people on the app yet.
I also actually struggle with this last time, long time loading with comfyui
It was not only the model but because of the extensions too but I halted my development.. So I didn't look any further yet
Yes, it's always running, so the model you currently loaded will still be there
Yup agreed
i think since the datacenter is most likely an outsourced one, we can't trust it's a real LAN.. 1GB speed etc.. so even though its local, there is 2-3 min to transfer the file each time, seems like overkill
YEah, so this will be great for later, this way they will be reserved for me too i can grab all the H100 i need over time
I'd suggest if you have multiple models, make sure its not unloaded to make it fast, you can use higher vram gpu so it accommodates them well
I've never tried this but seems to be a great idea to not de load model everytime
The flashboot on serverless actually seems to reduce cold start by keeping the model warm in gpu, so it loads faster so that's why something like this can be used
What do you mean?
i only use 1 model each time. and total only have 2. the issue is even if i dont unload it, the container when it moves to idle its still there but after 2 minutes or so, the containers refresh or something.. not sure how it works exactly but i noticed if i queue loads of jobs it works faster as it does not unload but if i wait 5 minutes between each request, the container worker 'forgets' and needs to reload
am i missing something?
Network storage on a different data center you're implying?
well, logically it is but it could be a VLAN.. u never know with these things.. i am originally a TCP networks and routing engineer, been around many datacenters..
Yeah sometimes if you have a lot of requests the "flashboot" keeps your model loaded
exactlly. i wonder if the flashboot reset time can be adjusted, i recall seeing something
So when it "refreshes" it deloads
Cannot hahah, but the more requests you have ( I'm guessing) it'll keep it way longer
Or the more time your worker is sctive
yes, can not set that.. unless i hack into their core system and find that parameter hahah
That execution timeout limits the time "running" for a job
yeah thats good since a generation can get stuck and charge me an arm and a leg: )
i am still new to this, i aim to learn every tiny bit of this - it's the only way to empower and use the most out of it. serverless is brilliant! allows a business to grow 🙂 normal 4090 on the cloud is like 2k monthly.
Oh that's cool, yeah I've never looked into their core systems but I believe it's in the same datacenter just has a low throughput to be able to handle the massive demand
Hope you like runpod hahah
I LOVE runpod.
and i am actually doing a project with one of their competitors but they dont have serverless and runpod's sdk is easy simple and they are ahead of the game.
Tested it with a heavier container when all models are inside it- sucha huge difference, loads super quick!!!
Nicee
So how big did it end up
90GB
Not the end of the world
Generating 4x1024 on flux model in seconds! Using H/A100 or even 4090s
Send you to check it out in private (don’t want everyone here running my credit hahah)
Out of curiosity, downloading the models is part of the Dockerfile configuration? Didnt' building and pushing the image take forever?
Not really if you have great internet connection, and how big is your model
I don't think my internet connection is what matter. I'm concerned about the internet connection of the workers.
If every time a worker starts, it has to download a 90GB image, wouldn't that take a long time?
Let's it first then, see if it takes too long then get back here again