Disk Volume Pricing for Serverless

I'm looking for clarification on disk pricing for serverless workers. On the pricing page a Container Disk price for running pods of $0.10/GB/month (and $0.20/GB/month for idle pods). How does this translate to the serverless workers? When I create a template for my endpoint I specify Volume Disk (e.g 20Gb); how am I being charged for this? 20 * $0.20 *number of workers per month (assuming the workers are idle)? In that case, does it make more sense to use a network volume, shared between the workers of the endpoint? Side question: is there a way to programmatically push data to a network volume? Thanks for your help!
13 Replies
Dj
Dj2w ago
My understanding is you're only being billed for the existence of the Network Storage Disk, not for how you use it. Pods are priced differently, storage is only priced: For Disks <= 1TB: $0.07/GB/Month For Disks > 1TB: $0.05/GB/Month The Create Network Volume page has a nice little calculator, the number there is all you should expect to be charged. https://www.runpod.io/console/user/storage/create PS: Not yet, but keep an eye out for an announcement here about using S3 APIs to access Network Volumes in a few weeks.
neural-soupe
neural-soupeOP2w ago
Thank you for clarifying. So I'm charged for the disk space specified in my Template, regardless of how many workers I have? So a Disk Volume of 20Gb in my Template comes down to 20 * $0.07 = $1.4 regardless of how many workers my endpoint has?
DIRECTcut ▲
DIRECTcut ▲2w ago
Disclaimer: I dont work at runpod, just a random techie, so take with a grain of salt When you create a volume, you get a chunk of disk space in one of the regions. You pay for the space allocated, regardless of how much data there is (it might be full 100%, or empty, does not matter). It also does not matter how many workers are using your disk, as the disk space is mounted into a worker, not duplicated per worker. There might be a dip in speed when multiple workers try to read/write the same file
neural-soupe
neural-soupeOP2w ago
That's the Network Volume, right? There is also the Disk Volume https://docs.runpod.io/pods/storage/types#disk-volume which is specified when creating a template for a serverless endpoint. I think the Disk Volume corresponds to the volumeInGb parameter in the Create Template API endpoint: https://rest.runpod.io/v1/docs#tag/templates/POST/templates, while the Network Volume would be specified in the networkVolumeId parameter of the Create Endpoint API endpoint https://rest.runpod.io/v1/docs#tag/endpoints/POST/endpoints As I understand it the Disk Volume is different from the Network Volume, and I believe the Disk Volume is not shared between workers, so I am wondering how I am charged for Disk Space; once, or for every worker (since every worker will have it's own separate disk space). The main reason I am asking about this is because I am downloading models onto my serverless workers when they get the handler receives its first request, which means that every time a worker gets it's first request I have to wait for it to download the models; working with a Network Volume sounds like a better alternative, but I would need to be able to load the models onto it programmatically. I also wanted to understand the pricing of the two storage options
DIRECTcut ▲
DIRECTcut ▲2w ago
I'm using network volumes to store models. One piece of feedback - loading models from there seems 3X slower than on my local machine (I am using comyui), not sure if it would be different for disk as opposed to network volume Anyways, using disk for storing models would increase cold starts as you have to load 20GB from container cache as opposed to 10 for example if models are on the network volume. Also your containers will take much longer to deploy (for the same reason)
Jason
Jason2w ago
It's charged per minute/gb use granularity if I'm not wrong Container disk is charged like running pods, when your worker is running
neural-soupe
neural-soupeOP7d ago
@DIRECTcut ▲ I got it to work with a network volume, but like you said it takes much longer to load the models onto VRAM. Have you figured out a way to reduce it? I'm going to test baking the models into the Docker image, but am worried about the performance of this when I need +40Gb of models
riverfog7
riverfog77d ago
The physical speed of the nw volume is the bittleneck so you cant reduce it Baking the model in will make the image start time bad if the image is not cached So cold start times will be bad for some time And then improve gradually as more hosts gets the images cached Best way to know for sure is to test it
neural-soupe
neural-soupeOP7d ago
Just tested with baked model, not great either. A dirty hack I'm doing is having the models in a network volume, then copying the models to my disk outside the handler; this adds about 5-10s when the worker is fresh. It's not super elegant, but the model loading times are much faster (13s vs 60s) and my image is not too big 🤷‍♂️ Looking forward to having a cleaner solution than this in the future
riverfog7
riverfog77d ago
That's strange It should take a lot longer than 10 seconds Assuming model size is 40 gigs
neural-soupe
neural-soupeOP7d ago
no the models are around 26Gb, but was planning on working with larger models down the line (40Gb)
riverfog7
riverfog77d ago
How does baked model perform My guess is it takes forever to pull the image and loads the model fast
neural-soupe
neural-soupeOP7d ago
True, seems faster. Will run some more tests later Baked models definitely perform better, getting more consistent response times. But not loving the +30Gb Docker images

Did you find this page helpful?