Runpod•3mo ago

What are the best practices when working with network volumes and large models

Hi Runpod! we've been using serverless pods for quite a while now. most of our customer serving ran in the background, on demand, which means we could tolerate the long warmup times. However, to meet demands as per our customers we have made several key improvements in our generation times. That being said, our main bottleneck today is the infrastructure itself. We use quite a bit of models to perform the work for our customers, and have tried 3 different paths: 1. Working with images from a private registry that contained the models - was untolerable, the images kept re-downloading layers that were not altered, making it unfeasible to sustain through development unless we seperate only the models. and even then, whenever we need to add a new lora etc -- causes a lot of issues. 2. Downloading the models on boot time - too slow, and unreliable. 3. Using a network drive - this is our current setup. we use network drive to persist all our models. but this comes with a caveat. loading models into the gpu from a network drive is at least 10x slower than it is than container volume, we are talking long minuets instead of short seconds. We need a better solution for this, and want to understand what is the current best practice that we can use to reduce this

4 Replies

salahzsh•3mo ago

Hi! Bumping this since I'm also interested in the answer, ty in advance

Usman Yasin•3mo ago

Same issue for us, the network volumes are simply unusable for large models.

max4c•3mo ago

Hey @J. we were talking about this. Any chance you can weigh in? And then can you communicate to Brendan and Mo how we could update the docs and potentially make a youtube video on best practices?

Unknown User•3mo ago

Message Not Public

Gaming

Programming

What are the best practices when working with network volumes and large models

Did you find this page helpful?