R
Runpod3w ago
Omer

What are the best practices when working with network volumes and large models

Hi Runpod! we've been using serverless pods for quite a while now. most of our customer serving ran in the background, on demand, which means we could tolerate the long warmup times. However, to meet demands as per our customers we have made several key improvements in our generation times. That being said, our main bottleneck today is the infrastructure itself. We use quite a bit of models to perform the work for our customers, and have tried 3 different paths: 1. Working with images from a private registry that contained the models - was untolerable, the images kept re-downloading layers that were not altered, making it unfeasible to sustain through development unless we seperate only the models. and even then, whenever we need to add a new lora etc -- causes a lot of issues. 2. Downloading the models on boot time - too slow, and unreliable. 3. Using a network drive - this is our current setup. we use network drive to persist all our models. but this comes with a caveat. loading models into the gpu from a network drive is at least 10x slower than it is than container volume, we are talking long minuets instead of short seconds. We need a better solution for this, and want to understand what is the current best practice that we can use to reduce this
4 Replies
salahzsh
salahzsh3w ago
Hi! Bumping this since I'm also interested in the answer, ty in advance
Usman Yasin
Usman Yasin3w ago
Same issue for us, the network volumes are simply unusable for large models.
max4c
max4c3w ago
Hey @J. we were talking about this. Any chance you can weigh in? And then can you communicate to Brendan and Mo how we could update the docs and potentially make a youtube video on best practices?
Unknown User
Unknown User3w ago
Message Not Public
Sign In & Join Server To View

Did you find this page helpful?