R
Runpod15mo ago
Hello

Offloading multiple models

Hi guys, anyone has experience with a inference pipeline that uses multiple models? Wondering how best to manage loading of models that exceed a worker's vram if everything is on vram. Any best practices / examples on how to keep model load time as minimal as possible. Thanks!
2 Replies
Unknown User
Unknown User15mo ago
Message Not Public
Sign In & Join Server To View
yhlong00000
yhlong0000015mo ago
btw, you can also select multiple GPU per worker, if you need to load large models. Some tips to reduce start time:

Did you find this page helpful?