Runpod•15mo ago

Offloading multiple models

Hi guys, anyone has experience with a inference pipeline that uses multiple models? Wondering how best to manage loading of models that exceed a worker's vram if everything is on vram. Any best practices / examples on how to keep model load time as minimal as possible. Thanks!

2 Replies

Unknown User•15mo ago

Message Not Public

yhlong00000•15mo ago

btw, you can also select multiple GPU per worker, if you need to load large models. Some tips to reduce start time:

message.txt

Gaming

Programming

Offloading multiple models

Did you find this page helpful?