Best way to cache models with serverless ?

Hello,

I'm using serverless endpoint to do image generation with flux dev. The model is 22gb which is quite long to download, especially since some workers seem to be faster than some others.

I've been using a network volume as a cache which greatly improve start up time. However, doing this lock me in a particular region which I believe make some GPUs like the A100 very rarely available.

Is there a way to have a global huggingface cache with serverless endpoint ? (like with pods)

Thanks

Best way to cache models with serverless ?

Similar Threads

Best way to cache models with serverless ?

Similar Threads

Similar Threads

Similar Threads