Runpod•13mo ago

How to cache model download from HuggingFace - Tips?

Usin Serverless (48gb pro) w Flashboot. Want to optimize for fast cold start

is there a guide somewhere?

it does not seem to be caching the download - it's always re-downloading the model entirely (and slowly)

should i ssh into some persistent storage & download the model there? then reference that local path in the HF model load?

Jason•11/29/24, 8:21 AM

Flashboot isn't some free storage like ssd, use network storage, it's mounted in /runpod-volume in serverless.. Or in pods /workspace

BlakeOP•11/30/24, 4:09 PM

@nerdylive would u recommend doing this (pic) ? (seems all workers in my endpoint will pull from this same /runpod-volume)

-

btw: perhaps runpod-volume is only available/mounted when when using a runpod docker base image? e.g. a ubuntu image doesn't seem to have it mounted (pic))

BlakeOP•11/30/24, 5:25 PM

also: it seems like when you change GPU type, the /runpod-volume is deleted/non accessible - is this correct?

Jason•12/1/24, 12:28 AM

No it's mounted when you run the worker in Runpod's server or system

BBlake also: it seems like when you change GPU type, the /runpod-volume is deleted/non ...

Jason•12/1/24, 12:29 AM

No, if you attach network storage it'll be persistent

Jason•12/1/24, 12:29 AM

Along as you keep it and keep it attached to the endpoint you use

JJason No, if you attach network storage it'll be persistent

BlakeOP•12/1/24, 12:51 AM

okay thanks. do you recommend creating a new network volume? & persisting HF weights in that?

perhaps that's more stable/clear for me to follow than using the default /runpod-volume (which i assume is attached by default?) but seems to be giving me unexpected behaviour

------

i seem to be triggering new HF downloads even when this image has run & downloaded & persisted the weights to /runpod-volume/.cache/huggingface/hub/.. in previous runs

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]Request 48c80db3-d744-4f39-8af2-929133a77895: HEAD https://huggingface.co/LanguageBind/Video-LLaVA-7B

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]Request 48c80db3-d744-4f39-8af2-929133a77895: HEAD https://huggingface.co/LanguageBind/Video-LLaVA-7B

if u happen to know / have a code example that shows a reliable way to persist HF in he most straightforward way lmk!

Jason•12/1/24, 12:54 AM

Just write and read to that path

Jason•12/1/24, 12:54 AM

You can imagine it like a folder that is always there

BlakeOP•12/1/24, 5:44 PM

when writing to /rundpod-volume i'm still seeing the container do full model downloads when i kill the worker

so i:

created a new network storage ( /modelstorage ) & and am read/writing to this
attached this volume to my endpoint (didn't deploy the volume)

but when i kill the worker it re-downloads from hf??

BlakeOP•12/1/24, 5:44 PM

any am i missing !? any code examples of ensuring it downloads from the network volume & NOT hf

Jason•12/1/24, 9:11 PM

No it acts as a drive.. Not re-downloads from network volume

Jason•12/1/24, 9:11 PM

You just use the model from /runpod-volume

Jason•12/1/24, 9:11 PM

Maybe your path/method is wrong, you need to cache your model there somehow

Jason•12/1/24, 9:12 PM

Snapshot model or set the path of model there

How to cache model download from HuggingFace - Tips?

Similar Threads

Similar Threads

Similar Threads