RunpodR
Runpod12mo ago
JCtheMC

Issue with Huggingface dataset not being cached to storage volume

I want to use https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu for a project. I'm trying to download this dataset through the python datasets package. I want this download to be stored on my storage volume. As per the documentation here: https://huggingface.co/docs/datasets/v3.2.0/en/cache#cache-directory , the package offers the option to either set an environment variable or use a function argument to specify the download directory. I've tried both approaches, but whatever i do, the cached files keep ending up on the Container instead of my storage Volume. Edit: it may very well be that i'm not defining the path correctly - i have limited linux experience. Please help.
Solution
If that's the right variable use export command in Linux to set the env variable instead of setting in runpod
Was this page helpful?