Issue with Huggingface dataset not being cached to storage volume
I want to use https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu for a project. I'm trying to download this dataset through the python datasets package. I want this download to be stored on my storage volume. As per the documentation here: https://huggingface.co/docs/datasets/v3.2.0/en/cache#cache-directory , the package offers the option to either set an environment variable or use a function argument to specify the download directory. I've tried both approaches, but whatever i do, the cached files keep ending up on the Container instead of my storage Volume.  Edit: it may very well be that i'm not defining the path correctly - i have limited linux experience. Please help.
29 Replies
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
yep - i'm doing it, and it's not working for some reason.
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View

Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
through a python script
files are still ending up in root

Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
yes i ran it manually
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
oh ok! let'me give that a try
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
oooh ok i completely do not understand how this works apparently hahaha
now it seems to be saving to my workspace correctly
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
aaah ok - that i didn't realise.
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
and this is some bash script that i would have to write myself?
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
yes please
i'm trying to learn and for me examples generally work best
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
so if i run this when the pod is setup, whenever i open a terminal, it'll have environment variables i defined through the web interface?
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
Thank you for the help - appreciate it. I really should take some time to go over the tutorials in more depth.

Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
i honestly need to work on a bunch of stuff - limited linux experiences, barely any docker experience.
for now though, this works. Which means i can do stuff. Thank your for the help, i appreciate it.
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View