R
Runpod10mo ago
JCtheMC

Issue with Huggingface dataset not being cached to storage volume

I want to use https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu for a project. I'm trying to download this dataset through the python datasets package. I want this download to be stored on my storage volume. As per the documentation here: https://huggingface.co/docs/datasets/v3.2.0/en/cache#cache-directory , the package offers the option to either set an environment variable or use a function argument to specify the download directory. I've tried both approaches, but whatever i do, the cached files keep ending up on the Container instead of my storage Volume. Edit: it may very well be that i'm not defining the path correctly - i have limited linux experience. Please help.
Solution:
Message Not Public
Sign In & Join Server To View
Jump to solution
29 Replies
Unknown User
Unknown User10mo ago
Message Not Public
Sign In & Join Server To View
JCtheMC
JCtheMCOP10mo ago
yep - i'm doing it, and it's not working for some reason.
Unknown User
Unknown User10mo ago
Message Not Public
Sign In & Join Server To View
JCtheMC
JCtheMCOP10mo ago
No description
Unknown User
Unknown User10mo ago
Message Not Public
Sign In & Join Server To View
JCtheMC
JCtheMCOP10mo ago
through a python script
JCtheMC
JCtheMCOP10mo ago
files are still ending up in root
No description
Unknown User
Unknown User10mo ago
Message Not Public
Sign In & Join Server To View
JCtheMC
JCtheMCOP10mo ago
yes i ran it manually
Unknown User
Unknown User10mo ago
Message Not Public
Sign In & Join Server To View
JCtheMC
JCtheMCOP10mo ago
oh ok! let'me give that a try
Unknown User
Unknown User10mo ago
Message Not Public
Sign In & Join Server To View
JCtheMC
JCtheMCOP10mo ago
oooh ok i completely do not understand how this works apparently hahaha
JCtheMC
JCtheMCOP10mo ago
No description
JCtheMC
JCtheMCOP10mo ago
now it seems to be saving to my workspace correctly
Unknown User
Unknown User10mo ago
Message Not Public
Sign In & Join Server To View
JCtheMC
JCtheMCOP10mo ago
aaah ok - that i didn't realise.
Unknown User
Unknown User10mo ago
Message Not Public
Sign In & Join Server To View
JCtheMC
JCtheMCOP10mo ago
and this is some bash script that i would have to write myself?
Unknown User
Unknown User10mo ago
Message Not Public
Sign In & Join Server To View
JCtheMC
JCtheMCOP10mo ago
yes please i'm trying to learn and for me examples generally work best
Unknown User
Unknown User10mo ago
Message Not Public
Sign In & Join Server To View
JCtheMC
JCtheMCOP10mo ago
so if i run this when the pod is setup, whenever i open a terminal, it'll have environment variables i defined through the web interface?
Unknown User
Unknown User10mo ago
Message Not Public
Sign In & Join Server To View
JCtheMC
JCtheMCOP10mo ago
Thank you for the help - appreciate it. I really should take some time to go over the tutorials in more depth.
JCtheMC
JCtheMCOP10mo ago
No description
Unknown User
Unknown User10mo ago
Message Not Public
Sign In & Join Server To View
JCtheMC
JCtheMCOP10mo ago
i honestly need to work on a bunch of stuff - limited linux experiences, barely any docker experience. for now though, this works. Which means i can do stuff. Thank your for the help, i appreciate it.
Unknown User
Unknown User10mo ago
Message Not Public
Sign In & Join Server To View

Did you find this page helpful?