R
Runpod3mo ago
notrius

My POD is not accessible + lost access to my data

2 days ago a notice appeared next to my pod: PLEASE CHECK ATTACHED TXT FILE FOR FULL INFO We have detected a critical error on this machine which may affect some pods. We are looking into the root cause and apologize for any inconvenience. We would recommend backing up your data and creating a new pod in the meantime. I guess this is a HW error. Since then, trying to boot with GPU gives me this error in log: error creating container: nvidia-smi: parsing output of line 6: failed to parse ([GPU requires reset]) into int: strconv.Atoi: parsing "": invalid syntax And wont boot up at all. If I try to bootup in CPU mode, the server seems to go online - with 512MB RAM which is immediately 100% utilized and 0.5 vCPU. Web terminal fails to launch and when I try to connect from my OSX terminal, I get through the authentication but end up with ...
FULL INFO IN THE ATTACHMENT I am really desperate, I've been using runpod for over a month and have though what a great service it is. I've built and configured a perfect pod for my work workflow. Was currently running a big job for a client (which I have now loosed for not delivering on time). Despite the notice (quoted above) nobody is proactively looking into the issue, no updates. I have cotacted RunPods customer service and created a ticket (and have read the whole documentation). The support was completely useless - replying with some template answer telling me to create a network storage and migrate my data there, pointing me to two knowledgebase articles. But.. FULL INFO IN THE ATTACHMENT This is very unfortunate situation for me and terrible customer experience. I've though "This is it" when I first discovered runpod but if this is how they care about their customers and the level of SLA they provide .. Any ideas? Please help. PLEASE CHECK ATTACHED TXT FILE FOR FULL INFO
17 Replies
Madiator2011
Madiator20113mo ago
looks like community cloud
notrius
notriusOP3mo ago
It is not. I'm in the secure cloud. Would expect some reasonable SLA for a service costing me around 450 usd / month. Not being able to access my data for 48+ hour plus stil having to pay for it - is ridiculous at least. Runpod had offered a compensation of 35 usd, what a (bad) joke. And I still dont have a resolution ETA.
Dj
Dj3mo ago
Message me your ticket number, but publicly I'll recommend you manually expose TCP port 22 so you don't send your SSH through the proxy. Please be advised this is me providing support outside of our standard support hours I may not be able to get back to you immediately.
notrius
notriusOP3mo ago
thanks DJ, 22 was exposed and the result is unfortunatelly the same .. sending you my ticket number cant send message to you directly unless you accept my friend request. Do I share the ticket number here? Also one thing is getting to my pod through SSH to retrieve my data (after 48+ hours), other thing is I would like to hear someone is actually working on the case and get some ETA update/estimate :(
Dj
Dj3mo ago
When you connect, you need to use the second option in the connect menu to not use the proxy. If you don't have this second option you may not have port 22 exposed or openssh running. I accepted a friend request.
notrius
notriusOP3mo ago
I've tried all three options - actually in CPU only mode I get connected but my shell wont start - terminal freezes. As stated in the description above - in CPU mode the pod boots with 512MB ram which is instantly 100% utilized and 0.5 vCPU (0.5?? wtf :)
Madiator2011
Madiator20113mo ago
@notrius have you tried to run pod with command: bash -c 'sleep infinity' that should let you access pod by blocking running any apps
notrius
notriusOP3mo ago
I did not, let me try that thanks, that is actually first advice that worked, finally at least got it and can backup my data to a network storage. What is my best option considering FUSE is not running on that pod? Also my shell scripts do not work, any idea why?
Madiator2011
Madiator20113mo ago
the FUSE wont work as it requires provilaged container, about your bash script idk
notrius
notriusOP3mo ago
so rsync and s3f are ruled out, I cant ssh to that pod from my other pod (why?). What do you recommend then? I also get disconnected from the pod eevery few minutes with "Connection closed"
Madiator2011
Madiator20113mo ago
web terminal, basic ssh, tcp ssh? rsync should work
notrius
notriusOP3mo ago
web terminal and tcp ssh ssh: connect to host 69.30.85.238 port 22078: Connection refused rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(232) [Receiver=3.2.7] ok, CROC seems to be the only way to get the job done actually .. not. Worked for a small set of files, now I get: peer error: refusing files getting desperate
Madiator2011
Madiator20113mo ago
@notrius would you give a try to sftp with like filezilla?
notrius
notriusOP3mo ago
sure. Anything to get my data. (does not currently work over exposed TCP)
Madiator2011
Madiator20113mo ago
try open web terminal and type pip install OhMyRunPod then run OhMyRunPod then select File Trnansfer and then SFTP it will give you details for Filezilla
notrius
notriusOP3mo ago
will do that
Madiator2011
Madiator20113mo ago
@notrius Just checking out. Were you able to move data with my guide?

Did you find this page helpful?