We have detected a critical error on this machine which may affect some pods. We are looking into the root cause and apologize for any inconvenience. We would recommend backing up your data and creating a new pod in the meantime.
I guess this is a HW error. Since then, trying to boot with GPU gives me this error in log:
error creating container: nvidia-smi: parsing output of line 6: failed to parse ([GPU requires reset]) into int: strconv.Atoi: parsing "": invalid syntax
error creating container: nvidia-smi: parsing output of line 6: failed to parse ([GPU requires reset]) into int: strconv.Atoi: parsing "": invalid syntax
And wont boot up at all. If I try to bootup in CPU mode, the server seems to go online - with 512MB RAM which is immediately 100% utilized and 0.5 vCPU. Web terminal fails to launch and when I try to connect from my OSX terminal, I get through the authentication but end up with ...
FULL INFO IN THE ATTACHMENT
I am really desperate, I've been using runpod for over a month and have though what a great service it is. I've built and configured a perfect pod for my work workflow. Was currently running a big job for a client (which I have now loosed for not delivering on time). Despite the notice (quoted above) nobody is proactively looking into the issue, no updates. I have cotacted RunPods customer service and created a ticket (and have read the whole documentation). The support was completely useless - replying with some template answer telling me to create a network storage and migrate my data there, pointing me to two knowledgebase articles. But..
FULL INFO IN THE ATTACHMENT
This is very unfortunate situation for me and terrible customer experience. I've though "This is it" when I first discovered runpod but if this is how they care about their customers and the level of SLA they provide ..