h100 servers having issues?

Hey RunPod folks, is something going on with the h100 secure cloud machines? I first got a number of weird issues on a 8xH100 (SXM) server (cross GPU links going down randomly? Hard to say what is exactly going on - I get random timeouts in multi GPU comms after days of work).

I tried spinning a new machine (ID: nyotnwudbsq0mu, ID: 23xahufe1yk33g) but they are stuck loading the docker images from our private Docker (that works great and I can access from other RunPod machines).

Can someone please have a look?
Was this page helpful?