RunpodR
Runpod3mo ago
ET3D

Lost GPUs mid-run

I was running on a 5090 pod with 3 GPUs (that's what was available to it). Mid-run my software complained that there are no CUDA GPUs. After stopping the app I tried nvidia-smi and got "Failed to initialize NVML: Unknown Error". That's never happened before.
Solution
Restarting helped. So just FYI. I also opened a ticked about this with the pod ID.
Was this page helpful?