Error when using 'nvidia-smi' on pod, 'Failed to initialize NVML: Unknown Error'

Hello. I am training ASR model on GPU pod, and as the title says, error started occurring.

I know this is a well-known issue and I know it needs to be fixed on the host server.

The pod information that the above issue occurred on is as follows.

Region: US-GA-2
Pod ID: mvu7urvdtnjdi4

I have not experienced any inconvenience related to this issue, and I just wrote this post to send you, so I think it isn't necessary to open a ticket.

I hope that appropriate action will be taken someday.

best regard, michigety.
image.png
image.png
Was this page helpful?