Error when using 'nvidia-smi' on pod, 'Failed to initialize NVML: Unknown Error'
Hello. I am training ASR model on GPU pod, and as the title says, error started occurring.
I know this is a well-known issue and I know it needs to be fixed on the host server.
The pod information that the above issue occurred on is as follows.
Region: US-GA-2
Pod ID: mvu7urvdtnjdi4
I have not experienced any inconvenience related to this issue, and I just wrote this post to send you, so I think it isn't necessary to open a ticket.
I hope that appropriate action will be taken someday.
best regard, michigety.


4 Replies
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
I'm sorry for the late response to your message.
I have 3 pods now, all pods region are the same, US-GA-2.
Excluding the above pod, their pod IDs are as follows:
yfcgqj7cf5llks
v0xrqmh4mzqowc
These pods have just one of L4(GPU) and do not throw an error to this time, only the one above does.
@michigety
Escalated To Zendesk
The thread has been escalated to Zendesk!
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View