Error when using 'nvidia-smi' on pod, 'Failed to initialize NVML: Unknown Error'

Hello. I am training ASR model on GPU pod, and as the title says, error started occurring. I know this is a well-known issue and I know it needs to be fixed on the host server. The pod information that the above issue occurred on is as follows. Region: US-GA-2 Pod ID: mvu7urvdtnjdi4 I have not experienced any inconvenience related to this issue, and I just wrote this post to send you, so I think it isn't necessary to open a ticket. I hope that appropriate action will be taken someday. best regard, michigety.
No description
No description
4 Replies
Unknown User
Unknown User4mo ago
Message Not Public
Sign In & Join Server To View
michigety
michigetyOP4mo ago
I'm sorry for the late response to your message. I have 3 pods now, all pods region are the same, US-GA-2. Excluding the above pod, their pod IDs are as follows: yfcgqj7cf5llks v0xrqmh4mzqowc These pods have just one of L4(GPU) and do not throw an error to this time, only the one above does.
Poddy
Poddy4mo ago
@michigety
Escalated To Zendesk
The thread has been escalated to Zendesk!
Unknown User
Unknown User4mo ago
Message Not Public
Sign In & Join Server To View

Did you find this page helpful?