Mi300x NCCL Issue
I’m experiencing an issue with the MI300X pod. Two GPUs are configured, but I’m unable to run the basic all_reduce_perf test on the pod.
5 Replies
@GK
Escalated To Zendesk
The thread has been escalated to Zendesk!
Ticket ID: #11,678
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
I’m new to using RunPod and have been working with the RunPod Pytorch 2.4.0 ROCm 6.1 template. It was functioning properly until last Friday. However, the same template no longer works now. I haven’t installed any additional drivers beyond what comes with the template.
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
okay