P2P is disabled between NVLINK connected GPUs 1 and 0

Hey team! Could you fix NVLink issue for H100 SXM Community pods? I encounter this error frequently. Corrupted pod ID: 4a5acwxj2kene6 P2P is disabled between NVLINK connected GPUs 1 and 0. This should not be the case given their connectivity, and is probably due to a hardware issue. If you still want to proceed, you can set NCCL_IGNORE_DISABLED_P2P=1. I can proceed with NCCL_IGNORE_DISABLED_P2P flag but this will drop performance ~ 10%
No description
Solution:
@storuky2306 so got response and aparently gpu5 is not supporting P2P. What we can advise for now is to pick diffrent machine...
Jump to solution
3 Replies
Madiator2011
Madiator20113mo ago
forwarded it to team
Solution
Madiator2011
Madiator20113mo ago
@storuky2306 so got response and aparently gpu5 is not supporting P2P. What we can advise for now is to pick diffrent machine
storuky2306
storuky23063mo ago
@Papa Madiator ok, thanks