P2P is disabled between NVLINK connected GPUs 1 and 0

Hey team! Could you fix NVLink issue for H100 SXM Community pods? I encounter this error frequently. Corrupted pod ID: 4a5acwxj2kene6
P2P is disabled between NVLINK connected GPUs 1 and 0. This should not be the case given their connectivity, and is probably due to a hardware issue. If you still want to proceed, you can set NCCL_IGNORE_DISABLED_P2P=1.

I can proceed with NCCL_IGNORE_DISABLED_P2P flag but this will drop performance ~ 10%
Screenshot_2024-03-18_at_17.10.04.png
Solution
@storuky2306 so got response and aparently gpu5 is not supporting P2P.
What we can advise for now is to pick diffrent machine
Was this page helpful?