No NVLINK in A100 SXM pod from US-WA-1
I believe these A100 SXM GPUs should support NVLink, but when I run the command nvlink-smi topo -m, it doesn't show NVLink connections. The links between GPUs appear as NODE or SYS, when I was expecting something like NV12 or NV4. Also, the workload requires GPU to GPU communicaiton via NVLINK fails in this pod. Could there be an issue causing this?

9 Replies
@Jihyun
Escalated To Zendesk
The thread has been escalated to Zendesk!
Ticket ID: #17630
Try to open a support ticket to confirm
what happens if you rent 8xA100
I'm quite sure these GPUs were previously connected via NVLink—we were using storage on the US-WA-1 data center, and NVLink connections were visible then. When I rent A100s from another data center, like US-KS-2, the nvlink-smi topo -m command does show NVLink connections (e.g., NV12 between GPU0 and GPU1 in this figure). I think the A100 SXM are more expensive than the A100 PCIe in RUNPOD partly because of NVLink support.

NVm yeah, you gotta rent 8x
The maximum available GPUs are 7 so this is the screenshot


but as i said, try to open a support ticket tto ask an official response
or perhaps only in 1 region?
I don't think so, because it doesn't matter how many GPUs we rent, it should always show the topology between 2 GPUs like US-KS-2, and """ALL""" A100 SXM should have NVLINK support. But thanks for the suggestion, I will ask an official response.
yeah maybe only in us-ks-2 is connected by nvlink