R
RunPod3w ago
Jihyun

No NVLINK in A100 SXM pod from US-WA-1

I believe these A100 SXM GPUs should support NVLink, but when I run the command nvlink-smi topo -m, it doesn't show NVLink connections. The links between GPUs appear as NODE or SYS, when I was expecting something like NV12 or NV4. Also, the workload requires GPU to GPU communicaiton via NVLINK fails in this pod. Could there be an issue causing this?
No description
9 Replies
Poddy
Poddy3w ago
@Jihyun
Escalated To Zendesk
The thread has been escalated to Zendesk!
Ticket ID: #17630
Jason
Jason3w ago
Try to open a support ticket to confirm
riverfog7
riverfog73w ago
what happens if you rent 8xA100
Jihyun
JihyunOP3w ago
I'm quite sure these GPUs were previously connected via NVLink—we were using storage on the US-WA-1 data center, and NVLink connections were visible then. When I rent A100s from another data center, like US-KS-2, the nvlink-smi topo -m command does show NVLink connections (e.g., NV12 between GPU0 and GPU1 in this figure). I think the A100 SXM are more expensive than the A100 PCIe in RUNPOD partly because of NVLink support.
No description
Jason
Jason3w ago
NVm yeah, you gotta rent 8x
Jihyun
JihyunOP3w ago
The maximum available GPUs are 7 so this is the screenshot
No description
No description
Jason
Jason3w ago
but as i said, try to open a support ticket tto ask an official response or perhaps only in 1 region?
Jihyun
JihyunOP3w ago
I don't think so, because it doesn't matter how many GPUs we rent, it should always show the topology between 2 GPUs like US-KS-2, and """ALL""" A100 SXM should have NVLINK support. But thanks for the suggestion, I will ask an official response.
Jason
Jason3w ago
yeah maybe only in us-ks-2 is connected by nvlink

Did you find this page helpful?