GPU's are unavailable on pod.

Hi guys, I've set 4xH100 instance (default one at most). And when the pod is instantiated GPU's are not available within. (I have a script to validate that)

here's c6ghnnsno6fkvu whatever pod id. I'll keep it for a day, to let you check it exactly.

Here's my script output:

[sanitycheck] VISIBLE=all | WORLD_SIZE=1 | NPROC=1 | NGPUS=0 | NAMES=[]


Usually it looks like this:

[sanitycheck] VISIBLE=all | WORLD_SIZE=1 | NPROC=1 | NGPUS=4 | NAMES=['NVIDIA H100 PCIe', 'NVIDIA H100 PCIe', 'NVIDIA H100 PCIe', 'NVIDIA H100 PCIe']


This thing happens with me second time. Last time I had the very same issue with A100 PCIe GPUs.

Recreation of pod helps but not always (that case with A100 it's obviously same resources were realocated few times in a row).
Was this page helpful?