some cuda devices not working in multi gpu setting
Hi, I started my pod with 8x H100 pcle & 4xH100 to see if this bug is reproducible.
and for 8x H100, I failed immediately to assign tensors to device 0.
while for 4xH100 it succeeds, then it fails at device 2. Do you have any idea why this is happening?
I'm keep wasting my credit because of this bug
Python 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
and for 8x H100, I failed immediately to assign tensors to device 0.
while for 4xH100 it succeeds, then it fails at device 2. Do you have any idea why this is happening?
I'm keep wasting my credit because of this bug
Python 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import torch
>>> for i in range(4): print(torch.randn(10).cuda(i))
...
tensor([ 0.2891, -1.5423, 0.9641, -0.9828, -0.2903, -0.1162, -0.3382, -0.4224,
-1.0990, 0.0097], device='cuda:0')
tensor([-0.0208, -0.8867, -0.9426, -0.0929, -0.2264, -0.2705, 0.0863, -0.0632,
-0.3770, -1.2062], device='cuda:1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
torch.AcceleratorError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile withTORCH_USE_CUDA_DSAto enable device-side assertions.
