import torch
>>> for i in range(4): print(torch.randn(10).cuda(i))
...
tensor([ 0.2891, -1.5423, 0.9641, -0.9828, -0.2903, -0.1162, -0.3382, -0.4224,
-1.0990, 0.0097], device='cuda:0')
tensor([-0.0208, -0.8867, -0.9426, -0.0929, -0.2264, -0.2705, 0.0863, -0.0632,
-0.3770, -1.2062], device='cuda:1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
torch.AcceleratorError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile withto enable device-side assertions.TORCH_USE_CUDA_DSA