RunpodR
Runpod2y ago
12 replies
codeRetarded

Serverless multi gpu

I have a model deployed on 2 48 GB GPUs and 1 worker. It ran correctly for the first time with cuda distributed. But then fails with this "error_message": "Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)",\n "error_traceback": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\\

What can be the issue here?
Was this page helpful?