RunPod•14mo ago

Cuda - Out of Memory error when the 2nd GPU not utilized

I have a pod with 2 x 80 GB PCIe and I am trying to load and run Smaug-72B-v0.1 LLM. The problem is, I can download it and when I try to load it it gives me CUDA Out of memory exception while the 2nd GPU memory is empty. I was expecting that when I choose 2 x GPU to run I can use the sum capacity. If you check screenshot, the 2nd GPU memory not used at all when exception is fired. Also, there is no GPU instances with that big RAM so I have to choose 2x or 3x. How i can fix it? Thanks The exception is:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 384.00 MiB. GPU 0 has a total capacty of 79.11 GiB of which 168.50 MiB is free. Process 3311833 has 78.93 GiB memory in use. Of the allocated memory 78.31 GiB is allocated by PyTorch, and 189.60 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

3 Replies

Madiator2011•14mo ago

Could you run my test script #RunPod GPU Tester (recomended for H100 users) Also I’m not expert with text ui so maybe there is setting to use multiple gpu. @levyeci if you run script send the debug output so we can see if serving gpu do not have some issues. H100 are known of being problematic

kopyl•14mo ago

Your script has to explicitly multi-GPU inference. If it does not, then it’s not Runpod’s fault.

ashleyk•14mo ago

You also need to go to settings in oobabooga and set it to use both GPU. I had to specify to use both A100 with Mistral 8x7b

Gaming

Programming

Cuda - Out of Memory error when the 2nd GPU not utilized

Did you find this page helpful?