R
RunPodlevyeci

Cuda - Out of Memory error when the 2nd GPU not utilized

I have a pod with 2 x 80 GB PCIe and I am trying to load and run Smaug-72B-v0.1 LLM. The problem is, I can download it and when I try to load it it gives me CUDA Out of memory exception while the 2nd GPU memory is empty. I was expecting that when I choose 2 x GPU to run I can use the sum capacity. If you check screenshot, the 2nd GPU memory not used at all when exception is fired. Also, there is no GPU instances with that big RAM so I have to choose 2x or 3x. How i can fix it? Thanks The exception is: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 384.00 MiB. GPU 0 has a total capacty of 79.11 GiB of which 168.50 MiB is free. Process 3311833 has 78.93 GiB memory in use. Of the allocated memory 78.31 GiB is allocated by PyTorch, and 189.60 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
No description
M
Madiator201141d ago
Could you run my test script #RunPod GPU Tester (recomended for H100 users) Also I’m not expert with text ui so maybe there is setting to use multiple gpu. @levyeci if you run script send the debug output so we can see if serving gpu do not have some issues. H100 are known of being problematic
K
kopyl41d ago
Your script has to explicitly multi-GPU inference. If it does not, then it’s not Runpod’s fault.
A
ashleyk41d ago
You also need to go to settings in oobabooga and set it to use both GPU. I had to specify to use both A100 with Mistral 8x7b
Want results from more Discord servers?
Add your server
More Posts
Postman returns either 401 Unauthorized, or when the request can be sent it returns as Failed, errorPostman reads the following, when I send runsync request from runpod tutorial (from generativelabs) Backdrop Build V3 Credits missingHi team, I hope this message finds you well. I am writing to follow up on the recent offer I receivText-generation-inference on serverless endpointsHi, I don't have much experience neither with llms nor with python, so I always just use this image When on 4000 ADA, it's RANDOMLY NOT DETECTING GPU!When on 4000 ADA, it's RANDOMLY NOT DETECTING GPU! Yesterday I set it up and it's okay. Today I set Cold Start Time is too longWhen i test a HelloWorld project, run , it take too much time. Worker Configuration as attachment, IWhat happened to the webhook graph?There was a webhook graph for serverless but I can't seem to find it anymore. Was it removed for soHow i can use more than 30 workers?i've tested my task with 30 workers and realized that i need more) is it possible to get 40 or more?What is the caching mechanism of RUNPOD docker image?our Docker image is stored in AWS ECR. We've noticed that every time we update the Docker template ocant get my pod to work righthi im new to runpod im trying to add models and loras to my runpod as well as trying to install runpHi, is there currently an outage to Serverless API?The request are "IN_QUEUE" forever...Error occuredGPU not usableserverless deploymenti want to deploy my llm on serverless endpoint, how can i do that?