Runpod•12mo ago

I'm seeing 93% GPU Memory Used even in a freshly restarted pod.

Not sure what to do about this. nvidia-smi shows there are no processes running, but when I try to run a job it shows "Process 1726743 has 42.25 GiB memory in use". How do I find and kill that?

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacity of 44.52 GiB of which 18.44 MiB is free. Process 1726743 has 42.25 GiB memory in use. Process 3814980 has 2.23 GiB memory in use. Of the allocated memory 1.77 GiB is allocated by PyTorch, and 53.97 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacity of 44.52 GiB of which 18.44 MiB is free. Process 1726743 has 42.25 GiB memory in use. Process 3814980 has 2.23 GiB memory in use. Of the allocated memory 1.77 GiB is allocated by PyTorch, and 53.97 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

3 Replies

Unknown User•12mo ago

Message Not Public

stevexOP•12mo ago

I tried most of that .. the process id it quoted doesn't show up in ps -ef (and the number is a bit unusual). If there was a process holding onto memory, restarting the pod would clear that.

Unknown User•12mo ago

Message Not Public

Gaming

Programming

I'm seeing 93% GPU Memory Used even in a freshly restarted pod.

Did you find this page helpful?