OutOfMemoryError: CUDA out of memory
I keep getting this error when trying to run various models (e.g.,
gpt-oss-20b
, llama-3.3-70b
) on pods. Even when running GPUs with way more than the required vRAM (e.g., 141GB H200 for gpt-oss-20b
) I still get this error. I have tried setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
but that didn't fix it.
[info] Pipeline stopped due to error: CUDA out of memory. Tried to allocate 42.49 GiB. GPU 0 has a total capacity of 139.72 GiB of which 38.96 GiB is free. Process 584678 has 100.75 GiB memory in use. Of the allocated memory 99.91 GiB is allocated by PyTorch, and 181.47 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)\n>
CUDA semantics — PyTorch 2.8 documentation
A guide to torch.cuda, a PyTorch module to run CUDA operations
2 Replies
Smaller models, like Qwen3-14b do work. But it's weird that a 20b param model like gpt-oss needs more than a H200 to run
Unknown User•this hour
Message Not Public
Sign In & Join Server To View