Runpod•2d ago

OutOfMemoryError: CUDA out of memory

I keep getting this error when trying to run various models (e.g., gpt-oss-20b, llama-3.3-70b) on pods. Even when running GPUs with way more than the required vRAM (e.g., 141GB H200 for gpt-oss-20b) I still get this error. I have tried setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True but that didn't fix it.

[info] Pipeline stopped due to error: CUDA out of memory. Tried to allocate 42.49 GiB. GPU 0 has a total capacity of 139.72 GiB of which 38.96 GiB is free. Process 584678 has 100.75 GiB memory in use. Of the allocated memory 99.91 GiB is allocated by PyTorch, and 181.47 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)\n

CUDA semantics — PyTorch 2.8 documentation

A guide to torch.cuda, a PyTorch module to run CUDA operations

2 Replies

gxOP•2d ago

Smaller models, like Qwen3-14b do work. But it's weird that a 20b param model like gpt-oss needs more than a H200 to run

Unknown User•this hour

Message Not Public

Gaming

Programming

OutOfMemoryError: CUDA out of memory

Did you find this page helpful?