R
RunPod2mo ago
Riham

I can not do training out of memory error I got)

Hellow there I have rented a pod with GPU: H100 SXM RAM:251 G RAM .I tried to train my model on images and their mask but unfortunately it return Out of memory error. Please help I am very confused
3 Replies
Jason
Jason2mo ago
maybe your training code / app? or it just requires more gpu vram i'm guessing any logs / error you can share
Riham
RihamOP2mo ago
Epoch 1/30 2025-03-21 06:26:13.742233: W tensorflow/tsl/framework/bfc_allocator.cc:485] Allocator (GPU_0bfc) ran out of memory trying to allocate 256.00MiB (rounded to 268435456)requested by op UNetPP/X01/dropout/GreaterEqual If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. Current allocation summary follows. Current allocation summary follows. 2025-03-21 06:26:13.742334: I tensorflow/tsl/framework/bfc_allocator.cc:1039] BFCAllocator dump for GPU_0_bfc 2025-03-21 06:26:13.742380: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (256): Total Chunks: 183, Chunks in use: 179. 45.8KiB allocated for chunks. 44.8KiB in use in bin. 16.4KiB client-requested in use in bin. 2025-03-21 06:26:13.742396: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (512): Total Chunks: 26, Chunks in use: 25. 13.0KiB allocated for chunks. 12.5KiB in use in bin. 12.5KiB client-requested in use in bin. 2025-03-21 06:26:13.742409: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (1024): Total Chunks: 16, Chunks in use: 15. 18.5KiB allocated for chunks. 17.2KiB in use in bin. 15.1KiB client-requested in use in bin. 2025-03-21 06:26:13.742422: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (2048): Total Chunks: 2, Chunks in use: 2. 4.0KiB allocated for chunks. 4.0KiB in use in bin. 4.0KiB client-requested in use in bin. 2025-03-21 06:26:13.742437: I tensorflow/tsl/framework/bfc_allocator.cc:1046] I don't know
Jason
Jason2mo ago
i don't know neither without additional information, but you can try debugging to figure that out or try bigger vram immediately yeah it says in the first few lines, your gpu ran out ofvram

Did you find this page helpful?