Riham
Riham
RRunPod
Created by Riham on 3/20/2025 in #⛅|pods-clusters
I can not do training out of memory error I got)
I don't know
8 replies
RRunPod
Created by Riham on 3/20/2025 in #⛅|pods-clusters
I can not do training out of memory error I got)
Epoch 1/30 2025-03-21 06:26:13.742233: W tensorflow/tsl/framework/bfc_allocator.cc:485] Allocator (GPU_0bfc) ran out of memory trying to allocate 256.00MiB (rounded to 268435456)requested by op UNetPP/X01/dropout/GreaterEqual If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. Current allocation summary follows. Current allocation summary follows. 2025-03-21 06:26:13.742334: I tensorflow/tsl/framework/bfc_allocator.cc:1039] BFCAllocator dump for GPU_0_bfc 2025-03-21 06:26:13.742380: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (256): Total Chunks: 183, Chunks in use: 179. 45.8KiB allocated for chunks. 44.8KiB in use in bin. 16.4KiB client-requested in use in bin. 2025-03-21 06:26:13.742396: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (512): Total Chunks: 26, Chunks in use: 25. 13.0KiB allocated for chunks. 12.5KiB in use in bin. 12.5KiB client-requested in use in bin. 2025-03-21 06:26:13.742409: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (1024): Total Chunks: 16, Chunks in use: 15. 18.5KiB allocated for chunks. 17.2KiB in use in bin. 15.1KiB client-requested in use in bin. 2025-03-21 06:26:13.742422: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (2048): Total Chunks: 2, Chunks in use: 2. 4.0KiB allocated for chunks. 4.0KiB in use in bin. 4.0KiB client-requested in use in bin. 2025-03-21 06:26:13.742437: I tensorflow/tsl/framework/bfc_allocator.cc:1046]
8 replies