Software Engineering Courses (SECourses)•15mo ago

Rank_3_T5_XXL_23500MB_11_35_Second_IT.json with V9 version on rtx3090, 8.94s/it 2024-10-25 20:14:

Rank_3_T5_XXL_23500MB_11_35_Second_IT.json with V9 version on rtx3090, 8.94s/it

2024-10-25 20:14:29 INFO prepare CLIP-L for fp8: set to torch.float8_e4m3fn, set flux_train_network.py:509
embeddings to torch.bfloat16
INFO prepare T5XXL for fp8: set to torch.float8_e4m3fn, set embeddings flux_train_network.py:538
to torch.bfloat16, add hooks
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 50
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 50
num epochs / epoch数: 200
batch size per device / バッチサイズ: 1
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 10000
steps: 0%| | 0/10000 [00:00<?, ?it/s]2024-10-25 20:14:47 INFO unet dtype: torch.float8_e4m3fn, device: cuda:0 train_network.py:1084
INFO text_encoder [0] dtype: torch.float8_e4m3fn, device: cuda:0 train_network.py:1090
INFO text_encoder [1] dtype: torch.float8_e4m3fn, device: cuda:0 train_network.py:1090

epoch 1/200
INFO epoch is incremented. current_epoch: 0, epoch: 1 train_util.py:715
D:\IA\kohya\kohya_ss\venv\lib\site-packages\torch\autograd\graph.py:825: UserWarning: cuDNN SDPA backward got grad_output.strides() != output.strides(), attempting to materialize a grad_output with matching strides... (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cudnn\MHA.cpp:676.)
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
steps: 0%|▏ | 34/10000 [05:04<24:45:40, 8.94s/it, avr_loss=0.417]T

ZZet Also helps to close all applications and disable hardware acceleration for brows...

Leolis78OP•10/25/24, 11:30 PM

You can also temporarily enable the CUDA policy to use shared memory. And then calculate how much VRAM is missing because the CUDA error lies about the VRAM needed.

theJindesan•10/25/24, 11:31 PM

theJindesan•10/25/24, 11:32 PM

It actually goes up to 38.4/39.9GB just before it crashes.

LLeolis78 Rank_3_T5_XXL_23500MB_11_35_Second_IT.json with V9 version on rtx3090, 8.94s/i...

Furkan Gözükara SECourses•10/25/24, 11:36 PM

nice

TtheJindesan Click to see attachment

Furkan Gözükara SECourses•10/25/24, 11:36 PM

ah wait

Furkan Gözükara SECourses•10/25/24, 11:36 PM

how much RAM you have?

Furkan Gözükara SECourses•10/25/24, 11:36 PM

physical RAM

FFurkan Gözükara SECourses how much RAM you have?

theJindesan•10/25/24, 11:36 PM

64GB - it stays on first epoch step1. slowing climbing up to 38.4 and just as it s about to do first step. it gives cuda out of memory

TtheJindesan 64GB - it stays on first epoch step1. slowing climbing up to 38.4 and just as i...

Furkan Gözükara SECourses•10/25/24, 11:37 PM

ok it should work

Furkan Gözükara SECourses•10/25/24, 11:37 PM

you are using base FLUX model right?

Furkan Gözükara SECourses•10/25/24, 11:37 PM

23.8 gb

Furkan Gözükara SECourses•10/25/24, 11:37 PM

batch size 1 right?

Furkan Gözükara SECourses•10/25/24, 11:38 PM

it uses 32 gb VRAM

Furkan Gözükara SECourses•10/25/24, 11:38 PM

combined

FFurkan Gözükara SECourses 23.8 gb

theJindesan•10/25/24, 11:38 PM

yes and batch 1 according to clean 8GB config

Furkan Gözükara SECourses•10/25/24, 11:38 PM

did you set accelerate?

theJindesan•10/25/24, 11:38 PM

only in the config.. but i noticed config shows fp16 instead ofbf16?

TtheJindesan only in the config.. but i noticed config shows fp16 instead ofbf16?

Furkan Gözükara SECourses•10/25/24, 11:39 PM

ok it should be bf16

Furkan Gözükara SECourses•10/25/24, 11:39 PM

doesnt work?

theJindesan•10/25/24, 11:39 PM

TtheJindesan Click to see attachment

Furkan Gözükara SECourses•10/25/24, 11:39 PM

no not that one

Furkan Gözükara SECourses•10/25/24, 11:39 PM

accelerate

Furkan Gözükara SECourses•10/25/24, 11:40 PM

it is accurate

Furkan Gözükara SECourses•10/25/24, 11:40 PM

well i dont know what else can be made need further debug. but your gpu also older gen i wonder if that makes difference

Furkan Gözükara SECourses•10/25/24, 11:40 PM

probably best to rent a machine on massed compute

FFurkan Gözükara SECourses well i dont know what else can be made need further debug. but your gpu also old...

theJindesan•10/25/24, 11:40 PM

it was working before. I don't understand wht changed with the recent release

Furkan Gözükara SECourses•10/25/24, 11:41 PM

train and use on your computer

TtheJindesan it was working before. I don't understand wht changed with the recent release �...

Furkan Gözükara SECourses•10/25/24, 11:41 PM

you can use older

Furkan Gözükara SECourses•10/25/24, 11:41 PM

just make fresh install

Furkan Gözükara SECourses•10/25/24, 11:41 PM

dont run torch 2.5

Furkan Gözükara SECourses•10/25/24, 11:41 PM

it will stay as torch 2.4.1

FFurkan Gözükara SECourses dont run torch 2.5

theJindesan•10/25/24, 11:41 PM

it ran on torch 2.5 pre-release or beta..

theJindesan•10/25/24, 11:41 PM

is there a way to get that again?

TtheJindesan it ran on torch 2.5 pre-release or beta..

Furkan Gözükara SECourses•10/25/24, 11:41 PM

ah i see

Furkan Gözükara SECourses•10/25/24, 11:41 PM

it was release candidate

Furkan Gözükara SECourses•10/25/24, 11:41 PM

i dont know if you can get

theJindesan•10/25/24, 11:42 PM

Yeah around Config v6/v7 timeframe. everything was working. with torch 2.5 dev I updated to V9 with torch release, and it stopped working

Furkan Gözükara SECourses•10/25/24, 11:43 PM

https://dev-discuss.pytorch.org/t/pytorch-release-2-5-0-final-rc-is-available/2514

PyTorch Developer Mailing List

PyTorch Release 2.5.0 - Final RC is available

Final 2.5.0 RC for PyTorch core and Domain Libraries is available for download from pytorch-test channel. Reminder of key dates: M5: External-Facing Content Finalized (10/11/24) M6: Release Day (10/17/24) List of Issues included in the Patch Release 2.5.0 can be found here: 2.5.0 Milestone Cherry-Picks included in the Patch Release 2.5.0 ca...

Furkan Gözükara SECourses•10/25/24, 11:43 PM

https://download.pytorch.org/libtorch/test/cu124/libtorch-win-shared-with-deps-2.5.0%2Bcu124.zip

Furkan Gözükara SECourses•10/25/24, 11:43 PM

try

Furkan Gözükara SECourses•10/25/24, 11:44 PM

uninstall xformers

Furkan Gözükara SECourses•10/25/24, 11:44 PM

and use SPDA

theJindesan•10/25/24, 11:45 PM

@Leolis78 are you on 3090 or 3090 ti? are you overclocking? I only get 12.18s/it

FFurkan Gözükara SECourses https://download.pytorch.org/libtorch/test/cu124/libtorch-win-shared-with-deps-2...

theJindesan•10/25/24, 11:46 PM

will try.. thank you again. its strange if its only supposed to use 32.2GB. why does mine keep climbing.. what is it loading into memory more?

TtheJindesan will try.. thank you again. its strange if its only supposed to use 32.2GB. why...

Furkan Gözükara SECourses•10/25/24, 11:55 PM

ye that doesnt make sense

Furkan Gözükara SECourses•10/25/24, 11:56 PM

full training without shared vram uses 28.200 MB : 48GB_GPU_28200MB_6.4_second_it_Tier_1

theJindesan•10/26/24, 12:36 AM

IDK, it keeps going over 32.2GB then it gives out of memory, on SPDA. I'll keep trying other things.

theJindesan•10/26/24, 12:36 AM

Its like its not following the 35 blocks swap

TtheJindesan IDK, it keeps going over 32.2GB then it gives out of memory, on SPDA. I'll ke...

Zet•10/26/24, 2:19 AM

You didn't enable save state for some reason, right?

ZZet You didn't enable save state for some reason, right?

theJindesan•10/26/24, 2:26 AM

I have in some configs, will that use more memory just to save a state? But i was also testing clean configs from the v9 zip

Rank_3_T5_XXL_23500MB_11_35_Second_IT.json with V9 version on rtx3090, 8.94s/it 2024-10-25 20:14:

Similar Threads

Similar Threads

Similar Threads