Software Engineering Courses (SECourses)•15mo ago

also try to test your LORA configurations. The Rank_1_29500MB_8_85_Second_IT.json is not ready to ru

also try to test your LORA configurations. The Rank_1_29500MB_8_85_Second_IT.json is not ready to run on a 3090, right? It's the first time I try your LORA configurations, sorry if it's an obvious question.

Leolis78OP•10/25/24, 10:59 PM

I run out of VRAM. I think I need about 29 gb of VRAM, and my 3090 falls short with its 24gb.

LLeolis78 also try to test your LORA configurations. The Rank_1_29500MB_8_85_Second_IT.jso...

Furkan Gözükara SECourses•10/25/24, 10:59 PM

it is 30 gb

LLeolis78 I run out of VRAM. I think I need about 29 gb of VRAM, and my 3090 falls short w...

Furkan Gözükara SECourses•10/25/24, 10:59 PM

yep

Furkan Gözükara SECourses•10/25/24, 10:59 PM

the names tell your GB you need

Furkan Gözükara SECourses•10/25/24, 11:00 PM

you need to rent cloud or purchase rtx 5090 - will be 32 gb very likely

Furkan Gözükara SECourses•10/25/24, 11:00 PM

i am waiting rtx 5090

Furkan Gözükara SECourses•10/25/24, 11:00 PM

i think its fine tuning speed will be like 2-3 second / it

FFurkan Gözükara SECourses you need to rent cloud or purchase rtx 5090 - will be 32 gb very likely

Leolis78OP•10/25/24, 11:03 PM

Here in Argentina it will be impossible to buy a 5090. But it is feasible to buy several 3090s because mining here is no longer a business and the prices of the 3090s have dropped a lot.

Zet•10/25/24, 11:03 PM

The grid of all Grids.. 6048 images

LLeolis78 Here in Argentina it will be impossible to buy a 5090. But it is feasible to buy...

Furkan Gözükara SECourses•10/25/24, 11:04 PM

nice

Furkan Gözükara SECourses•10/25/24, 11:05 PM

the speed on kaggle note bad

Furkan Gözükara SECourses•10/25/24, 11:05 PM

4.45 second / it

Furkan Gözükara SECourses•10/25/24, 11:05 PM

generation

Zet•10/25/24, 11:05 PM

9 weights, 6 loras, 4 step variations, 28 prompts

Furkan Gözükara SECourses•10/25/24, 11:05 PM

and 2x gpu

Zet•10/25/24, 11:05 PM

4xA6000

ZZet 9 weights, 6 loras, 4 step variations, 28 prompts

Furkan Gözükara SECourses•10/25/24, 11:05 PM

ZZet 4xA6000

Furkan Gözükara SECourses•10/25/24, 11:05 PM

nice way of doing it

Furkan Gözükara SECourses•10/25/24, 11:05 PM

i like swarm ui multi gpu back end

Zet•10/25/24, 11:06 PM

Zet•10/25/24, 11:06 PM

yep, the grid feature is unbeatable

Zet•10/25/24, 11:07 PM

It is really just for me to understand the process. You have been instrumental in making this happen @Furkan Gözükara SECourses ... I did the 140 Epochs B7 on 113 pics, then 60 Epochs on that model with same dataset

Zet•10/25/24, 11:07 PM

testing the 2nd gen loras

Zet•10/25/24, 11:07 PM

for fun

theJindesan•10/25/24, 11:08 PM

@Furkan Gözükara SECourses Unfortunately after clean install - orch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.90 GiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Of the allocated memory 8.51 GiB is allocated by PyTorch, and 1.70 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. What else can I try? I did Clean VS2022 17.8.14, Nvidia Driver 561.09 game ready, cuda 12.4, WindowsInstallStep1 installer option #1, option #2 cudun, Windows_Update_kohya_fix_flux_step2, Windows_install_torch 2.5. Loaded 8GB config from dreambooth

TtheJindesan <@205854764540362752> Unfortunately after clean install - orch.OutOfMemoryError:...

Zet•10/25/24, 11:14 PM

How does your idle memory consumption look like?

Zet•10/25/24, 11:14 PM

ZZet How does your idle memory consumption look like?

theJindesan•10/25/24, 11:14 PM

Zet•10/25/24, 11:15 PM

let me fire up the config on my end

TtheJindesan <@205854764540362752> Unfortunately after clean install - orch.OutOfMemoryError:...

Furkan Gözükara SECourses•10/25/24, 11:17 PM

watch the task manager

Furkan Gözükara SECourses•10/25/24, 11:17 PM

and see if it uses shared vram

Furkan Gözükara SECourses•10/25/24, 11:17 PM

during trying to training

Furkan Gözükara SECourses•10/25/24, 11:18 PM

it has to use

FFurkan Gözükara SECourses watch the task manager

theJindesan•10/25/24, 11:18 PM

it does, almost full until it crashes.. Will screen cap. on next attempt

TtheJindesan it does, almost full until it crashes.. Will screen cap. on next attempt

Furkan Gözükara SECourses•10/25/24, 11:20 PM

it uses total 32.2 GB VRAM

Furkan Gözükara SECourses•10/25/24, 11:20 PM

it shouldnt crash it shows you have 40 GB

Furkan Gözükara SECourses•10/25/24, 11:20 PM

get a fresh config from the zip file

Furkan Gözükara SECourses•10/25/24, 11:20 PM

set only your training folder and model file paths

Zet•10/25/24, 11:24 PM

Also helps to close all applications and disable hardware acceleration for browser

Zet•10/25/24, 11:25 PM

let me reboot and do a stress test on the profile

FFurkan Gözükara SECourses it is 30 gb

Leolis78OP•10/25/24, 11:25 PM

Rank_3_T5_XXL_23500MB_11_35_Second_IT.json with V9 version on rtx3090, 8.94s/it

2024-10-25 20:14:29 INFO prepare CLIP-L for fp8: set to torch.float8_e4m3fn, set flux_train_network.py:509
embeddings to torch.bfloat16
INFO prepare T5XXL for fp8: set to torch.float8_e4m3fn, set embeddings flux_train_network.py:538
to torch.bfloat16, add hooks
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 50
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 50
num epochs / epoch数: 200
batch size per device / バッチサイズ: 1
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 10000
steps: 0%| | 0/10000 [00:00<?, ?it/s]2024-10-25 20:14:47 INFO unet dtype: torch.float8_e4m3fn, device: cuda:0 train_network.py:1084
INFO text_encoder [0] dtype: torch.float8_e4m3fn, device: cuda:0 train_network.py:1090
INFO text_encoder [1] dtype: torch.float8_e4m3fn, device: cuda:0 train_network.py:1090

epoch 1/200
INFO epoch is incremented. current_epoch: 0, epoch: 1 train_util.py:715
D:\IA\kohya\kohya_ss\venv\lib\site-packages\torch\autograd\graph.py:825: UserWarning: cuDNN SDPA backward got grad_output.strides() != output.strides(), attempting to materialize a grad_output with matching strides... (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cudnn\MHA.cpp:676.)
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
steps: 0%|▏ | 34/10000 [05:04<24:45:40, 8.94s/it, avr_loss=0.417]T

ZZet Also helps to close all applications and disable hardware acceleration for brows...

Leolis78OP•10/25/24, 11:30 PM

You can also temporarily enable the CUDA policy to use shared memory. And then calculate how much VRAM is missing because the CUDA error lies about the VRAM needed.

theJindesan•10/25/24, 11:31 PM