Software Engineering Courses (SECourses)•16mo ago

sadly no rule of thumb

Furkan Gözükara SECoursesOP•9/17/24, 1:57 PM

but for 15 images i get 150 epoch best

Furkan Gözükara SECoursesOP•9/17/24, 1:57 PM

for 256 i got best at 80 epoch - it was on 8 gpu batch size 8

Furkan Gözükara SECoursesOP•9/17/24, 1:58 PM

for 256 images 40 epoch was undertrained with batch size 1

FFurkan Gözükara SECourses sadly no rule of thumb

jkgnl•9/17/24, 2:48 PM

Thanks for your fast answer, so it's more trial and error then

FFurkan Gözükara SECourses sadly no rule of thumb

jkgnl•9/17/24, 2:49 PM

I am now training an interior style 14 pictures with 200 epochs...I am curious what the quality will be

Dylan Normandin•9/17/24, 8:41 PM

I get error when using ultimate imageprocessor: too many open files. Why?

Dylan Normandin•9/17/24, 8:51 PM

now it says "cannot load backend tkagg which requires tk interactive framework. headless is currently running"
What deos this mean?

Dylan Normandin•9/17/24, 10:08 PM

nevermind. fixed above issues. nnot sure how i fixed the second one but after restart it worked. and changing file description limit via terminal.

DDylan Normandin nevermind. fixed above issues. nnot sure how i fixed the second one but after re...

Furkan Gözükara SECoursesOP•9/17/24, 10:21 PM

interesting

Ermellin•9/18/24, 9:01 AM

@Furkan Gözükara SECourses Hello! I'm using RTX4090 (24Gb)
For full FLUX finetuning I used your config "Rank_1_15500MB_39_Second_IT.json" and It took me 16.5 hours to train 15 images dataset.
Then I tried the config "Quality_1_23100MB_14_12_Second_IT.json" and It took me 23.5 hours to train the same dataset
Why the training goes slower on the second config although it is more vram consuming?

EErmellin <@205854764540362752> Hello! I'm using RTX4090 (24Gb) For full FLUX finetuning ...

Furkan Gözükara SECoursesOP•9/18/24, 10:46 AM

it means that your initial vram usage was a lot

Furkan Gözükara SECoursesOP•9/18/24, 10:46 AM

thus you should do this

Furkan Gözükara SECoursesOP•9/18/24, 10:46 AM

monitor your shared vram usage

Furkan Gözükara SECoursesOP•9/18/24, 10:47 AM

until it doesnt get increase

Furkan Gözükara SECoursesOP•9/18/24, 10:47 AM

increase number of block swaps

Furkan Gözükara SECoursesOP•9/18/24, 10:47 AM

1 block swap will reduce vram usage 640 mb for double blocks but reduce speed too

Furkan Gözükara SECoursesOP•9/18/24, 10:47 AM

so very likely that before starting training you were using at least 800 900 mb vram

EErmellin <@205854764540362752> Hello! I'm using RTX4090 (24Gb) For full FLUX finetuning ...

Furkan Gözükara SECoursesOP•9/18/24, 10:51 AM

added 2 more config lower vram check them out : https://www.patreon.com/posts/112099700

Patreon

Kohya FLUX Fine Tuning (Full Checkpoints) Training Full Tutorial Fo...

Get more from SECourses: Tutorials, Guides, Resources, Training, FLUX, MidJourney, Voice Clone, TTS, ChatGPT, GPT, LLM, Scripts by Furkan Gözü on Patreon

wardensc2•9/18/24, 2:35 PM

Hi @Furkan Gözükara SECourses is there a way to save the training process into local computer then resume later on mass compute or runpod or something else. Due to training times is very long in case of server restart or crash, are there anyway to save the on going process then resume later ?

Wwardensc2 Hi <@205854764540362752> is there a way to save the training process into local ...

Furkan Gözükara SECoursesOP•9/18/24, 5:13 PM

for fine tuning yes

Furkan Gözükara SECoursesOP•9/18/24, 5:13 PM

use your latest checkpoint

FFurkan Gözükara SECourses it means that your initial vram usage was a lot

AstraBat•9/19/24, 12:37 AM

'block swaps' - I presume that's somewhere in the kohya_ss gui, specifically "double_blocks_to_swap = 5 (in the .toml file)?

AstraBat•9/19/24, 12:44 AM

I kept my VRAM right on 23.9 GB using the "Quality_1_23100MB_14_12_Second_IT.json" config (on a 24 GB card). I had to try a few tricks to reduce VRAM usage, such as switching off hardware accelerated GPU scheduling, and selecting 'Adjust for best performance' under Visual Effects in the Control Panel for Advanced System Settings. This helped a little though I still had 400 MB shared on RAM.

AAstraBat I kept my VRAM right on 23.9 GB using the "Quality_1_23100MB_14_12_Second_IT.js...

Furkan Gözükara SECoursesOP•9/19/24, 1:02 AM

awesome

Furkan Gözükara SECoursesOP•9/19/24, 1:02 AM

you can also compare swap 6 7 speed difference

Furkan Gözükara SECoursesOP•9/19/24, 1:02 AM

and use fastest one

Furkan Gözükara SECoursesOP•9/19/24, 1:02 AM

all quality same

AstraBat•9/19/24, 1:07 AM

I had tried 4e-06 learning rate, on 26 input images, but all of my epoch results looked similar (I was saving checkpoints every tenth epoch), so I tried again on 5e-06 LR and saw some improvement, though very gradual. I also noticed, right from my earliest checkpoint save, that sample photographic images of a 'woman' looked identical to those for 'ohxw woman', whereas for SDXL dreambooth the two images converged once the model was being over-trained in the new subject.

AAstraBat I had tried 4e-06 learning rate, on 26 input images, but all of my epoch results...

Furkan Gözükara SECoursesOP•9/19/24, 1:18 AM

it is true that 05 06 or 03 may work better depending on dataset

Furkan Gözükara SECoursesOP•9/19/24, 1:18 AM

really hard to set one LR for all

Furkan Gözükara SECoursesOP•9/19/24, 1:18 AM

i tried to set an LR that will not very undertrain or overtrain

Furkan Gözükara SECoursesOP•9/19/24, 1:19 AM

but with my LR as you do more training, you should eventually get your perfect checkpoint

FFurkan Gözükara SECourses but with my LR as you do more training, you should eventually get your perfect c...

AstraBat•9/19/24, 1:21 AM

Yes, I eventually got a good result.

FFurkan Gözükara SECourses for fine tuning yes

wardensc2•9/19/24, 2:31 AM

Can you describe more out which step to do it. Where I can load the latest checkpoint and how to calculate the remain step for the latest checkpoint to continue

dxqb•9/19/24, 5:41 AM

I've seen Flux not really converge on realism when the LR is too low, no matter how long you run it. It changes facial features but remains in its plastic AI style with a low LR

dxqb•9/19/24, 5:42 AM

are the e-06 values above with rank 128 alpha 16?

Wwardensc2 Can you describe more out which step to do it. Where I can load the latest check...

Furkan Gözükara SECoursesOP•9/19/24, 10:54 AM

it wont have those

Furkan Gözükara SECoursesOP•9/19/24, 10:54 AM

you need to memorize and do yourself

Ddxqb are the e-06 values above with rank 128 alpha 16?

Furkan Gözükara SECoursesOP•9/19/24, 10:55 AM

dont change alpha we have it affects LR

dxqb•9/19/24, 11:41 AM

I have it at 1 but different LRs, that's why I'm asking. Alpha 16 multiplies the LR by 16 compared to 1