Software Engineering Courses (SECourses)•15mo ago

and that was even on 64GB ram

ZetOP•10/26/24, 2:48 AM

so I know that was not the issue, lol

ZZet and that was even on 64GB ram

theJindesan•10/26/24, 2:49 AM

So for you it just started working? My was working, then i upgraded and can't get it working again

ZetOP•10/26/24, 2:49 AM

right, for me, no update in between. just a day or two, different instance, no changes

ZetOP•10/26/24, 2:49 AM

in my behavior, except checking that save state

ZetOP•10/26/24, 2:49 AM

(at least I'd like to think that was it)

ZZet (at least I'd like to think that was it)

theJindesan•10/26/24, 2:50 AM

I have been using it on my other machine with 3090 but will for sure keep it off while i try to get this resolved on my older GPU

ZetOP•10/26/24, 2:50 AM

haha, yeah, I moved training to Massed Compute

ZetOP•10/26/24, 2:51 AM

because couldn't stand the thought of blocking my GPU

ZetOP•10/26/24, 2:51 AM

followed once comment, do 140 epochs, then take that and do 60 more

ZetOP•10/26/24, 2:51 AM

I tricked my ex-wife

ZetOP•10/26/24, 2:51 AM

(I trained model on one of my kids)

ZZet because couldn't stand the thought of blocking my GPU

theJindesan•10/26/24, 2:51 AM

yeah i had a 3090 for gaming, and now that I'm into AI, its cut into my gaming hobby.

ZetOP•10/26/24, 2:52 AM

yeah, used to be 80-150h/week in TFD or D2, now, nothing for the past two weeks

ZetOP•10/26/24, 2:52 AM

haha

ZetOP•10/26/24, 2:52 AM

maybe every two weeks

ZetOP•10/26/24, 2:52 AM

(these numbers seem high, lol)

ZetOP•10/26/24, 2:52 AM

theJindesan•10/26/24, 2:52 AM

thats why I'm trying to get my 2nd GPU running.. just let that run all the time and free up primary PC.. plus i want to actually generate images and use all the cool AI stuff, but you also can't do that whie training.

ZetOP•10/26/24, 2:53 AM

essentially, fully train on 1 GPU on MC is 24-28h

ZetOP•10/26/24, 2:53 AM

so $10 a pop

ZetOP•10/26/24, 2:53 AM

worth for me

ZetOP•10/26/24, 2:54 AM

instead of 5 days on home gpu

ZetOP•10/26/24, 2:54 AM

But I get it

theJindesan•10/26/24, 2:55 AM

its taken me 6 versions to get my own likeness perfect.

TtheJindesan <@1141070068344688660> are you on 3090 or 3090 ti? are you overclocking? I onl...

Leolis78•10/26/24, 2:55 AM

I have Gigabyte Eagle rtx3090, no overclocked. Are you running Rank_3_T5_XXL_23500MB_11_35_Second_IT.json to train LORA? I had posted 8.94s with Doc's V9, but unfortunately after some times the training process failed due to lack of VRAM. Now I'm testing with Kohya original version and it reached 10.5 s/It. But it is more stable in VRAM usage.

C:\IA\kohya_ss\venv\lib\site-packages\torch\utils\checkpoint.py:295: FutureWarning: torch.cpu.amp.autocast(args...)torch.cpu.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cpu', args...)torch.amp.autocast('cpu', args...) instead.
with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined]
steps: 0%|▎ | 15/3000 [02:36<8:38:56, 10.43s/it, avr_loss=0.416]
epoch 2/200
2024-10-25 23:28:24 INFO epoch is incremented. current_epoch: 1, epoch: 2 train_util.py:715
steps: 1%|▌ | 30/3000 [05:11<8:34:22, 10.39s/it, avr_loss=0.406]
epoch 3/200
2024-10-25 23:31:00 INFO epoch is incremented. current_epoch: 2, epoch: 3 train_util.py:715
steps: 2%|▊ | 45/3000 [07:49<8:33:58, 10.44s/it, avr_loss=0.374]
epoch 4/200
2024-10-25 23:33:38 INFO epoch is incremented. current_epoch: 3, epoch: 4 train_util.py:715
steps: 2%|█ | 60/3000 [10:28<8:32:55, 10.47s/it, avr_loss=0.409]

TtheJindesan its taken me 6 versions to get my own likeness perfect.

ZetOP•10/26/24, 2:56 AM

You should try the suggested approach. 140 Epochs (1 iteration per image) on dev model

ZetOP•10/26/24, 2:56 AM

then take the 140th epoch

ZetOP•10/26/24, 2:56 AM

and take that as input model

ZetOP•10/26/24, 2:56 AM

and train 60 more epochs with saving every 10th

ZetOP•10/26/24, 2:56 AM

it is incredible

theJindesan•10/26/24, 2:56 AM

@Leolis78 ah, i thought you were doing the dreambooth method.. I do get faster lora training/it.

ZetOP•10/26/24, 2:57 AM

I was

ZZet then take the 140th epoch

theJindesan•10/26/24, 2:57 AM

yeah i generally find in a 200 epoch run, the best is somewhere between 120-160, but never the 200's or less than 100

ZetOP•10/26/24, 2:57 AM

Dreambooth vs Lora, no question

TtheJindesan yeah i generally find in a 200 epoch run, the best is somewhere between 120-160,...

ZetOP•10/26/24, 2:58 AM

Right, when you take that sweet spot at 140, and use that as your base model for training for an additional 60 epochs

ZetOP•10/26/24, 2:58 AM

with same dataset

theJindesan•10/26/24, 2:59 AM

@Zet do you happen to know what it means to be overtrained/overcooked. I see this term all the time in reference to too many epoc/repeats, but what does it actually mean? is the 200th epoch usually overtrained?

ZetOP•10/26/24, 3:00 AM

overtrained means that you see it converge towards likeness, then go away from it

ZetOP•10/26/24, 3:00 AM

like the model is trying too hard

ZetOP•10/26/24, 3:00 AM

and introduces hallucinations of you, i.e. it starts not looking like you at all

ZetOP•10/26/24, 3:01 AM

I only have family members trained, so cannot share

ZetOP•10/26/24, 3:01 AM

but you will see it

ZetOP•10/26/24, 3:01 AM

(and only you can)

ZetOP•10/26/24, 3:01 AM

haha

ZetOP•10/26/24, 3:02 AM

It's like you crossed the canny valley, you have that sweet spot of confusion, and you right back in it

ZetOP•10/26/24, 3:02 AM

that sweet spot is where you want to be

ZetOP•10/26/24, 3:02 AM

too far, you overtrained

ZZet and introduces hallucinations of you, i.e. it starts not looking like you at all

theJindesan•10/26/24, 3:02 AM

i see, so it starts to head more like the early epochs. What i noticed, is with dreambooth, it starts with some "person" in memory that is simiar to some of your training photos, then it fine tunes that person into you epoch over epoch until its pretty much you at around 120

ZetOP•10/26/24, 3:02 AM

Right

TtheJindesan <@1141070068344688660> ah, i thought you were doing the dreambooth method.. I do...

Leolis78•10/26/24, 3:03 AM

To train Dreambooth, 24GB_GPU_23150MB_10.2_second_it_Tier_1.json with Doc V9 achieves 9.47s/it. It runs stable and smoothly.

and that was even on 64GB ram

Similar Threads