Software Engineering Courses (SECourses)•3y ago

or if you have them already hit the down arrow on models and choose an XL compatible one

Ned Shoaei•11/26/23, 4:14 AM

screencapture-d316c8191a7ad70277-gradio-live-2023-11-25-23_14_09.png

GGeekDadKevin or if you have them already hit the down arrow on models and choose an XL compat...

Ned Shoaei•11/26/23, 4:16 AM

I send the wrong photo sry, it's not working with xl controlnet models

FFurkan Gözükara SECourses currently if you have GPU

Ereshkigal•11/26/23, 4:17 AM

Yes, I'm running a 4090 at the moment. I'm just wondering which version of Dreambooth I should use in terms of the ones listed here https://github.com/d8ahazard/sd_dreambooth_extension/tags , because it seems like the version I got from the "extensions" tab is just not working properly, or I have something set very wrong & nothing is fixing it for some reason.
I mentioned before, when I first watched your original Tutorial on training LoRA through Dreambooth, my Dreambooth looked the same as yours & everything was decent enough, but then my SD broke when I was trying to get TensorRT working the first time, so I had to reinstall it from scratch, and since then, my Dreambooth looks very different from the one in the tutorial, so I'm not sure if I'm missing a setting, have something wrong, or what the issue is.
I've tried training the same face off as many as 60 & as few as 25 images, with quite a few different settings ranges (I've literally tried to train this face 10 times now lol), and it keeps coming up as barely any different than if I put the person's name in with no LoRA loaded, in terms of how they look, on a given Checkpoint.

GitHub

Tags · d8ahazard/sd_dreambooth_extension

Contribute to d8ahazard/sd_dreambooth_extension development by creating an account on GitHub.

DDigital [Starburst]I'm getting an error with TensorRT when trying to install Dreambooth

Ereshkigal•11/26/23, 4:21 AM

depending on how hot you're getting, and what GPU we're talking about, it should be fine. I cranked my 4090 at one point when rendering batches of 4x4, and it was actually getting warm enough to thermally throttle the clocks slightly, but that was with a +150 core or so offset, with the voltage maxed & the power slider maxed, and that's on an air cooled card in a well ventilated case, the GPU Hotspot in HWInfo was showing 87.6C

it was kinda warm in here since it's winter, but yeah. If you're not overclocking and you have a well ventilated case, I would say don't worry about it.
You can also try Undervolting, because rendering in SD uses VRAM more than the Core, so undervolting/downclocking your core a bit will actually save a lot of heat by lowering your power draw. I've been running my 4090 @ 2600MHz core with 900mV & the difference in render time between that & stock is less than 3 seconds on a 16-image batch - but that runs literally 10C cooler on the core & draws ~120W less power while rendering.
Quoted the wrong post but that was in reference to you asking about liquid cooling your GPU.

NNed Shoaei I send the wrong photo sry, it's not working with xl controlnet models

Ned Shoaei•11/26/23, 4:52 AM

I have used the CN py from patreon to update:
python CND_runpod.py

still having issues

Ned Shoaei•11/26/23, 4:52 AM

this is my extension tab:

DR.Siva•11/26/23, 4:52 AM

It failed again : https://pastebin.com/0h7sSbKa

Pastebin

--2023-11-26 03:58:27-- https://huggingface.co/MonsterMMORPG/SECou...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

DR.Siva•11/26/23, 4:53 AM

for time being will train new image batch with lora. Last time there were too many repeating images which made lora inherit certain aspect of dress/color into it and was making output with repative features of the images

DDR.Siva It failed again : https://pastebin.com/0h7sSbKa

Furkan Gözükara SECourses•11/26/23, 9:07 AM

just run cell again

Furkan Gözükara SECourses•11/26/23, 9:07 AM

load and hit again

Furkan Gözükara SECourses•11/26/23, 9:07 AM

it will cache the remaining images

Furkan Gözükara SECourses•11/26/23, 9:07 AM

what(): [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=3, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1802130 milliseconds before timing out.

Furkan Gözükara SECourses•11/26/23, 9:07 AM

caching taking too long thus failing

Furkan Gözükara SECourses•11/26/23, 9:08 AM

or reduce repeating count

Furkan Gözükara SECourses•11/26/23, 9:08 AM

like half it and double the epoch

NNed Shoaei I have used the CN py from patreon to update: python CND_runpod.py still having...

Furkan Gözükara SECourses•11/26/23, 9:09 AM

hi you need to show cmd output

Furkan Gözükara SECourses•11/26/23, 9:09 AM

did you kill previous instance and started yourself?

EEreshkigal Yes, I'm running a 4090 at the moment. I'm just wondering which version of Dream...

Furkan Gözükara SECourses•11/26/23, 9:21 AM

For 4090 definitely do SDXL DreamBooth with our config from here on Kohya

Furkan Gözükara SECourses•11/26/23, 9:22 AM

https://youtu.be/EEV8RPohsbw

YouTubeSECourses

How To Do Stable Diffusion XL (SDXL) DreamBooth Training (Full Fine...

Patreon post for config files

https://www.patreon.com/posts/89213064

SECourses Discord To Get Full Support

https://discord.com/servers/software-engineering-courses-secourses-772774097734074388

My LinkedIn

https://www.linkedin.com/in/furkangozukara/

My Twitter

https://twitter.com/GozukaraFurkan

My Instagram

https://www.instagra...

Furkan Gözükara SECourses•11/26/23, 9:22 AM

https://www.patreon.com/posts/89213064

Furkan Gözükara SECourses•11/26/23, 9:22 AM

24GB_TextEncoder.json

EEreshkigal Yes, I'm running a 4090 at the moment. I'm just wondering which version of Dream...

Furkan Gözükara SECourses•11/26/23, 9:22 AM

that extension is not maintained anymore.

DDR.Siva Give me a sec

Furkan Gözükara SECourses•11/26/23, 9:24 AM

asked your issue to kohya may get fixed : https://github.com/kohya-ss/sd-scripts/issues/971

GitHub

Watchdog caught collective operation timeout - over 30 min during c...

When caching taking over 30 minutes it time outs. I wonder if this is Kohya related The message is from Kaggle notebook caching latents. checking cache validity... 100%|████████████████████████████...

DR.Siva•11/26/23, 9:29 AM

Thanks a lot for taking time and effort to resolve our issues.. Thanks again

DR.Siva•11/26/23, 9:30 AM

currently doing lora training using your guide

DR.Siva•11/26/23, 9:30 AM

DR.Siva•11/26/23, 9:40 AM

My lora training just started .. roughly 4.5 hours.
@Dr. Furkan Gözükara The old guide seems to be limiting with ram availability , but i think it can be slightly updated since kaggle has increased the ram. I didnt add -lowram argument

DR.Siva•11/26/23, 9:41 AM

gradient checkpoint as well

DDR.Siva My lora training just started .. roughly 4.5 hours. @Dr. Furkan Gözükara The old...

Furkan Gözükara SECourses•11/26/23, 9:50 AM

you dont need --lowvram anymore accurate

DDR.Siva My lora training just started .. roughly 4.5 hours. @Dr. Furkan Gözükara The old...

Furkan Gözükara SECourses•11/26/23, 9:51 AM

5200 steps will be actually 10400 step so it may cook. since uses dual gpu - hence batch count is being 2

DR.Siva•11/26/23, 9:52 AM

i made a mistake

DR.Siva•11/26/23, 9:53 AM

i didnt copy the training command ,i stoped and deleted the regu folder and started the next code in notebnook which had your settings

DR.Siva•11/26/23, 9:53 AM

i am starting the notebook afresh

Baran•11/26/23, 9:54 AM

@Dr. Furkan Gözükara can you explain why you are using a vae in your latest sdxl dreambooth training config?

Baran•11/26/23, 9:54 AM

"vae": "stabilityai/sdxl-vae"

DR.Siva•11/26/23, 10:03 AM

also Jpeg / PNG , i saw in one of your comment that to convert jpeg to PNG in case there are errors . training better with jpeg or png ?

DR.Siva•11/26/23, 10:09 AM

Also since kaggle gives 20Gb , for lora training , do we need to stop and deleted the regul folder once its gets copied into results ? this step can be skipped , so that we dont mess with the lora training setting ?

DR.Siva•11/26/23, 10:30 AM

I am getting error when trying to paste the training command
accelerate launch --num_cpu_threads_per_process=4 "./sdxl_train_network.py" --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" --train_data_dir="/kaggle/working/results/img" --reg_data_dir="/kaggle/working/results/reg" --resolution="1024,1024" --output_dir="/kaggle/working/results/model" --logging_dir="/kaggle/working/results/log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=0.0004 --unet_lr=0.0004 --network_dim=64 --output_name="Lora_Gowthu" --lr_scheduler_num_cycles="8" --no_half_vae --learning_rate="0.0004" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="7200" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --cache_latents_to_disk --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False warmup_init=False weight_decay=0.01 --max_data_loader_n_workers="0" --bucket_reso_steps=64 --save_every_n_steps="1300" --mem_eff_attn --gradient_checkpointing --full_fp16 --xformers --bucket_no_upscale --noise_offset=0.0 --max_grad_norm=0.0 --no_half_vae --train_text_encoder --vae="stabilityai/sdxl-vae"

DR.Siva•11/26/23, 10:40 AM

2023-11-26 10:40:23.363084: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-26 10:40:23.363138: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-26 10:40:23.363201: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-26 10:40:23.366439: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-26 10:40:23.366486: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-26 10:40:23.366538: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
/opt/conda/lib/python3.10/site-packages/scipy/init.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.24.3
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/opt/conda/lib/python3.10/site-packages/scipy/init.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.24.3
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
usage: sdxl_train_network.py [-h] [--v2] [--v_parameterization]
[--pretrained_model_name_or_path PRETRAINED_MODEL_NAME_OR_PATH]

DR.Siva•11/26/23, 10:42 AM

DR.Siva•11/26/23, 10:42 AM

am i doing something wrong ?

DR.Siva•11/26/23, 11:44 AM

for some weird reason it worked when i used your setting , but now its failing ..

DR.Siva•11/26/23, 12:14 PM

~~i think its the kaggle_sdxl_dreambooth_best.json which is getting downloaded each time~~

DR.Siva•11/26/23, 12:51 PM

ok hitting "start training" within koayaa_gui seems to work. When we copy the code - stop the code- remove the folder - paster the code - start the code from kaggle ,doesnt seems to work

DR.Siva•11/26/23, 12:52 PM

sdxl_train_network.py [-h] [--v2] [--v_parameterization] this .py doesnt seem to get started when we execute the training command separately.

fortune•11/26/23, 12:57 PM

Hello, guys, Now I'm looking for someone who is experienced in .Net. (US, UK prefered, Fluent English Level, European or American). Please DM if anybody is interested in.

DR.Siva•11/26/23, 1:12 PM

got stuck at point and had to restart.. Have to reinstall fully ?

Is there a way to retain all the files/setups ? before i come to hit the train button ,gpu time of 1 hour is lost..

DR.Siva•11/26/23, 2:58 PM

atlast... 1st epoch finished

DR.Siva•11/26/23, 3:06 PM

Now i think why dreambooth training could have failed.. I think hitting train within the koyaa GUI works better on kaggle , rather than executing the same code from notebook. After the lora gets created , tomorrow will try once more to train on dreambooth.

samopopo•11/26/23, 4:01 PM

@Dr. Furkan Gözükara Hello will you create anytime soon Stable diffusion video generator tutorial? Or img to video?

or if you have them already hit the down arrow on models and choose an XL compatible one

Similar Threads

Similar Threads

Similar Threads