Software Engineering Courses (SECourses)•3y ago

I stopped using the kohya GUI because I couldn't see how to enable multi-gpu, so I've just been usin

I stopped using the kohya GUI because I couldn't see how to enable multi-gpu, so I've just been using a command line argument. This is the argument:

accelerate launch --num_cpu_threads_per_process=16 --num_processes=6 --multi_gpu --num_machines=1 --gpu_ids=0,1,2,3,4,5 "./sdxl_train.py" --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" --train_data_dir="training/img" --reg_data_dir="training/reg" --resolution="1024,1024" --output_dir="training/model" --logging_dir="training/log" --save_model_as=safetensors --full_bf16 --vae="stabilityai/sdxl-vae" --output_name="TESTsuperxl" --lr_scheduler_num_cycles="8" --max_data_loader_n_workers="0" --learning_rate_te1="3e-06" --learning_rate_te2="0.0" --learning_rate="1e-05" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="9600" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --cache_latents --cache_latents_to_disk --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False warmup_init=False weight_decay=0.01 --max_data_loader_n_workers="0" --bucket_reso_steps=64 --gradient_checkpointing --bucket_no_upscale --noise_offset=0.0 --max_grad_norm=0.0 --no_half_vae --train_text_encoder

accelerate launch --num_cpu_threads_per_process=16 --num_processes=6 --multi_gpu --num_machines=1 --gpu_ids=0,1,2,3,4,5 "./sdxl_train.py" --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" --train_data_dir="training/img" --reg_data_dir="training/reg" --resolution="1024,1024" --output_dir="training/model" --logging_dir="training/log" --save_model_as=safetensors --full_bf16 --vae="stabilityai/sdxl-vae" --output_name="TESTsuperxl" --lr_scheduler_num_cycles="8" --max_data_loader_n_workers="0" --learning_rate_te1="3e-06" --learning_rate_te2="0.0" --learning_rate="1e-05" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="9600" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --cache_latents --cache_latents_to_disk --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False warmup_init=False weight_decay=0.01 --max_data_loader_n_workers="0" --bucket_reso_steps=64 --gradient_checkpointing --bucket_no_upscale --noise_offset=0.0 --max_grad_norm=0.0 --no_half_vae --train_text_encoder

accelerate launch --num_cpu_threads_per_process=16 --num_processes=6 --multi_gpu --num_machines=1 --gpu_ids=0,1,2,3,4,5 "./sdxl_train.py" --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" --train_data_dir="training/img" --reg_data_dir="training/reg" --resolution="1024,1024" --output_dir="training/model" --logging_dir="training/log" --save_model_as=safetensors --full_bf16 --vae="stabilityai/sdxl-vae" --output_name="TESTsuperxl" --lr_scheduler_num_cycles="8" --max_data_loader_n_workers="0" --learning_rate_te1="3e-06" --learning_rate_te2="0.0" --learning_rate="1e-05" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="9600" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --cache_latents --cache_latents_to_disk --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False warmup_init=False weight_decay=0.01 --max_data_loader_n_workers="0" --bucket_reso_steps=64 --gradient_checkpointing --bucket_no_upscale --noise_offset=0.0 --max_grad_norm=0.0 --no_half_vae --train_text_encoder

accelerate launch --num_cpu_threads_per_process=16 --num_processes=6 --multi_gpu --num_machines=1 --gpu_ids=0,1,2,3,4,5 "./sdxl_train.py" --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" --train_data_dir="training/img" --reg_data_dir="training/reg" --resolution="1024,1024" --output_dir="training/model" --logging_dir="training/log" --save_model_as=safetensors --full_bf16 --vae="stabilityai/sdxl-vae" --output_name="TESTsuperxl" --lr_scheduler_num_cycles="8" --max_data_loader_n_workers="0" --learning_rate_te1="3e-06" --learning_rate_te2="0.0" --learning_rate="1e-05" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="9600" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --cache_latents --cache_latents_to_disk --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False warmup_init=False weight_decay=0.01 --max_data_loader_n_workers="0" --bucket_reso_steps=64 --gradient_checkpointing --bucket_no_upscale --noise_offset=0.0 --max_grad_norm=0.0 --no_half_vae --train_text_encoder

I've tried this with a batch size of 1 and 2. 2 is actually slower than 1, and anything higher than 2 gives me an out of memory error.

I've gone ahead and tried doing a repeat of 40 on the original images, and now 7 (close enough to 40 / 6).

Changing that didn't change the total number of optimization steps, which was always 9600, or the time to complete training. I think it just wanted to do more epochs.

After changing the repeats to 7 and removing the flag for

--max_train_steps="9600"

--max_train_steps="9600"

Now it's trying to do 46 epochs for a total of 1600 optimizations steps. It still says it's going to take over 4 hours.

I tried using the

--ddp_gradient_as_bucket_view

--ddp_gradient_as_bucket_view

flag as specified in the updated sd-scripts repo, but that made it 5-6x slower.

VexmachinaOP•12/23/23, 1:39 AM

After all this, using just a single 4090 with your default settings gives me a training time of 3 hours @ 8 epochs and 9600 steps versus 4+h on 6 4090s.

Sorry for being a mess! I just don't understand what I'm doing wrong to mess up multi-gpu training so badly. Thank you for all that you're doing to help us though!

VVexmachina I stopped using the kohya GUI because I couldn't see how to enable multi-gpu, so...

Furkan Gözükara SECourses•12/23/23, 1:44 AM

kohya gui only prepares this command

Furkan Gözükara SECourses•12/23/23, 1:44 AM

accelerate is the one that does distributed training

Furkan Gözükara SECourses•12/23/23, 1:44 AM

i can give you private consultation if you need

Max•12/23/23, 2:18 AM

Anyway, I'm following the tutorial. It's amazing!

AIGambino•12/23/23, 3:25 AM

How come img2img doesn't work?

AIGambino•12/23/23, 3:25 AM

it just barely changes the color and makes my character ruffled up

AIGambino•12/23/23, 3:25 AM

txt2img works fine for a realistic mouse

shortdwarf•12/23/23, 4:53 AM

@Dr. Furkan Gözükara sorry for the delay. Heres screenshots of my adetailer settings. I have also used the photo of ohwx man prompt. Also could DM photos if you want to see examples of with/without adetailer

Sshortdwarf Anyone know why I always get bad results from Adetailer? Does it only work well ...

shortdwarf•12/23/23, 4:54 AM

in response to this

FFurkan Gözükara SECourses i can give you private consultation if you need

VexmachinaOP•12/23/23, 5:13 AM

Thank you for the offer! I was able to discover to 4090 GPUs don't have NVLink, whoops, haha

Max•12/23/23, 6:03 AM

Guys just a question, once I've configured everything and run kaggle, is there a way to save the settings after running a session?

MMax Guys just a question, once I've configured everything and run kaggle, is there a...

shortdwarf•12/23/23, 6:09 AM

if you're referring to training, download the .json file and next time you need to use it, upload it. if you're referring to generating images in kaggle, you can upload a png (such as the last image you generated) with the desired settings to "png info" and then send those settings to txt2img

Max•12/23/23, 10:40 AM

Good morning! I just wanted to ask for information. To generate images with your own face, use the checkpoint (https://www.youtube.com/watch?v=16-b1AjvyBE&t=1883s), while in this other video, use the lora (https://youtube .com/watch?v=JF2P7BIUpIU&t=1260s)

YouTubeSECourses

How To Do Stable Diffusion XL (SDXL) DreamBooth Training For Free -...

Master Stable Diffusion XL Training on Kaggle for Free!

Welcome to this comprehensive tutorial where I'll be guiding you through the exciting world of setting up and training Stable Diffusion XL (SDXL) with Kohya on a free Kaggle account. This video is your one-stop resource for learning everything from initiating a Kaggle session with dual ...

MMax Good morning! I just wanted to ask for information. To generate images with your...

Benjamin•12/23/23, 10:57 AM

these are different training methods

FFurkan Gözükara SECourses i am yet to investigate them further

B0nd•12/23/23, 12:33 PM

Ive tested a few trainings with your SDXL dreambooth, then lora extraction method, no captions and using reg images but when I generate images using the LORA, it doesnt resemble the subjects face

100Sachen80•12/23/23, 12:34 PM

Hi there, i get an error in dreambooth when saving the models. It shows a massive error list, that starts with :
Exception training model: ' Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'encoder.conv_in.weight', 'decoder.up_blocks.1.upsamplers.0.conv.weight', 'decoder.up_blocks.3.resnets.0.conv_shortcut.weight', 'decoder.up_blocks.3.resnets.0.norm1.weight', 'encoder.mid_block.resnets.1.norm1.weight',

and ends with:

'decoder.up_blocks.0.resnets.0.conv1.weight', 'decoder.conv_out.weight'}]. A potential way to correctly save your model is to use

save_model

save_model

. More information at https://huggingface.co/docs/safetensors/torch_shared_tensors '.

I used the fixed github commits that are shown in this video, i used the 2 bat files that are linked on the patreon page.
https://www.youtube.com/watch?v=g0wXIcRhkJk

message.txt11.13KB

Torch shared tensors

YouTubeSECourses

The END of Photography - Use AI to Make Your Own Studio Photos, FRE...

Dreambooth is the best training method for Stable Diffusion. In this tutorial, I show how to install the Dreambooth extension of Automatic1111 Web UI from scratch. Additionally, I demonstrate my months of work on the realism workflow, which enables you to produce studio-quality images of yourself through #Dreambooth training. Furthermore, I shar...

100Sachen80•12/23/23, 12:42 PM

Just did the patreon btw

100Sachen80•12/23/23, 12:46 PM

Oh... i think i found the source of the error. I had the preview images every N epochs activated

1100Sachen80 Oh... i think i found the source of the error. I had the preview images every N ...

tourist07•12/23/23, 2:05 PM

The same happened for me. Turn the samples creation off down to 0. Also I turned off xtensors (to default) and turned off cache latents. The dreambooth runs pretty fast on my 4070 laptop. Around 1.35 it/s and 8,7gb vram with 512x512 images. Is it ok? Does default scheduler affect quality. With xtensors on it was very slow with much higher vram usage.

FFurkan Gözükara SECourses this is really hard

Antek•12/23/23, 2:08 PM

I was thinking about training a model on people wearing my t-shirt design and than use inpainting to put the t-shirts on other people. would that work?

DeadMan1999•12/23/23, 2:27 PM

@Dr. Furkan Gözükara You should make a quick video on this:
Nvidia has released a fix for dealing with Stable Diffusion slowdowns occurring due to Shared Memory feature they introduced some while ago...
https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/11062
https://nvidia.custhelp.com/app/answers/detail/a_id/5490
This was a while back, but I found out today. I believe a lot of people are still on older drivers due to this issue...

GitHub

[Resolved] NVIDIA driver performance issues · AUTOMATIC1111 stable-...

Update (2023-10-31) This issue should now be entirely resolved. NVIDIA has made a help article to disable the system memory fallback behavior. Please upgrade to the latest driver (546.01) and follo...

NVIDIA Support

B0nd•12/23/23, 3:18 PM

under 3D application settings (with the new driver) find your python.exe and apply the following setting

veldonlives•12/23/23, 3:22 PM

2 hardware performance questions:

I recently upgraded from an 2060 to a 3090 (24GB VRAM) - is there anything I’m supposed to do to make sure A1111 and ComfyUI go as fast as possible (maybe I need to reinstall if they do some kind of optimizations during install)
I have 16GB RAM - will this be a bottleneck of some kind?

Vveldonlives 2 hardware performance questions: 1. I recently upgraded from an 2060 to a 3090...

Xylber•12/23/23, 3:33 PM

Disable "driver fallback" option on nVidia panel (if you get OUt of memory CUDA error, enable it again) ////// get rid of "--lowvram" command

VexmachinaOP•12/23/23, 4:31 PM

When using accelerate and sdxl_train.py from the commandline, instead of using the Kohya GUI, how do you specify the "epoch" or "max train epoch" parameters? I don't see any commandline argument being provided that seems related to these, and I can't seem to find any reference to this by googling

Screenshot_2023-12-23_at_11.30.24_AM.png

Max•12/23/23, 4:34 PM

Hello lads! It's always me, an error generator. I was following this video https://www.youtube.com/watch?v=16-b1AjvyBE&t=1883s, and everything was fine, until you get to the calculation of the steps. Everything coincides except the last digit to be multiplied. For him it's 1300 / 1 / 1 * 1 * 2 while for me instead of 2, there's always 1. Solutions?

geno1237341•12/23/23, 4:34 PM

same issues

MMax Hello lads! It's always me, an error generator. I was following this video https...

shortdwarf•12/23/23, 4:37 PM

The 2 for him is because he is using regularization images. If you are not it’s okay to not get the x2

Sshortdwarf The 2 for him is because he is using regularization images. If you are not it’s ...

Max•12/23/23, 4:41 PM

Okay, thank you!

VVexmachina When using accelerate and sdxl_train.py from the commandline, instead of using t...

shortdwarf•12/23/23, 4:45 PM

Epoch is listed as “—lr_scheduler_num_cycles=“x””

shortdwarf•12/23/23, 4:46 PM

Not sure about max but you can find this by setting it to a number that doesn’t occur elsewhere in the command line and then print the command and you’ll see the number you set it to

vp•12/23/23, 4:55 PM

anyone know the outcome of triton on windows? I know it released recently and was undergoing testing and implementation.

Sshortdwarf Epoch is listed as “—lr_scheduler_num_cycles=“x””

VexmachinaOP•12/23/23, 5:01 PM

Awesome, thank you!

Max•12/23/23, 6:03 PM

Sorry guys, I was trying to train a checkpoint using a custom model (GHArt/Realistic_Stock_Photo_V1.0_xl_fp16), but it gives me the following error. Can you help me, please? Thank you!

sainthell•12/23/23, 6:39 PM

Hello... Might not be the right place to post this, but if I have 160 images of a character, what epoch and repeats is the best place to start for a lora?(using kohya)

EyeSpyBekoAI•12/23/23, 6:39 PM

Hey there community.
A quick random question for you. Besides for perhaps training SDXL via dreambooth, have you ever seen a need for more than 24gb vram gpu in your projects or adventures? I’m trying to cut down cloud computing to a minimum so seeing what’s logical and what isn’t to buy.

100Sachen80•12/23/23, 8:04 PM

Boy... Training on DB with a 3060 12GB VRAM works with your "best" settings, but it takes 8 hours for 10 instance pics and 2000 ref pics. Holy sh...... But i tried with your "12GB VRAM" suggestions that take only about 1 hour, and the results are....meehhhhhh.

100Sachen80•12/23/23, 8:28 PM

Looking forward to the results after the 8 hours