Hello everyone. I am Dr. Furkan Gözükara. PhD Computer Engineer. SECourses is a dedicated YouTube channel for the following topics : Tech, AI, News, Science, Robotics, Singularity, ComfyUI, SwarmUI, ML, Artificial Intelligence, Humanoid Robots, Wan 2.2, FLUX, Krea, Qwen Image, VLMs, Stable Diffusion
Maybe that was the difference then as I trained the LoRAs with AdamW but the dreambooth finetune with Adafactor (I could not fit AdamW in my gpu or a rented 48GB one) Do you think this could be the difference? I still believe finetuning should be better so I’m just confused about why LoRA yields better results for me.
Hey there! Anyone out here was deploying flux lora training as an API? Would be happy to chat or be grateful if you can share something useful on how to deploy to runpod/other compute providers. P.S. If you can share some useful presets for training person lora - beer on me haha!
Does training lora for Noobai v-pred require different settings? I'm using kohya_ss gui (dev branch that is up to date with the latest script). Someone somewhere said that it needs --zero_terminal_snr--zero_terminal_snr parameter. Anything else?
recently when people train lora with kohya and they also train clip text encode together, they claim that lora train together with clip text encode give much better result. I wonder your latest lora training script json file have these clip text encode trained with lora or not ?
I'll keep testing and let you know. In general the deambooth finetune does preserve small details betetr (logos, brake caliper, etc), but the LoRA follows the car's shape a lot better. I need to test it further with a contolnet and depth map (taken from the 3D model) to see if this car body issue can be resolved that way. And maybe as a final test I can send a 80GB+ GPU to do a AdamW dreambooth training run.
After a test I can say that normal SDXL lora training settings + --zero_terminal_snr--zero_terminal_snr on NoobAI v-pred doesn't work. Lora outputs just grey images that sometimes have very blurry subject.
are you sure fine tune checkpoint is not just undertrained? How many images do you have in the trainign dataset and how many training steps did you do for lora and fine-tuning?
The thing is - lora training is just much faster in terms of steps/resemblance compared to full checkpoint, but checkpoint should yeild better generalization
this chat is full of friendly people sharing their knowledge of training flux, for starters just follow basic tutorials of Dr. Furkan on the subject you are interested, they cover 99% of your questions
Good point, thanks for your suggestion. The dataset has 45 images. LoRA produces good results at around 6000 steps. For dreambooth I am trying around 9 - 10K steps. Do you think I should train longer still?
yes, continue training if you don't see model sanity degradation, you can get better results compared to lora at 18-20k steps or more, if your use case require perfect resemblance at the cost of lower model sanity
Great thanks I will give that a try. Costs and speed are not a priority, obtaining the best product accuracy / quality is, so I will give this a go by training a lot longer and compare the epochs in a grid. Cheers
Would it help to also train the text encoder (at 50% unet LR for example) and use T5 attention mask for dreambooth training just as we do with LoRA training? Has anyone seen any benefits?
What do you mean? Both parameters can be applied in the koyha UI (dreambooth tab) and the training runs without issues. Do you mean it will cause problems with the final trained models?
@Furkan Gözükara SECourses Hi sir, we trained the Lora model using a quality Tier 1 config file with 48GB of VRAM. We generated an AI character with photo grid, cropped, flipped, and upscaled the images. We created a dataset of about 8 images to train the Lora model. While generating images from Lora, the close-ups have a really good face, but in wide shots, the face becomes pixelated. Additionally, the face does not closely resemble the training dataset. I’m attaching the images as well. brown background image is generated white background image is dataset