Hello everyone. I am Dr. Furkan Gözükara. PhD Computer Engineer. SECourses is a dedicated YouTube channel for the following topics : Tech, AI, News, Science, Robotics, Singularity, ComfyUI, SwarmUI, ML, Artificial Intelligence, Humanoid Robots, Wan 2.2, FLUX, Krea, Qwen Image, VLMs, Stable Diffusion
i tried different dataset style, always perfect quality with different clothing and background
i found out that it is not the best idea to add different facial expression since when using a prompt like "surprised"
if the model use an already surprised expression from the dataset it would be SUPER exaggerated and very weird, even using moderate prompt like "slightly"
I suppose that it would not be an issue if every picture from dataset were captioned, but i havnt found a good tutorial about dataset/captioning and method to use
Also i found that not be able to modify face is a pretty sad, adding a cyborg eye, or facepaint, makup, blood, dirt whatever, tried a lot of prompt even in the adetailer face parameter
I also found out that if your dataset is always with a sharp focused talent and blurry backround, whatever prompt you use, it will always be the same in Stablediffusion, always a sharp focused person, but blury background (i may doing something wrong), i feel like the "from single text file" method is pretty good for something quick, but if you want to achieve God level of training, you need to do the Super anoying long version of data training method
you just follow the onetrainer tutorial from our lord Dr. 1.5 and SDXL works perfect for portrait or close-up, even midrange
The most important is dataset, high quality of 20-25 pictures with everything focused, the most important is eyes/iris
Don't do crazy angle but not only front face picture, add some slightly left/right/pan/tilted angle so you can use full potential of Stablediffusion generation
Use Realvisx4 sdxl model first so you can learn from a model that dsnt need a lot of tweeks to work great, if you want to do NSFW i cannot really help you since i do not produce them, but i assume that using Reg images will break a lot of the capacity of a model to do such generation,
You also gonna need comfy-UI and learn how to use it since you can make different layers of generation to be able to add texture in the face easily (makeup,modification and such) i suppose there is a method to do it without comfy, didn't found it yet.
If you follow Dr. tutorial, copy EVERYTHING and use a GOOD dataset, you will be able to produce even more quality than him (since he is using a medium quality dataset)
@Furkan Gözükara SECourses upgraded to new xformers 0.27 and torch, getting way better results and smoother results with xformers and training now on kohya
Thanks a lot for this! Really interesting I appreciate it.
I’ve done a SDXL training with a pretty good dataset I think, on Realvisxl 4, no captions. But the quality wasn’t that great, and the face resemblance was even worse.
don't forget that you need to use the good vae if not baked-in, Hires 1.5/1.7, adetailer capped on resolution 1024x1024 with a denoising strengh between 0.35 to 0.60,
Also i did an update on my stable diffusion and i couldnt anymore use my training face, , everything worked, but not my trained model, suddently turned bad for no reason, i had to reinstall a fresh sb, dunno why but some ppl had the same pb as me, so maybe you have an adetailer version or something that went wrong
If I already have a trained model but I want to add more poses/expressions, would it be correct to take that already trained checkpoint and try only 3 or 4 photos with those expressions? What is your opinion? Because doing it again costs me more money, and this way I save time
Prepping a dataset right now of a human subject and most of the quality photos I have are group shots. Is there a way I can modify/psd the group shots to work for training purposes?
Question, I have had very good results with Onetrainer SD1.5 finetuning and LoRa extraction, now I am trying it with SDXL models, finetuning works very well, but when I extract LoRa the similarity of the person is much weaker, this is not the case with the SD1.5 extracted LoRAs, do I just have to train longer here ?
at least with my datasets i get the best results with full checkpoint training + extraction. How big the difference? i can see it, if extraction is 10 then i would give a 6 or 7
Hello! I would like to ask for some advice. How can I achieve the best settings for training SDXL Lora on OneTrainer, and what should I pay attention to? The training set consists of approximately 80-150 images of artist’s works with inconsistent aspect ratios. Thank you!