Hello everyone. I am Dr. Furkan Gözükara. PhD Computer Engineer. SECourses is a dedicated YouTube channel for the following topics : Tech, AI, News, Science, Robotics, Singularity, ComfyUI, SwarmUI, ML, Artificial Intelligence, Humanoid Robots, Wan 2.2, FLUX, Krea, Qwen Image, VLMs, Stable Diffusion
I am happy now. I looked at several of Lora's profiles that used poses, concepts, clothing features. Based on that, I put together the file below. However, I also noticed two tricks: - they set a 10x repeat regardless of the number of images - usually for 10 epochs, so the number of images x100 is the measure in most cases. This time, however, I used WD14 tagging on SD 1.5. I removed the labels that were valid for my concept and a few that could possibly affect the final result of the generated image (e.g. blonde hair, brown eyes), and I left one of the double labels as phone and cellphone. And indeed, the 10th epoch model became absolutely flexible. And it even works without a keyword! I love it, and it shows how important the tagging and the repetition of images and the number of steps are.
Not just fine tuning, but also a new keyword. I experimented with Onoff model, where you have to learn the model: - the split screen, - the person on the left and the person on the right are the same - the person on the left has clothes on, the person on the right has clothes off, same body shape - maybe even wear the same pose on both sides of the person - while remaining flexible on face, dress, background, but background on both sides is fine if the same.
Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a...
could you tell us which models did you use to train your lest set for the costumer you said about ealier? when i tried a few models from there it wasnt very accurate. Also, as discussed here before, why use captions when you use regularlazation images?
I show how to install Automatic1111 Web UI & ControlNet extension installation from scratch in this video. Moreover I show how to make amazing QR codes and inpainting and out painting of ControlNet which are very similar to Photoshop generative fill and Midjourney zoom out. Furthermore, I explain and show what are Canny, Depth, Normal, OpenPose...
I've actually tested so many models but forgot to document (or rather was in a hurry from excitment) - is there a way to take a checkpoint and somehow check it to see if i used captions to train it?