Hello everyone. I am Dr. Furkan Gözükara. PhD Computer Engineer. SECourses is a dedicated YouTube channel for the following topics : Tech, AI, News, Science, Robotics, Singularity, ComfyUI, SwarmUI, ML, Artificial Intelligence, Humanoid Robots, Wan 2.2, FLUX, Krea, Qwen Image, VLMs, Stable Diffusion
Replace SD3Tokenizer with the original CLIP-L/G/T5 tokenizers. Extend the max token length to 256 for T5XXL. Refactor caching for latents. Refactor caching for Text Encoder outputs Extract arch...
the base model will have no problem differentiating a woman from spiderman. even woman and man is easy. on two woman or two men have this bleeding effect
He is right, it is very similar. You could have exactly the same effect by pregenerating and therefore make this possible for full finetune with no additional vram, too. You'd have to synchronize timesteps and seed(!) between the pregenerated data and the training though, which is the difference to the current "Prior Preservation" feature. I did implement this for full finetune also, but without pregeneration so it needs a lot of VRAM for two full models, the student and the trainer model. Feel free to forward my contact.
On the second comment though, it's not only activation. In this sample, dreambooth-training, Clinton would look like the left person without regularization.