Hello everyone. I am Dr. Furkan Gözükara. PhD Computer Engineer. SECourses is a dedicated YouTube channel for the following topics : Tech, AI, News, Science, Robotics, Singularity, ComfyUI, SwarmUI, ML, Artificial Intelligence, Humanoid Robots, Wan 2.2, FLUX, Krea, Qwen Image, VLMs, Stable Diffusion
@Furkan Gözükara SECourses Hello! I'm using RTX4090 (24Gb) For full FLUX finetuning I used your config "Rank_1_15500MB_39_Second_IT.json" and It took me 16.5 hours to train 15 images dataset. Then I tried the config "Quality_1_23100MB_14_12_Second_IT.json" and It took me 23.5 hours to train the same dataset Why the training goes slower on the second config although it is more vram consuming?
Get more from SECourses: Tutorials, Guides, Resources, Training, FLUX, MidJourney, Voice Clone, TTS, ChatGPT, GPT, LLM, Scripts by Furkan Gözü on Patreon
Hi @Furkan Gözükara SECourses is there a way to save the training process into local computer then resume later on mass compute or runpod or something else. Due to training times is very long in case of server restart or crash, are there anyway to save the on going process then resume later ?
I kept my VRAM right on 23.9 GB using the "Quality_1_23100MB_14_12_Second_IT.json" config (on a 24 GB card). I had to try a few tricks to reduce VRAM usage, such as switching off hardware accelerated GPU scheduling, and selecting 'Adjust for best performance' under Visual Effects in the Control Panel for Advanced System Settings. This helped a little though I still had 400 MB shared on RAM.
I had tried 4e-06 learning rate, on 26 input images, but all of my epoch results looked similar (I was saving checkpoints every tenth epoch), so I tried again on 5e-06 LR and saw some improvement, though very gradual. I also noticed, right from my earliest checkpoint save, that sample photographic images of a 'woman' looked identical to those for 'ohxw woman', whereas for SDXL dreambooth the two images converged once the model was being over-trained in the new subject.
Can you describe more out which step to do it. Where I can load the latest checkpoint and how to calculate the remain step for the latest checkpoint to continue
I've seen Flux not really converge on realism when the LR is too low, no matter how long you run it. It changes facial features but remains in its plastic AI style with a low LR