Hello everyone. I am Dr. Furkan Gözükara. PhD Computer Engineer. SECourses is a dedicated YouTube channel for the following topics : Tech, AI, News, Science, Robotics, Singularity, ComfyUI, SwarmUI, ML, Artificial Intelligence, Humanoid Robots, Wan 2.2, FLUX, Krea, Qwen Image, VLMs, Stable Diffusion
Hi @Furkan Gözükara SECourses are there anyway to convert back fp8 model into fp16 again and get the better quality of fp16 or at least make the model work like fp16 and keep the quality like fp8
I've been using config files from KOhya_FLUX_DreamBooth_v17, after installing the latest version of kohya_ss for the sd3_flux branch, and my Flux dreambooth sessions on an RTX4090, which used to sit at about 23.9 GB in VRAM, and they'd run fine, just don't sit within the VRAM 24GB size anymore. I've been fiddling with block_swap size, but 30.7 GB seems to be the best I can get for memory usage with at least some iterations happening. Is this a known thing? It could be that I've set some parameter somewhere when setting up kohya_ss again.
I even tried rerunning a previous .toml script for dreambooth, on the command line, but it didn't repeat the previous VRAM experience of using under 24GB.
Hi everyone! @Furkan Gözükara SECourses can you give me a hint? I have this error: RuntimeError: CUDA error: out of memory "CUDA kernel errors may be reported asynchronously by some other API call, so the stack trace below may not be correct. For debugging, consider passing CUDA_LAUNCH_BLOCKING=1" I tried 8gb and 3080 10gb config I have 48gb RAM and 10gb VRAM. What can help me?
Would it be possible for you to perhaps share any results? I am also looking into simpletuner and would love to see if someone has achieved some reasonable results. Would really appreciate it, I won't share ^^
I have not yet tested it but I plan to do that next month. I have heard many people talk good about it. Here is a video that goes deep into it and compares it with ai toolkit - https://youtu.be/se3qpLkJnrk?si=7MhyT8i3DALGK9L5
This is an FLUX.1 [dev] LoRA training log for art style and concepts. I document my thought process, experiments, mistakes, and analysis of quantitative and qualitative results. Hopefully, this can be a good starting guide for people looking to train LoRA...
can some explain how to reduce flux lora size with no (almost) quality loss? Because I've made bunch of them and 1gb loras is eating my storage) or maybe there is video with explanation?