Hello everyone. I am Dr. Furkan Gözükara. PhD Computer Engineer. SECourses is a dedicated YouTube channel for the following topics : Tech, AI, News, Science, Robotics, Singularity, ComfyUI, SwarmUI, ML, Artificial Intelligence, Humanoid Robots, Wan 2.2, FLUX, Krea, Qwen Image, VLMs, Stable Diffusion
Yeah something different between 30xx/50xx. My 3090 = 24.0 in task manager, 5090 = 31.5 in task manager - and I can't load the full FP16 WAN 2.1 model in 32GB because its 500MB short!! it goes to shared GPU which slows it down a lot.
@Furkan Gözükara SECourses have you found perfect teacache settings for Wan2.1? with no visible quality loss to me: teacache at 0.250 T2V 14B model, with Sage and torch-acceleration i got a 49 frame Wan down to 101 seconds. I can get it faster, but start to notice degradation.
Will you be doing wan/hunyuan lora training tests? I got hunyuan down well, but wan lora's are not coming out well for me with musubi-tuner, but can't tell if its 5090 problem. So much instability with cuda 12.8, sage, triton 3.3
will you use musubi-tuner? I just can't get good lora's for wan with the same dataset as used for good loras on hunyuan. I actually think captions might be required this time for more flexibilty during prompting. But not sure yet. I didn't caption hunyuan and they came out good, sometimes better than flux with kohya.
I just modified TTplanet's one. its a good starting point. Let me know how your Wan Lora success is. I'm going to keep experimenting, and will let you know what i discover too. I think there is a reason character lora's aren't being released as fast as "motion/animation" loras
Also TorchCompileModelWan (for wan21) breaks Lora's on 50xx but not 30xx. Must be the 3.3 triton, or torch nightlies. I just tested. Same workflow works fine on 30xx. but not 50xx with Torchcompile running on both.
Not my research - direct from Kijai: Benchmarks. Everything here is Wan2.1 720 @ 49 frames, sageattention, cuda 12.6, 3080 ti, listing thresh, starts, times, and quality observations
0.300 (rel_l1_thresh) a lot more subtle grainy noise than 0.250 0.00 (start) -> 13:08 (encoding time) 0.2875 0.00 -> 13:39 more grainy noise 0.280 0.00 -> 13:18 92% starts noisy 0.275 0.00 -> 14:55, near indistinguishable from 0.25 0.270 0.00 -> 14:41, 96%, nearly perfect 0.265 0.00 -> 16:09, 96%, near perfect 0.250 -> looks lossless 0.10 -> 16:30
2x run: Best so far, but slower, 0.250 0.10 (start) -> 16:20
Near if not lossless at 0.250, at the cost of about 3 minutes when compared to 0.300.
in summary
0.300 ~ 13:00 I found a bit too grainy for my liking, though speed is nice 0.275 ~ 14:55 great balance between speed and quality 0.250 ~ 16:30 I found was basically lossless
Introducing Stable Virtual Camera, currently in research preview. This multi-view diffusion model transforms 2D images into immersive 3D videos with realistic depth and perspective—without complex reconstruction or scene-specific optimization.
@Furkan Gözükara SECourses have you seen a xformers fix for blackwell yet? I searched but haven't seen progress. I saw someone compiled wheels for python 3.12 but not 3.10, was thinking to upgrade venv for kohya to 312. Also for wan21. lora training, you must use captions, or the lora model can't do anything, no flexibility beyond the source images similarity. I tried at least 4 tests using musubi. Also still struggling to get good likeness as compared to hunyuan
I have compiled xFormers on xformers-0.0.30+c5841688.d20250306 torch==2.7.0.dev20250228+cu128 triton-3.2.0+git8f9b005b the compile worked I am able to install. Python 3.10.11 - Windows 11 However I...