So i've barely scraped the surface but on diffusion-pipe I've tried simple videos at 16fps. Followin
So i've barely scraped the surface but on diffusion-pipe I've tried simple videos at 16fps. Following the num_frames from the empty embeds node from Kijai, I've used 16n+1 number of frames, (17, 33, 49, 65, 81, 129). I added all these in the frame_bucket part of the dataset.toml to be safe, so it looks like that : frame_buckets = [1, 17, 33, 49, 65, 81, 129] (1 is for Images I guess).
Basically everything got processed but not the 129 frames clip, i suppose it's too long.
Running on linux on a 4090, I can train the lora at rank 128, resolution 704, and it uses around 19gb of the gpu. All that training on the wan 1.3b model. I've tried 1024 training but it ooms instantly. (knowing my system was taking 400mb, I don't think it's possible to train 1024 at all on a 4090)
Basically everything got processed but not the 129 frames clip, i suppose it's too long.
Running on linux on a 4090, I can train the lora at rank 128, resolution 704, and it uses around 19gb of the gpu. All that training on the wan 1.3b model. I've tried 1024 training but it ooms instantly. (knowing my system was taking 400mb, I don't think it's possible to train 1024 at all on a 4090)


