24 GB VRAM is not enough for simple kohya_ss LORA generation.

How come 24GB VRAM is not enough for generating simple Lora in kohya_ss? I've tried running it with the simplest configuration: 32 pictures, fp16, AdamW8bit, no batching, or other demanding features, cuda constantly runs out of memory. I've tried launching it 4-5 times, clearing the cache, setting the limit in PyTorch as well as making sure nothing else is using VRAM. It still runs out of memory every time. The funniest part is that I've successfully launched the same configuration on my old PC with GTX 960 4GB, it is slow, but it does not run out of VRAM. Why pods here can't handle it? I ended up running it on a 48GB VRAM instance and it uses around 33 GB of it. Why my 4GB card can run it with pretty much the same config? Is it possible to achieve the same result here?
7 Replies
ashleyk
ashleyk6mo ago
Log issue in Kohya_ss repo, this is not a RunPod issue.
Andrew_Rocket
Andrew_Rocket6mo ago
Do you know where I can learn more information about this problem?
ashleyk
ashleyk6mo ago
Which template are you using?
Andrew_Rocket
Andrew_Rocket6mo ago
Oh, you mean issue in the template
Andrew_Rocket
Andrew_Rocket6mo ago
I tried using both this and this. They have same author, so they might have same problems
No description
No description
ashleyk
ashleyk6mo ago
No, I didn't say that, I just asked which template you are using. For the Ultimate one, you have to connect to port 8000 and stop A1111 before training Kohya_ss. Also if you are training SDXL, people use Adafactor not 8 bit Adam.
Andrew_Rocket
Andrew_Rocket6mo ago
Yeah I'd read about it, and I did. But that didn't change anything though I wasn't training SDXL, at that point I was just trying to run the simplest config I could to understand why 24 GB VRAM running out