Running out of memory

Hi, the OG kohya template from runpod was taken down and not replaced, so now I'm using the InvokeAI template. I can't complete any training because it keeps crashing because it keeps running out of memory. I've never had this happen before
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB. GPU 0 has a total capacty of 23.54 GiB of which 303.12 MiB is free. Process 2505369 has 384.00 MiB memory in use. Process 2505421 has 7.50 GiB memory in use. Process 2521763 has 15.35 GiB memory in use. Of the allocated memory 14.30 GiB is allocated by PyTorch, and 581.12 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB. GPU 0 has a total capacty of 23.54 GiB of which 303.12 MiB is free. Process 2505369 has 384.00 MiB memory in use. Process 2505421 has 7.50 GiB memory in use. Process 2521763 has 15.35 GiB memory in use. Of the allocated memory 14.30 GiB is allocated by PyTorch, and 581.12 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
5 Replies
Dj
Dj3w ago
Moving from https://discord.com/channels/912829806415085598/948767517332107274/1371966448301510666 The image you were using was removed by the person who maintains it. We cannot control this, we can only offer alternative options. That being said, the image you are using now is a larger image with more things inside of it. It will naturally have different VRAM requirements. The best I can do for you is say you will now need to use a different GPU or use the template directly. It's maintained in this repo https://github.com/ashleykleynhans/kohya-docker
GitHub
GitHub - ashleykleynhans/kohya-docker: Docker image for Kohya_ss We...
Docker image for Kohya_ss Web UI. Contribute to ashleykleynhans/kohya-docker development by creating an account on GitHub.
peanut_
peanut_OP3w ago
So runpod will no be adding a new one to replace it?
Dj
Dj3w ago
The best we could do is using the same image publish that repo under our namespace like the invokeai one you're using is also technically his image I can ask if we're able to do that, but if I did it it would look like an image just posted by any other user
peanut_
peanut_OP3w ago
If I change pods then it means I'd be paying more just to run the exact same program because you guys don't feel like posting a proper template
Dj
Dj3w ago
We offer that service as a courtesy, much like the readme on the InvokeAI template you're using now says.

Did you find this page helpful?