Cuda out of memory

Hello, I am using the Runpod PyTorch 2.1. I am trying to train a small model (phi) about 1.5gb and whatever I do, I keep getting an error about Cuda out of memory from a process I don’t know where it comes from. I am using a 3090 gpu so I don’t understand where is the problem
5 Replies
flash-singh
flash-singh6mo ago
you likely need more vram, whatever model your running takes up too much vram
RounMicLess
RounMicLess6mo ago
But this phi model is 1.5 bg and now I tried a A40 and got the same problem. Moreover, I don’t have any fluctuation of gpu utilization on the website Yeah I also tried it on A100
flash-singh
flash-singh6mo ago
something is wrong with code then
RounMicLess
RounMicLess6mo ago
Autotrain ? I should open an issue ? It’s okay for me, I just want to be sure it doesn’t have a link with runpod/container since I am able to run it on my computer locally
justin
justin6mo ago
I think considering that people like kopylk are able to run very large training sets, id be surprised, if there was an issue with runpod. maybe something about the code is constantly pushing to vram, without management. But ive used up a large amount of memories for image generations before and LLMs, where ive deifnitely runned out, but bumping up to something like a A100, I haven't had an issue with