JohnTheNerd
RRunPod
•Created by JohnTheNerd on 4/7/2025 in #⛅|pods-clusters
Pod ran out of CPU RAM
I somehow managed to run out of RAM (not VRAM, system RAM)... right after a very compute-heavy operation (calculating quantized KV-Cache scales)... while running
model.save_pretrained
... while the weights are still in VRAM... The pod is still running, but completely unresponsive.
Now that you're done laughing at my misfortune, is there anything at all I can do to save those weights? Even enabling some swap would be completely fine... I just want the weights to save to the networked drive...
Pod ID: tybrzp4aphrz3d548 replies