JohnTheNerd
JohnTheNerd
RRunPod
Created by JohnTheNerd on 4/7/2025 in #⛅|pods-clusters
Pod ran out of CPU RAM
I somehow managed to run out of RAM (not VRAM, system RAM)... right after a very compute-heavy operation (calculating quantized KV-Cache scales)... while running model.save_pretrained... while the weights are still in VRAM... The pod is still running, but completely unresponsive. Now that you're done laughing at my misfortune, is there anything at all I can do to save those weights? Even enabling some swap would be completely fine... I just want the weights to save to the networked drive... Pod ID: tybrzp4aphrz3d
548 replies