I somehow managed to run out of RAM (not VRAM, system RAM)... right after a very compute-heavy operation (calculating quantized KV-Cache scales)... while running
model.save_pretrained
model.save_pretrained
... while the weights are still in VRAM... The pod is still running, but completely unresponsive.
Now that you're done laughing at my misfortune, is there anything at all I can do to save those weights? Even enabling some swap would be completely fine... I just want the weights to save to the networked drive...