Downloading models causes the pod to freeze
Hey, not sure if I'm missing something obvious here.
I'm noticing two problems (might have the same cause):
The download is going to the volume disk.
Somewhere mid downloading 2nd out of 6
I lose ssh connection, the RunPod dash shows 100% Memory usage. I can only restart the pod at this point.
I'm noticing two problems (might have the same cause):
- I'm trying to download
phi-414b from HuggingFace.
The download is going to the volume disk.
Somewhere mid downloading 2nd out of 6
.safetensors files, the pod freezes.I lose ssh connection, the RunPod dash shows 100% Memory usage. I can only restart the pod at this point.
- When I try rsyncing 6gb of files from my local machine to a pod (eg.
Llama3.2-3B-Instruct), it uses up the ram and freezes, more often than not. Sometimes it helps to restart the pod, but sometimes the only way is to download the weights instead of rsyncing up.
I'm using: - 1xA40, 50GB RAM
runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04image- 20GB container, 40GB volume (with enough free space before starting the download)