RunpodR
Runpod3mo ago
chess

CUDA NO WORKY?

I'm unable to get SSH working to pods from a clean Cuda docker image. Despite saying they're ready and giving me an SSH line (and charging me $$$), they all spit out the same error:

Error response from daemon: container a94707bd5f391d6a3f25d13f3ba02a425757bdbecfcb7de3b1169ddda866d434 is not running


You can try one here. https://console.runpod.io/pods?id=mlbfg4iutwm19c

The only reason I'm using a clean Cuda image without PyTorch is because apparently the official PyTorch Cuda envs are misconfigured. By misconfigured I mean, no matter what I try, I can't get cuda visible to python, or get any CUDA_DEVICES_AVAILABLE.

cd /workspace
rm -rf venv
python3 -m venv venv && source venv/bin/activate

# Install ONLY pip first
pip install --upgrade pip

# Install PyTorch with EXACT CUDA version matching your driver
pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124

# TEST IMMEDIATELY before installing anything else
python -c "import torch; print(torch.cuda.is_available())" // Always false, or cuda undefined


No matter how many times or pods I try this on, I never get cuda defined!!
Was this page helpful?