R
Runpod•4w ago
chess

CUDA NO WORKY?

I'm unable to get SSH working to pods from a clean Cuda docker image. Despite saying they're ready and giving me an SSH line (and charging me $$$), they all spit out the same error:
Error response from daemon: container a94707bd5f391d6a3f25d13f3ba02a425757bdbecfcb7de3b1169ddda866d434 is not running
Error response from daemon: container a94707bd5f391d6a3f25d13f3ba02a425757bdbecfcb7de3b1169ddda866d434 is not running
You can try one here. https://console.runpod.io/pods?id=mlbfg4iutwm19c The only reason I'm using a clean Cuda image without PyTorch is because apparently the official PyTorch Cuda envs are misconfigured. By misconfigured I mean, no matter what I try, I can't get cuda visible to python, or get any CUDA_DEVICES_AVAILABLE.
cd /workspace
rm -rf venv
python3 -m venv venv && source venv/bin/activate

# Install ONLY pip first
pip install --upgrade pip

# Install PyTorch with EXACT CUDA version matching your driver
pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124

# TEST IMMEDIATELY before installing anything else
python -c "import torch; print(torch.cuda.is_available())" // Always false, or cuda undefined
cd /workspace
rm -rf venv
python3 -m venv venv && source venv/bin/activate

# Install ONLY pip first
pip install --upgrade pip

# Install PyTorch with EXACT CUDA version matching your driver
pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124

# TEST IMMEDIATELY before installing anything else
python -c "import torch; print(torch.cuda.is_available())" // Always false, or cuda undefined
No matter how many times or pods I try this on, I never get cuda defined!!
18 Replies
chess
chessOP•4w ago
At this point I'd like to request a refund. I'm at wit's end. Even the LLMs are telling me runpod's cuda envs must be misconfigured
J.
J.•4w ago
@chess Do you have a image to share? I tried to look at your link and it did not lead to anywhere
J.
J.•4w ago
I tried the official template, and was able to get it?
No description
chess
chessOP•4w ago
https://console.runpod.io/pods?id=mlbfg4iutwm19c this one shows up in my console, can you see it?
No description
J.
J.•4w ago
What template is that using? You said a clean cuda thing? I cannot see pods that are on your system but if you are trying to get ssh setup you can maybe try hold on
chess
chessOP•4w ago
nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04
nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04
it was a custom template with that docker image
J.
J.•4w ago
wget https://raw.githubusercontent.com/justinwlin/Runpod-SSH-Password/main/passwordrunpod.sh && chmod +x passwordrunpod.sh && ./passwordrunpod.sh
wget https://raw.githubusercontent.com/justinwlin/Runpod-SSH-Password/main/passwordrunpod.sh && chmod +x passwordrunpod.sh && ./passwordrunpod.sh
Got it
chess
chessOP•4w ago
and ssh would just kick me out every time
J.
J.•4w ago
let me take a look Im not familiar with this template, but: 1. I think runpod is working 2. If you want to try i have a ssh script that tries its best to install ssh by password based and tells u how to ssh into it when done 3. Let me give it a try
J.
J.•4w ago
GitHub
GitHub - justinwlin/Runpod-SSH-Password: Help ppl do pod ssh throug...
Help ppl do pod ssh through password. Contribute to justinwlin/Runpod-SSH-Password development by creating an account on GitHub.
J.
J.•4w ago
This is the repo fyi if curious actually the hard thing with this, is it might be too minimal, i wonder if it even has some basic terminal access / openssh server installed do you have like a link to your custom docker image? im guessing that maybe doesn't have openssh installed? or something like that You'll need: openssh-server
J.
J.•4w ago
FYI, this template was: that i was able to see the torch cuda thing
No description
J.
J.•4w ago
You can run my script to do password ssh with a runpod official template through web terminal / jupyter labs, and should work 🙂 or you can set up ssh key properly
J.
J.•4w ago
once you use a runpod official template, which has more than the bare minimum setup for you, you can just run my script in the web console or in the jupyter labs:
wget https://raw.githubusercontent.com/justinwlin/Runpod-SSH-Password/main/passwordrunpod.sh && chmod +x passwordrunpod.sh && ./passwordrunpod.sh
wget https://raw.githubusercontent.com/justinwlin/Runpod-SSH-Password/main/passwordrunpod.sh && chmod +x passwordrunpod.sh && ./passwordrunpod.sh
Should get SSH + I tried again, on a fresh pod and i still got the stuff working. I did not reinstall torch / torchvision tho. I just go straight from our pod
No description
No description
J.
J.•4w ago
No description
J.
J.•4w ago
Summary: 1. Try to use a runpod official template to start 2. You can run my ssh script, or set up ssh keys (in the docs) so you get automatic SSH for future pods spun up. Our templates are setup with openssh server 3. You can run your: python -c "import torch; print(torch.cuda.is_available())" which as i show in my screenshot in two instances that it does pick up
chess
chessOP•4w ago
got it. thanks for the assist. unfort id still like to process a refund because in the interim ive switched over to lambda labs
J.
J.•4w ago
You can try to submit a ticket with runpod support if it was a hardware issue on runpod side

Did you find this page helpful?