R
Runpod8mo ago
feesta

Cuda not connecting to image provisioned for GPU

Started a community pod with 1 GPU (4090) using the Runpod pytorch image/template (runpod/pytorch:2.4.0-py3.11-cuda12.4). Immediately after starting pod, GPU is unavailable even though nvidia-smi seems to see the GPU. This is happening about 20% of the time I start images with this official container. No errors thrown in system or container logs. root@5c367a0d4ea2:/# python -c "import torch; print(torch.cuda.is_available())" /usr/local/lib/python3.11/dist-packages/torch/cuda/init.py:128: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 False root@5c367a0d4ea2:/# nvidia-smi Mon Mar 24 15:59:01 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 On | 00000000:01:00.0 Off | Off | | 0% 26C P8 11W / 450W | 2MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ (abridged due to message length)
2 Replies
feesta
feestaOP7mo ago
Another pod. Immediately after starting the pod the GPU is not available even though it is set to 1 4090 GPU. ssh 93qymj5jda8e60-6441171e@ssh.runpod.io -i ~/.ssh/ided25519 -- RUNPOD.IO -- Enjoy your Pod #93qymj5jda8e60 ^^ __ __ (__ \ ( \ | | ) ) _) ) | | | __ / | | | || \ | // \ / | | | \ \ | || | | | | || |( (| | || |_||/ || |||| _/ _| For detailed documentation and guides, please visit: https://docs.runpod.io/ and https://blog.runpod.io/ root@773fb48759c7:/# python -c "import torch; print(torch.cuda.is_available())" /usr/local/lib/python3.11/dist-packages/torch/cuda/init.py:128: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 False
Hello from RunPod Documentation | RunPod Documentation
RunPod enables you to run your workloads on GPUs in the Cloud
RunPod Blog
RunPod Blog
The latest in Machine Learning and Artificial Intelligence
Unknown User
Unknown User7mo ago
Message Not Public
Sign In & Join Server To View

Did you find this page helpful?