Runpod•2y ago

Pod is unable to find/use GPU in python

Hi, I'm trying to connect to this pod: RunPod Pytorch 2.2.10 ID: zgel6p985mjmmn 1 x A30 8 vCPU 31 GB RAM runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 On-Demand - Community Cloud Running 40 GB Disk 20 GB Pod Volume Volume Path: /workspace I can see that it has a GPU with nvidia-smi, and the cuda and pytorch version seem correct, but I cannot use the GPU with torch... Can anyone help? Best ``` root@54be7382bee1:~# python Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import torch torch.cuda.is_available() /usr/local/lib/python3.10/dist-packages/torch/cuda/init.py:141: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 False torch.version '2.2.0+cu121' exit() root@54be7382bee1:~# nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0

Solution:

@Dhruv Mullick I don't think it has to do with the image... If you select it from the runpod website, there is a filter button at the top and then a drop down menu where you can select 12.2 as "Allowed CUDA Versions" as @ashleyk pointed out earlier 'the machine is running CUDA 12.3 which is not production ready'. if I select 12.2 it works....

Jump to solution

17 Replies

annah_doOP•2y ago

ashleyk•2y ago

Maybe because the machine is running CUDA 12.3 which is not production ready.

annah_doOP•2y ago

most machines use CUDA 12.3 and with the 48GB GPU it works

ashleyk•2y ago

@JM said they should all be on 12.2 because 12.3 is not production ready. I haven't seen any machines on 12.3 personally.

annah_doOP•2y ago

hm just double checked and you are right. my 48GB GPU is actually on 12.2... will keep an eye open for thin in the future...

Dhruv Mullick•2y ago

@ashleyk how do we use 12.2? I spawned an H100 SXM5 pod with the image: runpod/pytorch:2.1.1-py3.10-cuda12.1.1-devel-ubuntu22.04, but still nvidia-smi shows that cuda is 12.3 ID: axwx9s1edwts9x Facing the same issue as @annah_do This happens even if I change my template to: runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04

Solution

annah_do•2y ago

annah_doOP•2y ago

Dhruv Mullick•2y ago

Awesome, thank you @annah_do ! I thought it was the image that was controlling this.

Dhruv Mullick•2y ago

Even with Cuda 12.2 I'm seeing the same error now

ashleyk•2y ago

How did you install torch? Probably conda breaking stuff, conda sucks

Dhruv Mullick•2y ago

I just used the torch from the latest torch + Cuda template ( I think it was runpod/pytorch :2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 but I've now deleted the pod)

ashleyk•2y ago

RunPod templates don't use conda though as far as I'm aware. Your application probt installed it

Dhruv Mullick•2y ago

This is clean VM, with no other commands executed but the ones shown above 😅

ashleyk•2y ago

Thats not true, it does not say (torch_env) in front of my prompt like yours does with a clean pod.

ashleyk•2y ago

That only happens when that crap conda gets installed. And it shows that CUDA is available on A100.

>>> torch.cuda.is_available()
True

>>> torch.cuda.is_available()
True

So I don't know what you are doing, but you are clearly doing something wrong.

JM•2y ago

Hey guys! Yep, thanks @ashleyk Indeed, it might be possible that there would be some machines that slip off with 12.3, but the biggest bulk is on 12.2. Like already mentionned, 12.3 is beta and we recommend production ready drivers 🙂

Gaming

Programming

Pod is unable to find/use GPU in python

Did you find this page helpful?