RunPod•13mo ago

Runpod's GPU power

Does Runpod's gpu share? I need a GPU with 100% power for training

Solution:

The GPU is dedicated to you, they are not shared.

Jump to solution

43 Replies

Solution

digigoblin•13mo ago

The GPU is dedicated to you, they are not shared.

SummertimeOP•13mo ago

on Vast, GPU is shared, some has TFlops under 82.2 or GPU bandwidth a half. So i hope Runpod has 100% power of GPU

digigoblin•13mo ago

Internet speed is shared, not GPU

SummertimeOP•13mo ago

Great! Thanks for your help @flash-singh i've got this error when trying to install environment: environment: line 75: sudo: command not found environment: line 76: sudo: command not found "" command failed with exit code 127. this is not happen when i'm install on other cloud service

digigoblin•13mo ago

You don't need sudo, you are already root in almost all RunPod templates. Also no need to tag RunPod devs unless you have a RunPod specific hardware issue etc, things like this, the community can help you with.

SummertimeOP•13mo ago

i got this error when run project, i don't know what happened with cuda

SummertimeOP•13mo ago

digigoblin•13mo ago

Did you use the CUDA filter at the top of the page to select only CUDA versions 12.1, 12.2 and 12.3 before deploying your pod? You can run nvidia-smi to check which version of CUDA the host machine has.

SummertimeOP•13mo ago

yep i want to use lastest version of CUDA

SummertimeOP•13mo ago

digigoblin•13mo ago

Looks fine, not sure why you have errors, I suggest logging a GitHub issue for the application you're using.

SummertimeOP•13mo ago

how to test before runs a project?

digigoblin•13mo ago

Not sure what you mean. You test by running it, looks like some issue with the application you're using because the CUDA version of the pod is correct to use the Pytorch template you're running.

SummertimeOP•13mo ago

or maybe the problem is my GPU not visible

digigoblin•13mo ago

Best to contact the developer of the application if its not working. Use python CLI, import torch and check but seems to be fine according to nvidia-smi

SummertimeOP•13mo ago

my project work well on Vast.AI

digigoblin•13mo ago

Maybe you're using an unsupported torch version on RunPod torch 2.2.0 is pretty new, a lot of older projects still rely on version 2.1.2 etc

SummertimeOP•13mo ago

there is no 2.1.2 do you mean 2.1.1?

SummertimeOP•13mo ago

https://stackoverflow.com/questions/66371130/cuda-initialization-unexpected-error-from-cudagetdevicecount

Stack Overflow

CUDA initialization: Unexpected error from cudaGetDeviceCount()

I was running a deep learning program on my Linux server and I suddenly got this error. UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions

SummertimeOP•13mo ago

i've searched on GG and the suggestion is reboot how to reboot without losing installed data

digigoblin•13mo ago

Looks like RunPod has all of these versions available: 1.13.0 2.0.1 2.1.0 2.1.1 2.2.0 I would try 2.1.1 or 2.1.0 and then if you still have the same issue, try 2.0.1 Do you remember which torch version you were using on Vast? It is safe to reboot your pod if all of your data is installed on persistent storage or a network volume However, you will lose your data on reboot if you installed everything into the container disk

SummertimeOP•13mo ago

here is vast's template