R
RunPodβ€’2mo ago
Summertime

Runpod's GPU power

Does Runpod's gpu share? I need a GPU with 100% power for training
Solution:
The GPU is dedicated to you, they are not shared.
Jump to solution
43 Replies
Solution
digigoblin
digigoblinβ€’2mo ago
The GPU is dedicated to you, they are not shared.
Summertime
Summertimeβ€’2mo ago
on Vast, GPU is shared, some has TFlops under 82.2 or GPU bandwidth a half. So i hope Runpod has 100% power of GPU
No description
digigoblin
digigoblinβ€’2mo ago
Internet speed is shared, not GPU
Summertime
Summertimeβ€’2mo ago
Great! Thanks for your help @flash-singh i've got this error when trying to install environment: environment: line 75: sudo: command not found environment: line 76: sudo: command not found "" command failed with exit code 127. this is not happen when i'm install on other cloud service
digigoblin
digigoblinβ€’2mo ago
You don't need sudo, you are already root in almost all RunPod templates. Also no need to tag RunPod devs unless you have a RunPod specific hardware issue etc, things like this, the community can help you with.
Summertime
Summertimeβ€’2mo ago
i got this error when run project, i don't know what happened with cuda
No description
Summertime
Summertimeβ€’2mo ago
No description
digigoblin
digigoblinβ€’2mo ago
Did you use the CUDA filter at the top of the page to select only CUDA versions 12.1, 12.2 and 12.3 before deploying your pod? You can run nvidia-smi to check which version of CUDA the host machine has.
Summertime
Summertimeβ€’2mo ago
yep i want to use lastest version of CUDA
Summertime
Summertimeβ€’2mo ago
No description
digigoblin
digigoblinβ€’2mo ago
Looks fine, not sure why you have errors, I suggest logging a GitHub issue for the application you're using.
Summertime
Summertimeβ€’2mo ago
how to test before runs a project?
digigoblin
digigoblinβ€’2mo ago
Not sure what you mean. You test by running it, looks like some issue with the application you're using because the CUDA version of the pod is correct to use the Pytorch template you're running.
Summertime
Summertimeβ€’2mo ago
or maybe the problem is my GPU not visible
digigoblin
digigoblinβ€’2mo ago
Best to contact the developer of the application if its not working. Use python CLI, import torch and check but seems to be fine according to nvidia-smi
Summertime
Summertimeβ€’2mo ago
my project work well on Vast.AI
digigoblin
digigoblinβ€’2mo ago
Maybe you're using an unsupported torch version on RunPod torch 2.2.0 is pretty new, a lot of older projects still rely on version 2.1.2 etc
Summertime
Summertimeβ€’2mo ago
there is no 2.1.2 do you mean 2.1.1?
Summertime
Summertimeβ€’2mo ago
Stack Overflow
CUDA initialization: Unexpected error from cudaGetDeviceCount()
I was running a deep learning program on my Linux server and I suddenly got this error. UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions
Summertime
Summertimeβ€’2mo ago
i've searched on GG and the suggestion is reboot how to reboot without losing installed data
digigoblin
digigoblinβ€’2mo ago
Looks like RunPod has all of these versions available: 1.13.0 2.0.1 2.1.0 2.1.1 2.2.0 I would try 2.1.1 or 2.1.0 and then if you still have the same issue, try 2.0.1 Do you remember which torch version you were using on Vast? It is safe to reboot your pod if all of your data is installed on persistent storage or a network volume However, you will lose your data on reboot if you installed everything into the container disk
Summertime
Summertimeβ€’2mo ago
here is vast's template
No description
No description
digigoblin
digigoblinβ€’2mo ago
Wow CUDA 12.4 already 😲 . Does that install torch or did you install it yourself?
Summertime
Summertimeβ€’2mo ago
could be vast's template i never have to install pytorch manually
digigoblin
digigoblinβ€’2mo ago
Do you have a link to the Github project for your application?
Summertime
Summertimeβ€’2mo ago
GitHub
GitHub - nimaaghli/NASChain at abb8d2309a769cae54be7190fb2d01f4a66c...
Neural Architecture Search Powered by Bittensor. Contribute to nimaaghli/NASChain development by creating an account on GitHub.
digigoblin
digigoblinβ€’2mo ago
GitHub
NASChain/requirements.txt at abb8d2309a769cae54be7190fb2d01f4a66c7e...
Neural Architecture Search Powered by Bittensor. Contribute to nimaaghli/NASChain development by creating an account on GitHub.
Summertime
Summertimeβ€’2mo ago
everything seems harder now πŸ˜„
digigoblin
digigoblinβ€’2mo ago
So in theory, the RunPod template you're using should be fine, I don't know why you are getting errors.
Summertime
Summertimeβ€’2mo ago
wanna try to reboot, but seems not a good idea πŸ™‚
digigoblin
digigoblinβ€’2mo ago
I doubt rebooting will fix it. When they mention rebooting, they probably mean the machine with the GPU, ie. the host machine and not your pod.
Summertime
Summertimeβ€’2mo ago
out of idea now
Summertime
Summertimeβ€’2mo ago
then i got the same error log
No description
digigoblin
digigoblinβ€’2mo ago
Seems to be some issue with your pod then.
Summertime
Summertimeβ€’2mo ago
have to create a new one?
digigoblin
digigoblinβ€’2mo ago
Yeah, put pod id here so RunPod staff can check it out, then I suggest terminating it and creating a new one.
Summertime
Summertimeβ€’2mo ago
ID: c7cg4v0dgysoj5 thanks man, really appreciate!
Wolfsauge
Wolfsaugeβ€’5w ago
cuda 12.4 just got rolled out on RunPod and it works fine. thanks!
Summertime
Summertimeβ€’4w ago
in template popup, i only see CUDA 12.1.1 as newest version
Summertime
Summertimeβ€’4w ago
No description
No description
Wolfsauge
Wolfsaugeβ€’4w ago
No description
Wolfsauge
Wolfsaugeβ€’4w ago
No description