Search
Setup for Free
R
Runpod
•
2y ago
Ercan
[Urgent] One GPU suddenly went away
Hi
, we have prod issue right now one of the gpu from our pod suddently disappared
Runpod
Join
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!
20,883
Members
View on Discord
Resources
ModelContextProtocol
ModelContextProtocol
MCP Server
Similar Threads
Was this page helpful?
Yes
No
© 2026 Hedgehog Software, LLC
Twitter
GitHub
Discord
System
Light
Dark
More
Communities
Docs
About
Terms
Privacy
Similar Threads
GPU Suddenly Stopped Working
R
Runpod / ⛅|pods
9mo ago
[URGENT] Runpod instance suddenly stopped running
R
Runpod / ⛅|pods
3mo ago
Urgent: All new gpu pods are broken
R
Runpod / ⛅|pods
2y ago
2 GPU but only one work
R
Runpod / ⛅|pods
10mo ago
S
Superintendent
•
1/13/24, 12:33 AM
fall off the bus
?
E
Ercan
OP
•
1/13/24, 12:34 AM
@Justin Can someone help
, and check our pod
?
S
Superintendent
•
1/13/24, 12:35 AM
lspci
| grep VGA
S
Superintendent
•
1/13/24, 12:35 AM
should spit out something about the gpu
E
Ercan
Hi, we have prod issue right now one of the gpu from our pod suddently disappare...
S
Superintendent
•
1/13/24, 12:37 AM
also can u run nvidia
-smi and show it
S
Superintendent
also can u run nvidia-smi and show it
E
Ercan
OP
•
1/13/24, 12:39 AM
yes
, here first gpu got missing
S
Superintendent
•
1/13/24, 12:39 AM
so what does
lspci | grep VGA
lspci | grep VGA
spit out
?
E
Ercan
OP
•
1/13/24, 12:42 AM
cant run it
, command not found
E
Ercan
OP
•
1/13/24, 12:42 AM
not sure what to install
S
Superintendent
so what does `lspci | grep VGA` spit out?
E
Ercan
OP
•
1/13/24, 12:42 AM
what to install
?
S
Superintendent
•
1/13/24, 12:42 AM
wdym
S
Superintendent
•
1/13/24, 12:42 AM
lspci isnt there
?
E
Ercan
OP
•
1/13/24, 12:42 AM
No
S
Superintendent
•
1/13/24, 12:42 AM
run lspci
S
Superintendent
•
1/13/24, 12:42 AM
without grep
E
Ercan
OP
•
1/13/24, 12:43 AM
S
Superintendent
•
1/13/24, 12:43 AM
lspci
S
Superintendent
•
1/13/24, 12:43 AM
your missing an i
S
Superintendent
•
1/13/24, 12:43 AM
lspci
lspci
E
Ercan
OP
•
1/13/24, 12:43 AM
S
Superintendent
•
1/13/24, 12:43 AM
whar
.
E
Ercan
OP
•
1/13/24, 12:43 AM
is not this for amd
?
S
Superintendent
•
1/13/24, 12:43 AM
wdym
S
Superintendent
•
1/13/24, 12:43 AM
your on nvidia gpus
S
Superintendent
•
1/13/24, 12:43 AM
so yea it should work
E
Ercan
OP
•
1/13/24, 12:44 AM
yea that is what I am double checking if this should work with nvdia
S
Superintendent
•
1/13/24, 12:44 AM
sudo apt
-get update
sudo apt
-get install pciutils
S
Superintendent
•
1/13/24, 12:44 AM
try that
S
Superintendent
•
1/13/24, 12:44 AM
i think its something to do with pciutils
E
Ercan
OP
•
1/13/24, 12:44 AM
E
Ercan
OP
•
1/13/24, 12:44 AM
worked now
S
Superintendent
•
1/13/24, 12:45 AM
what does
dmesg
dmesg
spit out
E
Ercan
OP
•
1/13/24, 12:45 AM
dmesg
: read kernel buffer failed
: Operation not permitted
S
Superintendent
•
1/13/24, 12:46 AM
sudo
!
!
S
Superintendent
•
1/13/24, 12:46 AM
(Wtf
?
)
E
Ercan
OP
•
1/13/24, 12:46 AM
cant run sudo on Pods
S
Superintendent
•
1/13/24, 12:46 AM
oh right
S
Superintendent
•
1/13/24, 12:46 AM
docker container
S
Superintendent
•
1/13/24, 12:46 AM
uhh
.
E
Ercan
OP
•
1/13/24, 12:47 AM
yea
S
Superintendent
•
1/13/24, 12:47 AM
i have no friggin clue
, can u try to restart it
?
E
Ercan
OP
•
1/13/24, 12:47 AM
will try but moved all production process to the running gpu
E
Ercan
OP
•
1/13/24, 12:48 AM
so if I restart need to run bunch of things again
E
Ercan
OP
•
1/13/24, 12:48 AM
I am just waiting maybe it comes back
E
Ercan
will try but moved all production process to the running gpu
S
Superintendent
•
1/13/24, 12:48 AM
oof is the 4090 ok with the load
?
E
Ercan
OP
•
1/13/24, 12:49 AM
we have a internal queue
, set it to 1 right now
E
Ercan
OP
•
1/13/24, 12:49 AM
also just realized this is community cloud
S
Superintendent
•
1/13/24, 12:49 AM
yes
E
Ercan
OP
•
1/13/24, 12:49 AM
thought this was on secure cloud
Next page
lspci | grep VGA
lspci | grep VGA
lspci
lspci
dmesg
dmesg