R
RunPod2mo ago
fireice

Why no gpu in canada data center today?

My network volume is in ca-mtl-1, there is no any gpu now.
No description
Solution:
Hey y'all, we disable the creation of new pods four days before a maintenance to stop further issues (this was not something I was personally aware of until now otherwise it would have been posted in #🚨|incidents). However, I talked with the team and you should be able to create new pods again, let me know if you're running into any issues.
Jump to solution
20 Replies
digigoblin
digigoblin2mo ago
Read #🚨|incidents , its scheduled for maintenance thats why
moez4921
moez49212mo ago
@haris @Finley it's been more than 4 hours since the outage started. aren't you going to declare an incident and give some updates? looking at the green status on https://uptime.runpod.io, I suspect that your monitoring has not caught this issue.
nerdylive
nerdylive2mo ago
@Madiator2011 (Work) any idea about this?
digigoblin
digigoblin2mo ago
@nerdylive Its because RunPod disables the DC before maintenance is about to begin, probably because people don't read and then they log unneccessary support tickets.
nerdylive
nerdylive2mo ago
Oh long before read where?
digigoblin
digigoblin2mo ago
I already mentioned this elsewhere, but @fireice being an idiot and giving me a thumbs down already proves my point.
nerdylive
nerdylive2mo ago
Oof wheres that info from btw
digigoblin
digigoblin2mo ago
I know this from previous experience, Zeen or someone like that mentioned it.
nerdylive
nerdylive2mo ago
Ooh like days before?
digigoblin
digigoblin2mo ago
Yes
nerdylive
nerdylive2mo ago
ic ic yeah thats probably it
digigoblin
digigoblin2mo ago
No point in allowing someone to create a pod and have training that runs for days and gets interrupted
nerdylive
nerdylive2mo ago
yeah correct hahah, but it should be on #🚨|incidents too next time when its gonna be disabled
digigoblin
digigoblin2mo ago
Yeah agreed, RunPod communication is ALWAYS appalling, its about 1% better but still has a LONG way to go
moez4921
moez49212mo ago
just got an email response from them confirming what @digigoblin says. they are disabling new machine creations. the morale, as there is no way to clone network volumes (correct me if I'm wrong), you better continuously make backups using https://syncthing.net/ or something like that.
digigoblin
digigoblin2mo ago
Yep, I guess this is the point of the lack of communication, people need to know when a DC is going to be taken offline for maintenace a few days in advance so that they can start migrating their data to a different DC. When #🚨|incidents says its only going to be offline for maintenance on Monday, but no new pods can be created 4 days ahead of time, then its a problem because people can't access their data to make alternate arragments. @haris
Madiator2011 (Work)
I mean you should always do backup when you upload data as cloud is basically someone else computer
Solution
haris
haris2mo ago
Hey y'all, we disable the creation of new pods four days before a maintenance to stop further issues (this was not something I was personally aware of until now otherwise it would have been posted in #🚨|incidents). However, I talked with the team and you should be able to create new pods again, let me know if you're running into any issues.
nerdylive
nerdylive2mo ago
But the maintanance will be executed in the same schedule?
haris
haris2mo ago
Yep, as far as I know but I will double check