Consistent "Hardware Error"s after upgrading kernel
Hello all, first time I've been on the receiving end of this channel heh
My primary system is currently not booting up. The issues first happened in September of last year, where I shared a picture of it not booting up here: https://discord.com/channels/812703221789097985/812703222232776726/1150589387101446174 (I'll share a clearer error message in a moment)
My "solution" back then was to use
linux-lts
instead of linux
, which worked... until now (I did my usual round of system updates, and now it seems whatever change broke linux
also arrived in linux-lts
)
Now, googling the error, I understand that it's supposed to be a hardware error, but I've been using the old kernel with zero issues, and while leaving the system on for days at a time (while the newer kernel still crashed right at boot). I thus have a hard time believing it's actually a HW issue
I have a USB drive available, so I can install any alternative kernel you may want to try
As for system specifications, this is a Ryzen 7 3700X, 32GBs of RAM (G-Skill, 3200MHz), RTX 3070, all on an ASUS ROG Strix X570-E Gaming MoBo, and I'm running Arch Linux
For now, I'll try two things: Updating the BIOS, and seeing if even booting into an Arch ISO works
14 Replies
I've now added a picture of the error.
The CPU index thingy always changes, but it's always in that "firewire_ohci" function (?), and it of course always kernel panics like this
conscious-sapphire•2y ago
hubby's system is actually really similar, less ram and a 900 series nvidia GPU, but otherwise same mobo and cpu.. he's been on
linux
the whole time so potentially maybe it is a hardware issue?
Have you reduced the complexity of the system? Removing hardware that isnt required to boot?Well, I don't really have any hardware that isn't required. Could go with less RAM, but other than that...
Anyways, BIOS update's doing its thing for now. I was on a version from 2020, to be fair (going from 2606 -> 5003)
conscious-sapphire•2y ago
🤔 yeah that COULD help i also feel like it might not change much x_x Out of curiosity do you have another GPU? Extra ssd/hdds?
my go to is usually swap out hardware until i can isolate the trouble part
if its hardware at all
I guess I could steal borrow a roommate's GPU
Yes I have two extra hard drives, could take those out
conscious-sapphire•2y ago
if the bios update doesnt do it, i'd remove extra drives.. if those are failing they can do some really weird things ime haha
but generally that would be instability across the board, so its interesting that they wouldnt trigger it with the old lts kernel
Note to self: BIOS updated, leaving RAM at 2133MHz for now to test
Yup that didn't do it
conscious-sapphire•2y ago
x_x
Yeah that's the thing that gets me as well, like, how can it be a hardware issue if it was working fine for months?
Anyways, taking out stuff now
Haaang on
I forgot about this guy (card in lowest PCIe slot), and you got 3 guesses on what it does

Well I'm impatient so here's the reveal

conscious-sapphire•2y ago
hah firewire
oh actually
there's a bug with amd hardware and firewire on latest kernels i think?
(sorry im in a meeting)
Oh right, timezones. No worries
Mystery solved by the way, it's now working perfectly again!
(clarification, after removing the card it's now working. My discovery alone was not enough to scare it into working again)
conscious-sapphire•2y ago
hahaha
nice glad that fixed everything 😄