Misc Z13/395+ CPU problems.

So I'm finally getting around to playing a game on this thing (woot! Satisfactory factory goodness). Alas, after about 30 minutes of play (steam + proton), the wayland user session suddenly locked up. But fortunately the keycombo to enter a virtual console worked (and alas - I don't have another machine here with me where I could have sshed). Here's the dmseg logs I could snarf. Nasty amdgpu errors:
116 Replies
geeksville
geeksvilleOPโ€ข2mo ago
geeksville
geeksvilleOPโ€ข2mo ago
this was with 6.13.5-102.bazzite.fc41.x86_64 (64-bit) kernel just looking at this log and the 6.13.6 changelog. I bet it is fixed in: amdgpu/pm/legacy: fix suspend/resume issues
antheas
antheasโ€ข2mo ago
.6 is in testing with mesa 25
geeksville
geeksvilleOPโ€ข2mo ago
thanks! just snarfed it. So far I think 6.13.6 probably fixes it! @antheas Though 6.13.6-101 does have a regression compared to 6.13.5: Pressing the sleep button in the gui no longer fully enters sleep (black screen happens, wake happens but can still hear fans spinning). New error message appears in the dmesg output:
geeksville
geeksvilleOPโ€ข2mo ago
[ 137.130226] amd_pmc AMDI000B:00: Last suspend didn't reach deepest state the failure happens after the virtual console is shutdown. I turned on no_console_suspend after seeing this problem and wherever the new sleep problem is after suspend devices completes (because the screen was dark but cpu clearly still kicking)
antheas
antheasโ€ข2mo ago
I can't test it rn Heading to bed Tomorrow Fucking Kernel I swear to god just looking at the changelog that thing probably broke it
geeksville
geeksvilleOPโ€ข2mo ago
heh!
CheckYourFax
CheckYourFaxโ€ข2mo ago
Antheas to the gpu kernel peeps
antheas
antheasโ€ข2mo ago
built the revert
antheas
antheasโ€ข2mo ago
GitHub
Release 6.13.6-102: Revert AMD Sleep patch ยท hhd-dev/kernel-bazzite
Commit a355d0d24d00d19fa70d6408fc1be34fe8ac79e5 is suspected to be causing sleep issues on the Z13. Revert it. Full Changelog: 6.13.6-101...6.13.6-102
antheas
antheasโ€ข2mo ago
Worked on my side but it was a dirty build So tomorrow I'll test this one and hopefully Kyle will drop into testing There is a chance it was something else I partially built the kernel So maybe missing module
geeksville
geeksvilleOPโ€ข2mo ago
btw - alas, after 1 hr of play Satisfactory still crashed using this bazzite:testing branch (kernel 6.13.6-101.bazzite.fc41.x86_64). Wayland locked up. Relevant dmesg attached:
geeksville
geeksvilleOPโ€ข2mo ago
geeksville
geeksvilleOPโ€ข2mo ago
btw I just took a look at the first exception in this latest newcrash file. I bet the root cause is somewhere in panel-self-refresh. The relevant code is young (Nov 2024ish: https://lore.kernel.org/all/[email protected]/T/#m650152eb173c3a0b299c39dd843e92d0903b8b49 ) amdgpu_dm_enable_self_refresh(). I'm going to dig around and see if I can find a runtime flag to turn off this feature and see if the problem goes away. ok I dug around in the relevant kernel srcs and that exception. I think very high likelyhood the problem is in the new panel-replay optimization feature. I'm currently doing a test with "rpm-ostree kargs --append=amdgpu.dcdebugmask=0x400" to mask out just that feature. I also wouldn't be surprised (based on the code comments about what that feature does) that this will also fix the occasional draw artifacts. Also the power savings provided by this feature is probably small
CheckYourFax
CheckYourFaxโ€ข2mo ago
because the new kernel hasn't landed in testing yet You can test if this is the problem by adding dcdebugmask=0x10 to your kargs this disables panel self refresh rpm-ostree kargs --append-if-missing=dcdebugmask=0x10 its a power usage optimization feature
geeksville
geeksvilleOPโ€ข2mo ago
right, I was just mentioning the 6.13.6-101 didn't even fix the original thing I thought it fixed ๐Ÿ˜‰ IMO no need to turn off all of PSR, from looking at the code the error is in the self-refresh path only. so 0x400 probably better
CheckYourFax
CheckYourFaxโ€ข2mo ago
Yes, but to test whether this is actually the problem its a good idea to disable it. if your issue is fixed, you know the problem lies there ๐Ÿ™‚
geeksville
geeksvilleOPโ€ข2mo ago
0x400 turns off a subset of what 0x10 turns off ๐Ÿ˜‰
CheckYourFax
CheckYourFaxโ€ข2mo ago
Sure. If that doesn't work it might still be worthwhile to turn the whole feature off. My way of testing things is usually turn the whole shit off, see if it works, and then re-enable things one by one ๐Ÿ˜›
geeksville
geeksvilleOPโ€ข2mo ago
sure, I'm testing 0x400 now, if that doesn't work I'll go to a bigger hammer (with higher costs) yeah - but i've looked at the code and the error path is definitely in the section guarded by 0x400. Testing now though alas 0x400 was not sufficient, the exception eventually occurred and made it a bit deeper into amdgpu_dm_enable_self_refresh() but failed later in the function. So I switched to 0x10 (to turn off all of the PSR code). Been running now for 40 min and I think it will be golden. Because the occasional draw artifacts that everyone has seen no longer occur. I bet this problem could occur on any eDP panel that supports PSR. ya'll could turn on dcdebugmask=0x10 on for everyone via kargs and I think the cost would be zero for any unit that doesn't have a PSR capable display. From browsing 6.14 commits it looks like AMD geeks are still futzing with this feature, so such a hack will probably be needed only for a little while. i.e. tasty sounding commits like this: drm/amd/display: Disable PSR-SU on eDP panels @antheas the good news: turning off PSR definitely works-around the original exception in this report, it also fixes the occasional brief draw artifacts we've seen. See comment above about tasty sounding 6.14 commits for the root cause. the bad news: I just installed the new testing build. Whichever change you backed out to make 6.13.6-102 kernel wasn't the cause of 6.13.6 failing to fully enter sleep. Sleep still doesn't fully enter when on the testing branch. Relevant dmesgs are unchanged: [ 49.815569] PM: suspend entry (s2idle) [ 49.841997] Filesystems sync: 0.025 seconds [ 49.894384] Freezing user space processes [ 50.888266] Freezing user space processes completed (elapsed 0.993 seconds) [ 50.888285] OOM killer disabled. [ 50.888290] Freezing remaining freezable tasks [ 50.889316] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 50.904563] queueing ieee80211 work while going to suspend [ 50.911953] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: Asserting Reset [ 51.048414] usb 3-2: reset high-speed USB device number 2 using xhci_hcd [ 51.185959] PM: suspend devices took 0.296 seconds [ 51.187718] ACPI: EC: interrupt blocked [ 70.547009] amd_pmc AMDI000B:00: Last suspend didn't reach deepest state [ 70.547561] ACPI: EC: interrupt unblocked [ 70.746632] [drm] PCIE GART of 512M enabled (table at 0x00000083FFB00000). [ 70.746668] amdgpu 0000:c4:00.0: amdgpu: SMU is resuming...
antheas
antheasโ€ข2mo ago
Ok so I have to do a full kernel rebuild and try it today
antheas
antheasโ€ข2mo ago
Turns out it was always broken and 13.6 is fine You probably plugged in a dock or something
geeksville
geeksvilleOPโ€ข2mo ago
hmm - even with no USB accessories attached behavior is same on my flow 13.6 gives that error message wrt suspend (and fans keep spinning while sleeping). 13.5 is fine. do you get that "amd_pmc AMDI000B:00: Last suspend didn't reach deepest state" message even on 13.5?
antheas
antheasโ€ข2mo ago
yes
geeksville
geeksvilleOPโ€ข2mo ago
so sleep doesn't fully enter for you on 13.5? (fans stay spinning etc)
antheas
antheasโ€ข2mo ago
fans stay off but the message says what i said Fan stays on on .6 again you jinxed it I'll do more testing tomorrow
geeksville
geeksvilleOPโ€ข2mo ago
weird. I just tried a bunch of sleep cycles in 13.5 and didn't have that message.|
[ 357.699955] PM: suspend entry (s2idle) [ 357.711232] Filesystems sync: 0.010 seconds [ 357.738037] Freezing user space processes [ 359.640330] Freezing user space processes completed (elapsed 1.901 seconds) [ 359.641204] OOM killer disabled. [ 359.641763] Freezing remaining freezable tasks [ 359.644107] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 359.667529] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: Asserting Reset [ 359.673237] queueing ieee80211 work while going to suspend [ 359.674305] queueing ieee80211 work while going to suspend [ 359.814613] usb 3-2: reset high-speed USB device number 2 using xhci_hcd [ 359.952181] PM: suspend devices took 0.307 seconds [ 359.953778] ACPI: EC: interrupt blocked
antheas
antheasโ€ข2mo ago
@Kyle Gospo push the .5-103 to stable .6 is cooked i set it as latest if akmods need a rebuild When I compile it locally it works Fml
geeksville
geeksvilleOPโ€ข2mo ago
ooh! interesting!!!
CheckYourFax
CheckYourFaxโ€ข2mo ago
antheas is following a classic heisenbug a bug that disappears once you try to debug it :clueless: maybe its some compiler optimization causing issues that's classic heisenbug
antheas
antheasโ€ข2mo ago
@geeksville new kernel is building, seems like amdxdna needed some fixes. Hopefully in a few hours you can test
geeksville
geeksvilleOPโ€ข2mo ago
cool beans! i'll try it today! 6.13.6-103 (via bazzite:testing) works good! fixes the new sleep problem
CheckYourFax
CheckYourFaxโ€ข2mo ago
https://bodhi.fedoraproject.org/updates/FEDORA-2025-346cf69656 6.13.7 is also out already now might be worth it to rebase. ๐Ÿ˜› This also finally includes the unicode fix with the anaconda installer
geeksville
geeksvilleOPโ€ข2mo ago
geeksville
geeksvilleOPโ€ข2mo ago
(after the GPU reset all was fine again)
antheas
antheasโ€ข2mo ago
I'll queue a .7 in a few hours this really needs a rename
antheas
antheasโ€ข2mo ago
https://github.com/bazzite-org/kernel-bazzite/releases/tag/6.13.7-104 something for you to play with in a few hours
GitHub
Release 6.13.7-104: Z13 keyboard goodies ยท bazzite-org/kernel-bazzite
For the Asus ROG Z13: Fixes the touchpad acting like a mouse during boot Fixes the keyboard and lightbar light brightness levels and syncs them with the keyboard backlight setting in KDE/GNOME Fix...
geeksville
geeksvilleOPโ€ข2mo ago
alas, this kernel isn't yet in bazzite-testing but I can check again tomorrow morning.
antheas
antheasโ€ข2mo ago
My z13 wakes up at night on its own And crashes Same time, 2:38
geeksville
geeksvilleOPโ€ข2mo ago
interesting! I haven't seen that on mine (6.13.6-103.bazzite.fc41.x86_64). I put it to sleep at night and when I wake in the morning by pressing a key it looks fine.
antheas
antheasโ€ข2mo ago
I think it's .7 My .7 Probably Mario's display patches are undercooked and I should nix them Although I can't see anything wrong with them
geeksville
geeksvilleOPโ€ข2mo ago
I'm busy with other stuff for a few days so I haven't tried to figure out how bazzite/rpm/fedora build system layers patches on top of the regular kernel tree. But just from scrolling through github, is this okay? i.e. this function fails to release a lock through one of the two possible exit paths.
No description
geeksville
geeksvilleOPโ€ข2mo ago
also that caused me to search for brt_lock (admittedly only in the patch file view on github - so imperfect). Here is it possibly calling unlock on a mutex we have already released?
No description
geeksville
geeksvilleOPโ€ข2mo ago
btw - for lulz I tried running the latest ollama (in podman and with the gpu exposed into the container). It worked good! happily uses the GPU and runs fast (haven't benchmarked yet)
antheas
antheasโ€ข2mo ago
I fixed the issues with the locks. I don't think that's it Yeah I fixed that And that too, mutex lock happens after the unregister check That way when we unregister it does not lock twice What happens is that the GPU explodes in the log i have A rail does not come back and then it starts accessing invalid memory and it diws
geeksville
geeksvilleOPโ€ข2mo ago
heh - for lulz I tried using ollama via the very fresh rocm halo support. It mostly worked well but I did just see a GPU reset (which everything except ollama recovered from)
geeksville
geeksvilleOPโ€ข2mo ago
fyi
antheas
antheasโ€ข2mo ago
GitHub
Release 6.13.7-107: Asus Z13 RGB Support ยท bazzite-org/kernel-bazzite
Adds RGB support to Asus Z13 + stability fixes related to backlight. Full Changelog: 6.13.7-106...6.13.7-107
antheas
antheasโ€ข2mo ago
geeksville
geeksvilleOPโ€ข2mo ago
the new testing build works well (at least as well as the one that had the prior kernel. The keyboard/clamshell light control works also.
antheas
antheasโ€ข2mo ago
RGB should work too on this one
geeksville
geeksvilleOPโ€ข2mo ago
ooh - is there a helper app I should try to test that?
antheas
antheasโ€ข2mo ago
KDE accent
geeksville
geeksvilleOPโ€ข2mo ago
hmm - cool. it seems like it kinda works? I found the UI in KDE it now has the option to have the keyboard color follow the accent color. And initially the color for my theme was blue and the keyboard light was blue (yay!). But if I change the accent color in the KDE theme to red: The KDE button in the UI for keyboard color changes to red (yay) but the actual LED lights on the keyboard stay blue. So possibly something wonky there.
antheas
antheasโ€ข2mo ago
hopefully nothing crashed yeah the accent thing is kinda trash
konros
konrosโ€ข2mo ago
Are you guys also experiencing slower Wi-Fi performance on your z13? Seems capped at around 200 Mbps for me and the Mediatek module is soldered on
geeksville
geeksvilleOPโ€ข2mo ago
yes - the upload speed in particular is really slow on the mt7925. For the time beingI'm using a USB wifi dongle. After 6.14 is out (and in fedora/bazzite) if it is not fixed then (and no one else is working on it first), I'm planning on spending some serious effort on debugging it (hopefully the flaw is not in the opaque on-device firmware blob) ๐Ÿ˜‰ btw this person did some interesting crude bisect testing. If their test is correct there was a regression somewhere between 6.13.0 and 6.13.3
geeksville
geeksvilleOPโ€ข2mo ago
geeksville
geeksvilleOPโ€ข2mo ago
though I'm a little skeptical because I don't see anything substantial in "git diff v6.12 v6.13-rc3 -- drivers/net/wireless/mediatek" didn't check linux-firmware though though 6.14 has lots of new relevant commits. git diff v6.13-rc7 master drivers/net/wireless/mediatek
CheckYourFax
CheckYourFaxโ€ข2mo ago
sad they choose to solder instead of an m.2 key and save 13 cents ๐Ÿ˜ฆ
geeksville
geeksvilleOPโ€ข2mo ago
also I presume solder saves money on shock&vibe and related (i.e. users mucking around inside the device and fucking up without getting caught) warranty costs. so probably more like $2 there's a reason they made a door for the SSD and it wasn't just to be nice ๐Ÿ˜‰ and the % of users who would even bother to swap wifi (even if a M.2 connector) is tiny but yeah - it would fucking rock for me!
konros
konrosโ€ข2mo ago
Nice... I'll keep an eye on 6.14... I installed Arch originally... but I went back to Windows when I couldnt improve the wifi performance... I'm planning on picking up a 2TB SSD so I'll give it another shot then
geeksville
geeksvilleOPโ€ข2mo ago
btw @antheas I think I've got ryzenadj updated and I'm correctly writing undervolt values for CPU and GPU. I'll test more tomorrow after building up a little test harness. It is a bit of a race against time because we're going away for about a week on a bike trip - so if I don't have it tested by tomorrow evening I won't be back at it until sometime late next week.
CheckYourFax
CheckYourFaxโ€ข2mo ago
I'm not sure if still relevant, but 6.13.8 came out with a suspend fix on eDP: https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.13.8
CheckYourFax
CheckYourFaxโ€ข2mo ago
No description
CheckYourFax
CheckYourFaxโ€ข2mo ago
This should finally fix lingering wake-suspend issues on eDP displays
geeksville
geeksvilleOPโ€ข2mo ago
also mesa 25.0.2 should be out soon with some important fixes btw the CPU undervolting works. I've also updated the ryzenadj tool so it can read per core CPU power and voltage. My system works robustly with a -40mV cpu core voltage offset. I'm going to check this in and start working on the (probably similar) changes needed to undervolt the GPU die.
yay! 40 (30mV actual at idle - a bit more under load) of CPU undervolting works:
sudo ./ryzenadj --dump-table --set-coall=0x0fffd8 >coal40
core voltage change:
| 0x0BD0 | 0x3F343865 | 0.704 | 0.675 |
| 0x0BD4 | 0x3F33FD8A | 0.703 | 0.675 |
| 0x0BD8 | 0x3F34373A | 0.704 | 0.675 |
| 0x0BDC | 0x3F35B1FB | 0.710 | 0.681 |
| 0x0BE0 | 0x3F351F74 | 0.708 | 0.679 |
| 0x0BE4 | 0x3F34F129 | 0.707 | 0.678 |
| 0x0BE8 | 0x3F34C85D | 0.706 | 0.678 |
| 0x0BEC | 0x3F350183 | 0.707 | 0.679 |
| 0x0BF0 | 0x3F327A87 | 0.697 | 0.668 |
| 0x0BF4 | 0x3F341BA7 | 0.704 | 0.675 |
| 0x0BF8 | 0x3F32F8A9 | 0.699 | 0.671 |
| 0x0BFC | 0x3F33A0E0 | 0.702 | 0.673 |
| 0x0C00 | 0x3F32ED00 | 0.699 | 0.670 |
| 0x0C04 | 0x3F337071 | 0.701 | 0.672 |
| 0x0C08 | 0x3F3376AF | 0.701 | 0.672 |
| 0x0C0C | 0x3F326486 | 0.697 | 0.668 |
yay! 40 (30mV actual at idle - a bit more under load) of CPU undervolting works:
sudo ./ryzenadj --dump-table --set-coall=0x0fffd8 >coal40
core voltage change:
| 0x0BD0 | 0x3F343865 | 0.704 | 0.675 |
| 0x0BD4 | 0x3F33FD8A | 0.703 | 0.675 |
| 0x0BD8 | 0x3F34373A | 0.704 | 0.675 |
| 0x0BDC | 0x3F35B1FB | 0.710 | 0.681 |
| 0x0BE0 | 0x3F351F74 | 0.708 | 0.679 |
| 0x0BE4 | 0x3F34F129 | 0.707 | 0.678 |
| 0x0BE8 | 0x3F34C85D | 0.706 | 0.678 |
| 0x0BEC | 0x3F350183 | 0.707 | 0.679 |
| 0x0BF0 | 0x3F327A87 | 0.697 | 0.668 |
| 0x0BF4 | 0x3F341BA7 | 0.704 | 0.675 |
| 0x0BF8 | 0x3F32F8A9 | 0.699 | 0.671 |
| 0x0BFC | 0x3F33A0E0 | 0.702 | 0.673 |
| 0x0C00 | 0x3F32ED00 | 0.699 | 0.670 |
| 0x0C04 | 0x3F337071 | 0.701 | 0.672 |
| 0x0C08 | 0x3F3376AF | 0.701 | 0.672 |
| 0x0C0C | 0x3F326486 | 0.697 | 0.668 |
somewhere at about -80 my CPU (mprimes) stress test starts to fail... but pretty happy with -40 at least
geeksville
geeksvilleOPโ€ข2mo ago
GitHub
Add support for Strix Halo CPUs by geeksville ยท Pull Request #334 ยท...
Hi @FlyGoat, It was good chatting with you via email. I've made some progress on adding Strix Halo CPU support. I'm attaching this PR but I'll keep adding to it in my wor...
geeksville
geeksvilleOPโ€ข2mo ago
good news: I ran a long stress test and -40 of cpu undervolt was fine on my machine for a few hour test (and makes things run a lot cooler under load). -50 is too much and my cpu stresstest fails. UPDATE: Alas, -40 failed after about 4 hrs of stress testing, I'm going to leave -30 running overnight and stay with that if it does okay. FINAL-UPDATE: -30 ran solid for 8 hrs so I'm calling it good (for my particular laptop). I just added a commit to improve the GPU support for halos on ryzenadj. Alas, the cogfx mailbox is at a different message code on this new arch (different from hawkpoint/van gogh) - so no setting GPU undervolt until that new mailbox code is found. I'm going to go away now on my bike trek for about a week and a half, hopefully the windows ghelper folks will figure that out and I can crib from what they find ๐Ÿ˜‰ btw: I think the SMU gets a bit clobbered on wake from sleep, if that's true it will be necessary to rerun the set coall command after wake. When I come back I'll check that and if necessary make some sort of systemctlish thing to do the proper whacking.
konros
konrosโ€ข3w ago
In case anyone finds this useful... I haven't tested the patch yet... It might be in 14 already not sure... https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/112 Seems a fix for the regression is already being worked on: https://lore.kernel.org/linux-wireless/[email protected]/#r
GitLab
MT7925 wifi throughput halved with 6.13.2. Works fine with 6.13.1 (...
Description: The MT7925 wifi chip that came with my Gigabyte X870E Aorus Elite is showing massive wifi throughput regressions with kernel...
geeksville
geeksvilleOPโ€ข3w ago
this fix went in as commit commit 766ea2cf5a398c7eed519b12c6c6cf1631143ea2 Author: Sean Wang <[email protected]> Date: Tue Mar 4 16:08:46 2025 -0800 to mainline kernel 6.13.9 was a couple weeks after that so I assume the fix is included in 6.13.9. I just tried removing my USB wifi dongle and it seems to me that the mt7925e on the flow is now fixed on current bazzite! it is now at least as fast as the (slowish) wifi provided by my rental here in taipei. (300mbit down - only 50mbit up - but I bet that's a limit from my rental)
konros
konrosโ€ข3w ago
Damnโ€ฆ I tried it in arch 14.2 but itโ€™s still not fixedโ€ฆ Iโ€™m getting around 200-220/80โ€ฆ when I should be getting 400 at leastโ€ฆ Iโ€™ll have to do some digging
geeksville
geeksvilleOPโ€ข2w ago
btw @antheas , new kernel panic in 6.14.2 for the flowz13 on wake from sleep (doesn't really bother me though considering how generally busted in other critical ways F42 is for me on bazzite). Posting here in case useful:
antheas
antheasโ€ข2w ago
looks like the 6.13 ones
geeksville
geeksvilleOPโ€ข2w ago
i haven't had a panic like that in 6.13.9 at all
antheas
antheasโ€ข2w ago
ive had lots dc_update_planes_and_stream with that
geeksville
geeksvilleOPโ€ข2w ago
hmm interesting hi @antheas in case others report it or you see it: Flow 2025 has a new problem that appears in 6.14.2 kernel. Apparently it was fixed in some 6.13 releases but wasn't in 6.14 (again) until 6.14.6. A smallish percent of the time the USB host controller in the APU locks up (and this breaks many things downstream of course). I don't know if you want to bother with backporting the fix or just wait until fedora or bazzite bumps up to a more recent kernel. The relevant commit is: commit c7c1f3b05c67173f462d73d301d572b3f9e57e3b Author: Michal Pecio <[email protected]> Date: Tue Mar 4 13:31:47 2025 +0200 usb: xhci: Fix host controllers "dying" after suspend and resume
A recent cleanup went a bit too far and dropped clearing the cycle bit of link TRBs, so it stays different from the rest of the ring half of the time. Then a race occurs: if the xHC reaches such link TRB before more commands are queued, the link's cycle bit unintentionally matches the xHC's cycle so it follows the link and waits for further commands. If more commands are queued before the xHC gets there, inc_enq() flips the bit so the xHC later sees a mismatch and stops executing commands.
This function is called before suspend and 50% of times after resuming the xHC is doomed to get stuck sooner or later. Then some Stop Endpoint command fails to complete in 5 seconds and this shows up
xhci_hcd 0000:00:10.0: xHCI host not responding to stop endpoint command xhci_hcd 0000:00:10.0: xHCI host controller not responding, assume dead xhci_hcd 0000:00:10.0: HC died; cleaning up
followed by loss of all USB decives on the affected bus. That's if you are lucky, because if Set Deq gets stuck instead, the failure is silent.
Likely responsible for kernel bug 219824. I found this while searching for possible causes of that regression and reproduced it locally before hearing back from the reporter. To repro, simply wait for link cycle to become set (debugfs), then suspend, resume and wait. To accelerate the failure I used a script which repeatedly starts and stops a UVC camera.
Some HCs get fully reinitialized on resume and they are not affected.
antheas
antheasโ€ข2w ago
Is it fixed in .3?
geeksville
geeksvilleOPโ€ข2w ago
no. fixed in 6.14-rc6
antheas
antheasโ€ข2w ago
In testing we are in 6.14.2
geeksville
geeksvilleOPโ€ข2w ago
yeah - that won't have this patch
antheas
antheasโ€ข2w ago
6.14.2 comes after rc6 Good, I did a revert that nooped I'll drop the revert
geeksville
geeksvilleOPโ€ข2w ago
oh! i didn't realize how fedora numbers kernels! I thought 6.14.2 mapped to the kernel 6.14-rc2.
antheas
antheasโ€ข2w ago
It's not fedora A new kernel has around 8 rcs It begins at rc0 After that, the .0 kernel is made Kernel 6.14 released 3 weeks ago or so
geeksville
geeksvilleOPโ€ข2w ago
oh interesting. when I pointed my git at the kernel,org tree I only saw tags like this and assumed:
โฏ git tag | grep 6.14 17: v2.6.14 18: v2.6.14-rc1 19: v2.6.14-rc2 20: v2.6.14-rc3 21: v2.6.14-rc4 22: v2.6.14-rc5 805: v6.14 806: v6.14-rc1 807: v6.14-rc2 808: v6.14-rc3 809: v6.14-rc4 810: v6.14-rc5 811: v6.14-rc6 812: v6.14-rc7
CheckYourFax
CheckYourFaxโ€ข2w ago
2.6 is their old versioning
geeksville
geeksvilleOPโ€ข2w ago
right
CheckYourFax
CheckYourFaxโ€ข2w ago
and very very old
geeksville
geeksvilleOPโ€ข2w ago
but see lines 806-812
CheckYourFax
CheckYourFaxโ€ข2w ago
that's 200X old
geeksville
geeksvilleOPโ€ข2w ago
I see v6.14-rc2 but no 6.14.2. So 6.4.2 doesn't get a tag, they just make a branch for it?
antheas
antheasโ€ข2w ago
V6.14.2 is tagged It's not part of Torvalds tree You need the stable tree for that
geeksville
geeksvilleOPโ€ข2w ago
oH! thanks - will look. I'm used to the android kernel trees.
antheas
antheasโ€ข2w ago
Fedora kernels get mapped to kernel-version From the ark tree
CheckYourFax
CheckYourFaxโ€ข2w ago
torvalds only does rc's and main releases all dot releases are done by maintainers torvalds is mainline
geeksville
geeksvilleOPโ€ข2w ago
ok - i'll go find the stable git tree and refer to that instead
antheas
antheasโ€ข6d ago
Are you sure the fix is included?
geeksville
geeksvilleOPโ€ข5d ago
yep - the fix is in there! though strangely I still see this exact failure (that this fix was supposedly to address) occur (rarely)
CheckYourFax
CheckYourFaxโ€ข5d ago
not uncommon that fix doesn't work 100%
geeksville
geeksvilleOPโ€ข5d ago
๐Ÿ˜‰
CheckYourFax
CheckYourFaxโ€ข5d ago
might have to wait longer for a fix that covers all triggers ๐Ÿ˜‚ 6.14.4 just came out so maybe twirl your thumbs
geeksville
geeksvilleOPโ€ข5d ago
btw 6.14.4 might remove the need for the 0x410 debug flags that I needed: commit 56daad28d6c6e1157ad9021d3dab50e733f22f58 Author: Tom Chung <[email protected]> Date: Wed Mar 19 16:31:31 2025 +0800 drm/amd/display: Do not enable Replay and PSR while VRR is on in amdgpu_dm_commit_planes()
commit 69a46ce1f15b4391c128d581f6936750f9bfa052 upstream.
[Why] Replay and PSR will cause some video corruption while VRR is enabled.
antheas
antheasโ€ข5d ago
Btw that caused crashes due to a valve patch I removed
geeksville
geeksvilleOPโ€ข4d ago
fwiw - I just tried 6.14.4 and alas the 0x410 is still needed for dcdebugmask otherwise many draw artifacts on the Flow 2025 alas - someday they will fix it ๐Ÿ˜‰
antheas
antheasโ€ข4d ago
do you use yours docked?
geeksville
geeksvilleOPโ€ข3d ago
hi hi - i'm not quite sure what you mean by docked? i do use a usb4 hub with some misc hw widgets (ttgo lora boards) and a display-port over usb external monitor. does that help?
antheas
antheasโ€ข3d ago
Yes Does yours crash
geeksville
geeksvilleOPโ€ข3d ago
Only the rare usb host controller going away bug I alluded to previously. 1 in 20 wakes? Though haven't tried 6.14.4 yet for that But yes on 6.14.3
antheas
antheasโ€ข3d ago
Yes Well mine dies
geeksville
geeksvilleOPโ€ข3d ago
interesting! I only have that xhci blows chunks and stops talking (and I have to reboot because then all usb devices are inaccessible) problem - but it is rare for me
antheas
antheasโ€ข3d ago
The CPU locks up on kine Nothing. Worke
geeksville
geeksvilleOPโ€ข3d ago
oh! i'm currently using aurora-dx (and longer term i'll probably stay with that) - kernel 6.14.3. if you want I can try a quick reboot into bazzite-testing to see if I see the same thing. does your cpu lockup on wake from sleep?
antheas
antheasโ€ข3d ago
Yes After 20-30 times Stock kernel too 6.12 as well

Did you find this page helpful?