fTPM - stable but abysmal performance
I've been struggling with an unstable system since building it (specs below)
I get kernel panics at least once a day.
I noticed CPU/RAM errors !only! when stress testing while changing settings (hz, resolution) of my screen. Benchmark values were always at the lower average.
So far I was only to reproduce this reliably on bazzite. no issues like that on windows.
yesterday I enabled AMD fTPM again (originally, I was recommended to switch it off for linux)
The system has been stable during tests since and there were no kernel panics so far.
Only multi-core performance dropped to abysmal levels. 5-30% of expected averages.
On Windows it looks relatively normal.
specs:

74 Replies
ujust get-logs
however occt should detect hardware errors during a stress test
if it detects it, then the problem is probably hardware relatedYea hardware, bios or driver I think.
Since this problem is only on bazzite, not on windows, its probably driver
so performance is normal on windows's occt?
with no errors?
Yes performance benchmarks are at the average if I recall correctly. I'll test it again just to confirm
no errors with fTPM enabled at least. I can switch fTPM off and test for errors again on windows.
I read somewhere this might be a sign of a RNG problem with the BIOS and might need a patch from the manufacturer
no, not really
whether you have ftpm enabled or not does not affect error detection
its strange, I have 0 errors with fTPM, but 100% reproducable without fTPM. I have no explanation for this so far
reproducible as in?
logs will be very useful if you provide them
basically, I can produce errors If i disable fTPM, run OCCT stress test and change my display settings. It will show core erros or memory errors, depending on the test.
https://paste.centos.org/view/c873d0fb
last boot.
I always suspected its related to low power states somehow. It never crashed during gameplay, only occasionally whenn game loaded and fullscreen resolution changed.
it most reliably kernel panics when I just boot bazzite and wait for 1-2 hours. But I can game 5 hours straight, never crashes.
nnevermind, I just selected a resolution my monitor doesn't support while running OCCT and I got CPU errors. so fTPM only makes it less reproduceable, but still happens (on bazzite). will test again on windows in a moment.
you can try and use wine virtual desktop or gamescope(via scopebuddy) as a workaround
logs from just now when the errors happend (12:37) https://paste.centos.org/view/04eb3978
ok if you have pbo turned on, turn it off first
and xmp
ok will do 1sec
Also it very often hangs during shutdown, making me force power off
Hm can't find a BIOS setting called pbo or xmp
amd overclocking -> precision boost overdrive -> disabled
and something like D.O.C.P. -> disable
also if you don't use wifi, try and find an option to disable the wifi card
I guess gaming mode off and zen5 opt off?

Found precision boost and disabled it
there should be an option in advanced -> amd overclocking
but if you found it it should be fine too
Yep found the precision boost there. Still looking for docp

for xmp/docp it is in overclocking -> dram profile configuration
Disable avx512 as well? I think it's an I Tel thing right?
nah that is fine
Ok
This also ok?

yep
you can test it in windows and/or bazzite to see if the issue is hardware related
wait a minute
you said your cpu is a 9950x3d?
Yea
disable gaming mode here
Done.
From auto to first value I guess?

test with ddr5 5600 first
Ok will boot back to bazzite now
huh. I wonder why it says 8 cores. Afaik this processor should have 16 cores and 32 threads
damn. CPU errors again when changing screen resolution.
annd the errors will continue after that. as if it permanently breaks something after changing screen settings
but stopping+starting the test again, no errors.
I dont get it. or is it maybe an issue with OCCT itself, not actual hardware errors
journalctl from then:
something with your motherboard's "gaming mode" stuff
try and disable "zen5 gaming optimization" as well
ok. l eft it running after restarting, 20 minutes no errors, but also screenn settings untouched. I have no idea how that is even connected, makes no sense to me 🥲
probably unrelated to screen settings tbh
once everything is stable you can re-enable pbo and xmp
set pbo to enabled and xmp to xmp2-8000
you can try xmp1-8200, but i heard it may or may not be doable
I think my ram might draw too much voltage at 8000
nah thats fine
your ram is rated for it

otherwise it wouldn't be advertised as ddr5-8200
yeah thats fine
Yea I heard there was an issue with that model but forgot what it was. Ok
Originally left this on because it looked specifically made for my CPU series. Disabling now

it is probably the motherboard trying to do some weirdness with ccds.
since you got a 9950x3d, you should be aware that it has 2 ccds, and cross-ccd latency is relatively very high. On top of that, only one ccd has the extra X3D cache, so ideally you want games to only run on that ccd with the extra cache
nowadays the linux kernel cpu scheduler is aware of that and will keep games on that ccd
cpu schedulers (both windows and linux) used to not do that, and a lot of motherboard manufacturers hacked up their own methods to keep games on the x3d ccd
including turning off the other x3d outright, turning your 9950x3d back to a 9800x3d
Ok, so the issue might be related to processes swapping between the ccds?
probably
nowadays just let the linux cpu scheduler to deal with it
Ok let's see what happens 😂

Ok doesn't even pre-boot anymore, CMOS reset time 😭
technically NVRAM but same thing basically
actually wait for a few minutes before resetting
it may be memory training
Crap. Ok will do the settings and try once more just in case
you can try the other lower option too
difference is CMOS always had a battery
NVRAM doesn't meed one it's non volatile RAM
thus CMOS was always resettable at least by removing the battery if nothing else
with NVRAM there sometimes isn't a way to hardware reset it
on business laptops & stuff
Pbo enable or auto?auto is default
enable should be fine
Ok. Will try it with this first

Ok bazzite booting
so far so good 😄
ah forgot to disable wifi card, will do that next time
oh, this time even without doing something to my screen 😄

🫠
I know right 😄
try ram only?
so far so good

weird
how about just cpu?
ok. before i stopI do screen stuff once more
after changing screen resolution a few times

I dont get it 😄
makes 0 sense to me
makes 0 sense to me too
i guess disable xmp and try again?
set it to 5600 in the profile
I think I tested it at this stage before, but can try.
changed resolution during gpu test, nothing. this would have made more sense to me.
only thing I can assume is power-profile related. or c-state settings will let CPU stress run for a few minutes first stable so far
only thing I can assume is power-profile related. or c-state settings will let CPU stress run for a few minutes first stable so far
changed screen resolution -instantly

yea I'd be with GPT here, software bugs with OCCT onn resolution change lol. on the other hand... this wouldn't explain the kernel panics when my system does little/nothing.

tried changinng power profile to performance, still the same.
nah llm usually spits out bullshit
you can retry with pbo off
if it doesn't error out then its probably a hardware problem
yea true. sometimes it scraped over good stuff though
saw some parallels to this https://www.elevenforum.com/t/random-freezes-when-system-is-idle.29998/
Windows 11 Forum
Random freezes when system is idle
Hi, I'm using Windows 11 23H2 (OS Build 22631.4391) but the problem happened before the last update too. The PC components were bought and assembled about a year ago.
As it's infrequent, I don't know when exactly but at least 6 months ago. It occurs when the system is idle or doing common tasks...