Nvidia DRM errors, can't boot into new deployment since around 4/21.

Hi, I've been unable to boot into a new deployment since late April. I have a pinned deployment that I have been using since this time, and no recently-updated new deployment has resolved the problem. I see nvidia drm issues and a gdm core dump in the logs. Here is rpm-ostree status showing currently botted working deployment and the current staged deployment that is not. I have unlayered all packages as a troubleshooting step. Thank you for the help.
Deployments:
ostree-image-signed:docker://ghcr.io/ublue-os/bazzite-gnome-nvidia-open:stable
Digest: sha256:df9cd3a68dda2c88e53534c5ce07485dce458b4b7fac141eeb14b52fe02b19b3
Version: 42.20250513 (2025-05-13T16:18:35Z)
Diff: 303 upgraded, 27 removed, 12 added

● ostree-image-signed:docker://ghcr.io/ublue-os/bazzite-gnome-nvidia-open:stable
Digest: sha256:e0d3c81542c86a3ad306cdf6ab32f9294121aa89b95b05ee3f37b20dc202757a
Version: 42.20250421 (2025-04-21T05:32:18Z)
LayeredPackages: earlyoom ulauncher uxplay virt-manager
Pinned: yes
Deployments:
ostree-image-signed:docker://ghcr.io/ublue-os/bazzite-gnome-nvidia-open:stable
Digest: sha256:df9cd3a68dda2c88e53534c5ce07485dce458b4b7fac141eeb14b52fe02b19b3
Version: 42.20250513 (2025-05-13T16:18:35Z)
Diff: 303 upgraded, 27 removed, 12 added

● ostree-image-signed:docker://ghcr.io/ublue-os/bazzite-gnome-nvidia-open:stable
Digest: sha256:e0d3c81542c86a3ad306cdf6ab32f9294121aa89b95b05ee3f37b20dc202757a
Version: 42.20250421 (2025-04-21T05:32:18Z)
LayeredPackages: earlyoom ulauncher uxplay virt-manager
Pinned: yes
17 Replies
Wheeler
WheelerOP5mo ago
Wheeler
WheelerOP5mo ago
I'm no expert whatsoever, but the notable entries in the logs that stood out to me and aren't present in the logs for the working pinned deployment are:
May 13 15:58:37 kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b:00.0)
May 13 15:58:39 kernel: [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to allocate NvKmsKapiDevice
May 13 15:58:39 kernel: [drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to register device
May 13 15:58:39 kernel: nvidia 0000:0b:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001e address=0xfed01000 flags=0x0000]
May 13 22:58:46 /usr/bin/nvidia-powerd[1904]: Failed to attach all GPUs.
May 13 22:58:46 /usr/bin/nvidia-powerd[1904]: Failed to initialize RM Client
May 13 22:59:03 unix_chkpwd[3200]: could not obtain user info (gdm)
May 13 22:59:03 unix_chkpwd[3201]: could not obtain user info (gdm)
May 13 22:59:03 gdm[2960]: Gdm: GdmSession: no session desktop files installed, aborting...
May 13 22:59:03 systemd-coredump[3204]: Process 2960 (gdm) of user 0 dumped core.
May 13 15:58:37 kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b:00.0)
May 13 15:58:39 kernel: [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to allocate NvKmsKapiDevice
May 13 15:58:39 kernel: [drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to register device
May 13 15:58:39 kernel: nvidia 0000:0b:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001e address=0xfed01000 flags=0x0000]
May 13 22:58:46 /usr/bin/nvidia-powerd[1904]: Failed to attach all GPUs.
May 13 22:58:46 /usr/bin/nvidia-powerd[1904]: Failed to initialize RM Client
May 13 22:59:03 unix_chkpwd[3200]: could not obtain user info (gdm)
May 13 22:59:03 unix_chkpwd[3201]: could not obtain user info (gdm)
May 13 22:59:03 gdm[2960]: Gdm: GdmSession: no session desktop files installed, aborting...
May 13 22:59:03 systemd-coredump[3204]: Process 2960 (gdm) of user 0 dumped core.
If it helps, this is a desktop machine, nvidia 5080, no integrated GPU, just the disrete 5080. Never really had any issues. Replaced a 3090 with the 5080 months back, non "open" image didn't work, was advised to move to -open:stable and everything has been just fine since until late April when no new deployment would boot. I get the 3 red dots and the display manager fails to load, and the rest is in the logs. I'm not sure if it's helpful, but I'm also including the output from rpm-ostree db diff --changelogs in case anything can be gleaned from version differences/etc.
Wheeler
WheelerOP5mo ago
Wheeler
WheelerOP5mo ago
I think I made a mistake adding this many reponses. It looks like this is an active chat and not just me. Is it possible that I should be using Aurora/Bluefin rather than Bazzite?
sebmaster
sebmaster5mo ago
I've had this recently. Check your BIOS if you can enable 4G Decoding support + BAR Resize (might be called something else in your BIOS)
Wheeler
WheelerOP5mo ago
Oh, I appreciate the suggestion! I will give that a try!
sebmaster
sebmaster5mo ago
I created this issue back then. On second read the output does read differently, but maybe it's just crashing somewhere differently than it did for me.
Wheeler
WheelerOP5mo ago
Darn, it looks like that didn't quite do it. I had 4G Decoding Enabled, but BAR Resize was set to Disable. I changed it to Auto (Enable wasn't an option directly.)
Wheeler
WheelerOP5mo ago
No description
Wheeler
WheelerOP5mo ago
But still doesn't book. I get the 3 dots. One where all 3 was red, next boot only two of them red.
Wheeler
WheelerOP5mo ago
No description
Wheeler
WheelerOP5mo ago
No description
Wheeler
WheelerOP4mo ago
Still back on the 4/21 deployment. But I really appreciate the possible angle! The nvidia drm issue looks similar, but maybe it's an unrelated reason the pci device is entirely unavailable, but the nvidia drm encounters the same kinda error in either case. Bazzite letting me know I haven't updated in over a month. ☹️ I don't know what to do anymore. I narrowed it down to changes made between bazzite-gnome-nvidia-open:stable-42.20250417 and bazzite-gnome-nvidia-open:stable-42.20250425 This is the list of packages that were updated. Is it reasonable to assume something in these chnages is what's causing the issues now?
Wheeler
WheelerOP4mo ago
It seems like gdm can't find a display? Success after a month and a half. Searching for "gdm", I found @CheckYourFax proposing a solution to a github user that was having a similar issue from 3 days ago. (Thank you, CheckYourFax!)
Wheeler
WheelerOP4mo ago
GitHub
Black screen after 42.20250603.1 · Issue #2731 · ublue-os/bazzite
Describe the bug After the update to 42.2025063.1, I cannot boot into the system. I will have the Bazzite logo at the bottom. But after that it is black screen. What did you expect to happen? I hav...
Wheeler
WheelerOP4mo ago
Discord discussion about potential solution: https://discord.com/channels/1072614816579063828/1237926270151688192/1380507626311258133 This worked for me and I'm back in business. This is a huge relief cause nothing else seemed to make sense and I didn't feel capable of figuring the problem out.

Did you find this page helpful?