Too many open files crashing kwin_wayland
I have seen multiple crashes in the last 1-2 days, all mentioning too many open files.
These crashes kill kwin_wayland with a segfault, and the stack trace always mentions libspa-videoconvert.so, though that might be a red herring.
70 Replies
In particular, I notice that
ulimit -Sn and ulimit -Hn show soft and hard open file limits of 1024 and 1048576, respectively.
I want to track my actual open file (descriptor?) count so I can identify which process(es) or driver(s) may be leaking descriptors and causing these crashes.
Currently, the only data I have is lsof | wc -l which shows, at the moment, 503755
For anyone who sees this, what do you currently see from lsof | wc -l?
I'm curious if that is typical, or way higher than normal.
I have Steam, Firefox with several windows, several terminals, a Wine application, and Discord runningI'm showing 745,489 right now
wow, so I guess that's normal
That seems uncomfortably close to the 1048576 hard limit, given that it varies substantially depending on what's going on
(right now Vulkan shaders are compiling)
yeah I'm not even running anything intensive, just my standard work from home setup of like.. discord, spotify, teams, vs code, terminal, openoffice, bunch of browser tabs
@CaramelCorn before anyone else asks, nVidia or AMD? (I have nvidia)
AMD
I've filtered it by process. The big ones are
roughly 40k from steam
roughly 40k from plasmashell/krunner
over 240k from Firefox
35k from Discord
and about 80k from "Isolated Web Container"
not sure if that's a Firefox thing or a Bazzite thing
A couple hours later, seeing about 520k-550k
Ugh, it's getting more frequent
I had just checked the open file count and it was around 520k, then within a couple minutes I clicked a taskbar icon
System freeze, then kwin_wayland crash, too many open files coredump
After the KDE restart, now it's showing around 260k open file descriptors, random note
is it clicking the Taskbar icon every time that does it?
POSSIBLY, that's when I usually notice it
I go to click a (stacked?) icon to expand it, then the system freezes for a few seconds while still showing windows, then kwin_wayland dies and kills all applications
I see 2-3 coredump processes running after the fact, so it may also be that there are multiple coredumps.
Is it safe to kill those processes? It takes forever because my hard drive is slow, so I'd prefer to just skip them
It sucks that the Vulkan shader compilation completely got its progress reset due to the kwin_wayland crash. š And it takes hours on my system.
It looks like clicking "Cancel" also resets Vulkan compilation. Only "Skip" might preserve existing progress?
Next time it crashes, I'll try to do a more thorough log scan to see if there are 2 or 3 segfaults
3 hours later, I'm back up at almost 500k open files. I do have Vulkan shaders compiling yet again. Just recording data points.
shot in the dark, but try turning off file indexing
I didn't know Bazzite did that, I'll search how to do so
it's in kde settings, Search -> File Search
or just type index into the search box
Thanks, I see
baloo_file stopped running, but baloorunner is still activesee if it persists after a restart
I'll let this sit for a while, currently at 400k open files while compiling Vulkan
ooooooof even clicking "Skip" resets progress, so it seems there is literally no way to compile Vulkan shaders without it running 100% to completion without a KDE crash
New day, Steam downloaded an update (to ProtonGE 10-17? to their own shaders? no idea) so I'm attempting to make it through Vulkan again without a crash.
333k open files as of this message, will monitor through the day
7 hours later, 450k open files, no crashes today
I've started using konsole instead of ptyxis because I heard konsole can survive a KDE compositor crash
This happened again today, too many open files
running a few terminals, Firefox, System Monitor
Steam with shader compilation
kwin_wayland crash, lost all apps including konsole terminals that I hoped would survive compositor handoff
Bizarrely, a segfault shows up in libFLAC when I wasn't listening to any music, not sure what's up with that
And here's konsole dying. I wonder if it being a flatpak instead of a normal installed application prevents the compositor handoff?
need the stuff further down
below which section? That's the top of the crash I think
I need a stack trace
it doesn't have debug symbols but ti's fine
This is also weird, it's showing an nvidia driver error with invalid sync file descriptor, maybe because of too many open files?
And also video, maybe that's due to Discord? I didn't see any video playing anywhere
also make a new user account and see if you can reproduce the issue there?
Nov 01 15:09:25 kernel: [drm:nv_drm_semsurf_fence_wait_ioctl [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100]
I thought you were on AMD
no, nvidia
I'm aware it has issues, just trying to identify if it's causing this out of files error specifically
I meant to say I have nvidia
delete the last 2, useless
let me try to get the entire stack
the only other crash dump I see is irefox, I assume crashing because everything else died
A bunch of stuff in the middle of that stack trace references plasma, cursor, decorations, Qt etc. so I'm wondering if it's related to me clicking the taskbar icon stack twice and triggering something
no idea
do you have a reliable way to reproduce the issue
"Too many open files" happens just BEFORE the crash
and mentions
wp_linux_drm_syncobj_manager_v1.import_timeline
Sadly no, I think every time this specific crash has happened, with "Too many open files", it's been right after I click on the taskbar to change between applications.
And that's likely always a stacked set of identical windows, e.g. a set of terminals.it would be nice if you make a new user account and try to reproduce the issue
third party plasmoids?
I could do that but I'd lose all customization -- may I respectfully ask what might be linked to my accout which could affect things?
No plasma/KDE customization that I'm aware of
I've removed widgets like the media player which had crash reports, disabled the desktop background, minimized as much as I could
ruling out configuration, makes reproduction 100 times easier
The default configuration crashed repeatedly for me, sadly
then it's just bazzite defaults
By default, the taskbar has a media player widget, I have crash logs elsewhere I think
I can go dig it up
rpm-ostree status
It seemed to be a known issue supposed to be replaced in a later Plasma version, with the workaround being, remove the systray media player widget
I'm base Bazzite 42, holding off upgrading until 44 or later
update your system
Due to known issues and me needing the system for work, that may not be feasible until Bazzite 44, but if I do I'll see if I notice the same crash
respectfully, I don't think you know what that means
Can you clarify which part
I have decades of Linux experience, just new to Bazzite, so apologies if I use the wrong terms
Which part you feel I misunderstand, that is.
the fedora version bazzite is based on shouldn't matter, F44 comes out in 6 months
there was a media player widget crash caused by chromium
I meant Bazzite, not Fedora
that was resolved in kde and you get the fix by upgrading
Point is, I will likely wait a week or two to upgrade for the next Bazzite push, not sure yet
Links for my own / others' reference:
https://github.com/NixOS/nixpkgs/issues/442630
https://bugreports.qt.io/browse/QTBUG-140018
https://bugs.kde.org/show_bug.cgi?id=509192
I can confirm Bazzite 42 has Plasma 6.4.5, Qt 6.9.2.
The bugs.kde.or thread says fixed in Plasma 6.4.6 and later, Qt 6.10.1 and later.
I'm not sure which versions Bazzite 43 uses
I'll have to dig up the change log
have you actually upgraded your system and looked if it worked
rpm -qi plasma-desktop qt6-qtbase
should be 6.4.5 and 6.9.3
or are you purely speculating on this
you can rollback if it's broken
I have not tried Bazzite 43 on my system yet, but due to schedule and work needs I wanted to avoid experimenting for a few days unless crashes are too frequent. I'm aware rollbacks exist, yes.
6.4.5 and 6.9.2 in Bazzite 42, also shown in the info screen
https://discord.com/channels/1072614816579063828/1087140957096517672/1434266666325512193
looks like 6.4.5 and 6.9.2 are also the versions in Bazzite 43, per that user's screenshot
So I''ll watch for Plasma 6.4.6 and Qt 6.10.1 becoming available in a Bazzite release, then give those a shot.
yes, I'm not sure if fedora backported the patch
it will report the same version
touche
I'm not aware either, but thanks for your patience. I have to step away.
the next bazzite release will have 6.5 and QT 6.10
it hit fedora repos today
awesome
NVIDIA Developer Forums
Fd leak with explicit sync and kde plasma
I can trivially reproduce this as well, by sending notifications with notify-send . I am on 570.86.16 and KDE Plasma 6.3. The Upstream KDE issue here: 497424 ā fd leak with explicit sync (nvidia) , which was linked in the OP, claims this is an Nvidia driver bug. Please let me know if any other information would be useful
ah heck, this sounds eerily familiar
Yeah, I just tried clicking on and off a stacked Firefox taskbar icon a bunch of times, and promptly saw
lsof -p $(pidof plasmashell) | grep sync_file | wc -l output jump by 100
this seems bad
Looks like nVidia made an internal bug track, suggested that egl-wayland2 fixed things, but someone reported it didn't (for them).
So if anyone wants to test this, you can run lsof -p $(pidof plasmashell) | grep sync_file | wc -l after mousing over a taskbar icon
It seems to add 3 new handles for me, every time I move my mouse over a taskbar icon.
Update: It doesn't seem exact, I see 2-5 new handles each time I mouse over a taskbar icon.
Important thing, you have to wait until the tooltip appears.
I don't have graphical previews enabled, but it still seems to leak descriptors if the tooltip appears.
Alt-tab window previews don't affect the file descriptor count, only the taskbar tooltips
I'm attempting ujust update today due to the new Plasma version. I'll test the taskbar behavior if it succeeds.
nVidia driver version is still 580.95.05
file descriptor leak still exists, I immediately saw 3-5 new descriptors created every time I hover over a taskbar icon
I would still love if anyone with an nVidia card (or even AMD) wants to test the above. The process is still:
1. Run the command.
2. Hover your mouse over a few taskbar icons, waiting each time for a tooltip to appear.
3. Run the command again. Did the file descriptor count increase?
Update: I confirmed that the leaking file descriptors do indeed correspond to
sync_file (see https://www.kernel.org/doc/html/v6.18-rc3/driver-api/sync_file.html). This is a kernel interface so it would indeed be possible for the nVidia driver to be leaking these.
What's curious to me is that they only seem to leak in the very specific situation of a taskbar tooltip popup (or maybe more generally, a plasmoid opening, but I haven't messed around with that much). I wonder what API calls the taskbar popups use that other logic (like alt-tab) doesn't.
Update: Also from the nVidia dev forum thread, I noticed that the leak is more general and happens ANY time a popup is produced from KDE, e.g. by using the command notify-send test
So anyone who often switches applications using the taskbar in KDE with nVidia, or often sees notifications, is presumably going to crash eventually.
I'm surprised this isn't a more common problem, unless people just don't click stacked taskbar icons or get notifications much.mine just stays at 0

nvidia or AMD?
(thanks so much for testing, by the way)
Nvidia 1660TI on Nvidia-open, 43.20251102
Some nvidia rep posted to ask if egl-wayland2 fixes it, but a couple replies said it didn't, so I didn't bother trying.
you're KDE, I assume?
yup

I'm trying to find out how to get the 43.20251102 value you did
i suppose by rebooting or updating manually
I notice one (key?) difference, you have an integrated graphics card. I wonder if KDE is somehow using that for plasmoid stuff.
Since we have the same Bazzite version I assume we have the same nvidia driver version, I didn't layer anything
wait didn't you just say that your image is slightly out of date?
rpm ostree status works
I upgraded to the latest today because of the new Plasma version.
so yeah I'm now 43.20251102yea me too lol
Let me pin an update in case anyone else helpful comes along
seems like it's only running steam and distroshelf on the Nvidia card :ThonkFedora:

This is the original test I'm chasing, that results in a KDE crash over time, just from using the GUI.
https://discord.com/channels/1072614816579063828/1432840141101338654/1434931367069749312
I only have 1 GPU (GTX 1650), and I see the handles constantly increase after a notification.
Another use has an integrated Intel and dedicated nVidia, and they don't see any handles at all.
oh I can't pin lol
nice
@š¾ššššš I don't know how to check if you're using explicit sync, but I assume Bazzite always does by default
Working theory seems to be, this happens due to a nVidia driver bug involving explicit sync. I just had futile hopes I could get some workaround from the KDE side.
Silly workaround for now: Once I notice the open file descriptor count getting dangerously high, like over 600, I can just run
plasmashell --replace & or whatever to start a new instance
Almost all the kernel sync_file descriptors get dropped at that point, now I'm back to 17 open descriptors š
This should at least extend my uptime.
uptime is 4 days so far, with this workaround
I'm still hoping the nVidia rep eventually answers on the forum
I actually got a brief reply from nVidia. Apparently engineering is still working on this bug.