2 days in a row, akmods NVIDIA broken, out of space

2 days in a row, akmods has been having issues running out of space

16 Replies

so... there's not an easy (or any) way to free space on the runner, because we run in a container, thus we can't run anything directly on the host I am starting to look at what actually could have changed to trigger this problem the issue, running out of space, occurs: 1) only on nvidia akmods builds 2) seemingly for all our Fedora kernels (bazzite, main, longterm, coreos) tho CentOS kernels seem fine 3) during the "Test Image" step @M2 may be able to confirm but I believe the "Test Image" (which runs just test in devcontainer) does the bulk of the work in the build... it runs build-prep and test-prep shell scripts it actually builds the akmods kmod binaries makes sure they are signed, and then tests the install of those packages and that the signatures are good really just test will duplicate a lot of the same effort as just build but it uses layer caching so if just test runs first, then just build can run very quickly and the only extra behavior there is to tag the localhost/akmods-image-namehere

M2•4w ago

Yepp exactly

bshermanOP•4w ago

so, where we are running out of space is AFTER the test-prep.sh script runs, and a layer is being preserved/copied back to the container store but i'm not sure why this started 2 days ago there've been no code changes and this is happening both on F41 and F2 so the common elements seem to be: 1) the github runners themselves 2) changes to nvidia packages on negativo17

M2•4w ago

Or could be something with failing to cache hit somehow.

bshermanOP•4w ago

sure, but its very specific to nvidia kmods success and fail did not change based on nvidia driver version... the last good build of CoreOS-stable kernel's nvidia was the same version as the broken ones nvidia-driver 580.82.09-1.fc42 i see no changes in negativo17's nvidia or multimedia repos which could be related along this timeline honestly, i think the github runners just have less space than before though, i'm shocked this image takes 22GB Problem occurs AFTER this RUN completes and the layer is being copied: https://github.com/ublue-os/akmods/blob/main/Containerfile.in#L208 BEFORE COPYing the check-signatures.sh

antheas•4w ago

instead of using a devcontainer, you could have used the test image instead instead of having two containers the pr broke and is not running the checks anymore so, to use fancy just features instead of a simple bash script, you introduce a 3-4gb container image and block the maximize space script and its rust too, so we cant pull it from github because it will take an eternity to compile

ledif•4w ago

Are you talking about this PR https://github.com/ublue-os/akmods/pull/407 ? It seems to contain no changes, but it also looks like you're removing stuff to test

GitHub

fix: add maximize build space for nvidia by antheas · Pull Request...

antheas•4w ago

i switched to a branch and im doing manual runs i think i figured it out how to buy us some time at least

antheas•4w ago

https://github.com/ublue-os/akmods/pull/408

GitHub

fix: create some empty space by removing weak depts by antheas · P...

antheas•4w ago

this fixes it i ran the bazzite nvidia kmod and it worked

ledif•4w ago

Ah, I wonder what weakdeps it was pulling in to push it past the limit

antheas•4w ago

i dont know what happened but it was borderline some firmware and stuff got removed and composefs pipewire and mesa too do some acks and let get this over with

ledif•4w ago

LGTM. Just needs another ack

antheas•4w ago

@Kyle Gospo ack it or I'll yolo it

bshermanOP•3w ago

Thank you for diving in. I had overlooked the weak deps. That’s an important change in this context regardless of any other long term questions. Also thank you for diving in. A lot of real life hit yesterday and I wasn’t able to get back to this.

antheas•3w ago

Gaming

Programming

2 days in a row, akmods NVIDIA broken, out of space

Did you find this page helpful?