We just found an installation corruption error. Information: We base our images on ublue-os images. We deploy hardened_malloc. One variant (cosmic) has a permanent corruption issue. ostree is unable to do any image layering operations. The ostree filesystem also encounters errors (ostree fsck). We currently believe that the error is at ostree or rechunks. Our hardened_malloc seems to find this error but it is possible that more images are affected by this. I am unable to pinpoint the error. "ostree: fatal allocator error: detected write after free"
Describe the bug On the cosmic-main-hardened and cosmic-main-userns-hardened images (and probably nvidia images too), rpm-ostree can't write new deployments. Attempting to install or uninstall ...
The command: "sudo ostree fsck" finds issues in layers. "sudo ostree --delete" is unable to repair them. It may have nothing to do with the issue. Maybe just circumstance.
Please explain. I am unsure what exactly I should do. (Just FYI I need to reinstall to be able to debug this further, tried to repair it and may have caused more issues in ostree. Just to be able to pinpoint it I would need to reinstall to make sure that I do not find other issues that I caused myself.)
Yes, that is what I also believe. My working theory is that ostree + rechunks have one bug. Our malloc finds this issue. Nobody else sees it because everyone else uses the glic malloc instead of hardened_malloc. It is possible that we are affected because our malloc does not like memory corruption. It causes rpm ostree to crash. But that may also mean that ostree or rechunks have an issue that nobody else sees.
I do not believe that hardened_malloc is the issue. Because our non rechunked images work fine. And hardened_malloc just enforces certain malloc related rules.
Describe the bug On the cosmic-main-hardened and cosmic-main-userns-hardened images (and probably nvidia images too), rpm-ostree can't write new deployments. Attempting to install or uninstall ...
You may need to wait until a new image is build to see the issue for yourself. If you rebase now you can not upgrade to a newer version. And the issue is only caused in rpm-ostree operations with hardened_malloc. You may not see any issue before the conditions are met. Conditions: rechunks + hardened_malloc + sys upgrade available