I
Immich•3mo ago
Somedumbwanker

Progress on Arc Battlemage?

I'm using Immich in Docker on Arch with an Intel Arc B580. When attempting to use the OpenVino variant of the immich_machine_learning container, attempting to execute any ML function yields:
[08/19/25 09:51:17] INFO Loading visual model
'ViT-H-14-378-quickgelu__dfn5b' to memory
Abort was called at 1350 line in file:
../../neo/shared/source/os_interface/linux/drm_neo.cpp
[08/19/25 09:51:19] ERROR Worker (pid:9) was sent code 134!
[08/19/25 09:51:17] INFO Loading visual model
'ViT-H-14-378-quickgelu__dfn5b' to memory
Abort was called at 1350 line in file:
../../neo/shared/source/os_interface/linux/drm_neo.cpp
[08/19/25 09:51:19] ERROR Worker (pid:9) was sent code 134!
I have tried different models with no improvement. On the Immich subreddit I queried this and was told that there was an upstream update required for this to work, but no further information was provided. That was approx 4 months ago. What upstream is broken? What is required for the upstream to fix this? Has there been any movement on this at all? compose data to follow in comments.
24 Replies
Immich
Immich•3mo ago
:wave: Hey @Somedumbwanker, Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:. References - Container Logs: docker compose logs docs - Container Status: docker ps -a docs - Reverse Proxy: https://immich.app/docs/administration/reverse-proxy - Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA Checklist I have... 1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time). 2. :ballot_box_with_check: read applicable release notes. 3. :ballot_box_with_check: reviewed the FAQs for known issues. 4. :ballot_box_with_check: reviewed Github for known issues. 5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy). 6. :ballot_box_with_check: uploaded the relevant information (see below). 7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable (an item can be marked as "complete" by reacting with the appropriate number) Information In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider: - Your docker-compose.yml and .env files. - Logs from all the containers and their status (see above). - All the troubleshooting steps you've tried so far. - Any recent changes you've made to Immich or your system. - Details about your system (both software/OS and hardware). - Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h). - The version of the Immich server, mobile app, and other relevant pieces. - Any other information that you think might be relevant. Please paste files and logs with proper code formatting, and especially avoid blurry screenshots. Without the right information we can't work out what the problem is. Help us help you ;) If this ticket can be closed you can use the /close command, and re-open it later if needed.
Somedumbwanker
SomedumbwankerOP•3mo ago
Compose data and hwaccel.ml.yml files
Immich
Immich•3mo ago
Successfully submitted, a tag has been added to inform contributors. :white_check_mark:
Savely Krasovsky
Savely Krasovsky•2w ago
@Somedumbwanker I started to work on it few days ago: https://github.com/immich-app/immich/pull/23458 Would be nice if you can test it!
Immich
Immich•2w ago
[Pull Request] feat(ml): update ONNX Runtime, OpenVINO and ROCm stack (immich-app/immich#23458)
Somedumbwanker
SomedumbwankerOP•2w ago
I would be more than happy to give it a shot. I'm currently running immich-ml as a docker container from ghcr.io/immich-app/immich-machine-learning:release-openvino Is it fair to assume running this from your repo would probably require building a new container? I'm not familiar with building containers from git etc, would you mind giving me a quick rundown? My knowledge of docker is largely in spinning up and updating containers that others have built using docker compose. Do I just check out the repo, and docker build in the machine-learning subdir, and docker run using the same data as my typical docker-compose?
Savely Krasovsky
Savely Krasovsky•2w ago
No, it already builds an image just like the main repo:
docker pull ghcr.io/savely-krasovsky/immich-machine-learning:main-openvino
docker pull ghcr.io/savely-krasovsky/immich-machine-learning:main-openvino
Somedumbwanker
SomedumbwankerOP•2w ago
Oh, awesome. Thanks, pulled new image. I can confirm I'm seeing high GPU and GPU mem utlization in nvtop, one of the processes being python -m gunicorn immich_ml.main:app .... Though, oddly, it seems like something in the container is falling back to CPU, unless I'm, reading something wrong.
2025-11-05 21:58:41.217754298 [W:onnxruntime:Default, openvino_provider_factory.cc:260 operator()] Unsupported device key: GPU.0. Skipping entry.
2025-11-05 21:58:41.217754298 [W:onnxruntime:Default, openvino_provider_factory.cc:260 operator()] Unsupported device key: GPU.0. Skipping entry.
Is this expected? Full log attached
Savely Krasovsky
Savely Krasovsky•2w ago
Yes, it's fine, I will fix it soon. It's not critical.
Somedumbwanker
SomedumbwankerOP•2w ago
Great. It's massive progress regardless, and I'm glad to see it moving.
Savely Krasovsky
Savely Krasovsky•2w ago
In our tests it seems that xe driver has a memory leak right now. Would be nice if you confirm it (or not). It should be noticable after around 5k of assets processed (using OCR model).
Somedumbwanker
SomedumbwankerOP•2w ago
Ok, I'm at ~3k of OCR at the moment. Am I just looking for a spike in GPU mem consumption, or ... ?
Savely Krasovsky
Savely Krasovsky•2w ago
We tested with Arc Pro B50 and Intel Core 155H. Yes, VRAM usage will gradually go up and then so called GPU hangs could start. You would see something strange is going on in sudo dmesg.
Somedumbwanker
SomedumbwankerOP•2w ago
Should have an answer in the next 10m.
Savely Krasovsky
Savely Krasovsky•2w ago
You can check VRAM usage using new gputop utility. It should work for xe. gputop should be a part of igt-gpu-tools system package.
Somedumbwanker
SomedumbwankerOP•2w ago
Been using nvtop which is reporting mem fine for me, I'll pull down gputop regardless. Ok, I can see preliminary indication of a mem leak, I've gone from ~42% to 51% gradually with no sign of dropping. Nothing in dmesg.
Savely Krasovsky
Savely Krasovsky•2w ago
nvtop should also work, yeah.
Somedumbwanker
SomedumbwankerOP•2w ago
Definitely seeing the memory leak. Seems to jump in 2-300Mb increments periodically. My total mem usage is at 75% at the moment, 64% in immich_ml. I might not have enough assets for it to eat enough to cause a crash.. ~6k to go.
Savely Krasovsky
Savely Krasovsky•2w ago
It's probably xe or compute-runtime problem, I am working on report it properly to upstream devs, but I hope my PR will end up merged anyway.
Somedumbwanker
SomedumbwankerOP•2w ago
Ok, I did max out mem and die. Got this in dmesg:
[190488.299530] xe 0000:2f:00.0: [drm] GT0: Engine reset: engine_class=ccs, logical_mask: 0x1, guc_id=24
[190488.351408] xe 0000:2f:00.0: [drm] Xe device coredump has been created
[190488.351413] xe 0000:2f:00.0: [drm] Check your /sys/class/drm/card0/device/devcoredump/data
[190488.299530] xe 0000:2f:00.0: [drm] GT0: Engine reset: engine_class=ccs, logical_mask: 0x1, guc_id=24
[190488.351408] xe 0000:2f:00.0: [drm] Xe device coredump has been created
[190488.351413] xe 0000:2f:00.0: [drm] Check your /sys/class/drm/card0/device/devcoredump/data
And lots of attached in Docker.
Savely Krasovsky
Savely Krasovsky•2w ago
Yes, as expected unfortunately! Thanks for confirming!
Somedumbwanker
SomedumbwankerOP•2w ago
No worries. Also, the devcoredump/data file is empty.
Grif
Grif•2w ago
The tests you did @Somedumbwanker was with a B580 ? You seem to be doing some work to make it work with the Battlemage GPUs ?
Somedumbwanker
SomedumbwankerOP•2w ago
I'm not doing the work, just following somebody else's work. 🙂 but yes - B580. @Savely Krasovsky is the man with a plan here.

Did you find this page helpful?