Progress on Arc Battlemage?
I'm using Immich in Docker on Arch with an Intel Arc B580.
When attempting to use the OpenVino variant of the immich_machine_learning container, attempting to execute any ML function yields:
I have tried different models with no improvement.
On the Immich subreddit I queried this and was told that there was an upstream update required for this to work, but no further information was provided.
That was approx 4 months ago.
What upstream is broken?
What is required for the upstream to fix this?
Has there been any movement on this at all?
compose data to follow in comments.
24 Replies
:wave: Hey @Somedumbwanker,
Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:.
References
- Container Logs:
docker compose logs docs
- Container Status: docker ps -a docs
- Reverse Proxy: https://immich.app/docs/administration/reverse-proxy
- Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA
Checklist
I have...
1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time).
2. :ballot_box_with_check: read applicable release notes.
3. :ballot_box_with_check: reviewed the FAQs for known issues.
4. :ballot_box_with_check: reviewed Github for known issues.
5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy).
6. :ballot_box_with_check: uploaded the relevant information (see below).
7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable
(an item can be marked as "complete" by reacting with the appropriate number)
Information
In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider:
- Your docker-compose.yml and .env files.
- Logs from all the containers and their status (see above).
- All the troubleshooting steps you've tried so far.
- Any recent changes you've made to Immich or your system.
- Details about your system (both software/OS and hardware).
- Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h).
- The version of the Immich server, mobile app, and other relevant pieces.
- Any other information that you think might be relevant.
Please paste files and logs with proper code formatting, and especially avoid blurry screenshots.
Without the right information we can't work out what the problem is. Help us help you ;)
If this ticket can be closed you can use the /close command, and re-open it later if needed.Compose data and hwaccel.ml.yml files
Successfully submitted, a tag has been added to inform contributors. :white_check_mark:
@Somedumbwanker I started to work on it few days ago: https://github.com/immich-app/immich/pull/23458
Would be nice if you can test it!
I would be more than happy to give it a shot.
I'm currently running immich-ml as a docker container from
ghcr.io/immich-app/immich-machine-learning:release-openvino
Is it fair to assume running this from your repo would probably require building a new container?
I'm not familiar with building containers from git etc, would you mind giving me a quick rundown?
My knowledge of docker is largely in spinning up and updating containers that others have built using docker compose.
Do I just check out the repo, and docker build in the machine-learning subdir, and docker run using the same data as my typical docker-compose?No, it already builds an image just like the main repo:
Oh, awesome. Thanks, pulled new image.
I can confirm I'm seeing high GPU and GPU mem utlization in
nvtop, one of the processes being python -m gunicorn immich_ml.main:app ....
Though, oddly, it seems like something in the container is falling back to CPU, unless I'm, reading something wrong.
Is this expected?
Full log attachedYes, it's fine, I will fix it soon. It's not critical.
Great. It's massive progress regardless, and I'm glad to see it moving.
In our tests it seems that
xe driver has a memory leak right now.
Would be nice if you confirm it (or not).
It should be noticable after around 5k of assets processed (using OCR model).Ok, I'm at ~3k of OCR at the moment. Am I just looking for a spike in GPU mem consumption, or ... ?
We tested with Arc Pro B50 and Intel Core 155H.
Yes, VRAM usage will gradually go up and then so called GPU hangs could start. You would see something strange is going on in
sudo dmesg.Should have an answer in the next 10m.
You can check VRAM usage using new
gputop utility. It should work for xe.
gputop should be a part of igt-gpu-tools system package.Been using nvtop which is reporting mem fine for me, I'll pull down gputop regardless.
Ok, I can see preliminary indication of a mem leak, I've gone from ~42% to 51% gradually with no sign of dropping.
Nothing in dmesg.
nvtop should also work, yeah.Definitely seeing the memory leak. Seems to jump in 2-300Mb increments periodically.
My total mem usage is at 75% at the moment, 64% in immich_ml.
I might not have enough assets for it to eat enough to cause a crash.. ~6k to go.
It's probably
xe or compute-runtime problem, I am working on report it properly to upstream devs, but I hope my PR will end up merged anyway.Ok, I did max out mem and die.
Got this in dmesg:
And lots of attached in Docker.
Yes, as expected unfortunately! Thanks for confirming!
No worries.
Also, the
devcoredump/data file is empty.The tests you did @Somedumbwanker was with a B580 ? You seem to be doing some work to make it work with the Battlemage GPUs ?
I'm not doing the work, just following somebody else's work. 🙂 but yes - B580. @Savely Krasovsky is the man with a plan here.