Can't run machine learning docker image.
I run immich in an arch vm on promox with a 1050 ti passed trough but i tried to run the machine learning container on my main desktop (arch with a 2060) to speed up the process for face recognition but every time i run the machine learning job the container starts giving out errors and it doesn't process anything. I run the latest version of immich on my server and the latest version of the machine learning container on my desktop with the default .env
I attacched my docker compose file only for the machine learning container which i use on my desktop. The weird issue is that if i use the same docker compose file directly on the arch vm with the rest off immich it works fine.
44 Replies
:wave: Hey @Chicco,
Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:.
References
- Container Logs:
docker compose logs
docs
- Container Status: docker ps -a
docs
- Reverse Proxy: https://immich.app/docs/administration/reverse-proxy
- Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA
Checklist
I have...
1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time).
2. :ballot_box_with_check: read applicable release notes.
3. :ballot_box_with_check: reviewed the FAQs for known issues.
4. :ballot_box_with_check: reviewed Github for known issues.
5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy).
6. :ballot_box_with_check: uploaded the relevant information (see below).
7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable
(an item can be marked as "complete" by reacting with the appropriate number)
Information
In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider:
- Your docker-compose.yml and .env files.
- Logs from all the containers and their status (see above).
- All the troubleshooting steps you've tried so far.
- Any recent changes you've made to Immich or your system.
- Details about your system (both software/OS and hardware).
- Details about your storage (filesystems, type of disks, output of commands like fdisk -l
and df -h
).
- The version of the Immich server, mobile app, and other relevant pieces.
- Any other information that you think might be relevant.
Please paste files and logs with proper code formatting, and especially avoid blurry screenshots.
Without the right information we can't work out what the problem is. Help us help you ;)
If this ticket can be closed you can use the /close
command, and re-open it later if needed.docker ps
My gpu is an rtx 2060 12 gb with driver version 565.77
fdisk -l
only for used disk not for all of them
All the connections to the frontend and the ml container are done on local without any reverse proxySuccessfully submitted, a tag has been added to inform contributors. :white_check_mark:
How much memory does the container get?
I didn't set any memory limits for both ram and storage and i have 32gb and 50 gigs free
Do you happen to have a CPU with an iGPU?
Also, what's up with:
?
Last one is totally allowed, just curious
I literally just joined right now and I've had the same issues
But I got it to work
Ah, I'm blind. I misread. Our setup similarity ends at the 1050ti.
yea but its disabled in the bios
when i start it this pops up in the log immich_machine_learning-cuda | [02/02/25 17:41:44] INFO Listening at: http://[::]:3003 (9)
and if i dont change the port i can't access it
curious, do you have anything else running at 3003?
no
but its the port in the container so it doesn't matter the ports on the os
Yeah I know
anyway
what does
ls -la /dev/dri/
show on the remote ML?ls -la /dev/dri/
total 0
drwxr-xr-x 3 root root 100 Feb 2 15:44 ./
drwxr-xr-x 23 root root 4.6K Feb 2 18:20 ../
drwxr-xr-x 2 root root 80 Feb 2 15:44 by-path/
crw-rw----+ 1 root video 226, 1 Feb 2 15:44 card1
crw-rw-rw- 1 root render 226, 128 Feb 2 15:44 renderD128
ok one card so definitely the right one π
(brb)
cuda is definetly set up correctly on my machine since i run other ml tasks (llms, flux and transformers models in python)
going to leave this thread after this question
But is this related to Immich's remote ML?
yea for me if i run the ml container on the same host as immich it works fine but remotely on my desktop it doesn't work
I'll give this a try myself. I have a 4060 on my desktop that I can use
if you are using wsl there could be some issues from what i know. (never used it)
if you manage to find a fix please post it here since i'm trying to avoid doing face detection for 300 gb of photos on a 1050
I'm running Linux on bare metal so it should be no issue
Why did you add
to the compose @Chicco ?
Yeah I've got nothing. It's working perfectly on my end.
it was on the hwaccel.transcoding.yaml file on git
ill try and remove it
Are you also trying to do the transcoding on the 2060?
Really? How many versions ago π
i don't know i've been using immich for 2 months
if possible yes but its not the main thing
Strange. I'm looking at the docs but I see nothing about transcoding
If it's easier, try tearing down that stuff and just focus on the doc process, maybe?
Ah right transcoding and remote ml is definitely not the same thing
Very confusing with how it just works on my machine straight away. I also have an iGPU that I haven't disabled
Yeah. Try just sticking to ML.
A 1050ti can probably transcode just fine.
No I mean if he took the transcode HWA file then yes video and render need to be there
But it's not needed from remote ml
I think that it also breaks things, maybe
I literally just installed Immich today, so I'm likely totally wrong
I had to wrestle with getting ML/Transcode to work. All of it was just a docker issue.
removing the compute and video fixed the issue

i think that a comment in the hwaccel file for this issue could be added to warn people
Yeah, I think it would be useful.
Well
To be fair
You just used the wrong file π
Is it so wrong for man to want transcoding too π
why is the facial recognition workflow liimted to 1 job at a time and it can't be increased
Because it groups
if you do one at a time then it will do a terrible job
Now it basically queues em all and groups them all at once
yea but this being limited to only one at a time is weird since if you have enough hardware it would be way faster

No that's not how it works
why not splitting the workflow over multiple cores or gpus could be faster ΓΉ
Because that's not what is happening here
it's basically waiting for all images to be done with face detection
and then it will grab all unrecognized faces and try to group them into people
It's not processing 1 face at a time
yea but face detection is done
since that is not limited to one at a time i put 60 at a time and it finished already
There's a good reason it only does 1 at a time