I
Immichβ€’3mo ago
Chicco

Can't run machine learning docker image.

I run immich in an arch vm on promox with a 1050 ti passed trough but i tried to run the machine learning container on my main desktop (arch with a 2060) to speed up the process for face recognition but every time i run the machine learning job the container starts giving out errors and it doesn't process anything. I run the latest version of immich on my server and the latest version of the machine learning container on my desktop with the default .env I attacched my docker compose file only for the machine learning container which i use on my desktop. The weird issue is that if i use the same docker compose file directly on the arch vm with the rest off immich it works fine.
44 Replies
Immich
Immichβ€’3mo ago
:wave: Hey @Chicco, Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:. References - Container Logs: docker compose logs docs - Container Status: docker ps -a docs - Reverse Proxy: https://immich.app/docs/administration/reverse-proxy - Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA Checklist I have... 1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time). 2. :ballot_box_with_check: read applicable release notes. 3. :ballot_box_with_check: reviewed the FAQs for known issues. 4. :ballot_box_with_check: reviewed Github for known issues. 5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy). 6. :ballot_box_with_check: uploaded the relevant information (see below). 7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable (an item can be marked as "complete" by reacting with the appropriate number) Information In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider: - Your docker-compose.yml and .env files. - Logs from all the containers and their status (see above). - All the troubleshooting steps you've tried so far. - Any recent changes you've made to Immich or your system. - Details about your system (both software/OS and hardware). - Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h). - The version of the Immich server, mobile app, and other relevant pieces. - Any other information that you think might be relevant. Please paste files and logs with proper code formatting, and especially avoid blurry screenshots. Without the right information we can't work out what the problem is. Help us help you ;) If this ticket can be closed you can use the /close command, and re-open it later if needed.
Chicco
ChiccoOPβ€’3mo ago
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5acd59f6fb89 ghcr.io/immich-app/immich-machine-learning:release-cuda "tini -- ./start.sh" 15 minutes ago Up 7 seconds (health: starting) 0.0.0.0:2284->3003/tcp, [::]:2284->3003/tcp immich_machine_learning-cuda
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5acd59f6fb89 ghcr.io/immich-app/immich-machine-learning:release-cuda "tini -- ./start.sh" 15 minutes ago Up 7 seconds (health: starting) 0.0.0.0:2284->3003/tcp, [::]:2284->3003/tcp immich_machine_learning-cuda
My gpu is an rtx 2060 12 gb with driver version 565.77 fdisk -l only for used disk not for all of them
Disk /dev/nvme0n1: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: CT2000P3PSSD8
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: DD8DDF2C-B818-4189-8FC7-1400A1EA24DD

Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 2099199 2097152 1G EFI System
/dev/nvme0n1p2 2099200 3907027119 3904927920 1.8T Linux root (x86-64)
Disk /dev/nvme0n1: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: CT2000P3PSSD8
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: DD8DDF2C-B818-4189-8FC7-1400A1EA24DD

Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 2099199 2097152 1G EFI System
/dev/nvme0n1p2 2099200 3907027119 3904927920 1.8T Linux root (x86-64)
All the connections to the frontend and the ml container are done on local without any reverse proxy
Immich
Immichβ€’3mo ago
Successfully submitted, a tag has been added to inform contributors. :white_check_mark:
Mraedis
Mraedisβ€’3mo ago
How much memory does the container get?
Chicco
ChiccoOPβ€’3mo ago
I didn't set any memory limits for both ram and storage and i have 32gb and 50 gigs free
Mraedis
Mraedisβ€’3mo ago
Do you happen to have a CPU with an iGPU? Also, what's up with:
ports:
- 2284:3003
ports:
- 2284:3003
? Last one is totally allowed, just curious
fry
fryβ€’3mo ago
I literally just joined right now and I've had the same issues But I got it to work Ah, I'm blind. I misread. Our setup similarity ends at the 1050ti.
Chicco
ChiccoOPβ€’3mo ago
yea but its disabled in the bios when i start it this pops up in the log immich_machine_learning-cuda | [02/02/25 17:41:44] INFO Listening at: http://[::]:3003 (9) and if i dont change the port i can't access it
Mraedis
Mraedisβ€’3mo ago
curious, do you have anything else running at 3003?
Chicco
ChiccoOPβ€’3mo ago
no but its the port in the container so it doesn't matter the ports on the os
Mraedis
Mraedisβ€’3mo ago
Yeah I know anyway what does ls -la /dev/dri/ show on the remote ML?
Chicco
ChiccoOPβ€’3mo ago
ls -la /dev/dri/ total 0 drwxr-xr-x 3 root root 100 Feb 2 15:44 ./ drwxr-xr-x 23 root root 4.6K Feb 2 18:20 ../ drwxr-xr-x 2 root root 80 Feb 2 15:44 by-path/ crw-rw----+ 1 root video 226, 1 Feb 2 15:44 card1 crw-rw-rw- 1 root render 226, 128 Feb 2 15:44 renderD128
Mraedis
Mraedisβ€’3mo ago
ok one card so definitely the right one πŸ™ƒ (brb)
Chicco
ChiccoOPβ€’3mo ago
cuda is definetly set up correctly on my machine since i run other ml tasks (llms, flux and transformers models in python)
fry
fryβ€’3mo ago
going to leave this thread after this question But is this related to Immich's remote ML?
Chicco
ChiccoOPβ€’3mo ago
yea for me if i run the ml container on the same host as immich it works fine but remotely on my desktop it doesn't work
fry
fryβ€’3mo ago
I'll give this a try myself. I have a 4060 on my desktop that I can use
Chicco
ChiccoOPβ€’3mo ago
if you are using wsl there could be some issues from what i know. (never used it) if you manage to find a fix please post it here since i'm trying to avoid doing face detection for 300 gb of photos on a 1050
fry
fryβ€’3mo ago
I'm running Linux on bare metal so it should be no issue
Mraedis
Mraedisβ€’3mo ago
Why did you add
- compute
- video
- compute
- video
to the compose @Chicco ?
fry
fryβ€’3mo ago
Yeah I've got nothing. It's working perfectly on my end.
Chicco
ChiccoOPβ€’3mo ago
it was on the hwaccel.transcoding.yaml file on git ill try and remove it
fry
fryβ€’3mo ago
Are you also trying to do the transcoding on the 2060?
Mraedis
Mraedisβ€’3mo ago
Really? How many versions ago πŸ˜…
Chicco
ChiccoOPβ€’3mo ago
i don't know i've been using immich for 2 months if possible yes but its not the main thing
fry
fryβ€’3mo ago
Strange. I'm looking at the docs but I see nothing about transcoding If it's easier, try tearing down that stuff and just focus on the doc process, maybe?
Mraedis
Mraedisβ€’3mo ago
Ah right transcoding and remote ml is definitely not the same thing
fry
fryβ€’3mo ago
Very confusing with how it just works on my machine straight away. I also have an iGPU that I haven't disabled Yeah. Try just sticking to ML. A 1050ti can probably transcode just fine.
Mraedis
Mraedisβ€’3mo ago
No I mean if he took the transcode HWA file then yes video and render need to be there But it's not needed from remote ml
fry
fryβ€’3mo ago
I think that it also breaks things, maybe I literally just installed Immich today, so I'm likely totally wrong I had to wrestle with getting ML/Transcode to work. All of it was just a docker issue.
Chicco
ChiccoOPβ€’3mo ago
removing the compute and video fixed the issue
Chicco
ChiccoOPβ€’3mo ago
No description
Chicco
ChiccoOPβ€’3mo ago
i think that a comment in the hwaccel file for this issue could be added to warn people
fry
fryβ€’3mo ago
Yeah, I think it would be useful.
Mraedis
Mraedisβ€’3mo ago
Well To be fair You just used the wrong file πŸ˜›
fry
fryβ€’3mo ago
Is it so wrong for man to want transcoding too πŸ˜”
Chicco
ChiccoOPβ€’3mo ago
why is the facial recognition workflow liimted to 1 job at a time and it can't be increased
Mraedis
Mraedisβ€’3mo ago
Because it groups if you do one at a time then it will do a terrible job Now it basically queues em all and groups them all at once
Chicco
ChiccoOPβ€’3mo ago
yea but this being limited to only one at a time is weird since if you have enough hardware it would be way faster
No description
Mraedis
Mraedisβ€’3mo ago
No that's not how it works
Chicco
ChiccoOPβ€’3mo ago
why not splitting the workflow over multiple cores or gpus could be faster ΓΉ
Mraedis
Mraedisβ€’3mo ago
Because that's not what is happening here it's basically waiting for all images to be done with face detection and then it will grab all unrecognized faces and try to group them into people It's not processing 1 face at a time
Chicco
ChiccoOPβ€’3mo ago
yea but face detection is done since that is not limited to one at a time i put 60 at a time and it finished already
Mraedis
Mraedisβ€’3mo ago
There's a good reason it only does 1 at a time

Did you find this page helpful?