Immich•3mo ago

Can't run machine learning docker image.

I run immich in an arch vm on promox with a 1050 ti passed trough but i tried to run the machine learning container on my main desktop (arch with a 2060) to speed up the process for face recognition but every time i run the machine learning job the container starts giving out errors and it doesn't process anything. I run the latest version of immich on my server and the latest version of the machine learning container on my desktop with the default .env I attacched my docker compose file only for the machine learning container which i use on my desktop. The weird issue is that if i use the same docker compose file directly on the arch vm with the rest off immich it works fine.

compose.yaml

logs.txt

44 Replies

Immich•3mo ago

:wave: Hey @Chicco, Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:. References - Container Logs: docker compose logs docs - Container Status: docker ps -a docs - Reverse Proxy: https://immich.app/docs/administration/reverse-proxy - Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA Checklist I have... 1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time). 2. :ballot_box_with_check: read applicable release notes. 3. :ballot_box_with_check: reviewed the FAQs for known issues. 4. :ballot_box_with_check: reviewed Github for known issues. 5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy). 6. :ballot_box_with_check: uploaded the relevant information (see below). 7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable (an item can be marked as "complete" by reacting with the appropriate number) Information In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider: - Your docker-compose.yml and .env files. - Logs from all the containers and their status (see above). - All the troubleshooting steps you've tried so far. - Any recent changes you've made to Immich or your system. - Details about your system (both software/OS and hardware). - Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h). - The version of the Immich server, mobile app, and other relevant pieces. - Any other information that you think might be relevant. Please paste files and logs with proper code formatting, and especially avoid blurry screenshots. Without the right information we can't work out what the problem is. Help us help you ;) If this ticket can be closed you can use the /close command, and re-open it later if needed.

ChiccoOP•3mo ago

docker ps

CONTAINER ID   IMAGE                                                     COMMAND                  CREATED          STATUS                            PORTS                                         NAMES
5acd59f6fb89   ghcr.io/immich-app/immich-machine-learning:release-cuda   "tini -- ./start.sh"     15 minutes ago   Up 7 seconds (health: starting)   0.0.0.0:2284->3003/tcp, [::]:2284->3003/tcp   immich_machine_learning-cuda

CONTAINER ID   IMAGE                                                     COMMAND                  CREATED          STATUS                            PORTS                                         NAMES
5acd59f6fb89   ghcr.io/immich-app/immich-machine-learning:release-cuda   "tini -- ./start.sh"     15 minutes ago   Up 7 seconds (health: starting)   0.0.0.0:2284->3003/tcp, [::]:2284->3003/tcp   immich_machine_learning-cuda

My gpu is an rtx 2060 12 gb with driver version 565.77 fdisk -l only for used disk not for all of them

Disk /dev/nvme0n1: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: CT2000P3PSSD8
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: DD8DDF2C-B818-4189-8FC7-1400A1EA24DD

Device           Start        End    Sectors  Size Type
/dev/nvme0n1p1    2048    2099199    2097152    1G EFI System
/dev/nvme0n1p2 2099200 3907027119 3904927920  1.8T Linux root (x86-64)

Disk /dev/nvme0n1: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: CT2000P3PSSD8
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: DD8DDF2C-B818-4189-8FC7-1400A1EA24DD

Device           Start        End    Sectors  Size Type
/dev/nvme0n1p1    2048    2099199    2097152    1G EFI System
/dev/nvme0n1p2 2099200 3907027119 3904927920  1.8T Linux root (x86-64)

All the connections to the frontend and the ml container are done on local without any reverse proxy

Immich•3mo ago

Successfully submitted, a tag has been added to inform contributors. :white_check_mark:

Mraedis•3mo ago

How much memory does the container get?

ChiccoOP•3mo ago

I didn't set any memory limits for both ram and storage and i have 32gb and 50 gigs free

Mraedis•3mo ago

Do you happen to have a CPU with an iGPU? Also, what's up with:

    ports:
      - 2284:3003

    ports:
      - 2284:3003

? Last one is totally allowed, just curious

fry•3mo ago

I literally just joined right now and I've had the same issues But I got it to work Ah, I'm blind. I misread. Our setup similarity ends at the 1050ti.

ChiccoOP•3mo ago

yea but its disabled in the bios when i start it this pops up in the log immich_machine_learning-cuda | [02/02/25 17:41:44] INFO Listening at: http://[::]:3003 (9) and if i dont change the port i can't access it

Mraedis•3mo ago

curious, do you have anything else running at 3003?

ChiccoOP•3mo ago

no but its the port in the container so it doesn't matter the ports on the os

Mraedis•3mo ago

Yeah I know anyway what does ls -la /dev/dri/ show on the remote ML?

ChiccoOP•3mo ago

ls -la /dev/dri/ total 0 drwxr-xr-x 3 root root 100 Feb 2 15:44 ./ drwxr-xr-x 23 root root 4.6K Feb 2 18:20 ../ drwxr-xr-x 2 root root 80 Feb 2 15:44 by-path/ crw-rw----+ 1 root video 226, 1 Feb 2 15:44 card1 crw-rw-rw- 1 root render 226, 128 Feb 2 15:44 renderD128

Mraedis•3mo ago

ok one card so definitely the right one 🙃 (brb)

ChiccoOP•3mo ago

cuda is definetly set up correctly on my machine since i run other ml tasks (llms, flux and transformers models in python)

fry•3mo ago

going to leave this thread after this question But is this related to Immich's remote ML?

ChiccoOP•3mo ago

yea for me if i run the ml container on the same host as immich it works fine but remotely on my desktop it doesn't work

fry•3mo ago

I'll give this a try myself. I have a 4060 on my desktop that I can use

ChiccoOP•3mo ago

if you are using wsl there could be some issues from what i know. (never used it) if you manage to find a fix please post it here since i'm trying to avoid doing face detection for 300 gb of photos on a 1050

fry•3mo ago

I'm running Linux on bare metal so it should be no issue

Mraedis•3mo ago

Why did you add

                - compute
                - video

                - compute
                - video

to the compose @Chicco ?

fry•3mo ago

Yeah I've got nothing. It's working perfectly on my end.

ChiccoOP•3mo ago

it was on the hwaccel.transcoding.yaml file on git ill try and remove it

fry•3mo ago

Are you also trying to do the transcoding on the 2060?

Mraedis•3mo ago

Really? How many versions ago 😅

ChiccoOP•3mo ago

i don't know i've been using immich for 2 months if possible yes but its not the main thing

fry•3mo ago

Strange. I'm looking at the docs but I see nothing about transcoding If it's easier, try tearing down that stuff and just focus on the doc process, maybe?

Mraedis•3mo ago

Ah right transcoding and remote ml is definitely not the same thing

fry•3mo ago

Very confusing with how it just works on my machine straight away. I also have an iGPU that I haven't disabled Yeah. Try just sticking to ML. A 1050ti can probably transcode just fine.

Mraedis•3mo ago

No I mean if he took the transcode HWA file then yes video and render need to be there But it's not needed from remote ml

fry•3mo ago

I think that it also breaks things, maybe I literally just installed Immich today, so I'm likely totally wrong I had to wrestle with getting ML/Transcode to work. All of it was just a docker issue.

ChiccoOP•3mo ago

removing the compute and video fixed the issue

ChiccoOP•3mo ago

ChiccoOP•3mo ago

i think that a comment in the hwaccel file for this issue could be added to warn people

fry•3mo ago

Yeah, I think it would be useful.

Mraedis•3mo ago

Well To be fair You just used the wrong file 😛

fry•3mo ago

Is it so wrong for man to want transcoding too 😔

ChiccoOP•3mo ago

why is the facial recognition workflow liimted to 1 job at a time and it can't be increased

Mraedis•3mo ago

Because it groups if you do one at a time then it will do a terrible job Now it basically queues em all and groups them all at once

ChiccoOP•3mo ago

yea but this being limited to only one at a time is weird since if you have enough hardware it would be way faster

Mraedis•3mo ago

No that's not how it works

ChiccoOP•3mo ago

why not splitting the workflow over multiple cores or gpus could be faster ù

Mraedis•3mo ago

Because that's not what is happening here it's basically waiting for all images to be done with face detection and then it will grab all unrecognized faces and try to group them into people It's not processing 1 face at a time

ChiccoOP•3mo ago

yea but face detection is done since that is not limited to one at a time i put 60 at a time and it finished already

Mraedis•3mo ago

There's a good reason it only does 1 at a time

Gaming

Programming

Can't run machine learning docker image.

Did you find this page helpful?