Immich•4mo ago

No Smart Search

- OS: Rocky9 - Deployment: Docker Compose - Immich Version: v1.121.0 & 1.134.0 - HW: - AMD 3600 - ASUS B550 mobo - Gigabyte 1080ti - Several TB free storage - Reverse Proxy: SWAG I was previously running immich v1.121.0 when I noticed that my smart search was no longer working. I decided I likely just needed an update, and updated to v1.134.0 and checked for updates to the docker-compose.yml & supporting files. I can't remember if I checked the logs before updating to the newer version, but I'm currently seeing the error(s). From what I can gather, I think the models are self contained in the ML image. Which ruled out my first thought that I may be blocking the location hosting the models. I've also tried to open port 3003 on the ML container and that didn't help. Any ideas?

immich | [Nest] 17  - 06/13/2025, 12:40:14 AM    WARN [Api:MachineLearningRepository~0lkhpjdx] Machine learning request to "http://immich-machine-learning:3003" failed: fetch failed
immich | [Nest] 17  - 06/13/2025, 12:40:14 AM   ERROR [Api:ErrorInterceptor~0lkhpjdx] Unknown error: Error: Machine learning request '{"clip":{"textual":{"modelName":"ViT-B-32__openai","options":{"language":"en-US"}}}}' failed for all URLs

immich | [Nest] 17  - 06/13/2025, 12:40:14 AM    WARN [Api:MachineLearningRepository~0lkhpjdx] Machine learning request to "http://immich-machine-learning:3003" failed: fetch failed
immich | [Nest] 17  - 06/13/2025, 12:40:14 AM   ERROR [Api:ErrorInterceptor~0lkhpjdx] Unknown error: Error: Machine learning request '{"clip":{"textual":{"modelName":"ViT-B-32__openai","options":{"language":"en-US"}}}}' failed for all URLs

14 Replies

Immich•4mo ago

:wave: Hey @brconn, Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:. References - Container Logs: docker compose logs docs - Container Status: docker ps -a docs - Reverse Proxy: https://immich.app/docs/administration/reverse-proxy - Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA Checklist I have... 1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time). 2. :ballot_box_with_check: read applicable release notes. 3. :ballot_box_with_check: reviewed the FAQs for known issues. 4. :ballot_box_with_check: reviewed Github for known issues. 5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy). 6. :ballot_box_with_check: uploaded the relevant information (see below). 7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable (an item can be marked as "complete" by reacting with the appropriate number) Information In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider: - Your docker-compose.yml and .env files. - Logs from all the containers and their status (see above). - All the troubleshooting steps you've tried so far. - Any recent changes you've made to Immich or your system. - Details about your system (both software/OS and hardware). - Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h). - The version of the Immich server, mobile app, and other relevant pieces. - Any other information that you think might be relevant. Please paste files and logs with proper code formatting, and especially avoid blurry screenshots. Without the right information we can't work out what the problem is. Help us help you ;) If this ticket can be closed you can use the /close command, and re-open it later if needed.

brconnOP•4mo ago

IMMICH_VERSION=v1.134.0

immich-server:
    container_name: immich
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION}
    deploy:
      replicas: 1
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu
                - compute
                - video
    env_file:
      - .env
    ports:
      - 2283:2283
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    restart: unless-stopped
    depends_on:
      - redis
      - database

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION}-cuda
    deploy:
      replicas: 1
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu
    env_file:
      - .env
    volumes:
      - model-cache:/cache
    restart: unless-stopped

  redis:
    container_name: immich_redis
    image: docker.io/valkey/valkey:8-bookworm@sha256:a19bebed6a91bd5e6e2106fef015f9602a3392deeb7c9ed47548378dcee3dfc2
    healthcheck:
      test: redis-cli ping || exit 1
    deploy:
      replicas: 1
    restart: unless-stopped

  database:
    container_name: immich_postgres
    image: ghcr.io/immich-app/postgres:14-vectorchord0.3.0-pgvectors0.3.0
    deploy:
      replicas: 1
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
      DB_STORAGE_TYPE: 'HDD'
    volumes:
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    restart: unless-stopped

immich-server:
    container_name: immich
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION}
    deploy:
      replicas: 1
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu
                - compute
                - video
    env_file:
      - .env
    ports:
      - 2283:2283
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    restart: unless-stopped
    depends_on:
      - redis
      - database

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION}-cuda
    deploy:
      replicas: 1
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu
    env_file:
      - .env
    volumes:
      - model-cache:/cache
    restart: unless-stopped

  redis:
    container_name: immich_redis
    image: docker.io/valkey/valkey:8-bookworm@sha256:a19bebed6a91bd5e6e2106fef015f9602a3392deeb7c9ed47548378dcee3dfc2
    healthcheck:
      test: redis-cli ping || exit 1
    deploy:
      replicas: 1
    restart: unless-stopped

  database:
    container_name: immich_postgres
    image: ghcr.io/immich-app/postgres:14-vectorchord0.3.0-pgvectors0.3.0
    deploy:
      replicas: 1
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
      DB_STORAGE_TYPE: 'HDD'
    volumes:
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    restart: unless-stopped

This has occurred with the existing model-cache as well as a new & fresh model-cache

brconnOP•4mo ago

immich-ml.logs

immich-server.logs

brconnOP•4mo ago

FWIW, when drop -cuda from the ML image it works great

bo0tzz•4mo ago

ERROR Worker (pid:9) was sent code 139!

It's segfaulting What driver version are you running?

brconnOP•4mo ago

NVIDIA-SMI 550.100 Driver Version: 575.57.08 CUDA Version: 12.4 Running on a 1080ti which should have compute capability 6.1

bo0tzz•4mo ago

And do you meet the other requirements from https://immich.app/docs/features/ml-hardware-acceleration#cuda ?

brconnOP•4mo ago

I believe so

dnf list installed | grep nvidia-container
libnvidia-container-tools.x86_64               1.17.8-1                            @cuda-rhel9-x86_64                             
libnvidia-container1.x86_64                    1.17.8-1                            @cuda-rhel9-x86_64                             
nvidia-container-toolkit.x86_64                1.17.8-1                            @cuda-rhel9-x86_64                             
nvidia-container-toolkit-base.x86_64           1.17.8-1                            @cuda-rhel9-x86_64

dnf list installed | grep nvidia-container
libnvidia-container-tools.x86_64               1.17.8-1                            @cuda-rhel9-x86_64                             
libnvidia-container1.x86_64                    1.17.8-1                            @cuda-rhel9-x86_64                             
nvidia-container-toolkit.x86_64                1.17.8-1                            @cuda-rhel9-x86_64                             
nvidia-container-toolkit-base.x86_64           1.17.8-1                            @cuda-rhel9-x86_64

bo0tzz•4mo ago

@sogan any idea?

brconnOP•4mo ago

dnf list installed | grep nvidia
kmod-nvidia-latest-dkms.x86_64                 3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
libnvidia-container-tools.x86_64               1.17.8-1                            @cuda-rhel9-x86_64                             
libnvidia-container1.x86_64                    1.17.8-1                            @cuda-rhel9-x86_64                             
libnvidia-gpucomp.x86_64                       3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
libnvidia-ml.x86_64                            3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
nvidia-container-toolkit.x86_64                1.17.8-1                            @cuda-rhel9-x86_64                             
nvidia-container-toolkit-base.x86_64           1.17.8-1                            @cuda-rhel9-x86_64                             
nvidia-driver.x86_64                           3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
nvidia-driver-libs.x86_64                      3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
nvidia-kmod-common.noarch                      3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
nvidia-modprobe.x86_64                         3:575.57.08-1.el9                   @cuda-rhel9-x86_64

dnf list installed | grep nvidia
kmod-nvidia-latest-dkms.x86_64                 3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
libnvidia-container-tools.x86_64               1.17.8-1                            @cuda-rhel9-x86_64                             
libnvidia-container1.x86_64                    1.17.8-1                            @cuda-rhel9-x86_64                             
libnvidia-gpucomp.x86_64                       3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
libnvidia-ml.x86_64                            3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
nvidia-container-toolkit.x86_64                1.17.8-1                            @cuda-rhel9-x86_64                             
nvidia-container-toolkit-base.x86_64           1.17.8-1                            @cuda-rhel9-x86_64                             
nvidia-driver.x86_64                           3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
nvidia-driver-libs.x86_64                      3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
nvidia-kmod-common.noarch                      3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
nvidia-modprobe.x86_64                         3:575.57.08-1.el9                   @cuda-rhel9-x86_64

mertalev•4mo ago

Segfaults are generally driver-related. I'm not sure why that specific driver would be problematic though

brconnOP•4mo ago

Looks like that's the latest available nvidia driver on Rocky9 atm. I could try and replace it with the dkms version maybe? Actually I'm already on dkms

mertalev•4mo ago

Do you have the latest nvidia-container-toolkit installed?

brconnOP•9h ago

Yup 1.17.8 is the latest per their GitHub I did have a kernel update to do. So I did that and let DKMS rebuild but that didn't help @mertalev If I set the LD_LIBRARY_PATH for the ML container to include the path to cuda I'm able to get farther

immich_machine_learning  | 2025-10-02 22:04:45.616101011 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDA failure 803: system has unsupported display driver / cuda driver combination ; GPU=-1 ; hostname=1aeb1e8be6a3 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc ; line=66 ; expr=cudaGetDeviceCount(&num_devices); 
immich_machine_learning  | *************** EP Error ***************
immich_machine_learning  | EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc:59 static onnxruntime::CUDAExecutionProviderInfo onnxruntime::CUDAExecutionProviderInfo::FromProviderOptions(const onnxruntime::ProviderOptions&) [ONNXRuntimeError] : 1 : FAIL : provider_options_utils.h:151 Parse Failed to parse provider option "device_id": CUDA failure 803: system has unsupported display driver / cuda driver combination ; GPU=-1 ; hostname=1aeb1e8be6a3 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc ; line=66 ; expr=cudaGetDeviceCount(&num_devices); 
immich_machine_learning  |  when using ['CUDAExecutionProvider', 'CPUExecutionProvider']
immich_machine_learning  | Falling back to ['CPUExecutionProvider'] and retrying.
immich_machine_learning  | ****************************************

immich_machine_learning  | 2025-10-02 22:04:45.616101011 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDA failure 803: system has unsupported display driver / cuda driver combination ; GPU=-1 ; hostname=1aeb1e8be6a3 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc ; line=66 ; expr=cudaGetDeviceCount(&num_devices); 
immich_machine_learning  | *************** EP Error ***************
immich_machine_learning  | EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc:59 static onnxruntime::CUDAExecutionProviderInfo onnxruntime::CUDAExecutionProviderInfo::FromProviderOptions(const onnxruntime::ProviderOptions&) [ONNXRuntimeError] : 1 : FAIL : provider_options_utils.h:151 Parse Failed to parse provider option "device_id": CUDA failure 803: system has unsupported display driver / cuda driver combination ; GPU=-1 ; hostname=1aeb1e8be6a3 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc ; line=66 ; expr=cudaGetDeviceCount(&num_devices); 
immich_machine_learning  |  when using ['CUDAExecutionProvider', 'CPUExecutionProvider']
immich_machine_learning  | Falling back to ['CPUExecutionProvider'] and retrying.
immich_machine_learning  | ****************************************

Gaming

Programming

No Smart Search

Did you find this page helpful?