CUDA stopped working after upgrading to newer driver version

Hi all, Running immich in a docker container in a Proxmox LXC. GPU (RTX 2060) is exposed to the LXC, driver version 570.144, CUDA 12.8. I used to use the 535 drivers from the Debian repo, 570 drivers are now installed using NVidia .run file. Other containers, such as Frigate, Beszel (monitoring) can see and use the GPU. However, with Immich I am getting the following error:
[E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDA failure 100: no CUDA-capable device is detected ; GPU=-1 ; hostname=f77b4a10892a ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc ; line=66 ; expr=cudaGetDeviceCount(&num_devices);
immich_machine_learning | *************** EP Error ***************
immich_machine_learning | EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc:59 static onnxruntime::CUDAExecutionProviderInfo onnxruntime::CUDAExecutionProviderInfo::FromProviderOptions(const onnxruntime::ProviderOptions&) [ONNXRuntimeError] : 1 : FAIL : provider_options_utils.h:151 Parse Failed to parse provider option "device_id": CUDA failure 100: no CUDA-capable device is detected ; GPU=-1 ; hostname=f77b4a10892a ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc ; line=66 ; expr=cudaGetDeviceCount(&num_devices);
[E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDA failure 100: no CUDA-capable device is detected ; GPU=-1 ; hostname=f77b4a10892a ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc ; line=66 ; expr=cudaGetDeviceCount(&num_devices);
immich_machine_learning | *************** EP Error ***************
immich_machine_learning | EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc:59 static onnxruntime::CUDAExecutionProviderInfo onnxruntime::CUDAExecutionProviderInfo::FromProviderOptions(const onnxruntime::ProviderOptions&) [ONNXRuntimeError] : 1 : FAIL : provider_options_utils.h:151 Parse Failed to parse provider option "device_id": CUDA failure 100: no CUDA-capable device is detected ; GPU=-1 ; hostname=f77b4a10892a ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc ; line=66 ; expr=cudaGetDeviceCount(&num_devices);
Some outputs:
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0

nvidia-container-toolkit --version
NVIDIA Container Runtime Hook version 1.17.6
commit: e627eb2e21e167988e04c0579a1c941c1e263ff6
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0

nvidia-container-toolkit --version
NVIDIA Container Runtime Hook version 1.17.6
commit: e627eb2e21e167988e04c0579a1c941c1e263ff6
Docker compose:
immich-machine-learning:
container_name: immich_machine_learning
image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda
extends:
file: hwaccel.ml.yml
service: cuda
immich-machine-learning:
container_name: immich_machine_learning
image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda
extends:
file: hwaccel.ml.yml
service: cuda
Any help would be greatly appreciated.
2 Replies
Immich
Immich22h ago
:wave: Hey @Bittabola, Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:. References - Container Logs: docker compose logs docs - Container Status: docker ps -a docs - Reverse Proxy: https://immich.app/docs/administration/reverse-proxy - Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA Checklist I have... 1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time). 2. :ballot_box_with_check: read applicable release notes. 3. :ballot_box_with_check: reviewed the FAQs for known issues. 4. :ballot_box_with_check: reviewed Github for known issues. 5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy). 6. :ballot_box_with_check: uploaded the relevant information (see below). 7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable (an item can be marked as "complete" by reacting with the appropriate number) Information In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider: - Your docker-compose.yml and .env files. - Logs from all the containers and their status (see above). - All the troubleshooting steps you've tried so far. - Any recent changes you've made to Immich or your system. - Details about your system (both software/OS and hardware). - Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h). - The version of the Immich server, mobile app, and other relevant pieces. - Any other information that you think might be relevant. Please paste files and logs with proper code formatting, and especially avoid blurry screenshots. Without the right information we can't work out what the problem is. Help us help you ;) If this ticket can be closed you can use the /close command, and re-open it later if needed. Successfully submitted, a tag has been added to inform contributors. :white_check_mark:
Bittabola
BittabolaOP21h ago
Immich v1.132.3
docker exec -it immich_machine_learning python -c "import onnxruntime as ort;print(ort.get_device())"
GPU

docker exec -it immich_machine_learning nvidia-smi
Failed to initialize NVML: Unknown Error

nvidia-container-toolkit --version
NVIDIA Container Runtime Hook version 1.17.6
commit: e627eb2e21e167988e04c0579a1c941c1e263ff6
docker exec -it immich_machine_learning python -c "import onnxruntime as ort;print(ort.get_device())"
GPU

docker exec -it immich_machine_learning nvidia-smi
Failed to initialize NVML: Unknown Error

nvidia-container-toolkit --version
NVIDIA Container Runtime Hook version 1.17.6
commit: e627eb2e21e167988e04c0579a1c941c1e263ff6
Set no-cgroups = false in /etc/nvidia-container-runtime/config.toml which fixed the issue.

Did you find this page helpful?