ImmichI
Immich2mo ago
Medo

Remote Machine Learning full of errors when processing starts

I want to use my GPU to process my media. I started on my Windows 11 the machine learning container with CUDA image. Then I changed the settings in Immich to use this container as a machine learning server.
When I started queued jobs I saw that it could successfully connect to the remote machine learning server, but a lot of errors were logged.
I attached a screenshot when I run docker exec -it immich_machine_learning nvidia-smi.

Somebody knows what the problem is?

docker-compose.yml:
name: immich_remote_ml services: immich-machine-learning: container_name: immich_machine_learning image: altran1502/immich-machine-learning:v2.1.0-cuda deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: - gpu volumes: - model-cache:/cache restart: always ports: - 3003:3003 volumes: model-cache:

Logs:
[10/21/25 12:21:37] INFO Starting gunicorn 23.0.0 [10/21/25 12:21:37] INFO Listening at: http://[::]:3003 (8) 2025-10-21 12:30:27.143106979 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=b9f607173de1 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/nn/conv.cc ; line=455 ; expr=cudnnConvolutionForward(cudnn_handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.algo, workspace.get(), s_.workspace_bytes, &beta, s_.y_tensor, s_.y_data); 2025-10-21 12:30:27.143238491 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/visual/conv1/Conv' Status Message: CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=b9f607173de1 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/nn/conv.cc ; line=455 ; expr=cudnnConvolutionForward(cudnn_handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.algo, workspace.get(), s_.workspace_bytes, &beta, s_.y_tensor, s_.y_data); 2025-10-21 12:30:27.145386284 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=b9f607173de1 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/nn/conv.cc ; line=455 ; expr=cudnnConvolutionForward(cudnn_handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.algo, workspace.get(), s_.workspace_bytes, &beta, s_.y_tensor, s_.y_data); 2025-10-21 12:30:27.145452290 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/visual/conv1/Conv' Status Message: CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=b9f607173de1 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/nn/conv.cc ; line=455 ; expr=cudnnConvolutionForward(cudnn_handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.algo, workspace.get(), s_.workspace_bytes, &beta, s_.y_tensor, s_.y_data); [10/21/25 12:30:27] ERROR Exception in ASGI application
image.png
Was this page helpful?