GPU is detected and working but ML doesn't use it at all for any of it's related tasks.
I have attached the
docker compose and hwaccel along with the logs from the ml_container stating that it reverts back to CPU even though it is trying for CUDA.70 Replies
:wave: Hey @The12th Unique,
Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:.
References
- Container Logs:
docker compose logs docs
- Container Status: docker ps -a docs
- Reverse Proxy: https://immich.app/docs/administration/reverse-proxy
- Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA
Checklist
I have...
1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time).
2. :ballot_box_with_check: read applicable release notes.
3. :ballot_box_with_check: reviewed the FAQs for known issues.
4. :ballot_box_with_check: reviewed Github for known issues.
5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy).
6. :ballot_box_with_check: uploaded the relevant information (see below).
7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable
(an item can be marked as "complete" by reacting with the appropriate number)
Information
In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider:
- Your docker-compose.yml and .env files.
- Logs from all the containers and their status (see above).
- All the troubleshooting steps you've tried so far.
- Any recent changes you've made to Immich or your system.
- Details about your system (both software/OS and hardware).
- Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h).
- The version of the Immich server, mobile app, and other relevant pieces.
- Any other information that you think might be relevant.
Please paste files and logs with proper code formatting, and especially avoid blurry screenshots.
Without the right information we can't work out what the problem is. Help us help you ;)
If this ticket can be closed you can use the /close command, and re-open it later if needed.the .env file

Setup is - Proxmox as host, Immich is setup and installed in a LXC Container. (i5-6500, 16GB RAM, Quadro P1000)
I can confirm through
nvidia-smi that the gpu A Quadro P1000 gets detected in all 3 places mentioned above.
The mounting, is a backblaze b2 bucket.
The container has 4vCPUs, 4GB RAM and 90GB of storage allotted to it.The image is
nvidia-smi output
I have played around as much as I can with the docker files, and hwaccel.
Finally found the discord and saw someone pointed to the immich_machine_learning container using the internal IP of docker on the web interface.
I have spent more time on this than I'd like to admit. Any help is appreciated.

Successfully submitted, a tag has been added to inform contributors. :white_check_mark:
Is a P1000 even compatible?
Ah yep, compute 6.1 is >= 5.2
Not sure what the NVIDIA env vars are supposed to achieve?
We don't have those
By installed in an LXC, do you mean directly, or is docker running in your LXC @The12th Unique ?
Yes thereβs docker installed and it all runs off of that in the container
Could try setting:
In hwaccel instead @The12th Unique ?
Ok will try and update here
Didn't work


dang
maybe
Where xy zw are the gid numbers for video and render π
But I'm just guessing at this point
I assume that's where you wanted me to add it

indeed
What I can't get out of my mind is the error log still mentions "device_id" I got rid of that but the error log still mentions it

It mentioned that before you switched to it
my guess is there is a rights issue preventing the GPU from being properly accessesd
resulting in GPU=-1
hmm, the error log at the top that I shared doesn't have the white error from now
I do agree it might just be that too many virtualisations happening, proxmox>lxc>docker
causing permission issues
Installed immich on a vm first, worked great, then realised maybe adding a GPU for the ML loads would be a good idea considering the CPU is probably not the best for this stuff.
That was 3 days ago when I dropped $100 on the Quadro, first my wallet hurt, now my brain hurts.
The joke is your iGPU probably would've worked fine π
So to give you the real picture here
CPUs have their own accelerators
OpenVino for instance
And outside of the initial processing, you don't really need anything but RAM
usually loading the model is what takes the longest
using an external ML container for initial processing is very popular
Yeah I have been thinking of 2 options -
1. Use a spare disk and test run a desktop environment - probably mint. Try running immich on there and if it works well then maybe I move my entire stack there - HAos, SMB, Minecraft Servers.
2. My main machine is a RTX 3060 Laptop, Windows. Run the ML container on there? The web UI allows me to control when the jobs are running and I can control when the container is running on the laptop.
Also to note - I tried sending through 7000 files and that started crashing the container(lxc) seems I loaded the CPU too much and had all the ML stuff enabled because I thought the GPU was working
But on the VM, the CPU seemed to handle it beautifully, didn't crash even though the CPU was going ham the same way - this was before the GPU install.
I ran my ML on windows as well, took a good few hours but not days
Yeah I did look into, Nvidia and Microsoft have done some cool stuff to bring proper GPU access to WSL.
There's just so much conflicting and unclear information on the docs
Our docs?
oh no developer.nvidia.com
don't think I found anything about that on the immich docs
How are you passing through your GPU to the LXC container?
Well technically I donβt think an lxc container passes it through or gets direct access to the GPU.
It lets the proxmox host have control of it and shares resources from there
So with a VM, one VM will have access to the GPU.
With LXC Containers, host keeps the GPU but multiple containers can use it
I first tried VM but in that I couldnβt even get the GPU to be detected properly, with LXC it detects just doesnβt get used.
With LXC you still need to write rules in the LXC config file and you'll likely have to mount the GPU
If you do
ls /dev/nvidia* in the container does anything show up?@Paulie π
βΎπ mb don't know how I missed this message, but my config for the lxc has all these things in it

Please provide me with an
ls /dev/nvidia* from both the Proxmox host and from inside the containerThis is the Immich Container

That's Proxmox

Just to test the passthrough, if you download ffmpeg within the container and try to transcode a video, does it do that properly?
Iβll try that now
(Make sure to use the nvenc encoder to trigger GPU usage)
Yeah did that and seems it failed on the container

ffmpeg -hwaccel cuda -i P1066118.MOV test.mp4 that was my commandI just noticed you're passing through driver libraries from your host; is there any reason for this?
That's just what I found when searching online
The way I set it up was -
1. Got the driver file from Nvidia.com. Ran it on the proxmox host.
2. Ran the same file on the container but with the parameter to exclude building kernels since it was erroring out when I tried to install the same file because the lxc was not detecting the driver at all.
Also my reason to use the file from Nvidia.com was because of version discrepancy between what the pve host was installing from apt and the lxc.
Plus this way I assume less chance of breaking when updates are run
The driver in the Debian repository is a bit older, it's from Februari 2024, but for something like Immich I wouldn't be chasing the newest version (especially with a Debian-based OS) but I'd want stability instead
If I were you I'd install the Nvidia driver from the apt repository and then try this all again and avoid mounting driver directories; just install nvidia-smi inside the LXC container and it will pull along any necessary libraries it needs
Let me know if you need help
If you install the Nvidia drivers through apt if I remember correctly it will undo the .run installation itself
I get that - PVE was installing 570.20 and immich was installing 570.06 something.
In my mind it made sense to have the same driver versions so I ran the .run
But you mean to say - let the host have the drivers or not? Because isn't the container dependent on the host for the drivers?
Immich comes with its own driver blobs?
wait a min, you mean to say the docker container should have the drivers in there?
FYI new drivers often break ML, don't try to be clever here and stick with the "old" ones
Break ML or break Debian; there's no winning here
The issue is that installing Nvidia drivers from anything that is not the repository is a hole deeper than hell
I assume you mean old drivers break ML?
Or am I reading it wrong and staying with old is good?
I'd think see if repo driver works and if so, don't change, right?
I specifically said new π
Well, I have cleared out the driver install. Will reboot and try what's suggested here once the badblock test on one of my disks ends.
Make sure to get rid of the library directory mounts; that should not be necessary if you have nvidia-smi installed within the container.
I am so effin happy

What I did was nothing short of a full system reset -
Proxmox 9 was recently released, and I still use Proxmox 8. Decided imma install Proxmox 9 on a spare disk that will have none of my stuff on it besides the default and try passing through the GPU to a Windows VM. WORKED RIGHT AWAY
So I tried to recreate it on my actual boot disk but failed, spent the day backing up and then restoring my containers and vms to the new install and finally voila!!!
Don't know what was wrong in my old install that was messing the passthrough up so bad, but having attempted this so many times now over the past 10 days, I have nailed down the process and know exactly what is needed.
Should it have taken this long? - NO.
Does this remind of the covid lockdown days, where teenage me spent a lot of time understanding and figuring out how networking, reverse tunnels and java works, so I could play minecraft with friends? - Absolutely.
Seems my video transcoding is failing now

You're missing libcuda
Is this still an LXC container?
Nope itβs a Ubuntu server VM now.
Make sure the GPU is blacklisted on the Proxmox host
Oh hmm could that also be why I had to restart docker earlier.
It had stopped detecting a cuda device according to the errors
lspci -nn on the host and paste the output
Sorry, -nnk instead of -nn

Looks good, it's attached to vfio
And the GPU is properly visible in the Windows VM? In device management?
Well I have since gotten rid of the Windows VM, but it shows up nicely in my Ubuntu Server VM now.
It's chugging through the files

Nice!
Just not sure why ffmpeg is misbehaving, especially considering what I have found.
Immich uses the jellyfin fork of ffmpeg and I was gonna setup jellyfin next.
Well, it says right in the output
The extends portion was commented out, let's see if it works now

Alright that was it, guess I misplaced the file where I had that set already and used the wrong one
