I
Immichβ€’5mo ago
The12th Unique

GPU is detected and working but ML doesn't use it at all for any of it's related tasks.

I have attached the docker compose and hwaccel along with the logs from the ml_container stating that it reverts back to CPU even though it is trying for CUDA.
70 Replies
Immich
Immichβ€’5mo ago
:wave: Hey @The12th Unique, Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:. References - Container Logs: docker compose logs docs - Container Status: docker ps -a docs - Reverse Proxy: https://immich.app/docs/administration/reverse-proxy - Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA Checklist I have... 1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time). 2. :ballot_box_with_check: read applicable release notes. 3. :ballot_box_with_check: reviewed the FAQs for known issues. 4. :ballot_box_with_check: reviewed Github for known issues. 5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy). 6. :ballot_box_with_check: uploaded the relevant information (see below). 7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable (an item can be marked as "complete" by reacting with the appropriate number) Information In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider: - Your docker-compose.yml and .env files. - Logs from all the containers and their status (see above). - All the troubleshooting steps you've tried so far. - Any recent changes you've made to Immich or your system. - Details about your system (both software/OS and hardware). - Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h). - The version of the Immich server, mobile app, and other relevant pieces. - Any other information that you think might be relevant. Please paste files and logs with proper code formatting, and especially avoid blurry screenshots. Without the right information we can't work out what the problem is. Help us help you ;) If this ticket can be closed you can use the /close command, and re-open it later if needed.
The12th Unique
The12th UniqueOPβ€’5mo ago
the .env file
No description
The12th Unique
The12th UniqueOPβ€’5mo ago
Setup is - Proxmox as host, Immich is setup and installed in a LXC Container. (i5-6500, 16GB RAM, Quadro P1000) I can confirm through nvidia-smi that the gpu A Quadro P1000 gets detected in all 3 places mentioned above. The mounting, is a backblaze b2 bucket. The container has 4vCPUs, 4GB RAM and 90GB of storage allotted to it.
The12th Unique
The12th UniqueOPβ€’5mo ago
The image is nvidia-smi output
No description
The12th Unique
The12th UniqueOPβ€’5mo ago
I have played around as much as I can with the docker files, and hwaccel. Finally found the discord and saw someone pointed to the immich_machine_learning container using the internal IP of docker on the web interface. I have spent more time on this than I'd like to admit. Any help is appreciated.
No description
Immich
Immichβ€’5mo ago
Successfully submitted, a tag has been added to inform contributors. :white_check_mark:
Mraedis
Mraedisβ€’5mo ago
Is a P1000 even compatible? Ah yep, compute 6.1 is >= 5.2 Not sure what the NVIDIA env vars are supposed to achieve? We don't have those By installed in an LXC, do you mean directly, or is docker running in your LXC @The12th Unique ?
The12th Unique
The12th UniqueOPβ€’5mo ago
Yes there’s docker installed and it all runs off of that in the container
Mraedis
Mraedisβ€’5mo ago
Could try setting:
cuda:
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0']
capabilities:
- gpu
cuda:
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0']
capabilities:
- gpu
In hwaccel instead @The12th Unique ?
The12th Unique
The12th UniqueOPβ€’5mo ago
Ok will try and update here
The12th Unique
The12th UniqueOPβ€’5mo ago
Didn't work
No description
The12th Unique
The12th UniqueOPβ€’5mo ago
No description
Mraedis
Mraedisβ€’5mo ago
dang maybe
group_add:
- "xy"
- "zw"
group_add:
- "xy"
- "zw"
Where xy zw are the gid numbers for video and render πŸ‘€ But I'm just guessing at this point
The12th Unique
The12th UniqueOPβ€’5mo ago
I assume that's where you wanted me to add it
No description
Mraedis
Mraedisβ€’5mo ago
indeed
The12th Unique
The12th UniqueOPβ€’5mo ago
What I can't get out of my mind is the error log still mentions "device_id" I got rid of that but the error log still mentions it
The12th Unique
The12th UniqueOPβ€’5mo ago
No description
Mraedis
Mraedisβ€’5mo ago
It mentioned that before you switched to it my guess is there is a rights issue preventing the GPU from being properly accessesd resulting in GPU=-1
The12th Unique
The12th UniqueOPβ€’5mo ago
hmm, the error log at the top that I shared doesn't have the white error from now I do agree it might just be that too many virtualisations happening, proxmox>lxc>docker causing permission issues Installed immich on a vm first, worked great, then realised maybe adding a GPU for the ML loads would be a good idea considering the CPU is probably not the best for this stuff. That was 3 days ago when I dropped $100 on the Quadro, first my wallet hurt, now my brain hurts.
Mraedis
Mraedisβ€’5mo ago
The joke is your iGPU probably would've worked fine πŸ‘€
Mraedis
Mraedisβ€’5mo ago
So to give you the real picture here CPUs have their own accelerators OpenVino for instance And outside of the initial processing, you don't really need anything but RAM usually loading the model is what takes the longest using an external ML container for initial processing is very popular
The12th Unique
The12th UniqueOPβ€’5mo ago
Yeah I have been thinking of 2 options - 1. Use a spare disk and test run a desktop environment - probably mint. Try running immich on there and if it works well then maybe I move my entire stack there - HAos, SMB, Minecraft Servers. 2. My main machine is a RTX 3060 Laptop, Windows. Run the ML container on there? The web UI allows me to control when the jobs are running and I can control when the container is running on the laptop. Also to note - I tried sending through 7000 files and that started crashing the container(lxc) seems I loaded the CPU too much and had all the ML stuff enabled because I thought the GPU was working But on the VM, the CPU seemed to handle it beautifully, didn't crash even though the CPU was going ham the same way - this was before the GPU install.
Mraedis
Mraedisβ€’5mo ago
I ran my ML on windows as well, took a good few hours but not days
The12th Unique
The12th UniqueOPβ€’5mo ago
Yeah I did look into, Nvidia and Microsoft have done some cool stuff to bring proper GPU access to WSL. There's just so much conflicting and unclear information on the docs
Mraedis
Mraedisβ€’5mo ago
Our docs?
The12th Unique
The12th UniqueOPβ€’5mo ago
oh no developer.nvidia.com don't think I found anything about that on the immich docs
Paulie π“…‚β˜Ύπ“†
How are you passing through your GPU to the LXC container?
The12th Unique
The12th UniqueOPβ€’5mo ago
Well technically I don’t think an lxc container passes it through or gets direct access to the GPU. It lets the proxmox host have control of it and shares resources from there So with a VM, one VM will have access to the GPU. With LXC Containers, host keeps the GPU but multiple containers can use it I first tried VM but in that I couldn’t even get the GPU to be detected properly, with LXC it detects just doesn’t get used.
Paulie π“…‚β˜Ύπ“†
With LXC you still need to write rules in the LXC config file and you'll likely have to mount the GPU If you do ls /dev/nvidia* in the container does anything show up?
The12th Unique
The12th UniqueOPβ€’4mo ago
@Paulie π“…‚β˜Ύπ“† mb don't know how I missed this message, but my config for the lxc has all these things in it
The12th Unique
The12th UniqueOPβ€’4mo ago
No description
Paulie π“…‚β˜Ύπ“†
Please provide me with an ls /dev/nvidia* from both the Proxmox host and from inside the container
The12th Unique
The12th UniqueOPβ€’4mo ago
This is the Immich Container
No description
The12th Unique
The12th UniqueOPβ€’4mo ago
That's Proxmox
No description
Paulie π“…‚β˜Ύπ“†
Just to test the passthrough, if you download ffmpeg within the container and try to transcode a video, does it do that properly?
The12th Unique
The12th UniqueOPβ€’4mo ago
I’ll try that now
Paulie π“…‚β˜Ύπ“†
(Make sure to use the nvenc encoder to trigger GPU usage)
The12th Unique
The12th UniqueOPβ€’4mo ago
Yeah did that and seems it failed on the container
No description
The12th Unique
The12th UniqueOPβ€’4mo ago
ffmpeg -hwaccel cuda -i P1066118.MOV test.mp4 that was my command
Paulie π“…‚β˜Ύπ“†
I just noticed you're passing through driver libraries from your host; is there any reason for this?
The12th Unique
The12th UniqueOPβ€’4mo ago
That's just what I found when searching online The way I set it up was - 1. Got the driver file from Nvidia.com. Ran it on the proxmox host. 2. Ran the same file on the container but with the parameter to exclude building kernels since it was erroring out when I tried to install the same file because the lxc was not detecting the driver at all. Also my reason to use the file from Nvidia.com was because of version discrepancy between what the pve host was installing from apt and the lxc. Plus this way I assume less chance of breaking when updates are run
Paulie π“…‚β˜Ύπ“†
The driver in the Debian repository is a bit older, it's from Februari 2024, but for something like Immich I wouldn't be chasing the newest version (especially with a Debian-based OS) but I'd want stability instead If I were you I'd install the Nvidia driver from the apt repository and then try this all again and avoid mounting driver directories; just install nvidia-smi inside the LXC container and it will pull along any necessary libraries it needs Let me know if you need help If you install the Nvidia drivers through apt if I remember correctly it will undo the .run installation itself
The12th Unique
The12th UniqueOPβ€’4mo ago
I get that - PVE was installing 570.20 and immich was installing 570.06 something. In my mind it made sense to have the same driver versions so I ran the .run But you mean to say - let the host have the drivers or not? Because isn't the container dependent on the host for the drivers?
Paulie π“…‚β˜Ύπ“†
Immich comes with its own driver blobs?
The12th Unique
The12th UniqueOPβ€’4mo ago
wait a min, you mean to say the docker container should have the drivers in there?
Mraedis
Mraedisβ€’4mo ago
FYI new drivers often break ML, don't try to be clever here and stick with the "old" ones
Paulie π“…‚β˜Ύπ“†
Break ML or break Debian; there's no winning here The issue is that installing Nvidia drivers from anything that is not the repository is a hole deeper than hell I assume you mean old drivers break ML? Or am I reading it wrong and staying with old is good? I'd think see if repo driver works and if so, don't change, right?
Mraedis
Mraedisβ€’4mo ago
I specifically said new πŸ‘€
The12th Unique
The12th UniqueOPβ€’4mo ago
Well, I have cleared out the driver install. Will reboot and try what's suggested here once the badblock test on one of my disks ends.
Paulie π“…‚β˜Ύπ“†
Make sure to get rid of the library directory mounts; that should not be necessary if you have nvidia-smi installed within the container.
The12th Unique
The12th UniqueOPβ€’4mo ago
I am so effin happy
No description
The12th Unique
The12th UniqueOPβ€’4mo ago
What I did was nothing short of a full system reset - Proxmox 9 was recently released, and I still use Proxmox 8. Decided imma install Proxmox 9 on a spare disk that will have none of my stuff on it besides the default and try passing through the GPU to a Windows VM. WORKED RIGHT AWAY So I tried to recreate it on my actual boot disk but failed, spent the day backing up and then restoring my containers and vms to the new install and finally voila!!! Don't know what was wrong in my old install that was messing the passthrough up so bad, but having attempted this so many times now over the past 10 days, I have nailed down the process and know exactly what is needed. Should it have taken this long? - NO. Does this remind of the covid lockdown days, where teenage me spent a lot of time understanding and figuring out how networking, reverse tunnels and java works, so I could play minecraft with friends? - Absolutely.
The12th Unique
The12th UniqueOPβ€’4mo ago
Seems my video transcoding is failing now
No description
Paulie π“…‚β˜Ύπ“†
You're missing libcuda Is this still an LXC container?
The12th Unique
The12th UniqueOPβ€’4mo ago
Nope it’s a Ubuntu server VM now.
Paulie π“…‚β˜Ύπ“†
Make sure the GPU is blacklisted on the Proxmox host
The12th Unique
The12th UniqueOPβ€’4mo ago
Oh hmm could that also be why I had to restart docker earlier. It had stopped detecting a cuda device according to the errors
Paulie π“…‚β˜Ύπ“†
lspci -nn on the host and paste the output
The12th Unique
The12th UniqueOPβ€’4mo ago
No description
Paulie π“…‚β˜Ύπ“†
Sorry, -nnk instead of -nn
The12th Unique
The12th UniqueOPβ€’4mo ago
No description
Paulie π“…‚β˜Ύπ“†
Looks good, it's attached to vfio And the GPU is properly visible in the Windows VM? In device management?
The12th Unique
The12th UniqueOPβ€’4mo ago
Well I have since gotten rid of the Windows VM, but it shows up nicely in my Ubuntu Server VM now. It's chugging through the files
No description
The12th Unique
The12th UniqueOPβ€’4mo ago
Just not sure why ffmpeg is misbehaving, especially considering what I have found. Immich uses the jellyfin fork of ffmpeg and I was gonna setup jellyfin next.
Paulie π“…‚β˜Ύπ“†
Well, it says right in the output
The12th Unique
The12th UniqueOPβ€’4mo ago
The extends portion was commented out, let's see if it works now
No description
The12th Unique
The12th UniqueOPβ€’4mo ago
Alright that was it, guess I misplaced the file where I had that set already and used the wrong one
No description

Did you find this page helpful?