ROCM on truenas scale

Hey everyone, I'm trying to get gpu acceleration on my truenas scale instance with a ryzen 5 5600G. The machine learning container gives the following error: https://gist.github.com/Anulo2/89d08a19fb0094f4dcc4cf866cafe884 Current app setting uses the rocm machine learning image, i've tried both HSA_OVERRIDE_GFX_VERSION 9.0.0 and 10.3.0, maybe some other version works? idk how to exactly find the verison needed. I've also tried settings HSA_USE_SVM to 0. Both the setting have been aplied via additional enviroment variables. Other than this the app run fine. Just doesn't do machine learning stuff on gpu because of the error. I'm on immich v1.132.3 (truenas scale latest available version)
Gist
ROCm error on TrueNAS Scale
ROCm error on TrueNAS Scale. GitHub Gist: instantly share code, notes, and snippets.
31 Replies
Immich
Immich2w ago
:wave: Hey @Zaizen, Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:. References - Container Logs: docker compose logs docs - Container Status: docker ps -a docs - Reverse Proxy: https://immich.app/docs/administration/reverse-proxy - Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA Checklist I have... 1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time). 2. :ballot_box_with_check: read applicable release notes. 3. :ballot_box_with_check: reviewed the FAQs for known issues. 4. :ballot_box_with_check: reviewed Github for known issues. 5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy). 6. :ballot_box_with_check: uploaded the relevant information (see below). 7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable (an item can be marked as "complete" by reacting with the appropriate number) Information In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider: - Your docker-compose.yml and .env files. - Logs from all the containers and their status (see above). - All the troubleshooting steps you've tried so far. - Any recent changes you've made to Immich or your system. - Details about your system (both software/OS and hardware). - Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h). - The version of the Immich server, mobile app, and other relevant pieces. - Any other information that you think might be relevant. Please paste files and logs with proper code formatting, and especially avoid blurry screenshots. Without the right information we can't work out what the problem is. Help us help you ;) If this ticket can be closed you can use the /close command, and re-open it later if needed. Successfully submitted, a tag has been added to inform contributors. :white_check_mark:
_Zaizen_
_Zaizen_OP2w ago
Also I want to add: I'm open to trying out stuff, I've got some experience with docker and can provide logs if needed. I know the rocm support is very new and expected to be buggy so if I can help in any way to improve the experience for future users, that would be awesome
Nicholas Flamy
I'll see if I can help, we might need to try a different override from this list: https://llvm.org/docs/AMDGPUUsage.html#processors I would guess this is the same issue someone else has. It's probably caused by a bug on the TrueNAS App catalog scripts. https://github.com/truenas/apps/issues/2145#issuecomment-2830241891 That comment has a command you can run on the TrueNAS host. You'll either want to change the CPU and Memory values in the command or just change them in the WebUI after running the command.
_Zaizen_
_Zaizen_OP2w ago
uh seems promising, I'll try to understand more of this later today, meanwhile, thanks! Alright, I've run midclt call -job app.update immich-2 '{"values": {"resources": {"limits": {"cpus": 2,"memory": 4096},"gpus": {"use_all_gpus": true, "nvidia_gpu_selection": {}, "kfd_device_exists": true}}}}' but : Total Progress: [__] 0.00%Status: (none) Total Progress: [__] 0.00% [EINVAL] values.resources.gpus.kfd_device_exists: Field was not expected Any suggestion?
_Zaizen_
_Zaizen_OP2w ago
As I understand it from LLVM I have to look at this: so I should override with 9.0.c ?
No description
Nicholas Flamy
What version of TrueNAS are you on? TBH I'm not sure but I'm pretty sure you can stick with 9.0.0
_Zaizen_
_Zaizen_OP2w ago
Latest stable available
Nicholas Flamy
I need the version number.
Nicholas Flamy
Let me know what version it says here.
No description
_Zaizen_
_Zaizen_OP2w ago
ElectricEel-24.10.2.1 I've updated a few days ago but immich has never actually worked with gpu
Nicholas Flamy
Send screenshots of going into the immich container shell and running ls -la /dev/dri and ls -la /dev/kfd
_Zaizen_
_Zaizen_OP2w ago
No description
Nicholas Flamy
So it's passing the GPU but not the kernel fused driver needed for ROCm. Is your immich app up-to-date? As in do you see an update button in the TrueNAS apps interface? Because I'm surprised it says field was not expected.
_Zaizen_
_Zaizen_OP2w ago
I'm on: App Version: v1.132.3 Version: 1.7.42 but there's an update from today, I'm updating now
Nicholas Flamy
Okay, that's probably not the issue. I'll try something on my system. I don't have an AMD GPU currently installed in my TrueNAS server but I'll try something.
_Zaizen_
_Zaizen_OP2w ago
If you want I can also do voice + screen share if that help you debug faster
Nicholas Flamy
What happens if you run this on the host:
midclt call app.config immich | jq .resources
midclt call app.config immich | jq .resources
Can't.
_Zaizen_
_Zaizen_OP2w ago
{ "gpus": { "nvidia_gpu_selection": {}, "use_all_gpus": true }, "limits": { "cpus": 2, "memory": 4096 } }
Nicholas Flamy
I got basically the same thing. I'll have to double check but it's possible the only solution is to update TrueNAS.
_Zaizen_
_Zaizen_OP2w ago
Wait but I have already updated my truenas, do you mean going on the beta channel? to the 25.04?
Nicholas Flamy
It's stable.
_Zaizen_
_Zaizen_OP2w ago
uuuh time to read the changes and update then!
Nicholas Flamy
I'm avoiding it because I'll have to reconfigure my VM. Otherwise I would upgrade.
_Zaizen_
_Zaizen_OP2w ago
I think I shouldn't have too many breaking changes since i've already ditched the kubernetes stuff
Nicholas Flamy
Then you'll still have to run that command that sets the resources and it should work. Do you have any VMs?
_Zaizen_
_Zaizen_OP2w ago
alright thanks, will work on updating truenas and let you know they can all be discarded, nothing important just some testing vms with unused stuff inside that I can reinstall quickly
Nicholas Flamy
Oh okay.
_Zaizen_
_Zaizen_OP2w ago
Lucky for me other than immich, jellyfin, nextcloud and pihole I don't run anything else Just a bunch of storage (18TB * 3, raidz 1) :D WORKS! Thank you so much!
Nicholas Flamy
Woohoo! You're welcome.
_Zaizen_
_Zaizen_OP2w ago
Now I just have to wait for my extremely slow connection to download the models haha antelope done, now vit h

Did you find this page helpful?