ROCM on truenas scale
Hey everyone, I'm trying to get gpu acceleration on my truenas scale instance with a ryzen 5 5600G.
The machine learning container gives the following error:
https://gist.github.com/Anulo2/89d08a19fb0094f4dcc4cf866cafe884
Current app setting uses the rocm machine learning image, i've tried both HSA_OVERRIDE_GFX_VERSION 9.0.0 and 10.3.0, maybe some other version works? idk how to exactly find the verison needed. I've also tried settings HSA_USE_SVM to 0. Both the setting have been aplied via additional enviroment variables.
Other than this the app run fine. Just doesn't do machine learning stuff on gpu because of the error.
I'm on immich v1.132.3 (truenas scale latest available version)
Gist
ROCm error on TrueNAS Scale
ROCm error on TrueNAS Scale. GitHub Gist: instantly share code, notes, and snippets.
31 Replies
:wave: Hey @Zaizen,
Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:.
References
- Container Logs:
docker compose logs
docs
- Container Status: docker ps -a
docs
- Reverse Proxy: https://immich.app/docs/administration/reverse-proxy
- Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA
Checklist
I have...
1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time).
2. :ballot_box_with_check: read applicable release notes.
3. :ballot_box_with_check: reviewed the FAQs for known issues.
4. :ballot_box_with_check: reviewed Github for known issues.
5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy).
6. :ballot_box_with_check: uploaded the relevant information (see below).
7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable
(an item can be marked as "complete" by reacting with the appropriate number)
Information
In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider:
- Your docker-compose.yml and .env files.
- Logs from all the containers and their status (see above).
- All the troubleshooting steps you've tried so far.
- Any recent changes you've made to Immich or your system.
- Details about your system (both software/OS and hardware).
- Details about your storage (filesystems, type of disks, output of commands like fdisk -l
and df -h
).
- The version of the Immich server, mobile app, and other relevant pieces.
- Any other information that you think might be relevant.
Please paste files and logs with proper code formatting, and especially avoid blurry screenshots.
Without the right information we can't work out what the problem is. Help us help you ;)
If this ticket can be closed you can use the /close
command, and re-open it later if needed.
Successfully submitted, a tag has been added to inform contributors. :white_check_mark:Also I want to add: I'm open to trying out stuff, I've got some experience with docker and can provide logs if needed. I know the rocm support is very new and expected to be buggy so if I can help in any way to improve the experience for future users, that would be awesome
I'll see if I can help, we might need to try a different override from this list: https://llvm.org/docs/AMDGPUUsage.html#processors
I would guess this is the same issue someone else has.
It's probably caused by a bug on the TrueNAS App catalog scripts.
https://github.com/truenas/apps/issues/2145#issuecomment-2830241891
That comment has a command you can run on the TrueNAS host.
You'll either want to change the CPU and Memory values in the command or just change them in the WebUI after running the command.
uh seems promising, I'll try to understand more of this later today, meanwhile, thanks!
Alright, I've run
midclt call -job app.update immich-2 '{"values": {"resources": {"limits": {"cpus": 2,"memory": 4096},"gpus": {"use_all_gpus": true, "nvidia_gpu_selection": {}, "kfd_device_exists": true}}}}'
but :
Total Progress: [__] 0.00%Status: (none)
Total Progress: [__] 0.00%
[EINVAL] values.resources.gpus.kfd_device_exists: Field was not expected
Any suggestion?
As I understand it from LLVM I have to look at this:
so I should override with 9.0.c ?

What version of TrueNAS are you on?
TBH I'm not sure but I'm pretty sure you can stick with 9.0.0
Latest stable available
I need the version number.
Let me know what version it says here.

ElectricEel-24.10.2.1
I've updated a few days ago but immich has never actually worked with gpu
Send screenshots of going into the immich container shell and running
ls -la /dev/dri
and ls -la /dev/kfd

So it's passing the GPU but not the kernel fused driver needed for ROCm.
Is your immich app up-to-date?
As in do you see an update button in the TrueNAS apps interface?
Because I'm surprised it says field was not expected.
I'm on:
App Version:
v1.132.3
Version:
1.7.42
but there's an update from today, I'm updating now
Okay, that's probably not the issue.
I'll try something on my system. I don't have an AMD GPU currently installed in my TrueNAS server but I'll try something.
If you want I can also do voice + screen share if that help you debug faster
What happens if you run this on the host:
Can't.
{
"gpus": {
"nvidia_gpu_selection": {},
"use_all_gpus": true
},
"limits": {
"cpus": 2,
"memory": 4096
}
}
I got basically the same thing. I'll have to double check but it's possible the only solution is to update TrueNAS.
Wait but I have already updated my truenas, do you mean going on the beta channel?
to the 25.04?

It's stable.
uuuh
time to read the changes and update then!
I'm avoiding it because I'll have to reconfigure my VM. Otherwise I would upgrade.
I think I shouldn't have too many breaking changes since i've already ditched the kubernetes stuff
Then you'll still have to run that command that sets the resources and it should work.
Do you have any VMs?
alright thanks, will work on updating truenas and let you know
they can all be discarded, nothing important
just some testing vms with unused stuff inside that I can reinstall quickly
Oh okay.
Lucky for me other than immich, jellyfin, nextcloud and pihole I don't run anything else
Just a bunch of storage (18TB * 3, raidz 1) :D
WORKS!
Thank you so much!
Woohoo!
You're welcome.
Now I just have to wait for my extremely slow connection to download the models haha
antelope done, now vit h