Face recognition Issue
Face recognition has been running fine (if you believe the interface), but the People menu doesn't appear in the explore menu.
The ip:2283/api/person only displays [], no thumbnail.
Errors in the log of the machine learning container:
50 Replies
So this is an issue of the container trying to load the model
Can you perform the exact step
1. clear the queue
2. docker-compose down
3. docker-compose up
4. run the job
5. check /api/person
What is your system spec?
OS? architecture
Running on a Proxmox CT, Ubuntu 22.04, 3Gb RAM, 2 CPU
Not relevance but you will need at least 5 GB of RAM for Immich, for when all ML jobs run
I'm running into that too.
onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /cache/models/buffalo_l/1k3d68.onnx failed:Protobuf parsing failed.

Try changing the type to host
Still [] in API, I can increase ressources even though that didn't look necessary

maybe relevant, the pictures are in a mounted folder from a Synology NAS
That shouldn't be the issue, the error message shows the machine learning cannot load the facial recognition model
What is the processors looks like on your VM?

Sorry, I don't have that host option, where do you find it?

Clicking advanced
it's a container not a VM
Ah
If it is a container then it should have use the host processor already
So it might not be relevance

Might be related to resources issue
will try again with 6 GB ram for the sake of completeness
try to increase the RAM to 4GB just to try
even better
Cleared the queue;
Upgraded resources;
Docker compose down, docker compose up;
Wait a bit;
Restart face recognition, but API page is still blank 😦
Resource look fine

Ok try this
bring down all the container
then
docker volume rm docker_model-cache
and bring up the stack
it would trigger the redownload of the modelThe rm was after a docker compose down, right?
I get:
Error response from daemon: get docker_model-cache: no such volume
Correct
try
docker volume ls
I had to remove memory limit and now it's happy.
Good to know, thank you!
can you rm the
immich-docker_model-cache
and immich_model-cache
volume?
You are running on k8s right?
From my testing, the machine learning container will need at least 5GB of RAM to work correctlyCorrect, and limits of 1 nor 2GB of RAM was enough for ml.
So, I did:
docker volume rm immich-docker_model-cache
docker volume rm immich_model-cache
docker compose up -d
Restarted the job, but still empty API pageWhat is the machine learning log say?
hold on, something is coming up
API starts to display data
Ok now should see something in the explore page
yes, I do indeed!
So in your case I suspect the corrupted model when it was downloading
and CPU is working much harder now 😉

yeah it should be 😄
Anyway, thanks for the support! (and the great app by the way)
No problem! Enjoy!
Hi @Alex, so are you saying that Immich would need 5GB+ RAM?
Yes
The machine learning would use almost 4GB of RAM when loading all the models to the memory
That's good to know. I think docs mention 2GB and I was just about to get a 4GB VPS or low end dedicated server.

Yes it was outdated, I forgot to update
Does ML keep the models in memory all the time? Would I get away with 4 GB and some HDD swapping?
It is right now, since we haven't figured out the best way to unload it from RAM yet
What's the command to see that
My RAM usage has been pretty stable around 3 GB all night.
Still 50% of my 45.000 asssets to be recognized, but it's moving on (it takes much more time when it actually does something :-))

yeah 3CPU will take a wile
I only have 4 and don't want to strangle the other apps no my machine. No problem, it's only the first catch up that will be painful, daily uploads will go unnoticed I am sure.
First resulsts seem promising, but I let him finish its job before playing around
Correct
I had a similar issue as the OP when I first upgraded and ran Facial Recognition on all photos. I don't know if it's related or not, but wanted to share my experience of what I saw. Since I was running the full job, there were 3 tasks going on in parallel, which meant it was downloading the model from Github in each of those processes, and then once it was downloaded I assume it attempted to then use it.
My question is, would downloading and loading the model in parallel like that cause issues and contention? I feel like that could cause collisions with the loaded model state.
In my case, as soon as it downloaded them model and I assume load and try and use it, it starting throwing many errors. I immediately paused the job from the admin page and then cleared the jobs. I then restarted the Machine Learning service and tried running the job for all items again. This time since the model was already downloaded, I didn't receive any error and it completed successfully.