Machine learning container start.sh failed
Hey everyone, I’ve encountered an issue with Immich and could use some help. The smart search feature is not working, and when I checked the logs for the machine learning container, I found the following errors:
[FATAL tini (7)] exec ./start.sh failed: No such file or directory
[FATAL tini (7)] exec ./start.sh failed: No such file or directory
[FATAL tini (6)] exec ./start.sh failed: No such file or directory
[FATAL tini (8)] exec ./start.sh failed: No such file or directory
I decided to repull the machine learning container (because I update the Immich server this way), but I hadn’t updated any of the non-server containers since the initial setup in January 2025. After repulling the ML container, it stopped starting altogether, and I’m still seeing the same error messages in the logs.
I’m not super experienced with containers, so I’m a bit lost here. ChatGPT suggested recreating the ML container, but I’m afraid I might break something even more. Does anyone have advice on what I should do next? Thanks in advance!
63 Replies
:wave: Hey @plasma,
Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:.
References
- Container Logs:
docker compose logs
docs
- Container Status: docker ps -a
docs
- Reverse Proxy: https://immich.app/docs/administration/reverse-proxy
- Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA
Checklist
I have...
1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time).
2. :ballot_box_with_check: read applicable release notes.
3. :ballot_box_with_check: reviewed the FAQs for known issues.
4. :ballot_box_with_check: reviewed Github for known issues.
5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy).
6. :blue_square: uploaded the relevant information (see below).
7. :blue_square: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable
(an item can be marked as "complete" by reacting with the appropriate number)
Information
In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider:
- Your docker-compose.yml and .env files.
- Logs from all the containers and their status (see above).
- All the troubleshooting steps you've tried so far.
- Any recent changes you've made to Immich or your system.
- Details about your system (both software/OS and hardware).
- Details about your storage (filesystems, type of disks, output of commands like fdisk -l
and df -h
).
- The version of the Immich server, mobile app, and other relevant pieces.
- Any other information that you think might be relevant.
Please paste files and logs with proper code formatting, and especially avoid blurry screenshots.
Without the right information we can't work out what the problem is. Help us help you ;)
If this ticket can be closed you can use the /close
command, and re-open it later if needed.You missed a breaking change from a while ago
Please make sure you have read and followed the release notes: https://github.com/immich-app/immich/discussions?discussions_q=label%3Achangelog%3Abreaking-change+sort%3Adate_created
How come? I started with v1.124
I don't see any breaking changes afterwards.
Hmm
Can you post the info requested by the bot?
Where did you get your compose file @plasma ?
I guess from the instructions that I followed (mariushosting.com)
Marius likes to do things differently for whatever reason, best post your compose here and we'll see what the way forward is
Actually looks fine to me, pulling all containers should get it back up
I recreated all of the four containers with the option "re-pull image".
But the ML container still refuses to start with the same log lines like above.
Alright, compose down, stop and delete the ML container, delete any cache volume it has (it's the docker volume). Delete the ML container image too.
Then re-pull the ML image
Be sure that you don't delete anything related to UPLOAD_LOCATION or DB_DATA_LOCATION
I did it like that. Removed the ML container, deleted the image and cache, repulled the image and created the container new.
The container now starts but is 'unhealthy'. The Immich-SERVER doesn't seem to reach it (fetch failed).
unhealthy is a known issue, it's actually the healthcheck that is missing
But the not reaching it isn't the know issue 🙃
Alright. That's good and not good 🫣
In the ML container the log tells me the following:
[03/31/25 22:09:02] INFO Starting gunicorn 23.0.0
[03/31/25 22:09:02] INFO Listening at: http://[::]:3003 (8)
[03/31/25 22:09:02] INFO Using worker: immich_ml.config.CustomUvicornWorker [03/31/25 22:09:02] INFO Booting worker with pid: 9
[03/31/25 22:09:06] INFO Started server process [9]
[03/31/25 22:09:06] INFO Waiting for application startup.
[03/31/25 22:09:06] INFO Created in-memory cache with unloading after 300s of inactivity.
[03/31/25 22:09:06] INFO Initialized request thread pool with 4 threads.
[03/31/25 22:09:06] INFO Application startup complete.
[03/31/25 22:09:02] INFO Listening at: http://[::]:3003 (8)
[03/31/25 22:09:02] INFO Using worker: immich_ml.config.CustomUvicornWorker [03/31/25 22:09:02] INFO Booting worker with pid: 9
[03/31/25 22:09:06] INFO Started server process [9]
[03/31/25 22:09:06] INFO Waiting for application startup.
[03/31/25 22:09:06] INFO Created in-memory cache with unloading after 300s of inactivity.
[03/31/25 22:09:06] INFO Initialized request thread pool with 4 threads.
[03/31/25 22:09:06] INFO Application startup complete.
What's the ML URL in your admin settings?
and the fetch error is from the immich-server container I assume?
The full lines:
[Nest] 8 - 04/01/2025, 12:12:50 AM WARN [Microservices:MachineLearningRepository] Machine learning request to "http://immich-machine-learning:3003" failed: fetch failed
[Nest] 8 - 04/01/2025, 12:12:50 AM ERROR [Microservices:{"id":"241293c4-0dc0-48d4-8681-522d67143b06"}] Unable to run job handler (smartSearch/smart-search): Error: Machine learning request '{"clip":{"visual":{"modelName":"ViT-B-32__openai"}}}' failed for all URLs
I have to leave now but do you mind telling about your platform a bit? Like VM, host stuff like that
@Mraedis @bo0tzz I have the same issue and get the same errors since today as @plasma and yeah, I think I definitely missed a breaking change.
I get my Compose from Cosmos' Market (Cosmos is an All-in-One self hosted service) https://azukaar.github.io/cosmos-servapps-official/servapps/Immich/cosmos-compose.json
Here is the Compose for Immich from Cosmos
Is there a way for me to re-do the breaking change (could you maybe tell me which one, if you know) or do I have to completely reinstall Immich?
Below the top
{/if}
there is a ]
missing a ,
after it, so turn it into:
@Noah151
This has nothing to do with any of this topic though.@Mraedis Thanks for the quick reply!
Oh uhm, may I ask why? Because I have the exact same problem as OP
My machine learning container keeps restarting with [FATAL tini (7)] exec ./start.sh failed: No such file or directory since today
There is no breaking change this version 👀
And your error message was
SyntaxError: JSON.parse: expected ',' or ']' after array element at line 16 column 7 of the JSON data
Sorry for not being clear, the syntax error is unrelated to my issue
I just pasted it here because you asked @plasma for the source of his Immich compose too
this is my issue
Ok but other than parsing the syntax I/we don't know anything about Cosmos
Okay, yeah I see
There are issues with nvidia hardware acceleration in the latest release, but I haven't seen the start.sh error anywhere else
But if we ignore Cosmos for a second:
Do you know whether it is possible to implement the breaking change which leads to the error according to@bo0tzz now or if I have to completely reinstall my instance?
ah okay, now I am confused haha
bo0tzz meant a fairly old change, if you went through cosmos you definitely don't have it (and I can see from the compose that you don't)
But I think your issue might be the container image pulled badly?
I am using Immich since almost 2 years, so I think I could definitely have missed it
Ah wait, my bad. I didn't post my actual Cosmos compose (Cosmos own implementation of Docker Compose) but I posted the Compose Compose you would get when pulling Immich from the Cosmos market right now
One sec, I'll paste my compose which is actually currently in use
Here
"entrypoint": "tini -- /bin/bash",But what's this?
"org.opencontainers.image.version": "v1.106.4"
Oh that's just a label, nevermind
This entire JSON reads like it needs an update badlyhaha oh
Did it build from source? So confused
now, I don't think it did
I thought the line "command": "start.sh", is the culprit
Cosmos does a lot of stuff in the background for the user, its just "find app in Cosmos' Market", "click install"
I'll take a look, thanks :))
It's not much, but that's the change bo0tzz meant
Looks like Cosmos fixed that in release 1.106.0 ...
https://github.com/azukaar/cosmos-servapps-official/commit/e3c304201e7b018f5c34f0b6107096a2b81cbda8
👀

Ah okay, thanks for finding it
I guess my Cosmos Compose didn't update for some reason
So do you think just making those changes to my Compose manually should fix my issue?
Were you actually running 1.130 ?
or 1.129 or any recent change 😛
yeah

If the database user/pass/location and upload location are the same, everything should ™️ work
Man, I am confused haha
About what I need to do now
Let me rephrase, immich doesn't care what happened to/in the containers, you could wipe it all, and just point a freshly installed operating system/docker+immich to the old folders and it should work (it probably won't because postgres won't like all those changes, but docker will be fine)
I really have to wonder whether you're not actually somehow running 2 installs though :
I hope I am not 😬
Okay, thx
First things first, do backups work for you
Because they were somewhat recently introduced, but with your compose weirdness... 👀
Should be over at UPLOAD_LOCATION/backups
Yeah, I think they do
Maybe I should just completely reinstall Immich and just restore one of my backups?
Ok just to be clear here, the backups are database dumps only, no images 😛
don't go wiping the image folder
Does cosmos allow you to stop the individual containers? You'll need only the DB running for a proper restore
yes, it does!
Figured I'd post here for anyone else looking but it seems to be an issue exclusive to cosmos and I'll post here if I find a solution :p
Ahh okay, thanks a lot for the info, I'd be really glad for an update if you can figure out a fix 🙂
Hopefully I can figure something out tonight :p
aight everyone on cosmos having issues go to your compose in the machine learning container and change the line with "command": to this:
@Noah151 @plasma @Gu11master ^
https://discord.com/channels/1083875833824944188/1083875835741741096/1356760927873138799
i think he said he had to remove some other stuff that had to do with start.sh but i only had to change the command line from start.sh to this python command. im on the cuda image so idk if theres much of a difference
Eyy, thanks so much, y'all!!
That fixed it! Just had to change the command line in the compose for the ML container to
"command": "python -m immich_ml",
like @Edge said :))glad to hear its working now :3
In portainer I had already
'python' '-m' 'immich_ml'
in. I changed it to 'python -m immich_ml'
-> the container is now healthy
Thanks for the help!
But my Immich-SERVER can't reach the ML container still:
[Nest] 6 - 04/02/2025, 5:14:41 PM WARN [Microservices:MachineLearningRepository] Machine learning request to "http://immich-machine-learning:3003" failed: fetch failed
[Nest] 6 - 04/02/2025, 5:14:41 PM ERROR [Microservices:{"id":"8f6ef92f-8a95-4897-b337-c9121656093b"}] Unable to run job handler (smartSearch/smart-search): Error: Machine learning request '{"clip":{"visual":{"modelName":"ViT-B-32__openai"}}}' failed for all URLs
I think the healthy part is with update 1.131.3 which fixed that @plasma 😛
As for the request, what do you get when you
curl http://IP-for-docker-host:3003
from outside your docker host?