Immich•2w ago

NAS completely froze overnight

Hi! I'm running immich on my DS918+ with 8GB RAM and tonight the entire system completely froze around 5AM and I had to hard restart it. I already had a crash like this when I started using immich, and then I limited all jobs to only run 1 parallel, and restricted the CPU cores each container can use a bit to not hammer the CPU as much. The RAM usage is around 35% usually. unfortunately the NAS did not manage to save any RAM usage charts for this night (probably due to the crash/hang..) the only pointer I have to why this could maybe happen is the immich_server logs: (in next message)

26 Replies

Immich•2w ago

:wave: Hey @Thunder, Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:. References - Container Logs: docker compose logs docs - Container Status: docker ps -a docs - Reverse Proxy: https://immich.app/docs/administration/reverse-proxy - Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA Checklist I have... 1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time). 2. :blue_square: read applicable release notes. 3. :blue_square: reviewed the FAQs for known issues. 4. :blue_square: reviewed Github for known issues. 5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy). 6. :ballot_box_with_check: uploaded the relevant information (see below). 7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable (an item can be marked as "complete" by reacting with the appropriate number) Information In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider: - Your docker-compose.yml and .env files. - Logs from all the containers and their status (see above). - All the troubleshooting steps you've tried so far. - Any recent changes you've made to Immich or your system. - Details about your system (both software/OS and hardware). - Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h). - The version of the Immich server, mobile app, and other relevant pieces. - Any other information that you think might be relevant. Please paste files and logs with proper code formatting, and especially avoid blurry screenshots. Without the right information we can't work out what the problem is. Help us help you ;) If this ticket can be closed you can use the /close command, and re-open it later if needed.

ThunderOP•2w ago

logs for immich_server container for that point in time:

...
2025/05/11 10:27:17    stdout    Starting microservices worker
2025/05/11 10:27:17    stdout    Starting api worker
2025/05/11 10:27:04    stdout    Detected CPU Cores: 4
2025/05/11 10:27:04    stdout    Initializing Immich v1.132.3
2025/05/11 04:43:35    stderr    Killing api process
2025/05/11 04:43:35    stderr    microservices worker exited with code 1
2025/05/11 04:41:55    stderr        at GetAddrInfoReqWrap.onlookupall [as oncomplete] (node:dns:120:26)
2025/05/11 04:41:55    stderr    microservices worker error: Error: getaddrinfo EAI_AGAIN database, stack: Error: getaddrinfo EAI_AGAIN database
2025/05/11 02:01:36    stdout    [32m[Nest] 7  - [39m05/11/2025, 2:01:36 AM [32m    LOG[39m [33m[Microservices:BackupService][39m [32mDatabase Backup Success[39m
2025/05/11 02:00:00    stdout    [32m[Nest] 7  - [39m05/11/2025, 2:00:00 AM [32m    LOG[39m [33m[Microservices:BackupService][39m [32mDatabase Backup Starting. Database Version: 14[39m
2025/05/11 00:00:05    stderr        at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
2025/05/11 00:00:05    stderr        at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
2025/05/11 00:00:05    stderr        at async EventRepository.onEvent (/usr/src/app/dist/repositories/event.repository.js:126:13)
2025/05/11 00:00:05    stderr        at async JobService.onJobStart (/usr/src/app/dist/services/job.service.js:166:28)
2025/05/11 00:00:05    stderr        at async MediaService.handleGenerateThumbnails (/usr/src/app/dist/services/media.service.js:103:25)
2025/05/11 00:00:05    stderr        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2025/05/11 00:00:05    stderr       ...
...

...
2025/05/11 10:27:17    stdout    Starting microservices worker
2025/05/11 10:27:17    stdout    Starting api worker
2025/05/11 10:27:04    stdout    Detected CPU Cores: 4
2025/05/11 10:27:04    stdout    Initializing Immich v1.132.3
2025/05/11 04:43:35    stderr    Killing api process
2025/05/11 04:43:35    stderr    microservices worker exited with code 1
2025/05/11 04:41:55    stderr        at GetAddrInfoReqWrap.onlookupall [as oncomplete] (node:dns:120:26)
2025/05/11 04:41:55    stderr    microservices worker error: Error: getaddrinfo EAI_AGAIN database, stack: Error: getaddrinfo EAI_AGAIN database
2025/05/11 02:01:36    stdout    [32m[Nest] 7  - [39m05/11/2025, 2:01:36 AM [32m    LOG[39m [33m[Microservices:BackupService][39m [32mDatabase Backup Success[39m
2025/05/11 02:00:00    stdout    [32m[Nest] 7  - [39m05/11/2025, 2:00:00 AM [32m    LOG[39m [33m[Microservices:BackupService][39m [32mDatabase Backup Starting. Database Version: 14[39m
2025/05/11 00:00:05    stderr        at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
2025/05/11 00:00:05    stderr        at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
2025/05/11 00:00:05    stderr        at async EventRepository.onEvent (/usr/src/app/dist/repositories/event.repository.js:126:13)
2025/05/11 00:00:05    stderr        at async JobService.onJobStart (/usr/src/app/dist/services/job.service.js:166:28)
2025/05/11 00:00:05    stderr        at async MediaService.handleGenerateThumbnails (/usr/src/app/dist/services/media.service.js:103:25)
2025/05/11 00:00:05    stderr        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2025/05/11 00:00:05    stderr       ...
...

ThunderOP•2w ago

docker-compose file is here note that I also run immich-power-tools for a few weeks now but I don't see much there in the logs that points to errors

docker-compose.yaml

ThunderOP•2w ago

also notable: the system log of the nas from the time of the hang:

...
Info    System    2025/05/11 10:25:46    SYSTEM    System started to boot up.
Error    System    2025/05/11 04:55:38    SYSTEM    System failed to get External IP.
Info    System    2025/05/11 04:36:02    SYSTEM    USB disk [1] woke up from hibernation.
...

...
Info    System    2025/05/11 10:25:46    SYSTEM    System started to boot up.
Error    System    2025/05/11 04:55:38    SYSTEM    System failed to get External IP.
Info    System    2025/05/11 04:36:02    SYSTEM    USB disk [1] woke up from hibernation.
...

what would the next steps be to try and find out where this system hang could come from, maybe it's not even related to immich? (but I only had 2 crashes like this in the time when I had immich on this system)

NoMachine•2w ago

looks like it lost connection to the DB around that time, do you see anything in your postgres logs?

bbrendon•2w ago

immich can be a resource hog when loading it up with images. check your memory limits on immich to make sure it doesn't crash your system. maybe you need more memory .

ThunderOP•2w ago

the logs for that container don't contain much unfortunately:

immich_postgres
date    stream    content
2025/05/11 10:27:05    stderr    2025-05-11 08:27:05.721 UTC [1] HINT:  Future log output will appear in directory "log".
2025/05/11 10:27:05    stderr    2025-05-11 08:27:05.721 UTC [1] LOG:  redirecting log output to logging collector process
2025/05/11 10:27:05    stdout    
2025/05/11 10:27:05    stdout    PostgreSQL Database directory appears to contain a database; Skipping initialization
2025/05/11 10:27:05    stdout    
2025/05/07 19:41:09    stderr    2025-05-07 17:41:09.137 UTC [1] HINT:  Future log output will appear in directory "log".
2025/05/07 19:41:09    stderr    2025-05-07 17:41:09.137 UTC [1] LOG:  redirecting log output to logging collector process
2025/05/07 19:41:08    stdout    
2025/05/07 19:41:08    stdout    PostgreSQL Database directory appears to contain a database; Skipping initialization
2025/05/07 19:41:08    stdout

immich_postgres
date    stream    content
2025/05/11 10:27:05    stderr    2025-05-11 08:27:05.721 UTC [1] HINT:  Future log output will appear in directory "log".
2025/05/11 10:27:05    stderr    2025-05-11 08:27:05.721 UTC [1] LOG:  redirecting log output to logging collector process
2025/05/11 10:27:05    stdout    
2025/05/11 10:27:05    stdout    PostgreSQL Database directory appears to contain a database; Skipping initialization
2025/05/11 10:27:05    stdout    
2025/05/07 19:41:09    stderr    2025-05-07 17:41:09.137 UTC [1] HINT:  Future log output will appear in directory "log".
2025/05/07 19:41:09    stderr    2025-05-07 17:41:09.137 UTC [1] LOG:  redirecting log output to logging collector process
2025/05/07 19:41:08    stdout    
2025/05/07 19:41:08    stdout    PostgreSQL Database directory appears to contain a database; Skipping initialization
2025/05/07 19:41:08    stdout

can you explain what you mean with "check your memory limits on immich to make sure it doesn't crash your system."?

maybe you need more memory .

unfortunately 8GB is the max officially supported amount of RAM for this NAS

Zeus•2w ago

just FYI because of how docker works it should be impossible for the whole system to hang based on immich So likely it’s just getting massively overloaded or there’s a hardware fault

ThunderOP•2w ago

massively overloaded could be the case tbh I never really had issues with this NAS, but now with immich I managed to crash it twice also redis has a warning in the logs when booting up?

2025/05/11 10:27:05    stdout    1:M 11 May 2025 08:27:05.161 * Server initialized
2025/05/11 10:27:05    stdout    1:M 11 May 2025 08:27:05.161 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
2025/05/11 10:27:05    stdout    1:M 11 May 2025 08:27:05.161 * Running mode=standalone, port=6379.
2025/05/11 10:27:05    stdout    1:M 11 May 2025 08:27:05.160 * monotonic clock: POSIX clock_gettime
2025/05/11 10:27:05    stdout    1:M 11 May 2025 08:27:05.159 # Warning: no config file specified, using the default config. In order to specify a config file use valkey-server /path/to/valkey.conf
2025/05/11 10:27:05    stdout    1:M 11 May 2025 08:27:05.159 * Valkey version=8.1.0, bits=64, commit=00000000, modified=0, pid=1, just started
2025/05/11 10:27:05    stdout    1:M 11 May 2025 08:27:05.159 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
2025/05/11 10:27:05    stdout    1:M 11 May 2025 08:27:05.159 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
2025/05/11 04:40:57    stdout    1:M 11 May 2025 02:40:57.152 * Background saving terminated with success
2025/05/11 04:40:57    stdout    6278:C 11 May 2025 02:40:57.082 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
2025/05/11 04:40:57    stdout    6278:C 11 May 2025 02:40:57.082 * DB saved on disk
2025/05/11 04:40:57    stdout    1:M 11 May 2025 02:40:57.051 * Background saving started by pid 6278
2025/05/11 04:40:57    stdout    1:M 11 May 2025 02:40:57.051 * 100 changes in 300 seconds. Saving...

2025/05/11 10:27:05    stdout    1:M 11 May 2025 08:27:05.161 * Server initialized
2025/05/11 10:27:05    stdout    1:M 11 May 2025 08:27:05.161 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
2025/05/11 10:27:05    stdout    1:M 11 May 2025 08:27:05.161 * Running mode=standalone, port=6379.
2025/05/11 10:27:05    stdout    1:M 11 May 2025 08:27:05.160 * monotonic clock: POSIX clock_gettime
2025/05/11 10:27:05    stdout    1:M 11 May 2025 08:27:05.159 # Warning: no config file specified, using the default config. In order to specify a config file use valkey-server /path/to/valkey.conf
2025/05/11 10:27:05    stdout    1:M 11 May 2025 08:27:05.159 * Valkey version=8.1.0, bits=64, commit=00000000, modified=0, pid=1, just started
2025/05/11 10:27:05    stdout    1:M 11 May 2025 08:27:05.159 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
2025/05/11 10:27:05    stdout    1:M 11 May 2025 08:27:05.159 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
2025/05/11 04:40:57    stdout    1:M 11 May 2025 02:40:57.152 * Background saving terminated with success
2025/05/11 04:40:57    stdout    6278:C 11 May 2025 02:40:57.082 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
2025/05/11 04:40:57    stdout    6278:C 11 May 2025 02:40:57.082 * DB saved on disk
2025/05/11 04:40:57    stdout    1:M 11 May 2025 02:40:57.051 * Background saving started by pid 6278
2025/05/11 04:40:57    stdout    1:M 11 May 2025 02:40:57.051 * 100 changes in 300 seconds. Saving...

# WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.

Zeus•2w ago

That doesn’t matter

ThunderOP•2w ago

Zeus•2w ago

You could check the database logs which are in the logs folder of the database volume

ThunderOP•2w ago

yeah mean the ones in /immich/postgres/log folder? seems like some are empty there

ThunderOP•2w ago

ThunderOP•2w ago

the latest logs look like this:

2025-05-11 08:27:05.721 UTC [1] LOG:  starting PostgreSQL 14.10 (Debian 14.10-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2025-05-11 08:27:05.722 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2025-05-11 08:27:05.722 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2025-05-11 08:27:05.758 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-05-11 08:27:05.780 UTC [27] LOG:  database system was interrupted; last known up at 2025-05-11 01:42:26 UTC
[2025-05-11T08:27:05Z INFO  service::utils::clean] Find directory "pg_vectors/indexes/100440".
[2025-05-11T08:27:05Z INFO  service::utils::clean] Find directory "pg_vectors/indexes/100442".
[2025-05-11T08:27:05Z INFO  service::utils::clean] Find directory "pg_vectors/indexes/100440/segments/66016ac2-6cd7-4f79-95f9-6b119b043678".
2025-05-11 08:27:05.931 UTC [27] LOG:  database system was not properly shut down; automatic recovery in progress
2025-05-11 08:27:05.981 UTC [27] LOG:  redo starts at 1/1122D1E0
2025-05-11 08:27:05.981 UTC [27] LOG:  invalid record length at 1/1122D2C8: wanted 24, got 0
2025-05-11 08:27:05.981 UTC [27] LOG:  redo done at 1/1122D290 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2025-05-11 08:27:06.276 UTC [1] LOG:  database system is ready to accept connections
[2025-05-11T08:27:06Z INFO  service::utils::clean] Find directory "pg_vectors/indexes/100442/segments/f4744856-cfdc-40b9-bc44-7ad70a845bee".

2025-05-11 08:27:05.721 UTC [1] LOG:  starting PostgreSQL 14.10 (Debian 14.10-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2025-05-11 08:27:05.722 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2025-05-11 08:27:05.722 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2025-05-11 08:27:05.758 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-05-11 08:27:05.780 UTC [27] LOG:  database system was interrupted; last known up at 2025-05-11 01:42:26 UTC
[2025-05-11T08:27:05Z INFO  service::utils::clean] Find directory "pg_vectors/indexes/100440".
[2025-05-11T08:27:05Z INFO  service::utils::clean] Find directory "pg_vectors/indexes/100442".
[2025-05-11T08:27:05Z INFO  service::utils::clean] Find directory "pg_vectors/indexes/100440/segments/66016ac2-6cd7-4f79-95f9-6b119b043678".
2025-05-11 08:27:05.931 UTC [27] LOG:  database system was not properly shut down; automatic recovery in progress
2025-05-11 08:27:05.981 UTC [27] LOG:  redo starts at 1/1122D1E0
2025-05-11 08:27:05.981 UTC [27] LOG:  invalid record length at 1/1122D2C8: wanted 24, got 0
2025-05-11 08:27:05.981 UTC [27] LOG:  redo done at 1/1122D290 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2025-05-11 08:27:06.276 UTC [1] LOG:  database system is ready to accept connections
[2025-05-11T08:27:06Z INFO  service::utils::clean] Find directory "pg_vectors/indexes/100442/segments/f4744856-cfdc-40b9-bc44-7ad70a845bee".

bbrendon•2w ago

docker supports setting memory limits

ThunderOP•2w ago

I see, thanks. I'll look into it - though from the history of the resource monitor RAM usage doesn't seem to be very high though cpu also seems alright to me hm..

ThunderOP•2w ago

memort

NoMachine•2w ago

maybe try removing cpu_shares and cpuset, see what happens. I find it weird a container is hanging the whole system, but maybe synology implementation is bugged

ThunderOP•2w ago

cpu

ThunderOP•2w ago

hm yeah that was me trying to get the system to be more responsive when immich is hammering it - my first crash was when I didn't add those settings and they managed to reduce CPU usage a bit and make the system more responsive again while ingesting images checklist for myself: - set sane memory limits for the containers... just in case - investigate cpu_shares and cpuset.. removing the settings / or alternatively leave a core completely free for the system and test that

NoMachine•2w ago

what about IO?

ThunderOP•2w ago

volume utilization: volume1 is hard drives (with SSD read cache), volume2 is SSDs (media is stored on HDDs — thumbnails, database and so on on SSDs) note that DSM automatically kicked off data scrubbing after the hard reboot; which is why the usage was so high today also note: total system load, so reads from nightly backups are also shown here

Zeus•2w ago

your database crashed / got OOM killed it seems that it was able to recover on startup

bbrendon•2w ago

in general linux mostly hangs because of memory. synology is probably btrfs so that could be causing it to crash, though I've only seen generic linux kernels (not synology) crash from btrfs.

ThunderOP•2w ago

yup all volumes are set up as btrfs - so that sounds like what you are describing a bit is docker memory usage perhaps not reported properly in the synology activity monitor? I'll add some memory limits to my containers then.. are there some recommended values?

Gaming

Programming

NAS completely froze overnight

Did you find this page helpful?