NAS completely froze overnight

Hi! I'm running immich on my DS918+ with 8GB RAM and tonight the entire system completely froze around 5AM and I had to hard restart it. I already had a crash like this when I started using immich, and then I limited all jobs to only run 1 parallel, and restricted the CPU cores each container can use a bit to not hammer the CPU as much. The RAM usage is around 35% usually. unfortunately the NAS did not manage to save any RAM usage charts for this night (probably due to the crash/hang..) the only pointer I have to why this could maybe happen is the immich_server logs: (in next message)
26 Replies
Immich
Immich2w ago
:wave: Hey @Thunder, Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:. References - Container Logs: docker compose logs docs - Container Status: docker ps -a docs - Reverse Proxy: https://immich.app/docs/administration/reverse-proxy - Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA Checklist I have... 1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time). 2. :blue_square: read applicable release notes. 3. :blue_square: reviewed the FAQs for known issues. 4. :blue_square: reviewed Github for known issues. 5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy). 6. :ballot_box_with_check: uploaded the relevant information (see below). 7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable (an item can be marked as "complete" by reacting with the appropriate number) Information In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider: - Your docker-compose.yml and .env files. - Logs from all the containers and their status (see above). - All the troubleshooting steps you've tried so far. - Any recent changes you've made to Immich or your system. - Details about your system (both software/OS and hardware). - Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h). - The version of the Immich server, mobile app, and other relevant pieces. - Any other information that you think might be relevant. Please paste files and logs with proper code formatting, and especially avoid blurry screenshots. Without the right information we can't work out what the problem is. Help us help you ;) If this ticket can be closed you can use the /close command, and re-open it later if needed.
Thunder
ThunderOP2w ago
logs for immich_server container for that point in time:
...
2025/05/11 10:27:17 stdout Starting microservices worker
2025/05/11 10:27:17 stdout Starting api worker
2025/05/11 10:27:04 stdout Detected CPU Cores: 4
2025/05/11 10:27:04 stdout Initializing Immich v1.132.3
2025/05/11 04:43:35 stderr Killing api process
2025/05/11 04:43:35 stderr microservices worker exited with code 1
2025/05/11 04:41:55 stderr at GetAddrInfoReqWrap.onlookupall [as oncomplete] (node:dns:120:26)
2025/05/11 04:41:55 stderr microservices worker error: Error: getaddrinfo EAI_AGAIN database, stack: Error: getaddrinfo EAI_AGAIN database
2025/05/11 02:01:36 stdout [Nest] 7 - 05/11/2025, 2:01:36 AM  LOG [Microservices:BackupService] Database Backup Success
2025/05/11 02:00:00 stdout [Nest] 7 - 05/11/2025, 2:00:00 AM  LOG [Microservices:BackupService] Database Backup Starting. Database Version: 14
2025/05/11 00:00:05 stderr at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
2025/05/11 00:00:05 stderr at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
2025/05/11 00:00:05 stderr at async EventRepository.onEvent (/usr/src/app/dist/repositories/event.repository.js:126:13)
2025/05/11 00:00:05 stderr at async JobService.onJobStart (/usr/src/app/dist/services/job.service.js:166:28)
2025/05/11 00:00:05 stderr at async MediaService.handleGenerateThumbnails (/usr/src/app/dist/services/media.service.js:103:25)
2025/05/11 00:00:05 stderr at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2025/05/11 00:00:05 stderr ...
...
...
2025/05/11 10:27:17 stdout Starting microservices worker
2025/05/11 10:27:17 stdout Starting api worker
2025/05/11 10:27:04 stdout Detected CPU Cores: 4
2025/05/11 10:27:04 stdout Initializing Immich v1.132.3
2025/05/11 04:43:35 stderr Killing api process
2025/05/11 04:43:35 stderr microservices worker exited with code 1
2025/05/11 04:41:55 stderr at GetAddrInfoReqWrap.onlookupall [as oncomplete] (node:dns:120:26)
2025/05/11 04:41:55 stderr microservices worker error: Error: getaddrinfo EAI_AGAIN database, stack: Error: getaddrinfo EAI_AGAIN database
2025/05/11 02:01:36 stdout [Nest] 7 - 05/11/2025, 2:01:36 AM  LOG [Microservices:BackupService] Database Backup Success
2025/05/11 02:00:00 stdout [Nest] 7 - 05/11/2025, 2:00:00 AM  LOG [Microservices:BackupService] Database Backup Starting. Database Version: 14
2025/05/11 00:00:05 stderr at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
2025/05/11 00:00:05 stderr at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
2025/05/11 00:00:05 stderr at async EventRepository.onEvent (/usr/src/app/dist/repositories/event.repository.js:126:13)
2025/05/11 00:00:05 stderr at async JobService.onJobStart (/usr/src/app/dist/services/job.service.js:166:28)
2025/05/11 00:00:05 stderr at async MediaService.handleGenerateThumbnails (/usr/src/app/dist/services/media.service.js:103:25)
2025/05/11 00:00:05 stderr at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2025/05/11 00:00:05 stderr ...
...
Thunder
ThunderOP2w ago
docker-compose file is here note that I also run immich-power-tools for a few weeks now but I don't see much there in the logs that points to errors
Thunder
ThunderOP2w ago
also notable: the system log of the nas from the time of the hang:
...
Info System 2025/05/11 10:25:46 SYSTEM System started to boot up.
Error System 2025/05/11 04:55:38 SYSTEM System failed to get External IP.
Info System 2025/05/11 04:36:02 SYSTEM USB disk [1] woke up from hibernation.
...
...
Info System 2025/05/11 10:25:46 SYSTEM System started to boot up.
Error System 2025/05/11 04:55:38 SYSTEM System failed to get External IP.
Info System 2025/05/11 04:36:02 SYSTEM USB disk [1] woke up from hibernation.
...
what would the next steps be to try and find out where this system hang could come from, maybe it's not even related to immich? (but I only had 2 crashes like this in the time when I had immich on this system)
NoMachine
NoMachine2w ago
looks like it lost connection to the DB around that time, do you see anything in your postgres logs?
bbrendon
bbrendon2w ago
immich can be a resource hog when loading it up with images. check your memory limits on immich to make sure it doesn't crash your system. maybe you need more memory .
Thunder
ThunderOP2w ago
the logs for that container don't contain much unfortunately:
immich_postgres
date stream content
2025/05/11 10:27:05 stderr 2025-05-11 08:27:05.721 UTC [1] HINT: Future log output will appear in directory "log".
2025/05/11 10:27:05 stderr 2025-05-11 08:27:05.721 UTC [1] LOG: redirecting log output to logging collector process
2025/05/11 10:27:05 stdout
2025/05/11 10:27:05 stdout PostgreSQL Database directory appears to contain a database; Skipping initialization
2025/05/11 10:27:05 stdout
2025/05/07 19:41:09 stderr 2025-05-07 17:41:09.137 UTC [1] HINT: Future log output will appear in directory "log".
2025/05/07 19:41:09 stderr 2025-05-07 17:41:09.137 UTC [1] LOG: redirecting log output to logging collector process
2025/05/07 19:41:08 stdout
2025/05/07 19:41:08 stdout PostgreSQL Database directory appears to contain a database; Skipping initialization
2025/05/07 19:41:08 stdout
immich_postgres
date stream content
2025/05/11 10:27:05 stderr 2025-05-11 08:27:05.721 UTC [1] HINT: Future log output will appear in directory "log".
2025/05/11 10:27:05 stderr 2025-05-11 08:27:05.721 UTC [1] LOG: redirecting log output to logging collector process
2025/05/11 10:27:05 stdout
2025/05/11 10:27:05 stdout PostgreSQL Database directory appears to contain a database; Skipping initialization
2025/05/11 10:27:05 stdout
2025/05/07 19:41:09 stderr 2025-05-07 17:41:09.137 UTC [1] HINT: Future log output will appear in directory "log".
2025/05/07 19:41:09 stderr 2025-05-07 17:41:09.137 UTC [1] LOG: redirecting log output to logging collector process
2025/05/07 19:41:08 stdout
2025/05/07 19:41:08 stdout PostgreSQL Database directory appears to contain a database; Skipping initialization
2025/05/07 19:41:08 stdout
can you explain what you mean with "check your memory limits on immich to make sure it doesn't crash your system."?
maybe you need more memory .
unfortunately 8GB is the max officially supported amount of RAM for this NAS
Zeus
Zeus2w ago
just FYI because of how docker works it should be impossible for the whole system to hang based on immich So likely it’s just getting massively overloaded or there’s a hardware fault
Thunder
ThunderOP2w ago
massively overloaded could be the case tbh I never really had issues with this NAS, but now with immich I managed to crash it twice also redis has a warning in the logs when booting up?
2025/05/11 10:27:05 stdout 1:M 11 May 2025 08:27:05.161 * Server initialized
2025/05/11 10:27:05 stdout 1:M 11 May 2025 08:27:05.161 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
2025/05/11 10:27:05 stdout 1:M 11 May 2025 08:27:05.161 * Running mode=standalone, port=6379.
2025/05/11 10:27:05 stdout 1:M 11 May 2025 08:27:05.160 * monotonic clock: POSIX clock_gettime
2025/05/11 10:27:05 stdout 1:M 11 May 2025 08:27:05.159 # Warning: no config file specified, using the default config. In order to specify a config file use valkey-server /path/to/valkey.conf
2025/05/11 10:27:05 stdout 1:M 11 May 2025 08:27:05.159 * Valkey version=8.1.0, bits=64, commit=00000000, modified=0, pid=1, just started
2025/05/11 10:27:05 stdout 1:M 11 May 2025 08:27:05.159 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
2025/05/11 10:27:05 stdout 1:M 11 May 2025 08:27:05.159 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
2025/05/11 04:40:57 stdout 1:M 11 May 2025 02:40:57.152 * Background saving terminated with success
2025/05/11 04:40:57 stdout 6278:C 11 May 2025 02:40:57.082 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
2025/05/11 04:40:57 stdout 6278:C 11 May 2025 02:40:57.082 * DB saved on disk
2025/05/11 04:40:57 stdout 1:M 11 May 2025 02:40:57.051 * Background saving started by pid 6278
2025/05/11 04:40:57 stdout 1:M 11 May 2025 02:40:57.051 * 100 changes in 300 seconds. Saving...
2025/05/11 10:27:05 stdout 1:M 11 May 2025 08:27:05.161 * Server initialized
2025/05/11 10:27:05 stdout 1:M 11 May 2025 08:27:05.161 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
2025/05/11 10:27:05 stdout 1:M 11 May 2025 08:27:05.161 * Running mode=standalone, port=6379.
2025/05/11 10:27:05 stdout 1:M 11 May 2025 08:27:05.160 * monotonic clock: POSIX clock_gettime
2025/05/11 10:27:05 stdout 1:M 11 May 2025 08:27:05.159 # Warning: no config file specified, using the default config. In order to specify a config file use valkey-server /path/to/valkey.conf
2025/05/11 10:27:05 stdout 1:M 11 May 2025 08:27:05.159 * Valkey version=8.1.0, bits=64, commit=00000000, modified=0, pid=1, just started
2025/05/11 10:27:05 stdout 1:M 11 May 2025 08:27:05.159 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
2025/05/11 10:27:05 stdout 1:M 11 May 2025 08:27:05.159 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
2025/05/11 04:40:57 stdout 1:M 11 May 2025 02:40:57.152 * Background saving terminated with success
2025/05/11 04:40:57 stdout 6278:C 11 May 2025 02:40:57.082 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
2025/05/11 04:40:57 stdout 6278:C 11 May 2025 02:40:57.082 * DB saved on disk
2025/05/11 04:40:57 stdout 1:M 11 May 2025 02:40:57.051 * Background saving started by pid 6278
2025/05/11 04:40:57 stdout 1:M 11 May 2025 02:40:57.051 * 100 changes in 300 seconds. Saving...
# WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
Zeus
Zeus2w ago
That doesn’t matter
Thunder
ThunderOP2w ago
kk
Zeus
Zeus2w ago
You could check the database logs which are in the logs folder of the database volume
Thunder
ThunderOP2w ago
yeah mean the ones in /immich/postgres/log folder? seems like some are empty there
Thunder
ThunderOP2w ago
No description
Thunder
ThunderOP2w ago
the latest logs look like this:
2025-05-11 08:27:05.721 UTC [1] LOG: starting PostgreSQL 14.10 (Debian 14.10-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2025-05-11 08:27:05.722 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2025-05-11 08:27:05.722 UTC [1] LOG: listening on IPv6 address "::", port 5432
2025-05-11 08:27:05.758 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-05-11 08:27:05.780 UTC [27] LOG: database system was interrupted; last known up at 2025-05-11 01:42:26 UTC
[2025-05-11T08:27:05Z INFO service::utils::clean] Find directory "pg_vectors/indexes/100440".
[2025-05-11T08:27:05Z INFO service::utils::clean] Find directory "pg_vectors/indexes/100442".
[2025-05-11T08:27:05Z INFO service::utils::clean] Find directory "pg_vectors/indexes/100440/segments/66016ac2-6cd7-4f79-95f9-6b119b043678".
2025-05-11 08:27:05.931 UTC [27] LOG: database system was not properly shut down; automatic recovery in progress
2025-05-11 08:27:05.981 UTC [27] LOG: redo starts at 1/1122D1E0
2025-05-11 08:27:05.981 UTC [27] LOG: invalid record length at 1/1122D2C8: wanted 24, got 0
2025-05-11 08:27:05.981 UTC [27] LOG: redo done at 1/1122D290 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2025-05-11 08:27:06.276 UTC [1] LOG: database system is ready to accept connections
[2025-05-11T08:27:06Z INFO service::utils::clean] Find directory "pg_vectors/indexes/100442/segments/f4744856-cfdc-40b9-bc44-7ad70a845bee".
2025-05-11 08:27:05.721 UTC [1] LOG: starting PostgreSQL 14.10 (Debian 14.10-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2025-05-11 08:27:05.722 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2025-05-11 08:27:05.722 UTC [1] LOG: listening on IPv6 address "::", port 5432
2025-05-11 08:27:05.758 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-05-11 08:27:05.780 UTC [27] LOG: database system was interrupted; last known up at 2025-05-11 01:42:26 UTC
[2025-05-11T08:27:05Z INFO service::utils::clean] Find directory "pg_vectors/indexes/100440".
[2025-05-11T08:27:05Z INFO service::utils::clean] Find directory "pg_vectors/indexes/100442".
[2025-05-11T08:27:05Z INFO service::utils::clean] Find directory "pg_vectors/indexes/100440/segments/66016ac2-6cd7-4f79-95f9-6b119b043678".
2025-05-11 08:27:05.931 UTC [27] LOG: database system was not properly shut down; automatic recovery in progress
2025-05-11 08:27:05.981 UTC [27] LOG: redo starts at 1/1122D1E0
2025-05-11 08:27:05.981 UTC [27] LOG: invalid record length at 1/1122D2C8: wanted 24, got 0
2025-05-11 08:27:05.981 UTC [27] LOG: redo done at 1/1122D290 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2025-05-11 08:27:06.276 UTC [1] LOG: database system is ready to accept connections
[2025-05-11T08:27:06Z INFO service::utils::clean] Find directory "pg_vectors/indexes/100442/segments/f4744856-cfdc-40b9-bc44-7ad70a845bee".
bbrendon
bbrendon2w ago
docker supports setting memory limits
Thunder
ThunderOP2w ago
I see, thanks. I'll look into it - though from the history of the resource monitor RAM usage doesn't seem to be very high though cpu also seems alright to me hm..
Thunder
ThunderOP2w ago
memort
No description
NoMachine
NoMachine2w ago
maybe try removing cpu_shares and cpuset, see what happens. I find it weird a container is hanging the whole system, but maybe synology implementation is bugged
Thunder
ThunderOP2w ago
cpu
No description
Thunder
ThunderOP2w ago
hm yeah that was me trying to get the system to be more responsive when immich is hammering it - my first crash was when I didn't add those settings and they managed to reduce CPU usage a bit and make the system more responsive again while ingesting images checklist for myself: - set sane memory limits for the containers... just in case - investigate cpu_shares and cpuset.. removing the settings / or alternatively leave a core completely free for the system and test that
NoMachine
NoMachine2w ago
what about IO?
Thunder
ThunderOP2w ago
volume utilization: volume1 is hard drives (with SSD read cache), volume2 is SSDs (media is stored on HDDs — thumbnails, database and so on on SSDs) note that DSM automatically kicked off data scrubbing after the hard reboot; which is why the usage was so high today also note: total system load, so reads from nightly backups are also shown here
No description
Zeus
Zeus2w ago
your database crashed / got OOM killed it seems that it was able to recover on startup
bbrendon
bbrendon2w ago
in general linux mostly hangs because of memory. synology is probably btrfs so that could be causing it to crash, though I've only seen generic linux kernels (not synology) crash from btrfs.
Thunder
ThunderOP2w ago
yup all volumes are set up as btrfs - so that sounds like what you are describing a bit is docker memory usage perhaps not reported properly in the synology activity monitor? I'll add some memory limits to my containers then.. are there some recommended values?

Did you find this page helpful?