I am getting a database error after updating the docker compose stack

[Nest] 1 - 06/10/2023, 8:07:50 AM ERROR [TypeOrmModule] Unable to connect to the database. Retrying (2)... QueryFailedError: column "checksum" of relation "assets" contains null values at PostgresQueryRunner.query (/usr/src/app/node_modules/typeorm/driver/postgres/PostgresQueryRunner.js:211:19) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async RequireChecksumNotNull1684328185099.up (/usr/src/app/dist/apps/immich/libs/infra/src/migrations/1684328185099-RequireChecksumNotNull.js:10:9) at async MigrationExecutor.executePendingMigrations (/usr/src/app/node_modules/typeorm/migration/MigrationExecutor.js:225:17) at async DataSource.runMigrations (/usr/src/app/node_modules/typeorm/data-source/DataSource.js:260:35) at async DataSource.initialize (/usr/src/app/node_modules/typeorm/data-source/DataSource.js:148:17)
27 Replies
jrasm91
jrasm912y ago
GitHub
[BUG] "Unable to connect to the database" with the latest update 1....
The bug After the installation of the latest update (v1.58.0), my server is stuck in a bootloop. I had updated my compose with the new strings, but nothing works. The problem consists in two contai...
Foosinho
Foosinho2y ago
So I am having a very similar problem, but not exactly the same. The errors I'm getting after upgrading two days ago to the latest release. From my immich_server logs: QueryFailedError: null value in column "libraryId" of relation "assets" violates not-null constraint at PostgresQueryRunner.query (/usr/src/app/node_modules/typeorm/driver/postgres/PostgresQueryRunner.js:211:19) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async AddLocalDateTime1694525143117.up (/usr/src/app/dist/infra/migrations/1694525143117-AddLocalDateTime.js:10:9) at async MigrationExecutor.executePendingMigrations (/usr/src/app/node_modules/typeorm/migration/MigrationExecutor.js:225:17) at async DataSource.runMigrations (/usr/src/app/node_modules/typeorm/data-source/DataSource.js:260:35) at async DataSource.initialize (/usr/src/app/node_modules/typeorm/data-source/DataSource.js:148:17) From the immich_postgres logs: 2023-10-23 17:46:20.472 UTC [759] STATEMENT: UPDATE "assets" SET "localDateTime" = "fileCreatedAt" 2023-10-23 17:46:21.048 UTC [760] ERROR: null value in column "libraryId" of relation "assets" violates not-null constraint 2023-10-23 17:46:21.048 UTC [760] DETAIL: Failing row contains (43cd3093-06ab-413d-b48e-b1456bde49f7, 1000012859, 26c87ede-f893-405b-a784-d3b6bb4f9f70, 14827fda7b09786ff02c526019d56ec5d0e3b90c98ccbdf98817a8c6f26fd241, IMAGE, upload/library/26c87ede-f893-405b-a784-d3b6bb4f9f70/2023/2023-09..., upload/thumbs/26c87ede-f893-405b-a784-d3b6bb4f9f70/43cd3093-06ab..., 2023-09-25 15:13:55+00, 2023-09-25 15:13:58+00, f, 0:00:00.000000, upload/thumbs/26c87ede-f893-405b-a784-d3b6bb4f9f70/43cd3093-06ab..., null, \xe96960f2319262957a499fcfe111536b58b6acd3, t, null, 2023-09-25 15:17:45.27373+00, 2023-09-25 15:17:41.406338+00, f, PXL_20230925_151355856, null, f, \x1608121502fa886876a6788789887778c360170c45, f, null, f, null, 2023-09-25 15:13:55+00). Ideas? Didn't want to open a bug unless necessary.
jrasm91
jrasm912y ago
If you restart the immich-server container does it still happen?
Foosinho
Foosinho2y ago
Yep. Tried re-pulling everything and that doesn't work. Tried to modify the DB procedure mentioned in the referenced bug, and "libraryId" is not a valid column?
jrasm91
jrasm912y ago
Can you try running select * from "assets" where "libraryId" is null;?
Foosinho
Foosinho2y ago
Interesting: immich=# select * from "assets" where "libraryId" is null; ERROR: unexpected chunk number 1 (expected 0) for toast value 41849 in pg_toast_2619
jrasm91
jrasm912y ago
what is pg_toast?
Alex Tran
Alex Tran2y ago
do you know which version did you upgrade from?
Foosinho
Foosinho2y ago
Dunno. Doing some searching it appears I might need to reindex.
jrasm91
jrasm912y ago
Yeah, I'm seeing the same thing.
Foosinho
Foosinho2y ago
I'm not sure, Alex. Let me poke around and see if I can reconstruct that. I don't miss too many updates, but usually it takes me opening the mobile client to be prompted and I don't do that all the time.
jrasm91
jrasm912y ago
It sounds like there might be some invalid data somewhere. In the postgres index or something.
Foosinho
Foosinho2y ago
Well, I can confirm I'm running 1.82.1. I thought I was on a 1.81 version, but I'm not positive. In fact, looking at the tags and the lag in photo uploads it might have been a 1.7x. But I'm not sure how to tell for certain. I'm going to pull a backup of the VM image to see what version I was on. I was indeed on 1.81.1. And I'm running the VM as it was right before I updated, so I can attempt to do a different upgrade path than just pulling latest if that's desired. I still have the updated VM as well. Spun up a new one to pull the backup on to.
jrasm91
jrasm912y ago
The error you have is very strange. Can you try just shutting everything down, then just start the database and immich-server containers and see if it is successful this time?
Foosinho
Foosinho2y ago
OK, this is super interesting! I'm getting the exact same pg_toast error in the restored VM. The DB container volume is not on the NAS share where the photos are stored, so it was definitely restored to pre-update. All of the containers are running; after upgrading the server and... microservices(?) were in a reboot loop. Everything up and running now, but I'm getting these unexpected chunk number errors on lots of actions and the errors were showing up when the machine backup image was taken. 2023-10-15 06:38:05.727 UTC [13338] ERROR: unexpected chunk number 1 (expected 0) for toast value 41849 in pg_toast_2619 2023-10-15 06:38:05.727 UTC [13338] CONTEXT: automatic analyze of table "immich.public.assets" I'm going to take everything down and then just pull up the DB and server containers and see what we get Database container comes up clean: PostgreSQL Database directory appears to contain a database; Skipping initialization 2023-10-23 20:31:28.972 UTC [1] LOG: starting PostgreSQL 14.9 (Debian 14.9-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit 2023-10-23 20:31:28.972 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432 2023-10-23 20:31:28.972 UTC [1] LOG: listening on IPv6 address "::", port 5432 2023-10-23 20:31:29.603 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2023-10-23 20:31:30.049 UTC [27] LOG: database system was shut down at 2023-10-23 20:30:45 UTC 2023-10-23 20:31:30.941 UTC [1] LOG: database system is ready to accept connections Server comes up and complains a lot about not finding the redis server. No log messages about database problems; no more entries in the DB logs.
jrasm91
jrasm912y ago
Right, but this I think this is saying something about data inconsistency, specifically about data missing from indices or something.
Alex Tran
Alex Tran2y ago
If you remove the redis container and bring it up along with the stack, does it show any error message?
jrasm91
jrasm912y ago
Seems like this is what is happening.
Foosinho
Foosinho2y ago
I agree. It seems pretty clear (now) that this is a data corruption issue and likely should've been a new case as it's not that similar underneath to the root of this one. Oops. Now the server is just complaining about not finding typesense. Launching the typesense container got the server to the point where it is listening for connections. Since it hasn't hit the database, no corruption log entries yet. I'm going to have to figure out how to check for this in my monitoring, or a least run automated DB backups.
jrasm91
jrasm912y ago
So, how do you do database backups? Is it a filesystem level back, like the whole vm?
Foosinho
Foosinho2y ago
Right now, entire VM. I do weekly backups on ProxMox. Those are... not great, when it comes to affordable cloud replication, so I'm starting to migrate my entire homelab stack to store docker volumes on the NAS, where I can snapshot them. Might also just do weekly dumps? Unsure at this point. I think I'd have to one-by-one restore old backups until I found a non-corrupt database (minimum 15 minutes to restore each backup on ProxMox, plus then testing time), which sounds exceptionally painful. So monitoring to catch corruption when it happens again would be nice. It seems like I would have seen this if I opened the client regularly, but I mostly just use it to back up my phone quietly and UptimeKuma did not notice this problem. Might be easier to just start from scratch and re-import my phone camera roll, recreate any albums, and import photos from other sources manually. I'm running this in parallel to Google Photos at the moment. Still, would like to find a way to fix this if possible.
Alex Tran
Alex Tran2y ago
Do you use Promox backup server?
Foosinho
Foosinho2y ago
I run per-VM/LXC backup tasks every week.
jrasm91
jrasm912y ago
This would not happen or be a problem with pgdumpall btw
Foosinho
Foosinho2y ago
And I didn't have to go back very far to find a backup of the LXC that didn't have a corrupt DB. Just two weeks. Running 1.79.1. Could log in via the web and see everything working, and the logs looked clean. Brought everything down, did a docker-compose pull, and that appears to have brought everything back up to a working 1.82.1. So I don't know exactly what caused the DB corruption, but it clearly wasn't the 1.82.1 upgrade. Perhaps it was the 1.81.1 upgrade and I just didn't notice it then? I think I'll do regular pgdumpall operations so I don't need to restore the entire VM from a backup for future corruption.
Foosinho
Foosinho2y ago
For completeness for anyone else who searches this, there is a pretty good guide on how to do this already in the documentation: https://immich.app/docs/administration/backup-and-restore

Did you find this page helpful?