Corrupt database - any options? (Solution: use the backups Immich already saves by default)
Evidently my DB got corrupted at some point in the last two months, and now I'm stuck trying to figure out if there's any hope of recovery or if I need to rebuild from scratch
I was previously on I think 1.126, and had tried to update to 1.129 over the weekend, only to find that the main container wouldn't finish connecting to the DB container, and that the DB container logs showed it had been failing to fix some issues for quite some time. (The app itself never notified me that it couldn't connect to the server, that may be a separate feature request)
Logs in the thread:
113 Replies
:wave: Hey @HashtagOctothorp,
Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:.
References
- Container Logs:
docker compose logs
docs
- Container Status: docker ps -a
docs
- Reverse Proxy: https://immich.app/docs/administration/reverse-proxy
- Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA
Checklist
I have...
1. :blue_square: verified I'm on the latest release(note that mobile app releases may take some time).
2. :blue_square: read applicable release notes.
3. :blue_square: reviewed the FAQs for known issues.
4. :blue_square: reviewed Github for known issues.
5. :blue_square: tried accessing Immich via local ip (without a custom reverse proxy).
6. :blue_square: uploaded the relevant information (see below).
7. :blue_square: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable
(an item can be marked as "complete" by reacting with the appropriate number)
Information
In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider:
- Your docker-compose.yml and .env files.
- Logs from all the containers and their status (see above).
- All the troubleshooting steps you've tried so far.
- Any recent changes you've made to Immich or your system.
- Details about your system (both software/OS and hardware).
- Details about your storage (filesystems, type of disks, output of commands like fdisk -l
and df -h
).
- The version of the Immich server, mobile app, and other relevant pieces.
- Any other information that you think might be relevant.
Please paste files and logs with proper code formatting, and especially avoid blurry screenshots.
Without the right information we can't work out what the problem is. Help us help you ;)
If this ticket can be closed you can use the /close
command, and re-open it later if needed.Do you have storage-level snapshots from (over) 2mo ago?
There might be an option to roll back changes on Synology's software side.
Logs on the main container that clued me into it:
The DB docker container was also missing an IP, so I started digging into the logs, pulling a few interesting ones:
Most of the logs from the last several weeks look like the 03-09 logs, with these missing file logs.
Then the first notice of an error: 03-15 with the issue
No I don't have any storage-level backups of the container
I'm not exactly sure how experienced you are. How did you determine the db to be corrupted?
All the subsequent logs
Did you in any way touch the setup during the last 2-3 days?
Minus the "rebuild stack" option, attempting to get the latest version, no
Rebuild the stack is a synology-specific thing? hm
No, portainer

If you don’t have a backup the only option is to do a full re upload of all images
Can reuse the images already in the location?
You can re upload them or add an external library
Are there docs for recommendations on backup patterns?
My 2 clues is networking problems between immich and db container and possible incompatabilities between immich and db software versions, and/or incomplete migrations.
My gut says Immich was probing the db, it failed a healthcheck and the db got in to a restart loop.
Either fs-level backups work fine, or Immich has built-in db (metadata-only) dumping (which you must combine with images from the uploads dir).
https://immich.app/docs/administration/backup-and-restore/
(in addition) Keeping track of container SHAs/versions is usually not needed, but increases sanity.
there was one log that seemed interesting too:
maybe it got shutdown mid-upgrade or something?
Not sure how I might have done that though, I never forced-shutdown any of the containers.
lemme look it on a computer
might be the pg-vect sha is pinned to an old version
Prior to any LLM-prompted troubleshooting, I duplicated the postgres folder, hoping that would help any recovery efforts, if that helps.
I have re-initialized with an empty folder to speed up my potential future of "manually recreate/reupload everything", and will be more diligent about backups moving forward.
meaning you want no further help and will recreate?
You can backup the dumps however you want. Such as Borg or restic
No, I'd like to recover if possible
You should keep at least a years worth, with interval pruning
(Using the old data, not newly-initted), can you run only the database container with the healthcheck test removed?
with the goal being getting the database up before asking it to do anything (database system is ready to accept connections)
checking.
if it is in anything close to being up, you can probably dump it (portainer is docker afaik? somewhere
docker compose exec database pg_dump immich > olddump.db
) https://www.postgresql.org/docs/current/backup-dump.htmlPostgreSQL Documentation
25.1. SQL Dump
25.1. SQL Dump # 25.1.1. Restoring the Dump 25.1.2. Using pg_dumpall 25.1.3. Handling Large Databases The idea behind this dump method …
actually, Q: how would I remove the healthcheck on the existing container?
modifying the compose file, commenting the 'healthcheck:' portion out
ah, then recreating the stack
although disclaimer, I don't know anything about portainer
according to the screenshot above, recreating the stack sounds like docker compose pull && docker compose up -d
If you can modify the compose file in-between, then I guess, yes
yeah, that button is on the editor page
remove the entire healthcheck section?
yes, you can just comment it out (adding
#
to the beginning of the lines)
while you are at it you can probably comment out the immich: (container) section as well, which would ensure that immich doesn't try to access it meanwhilethis part?

yes
worth adding DB_SKIP_MIGRATIONS=true while I'm in here?
doesn't matter right now, since immich won't be running anyway (and thus will not migrate)
but to ask again, you are sure you have 0 snapshots for containers?
that sounds like an unreasonable default to have on synology, the set it and everything backed up appliane (:
I setup the stack through portainer
and probably don't have any backup stuff setup
herp derp
and the folder locations are on-disk rather than docker volume
yeah, but on-disk where?
on the synology NAS, that's running the containers
I'd expect the whole synology fs to be snapshotted (no clue where you'd check-restore backups, haven't had the need to touch synology either)
hm
let me check
if you're talking this: ... I don't have it setup

I don't think synology sets this up during setup by default. Either that or I missed it.
what, are synologyes non-resilient to ransomware by default?
seems like. Which is ironic, since being ransomwared (long story from a younger, dumber me) was the primary push to by the NAS in teh first place
anything under 'backup tasks'?
the docs for synology are quite questionable asw, at least assuming I don't have a synology to play around with
can't find anything re backup
Hyper Backup and Snapshot Replication
¯\_(ツ)_/¯
also, not installed

:^)
🤡
that's solid dumb that there is no snapshots by default at all on any place
ok, well
let's see if the db container is alive on some terms
ah yes. TLDR, prob not
previously I'd tried setting a
recovery.signal
file, as per AI debugging recommendation. It didn't work.well, there's probably not much recovery unless you go surgical :)
can you access /volume1/immich/postgres from a shell somewhere?
do you know what a shell is?
yeppers
I have SSH keys setup to the NAS, don't worry
the internet says that you may be able to recover with partial metadata loss and/or broken state with
pg_resetwal volume1/immich/postgres
for pg_resetwal you have to have postgres (or only the tools) installed on the hostI think there's a pg package I can install
but the risk is we end up with an immich instance which will break down the road
probably the main things you have are album organizations? otherwise just recreating the instance and re-uploading (with say immich-go) all the files from the old instance's upload directory
album was one of them, yeah.
oh wait
do you have the app
yep
or can open it in a browser you have opened it before
it might have offline date still
so you could list what you have in albums
although especially on a phone, it is probably very inconvenient
OR if we (doesn't hurt) get the db online (even if it doesn't work with Immich) we can do it much more conveniently
as in, use the phone app to recreate the albums while reimporting the images from the folder?
as in, use the phone app to view what the server used to be (the phone doesn't know the server is dead, it has cached its state, at least filenames, in some amount, probably)
but try the pg_resetwal thing
see what pg thinks of it
and it'd be smart, while running the resetwal command, to make sure the db container isn't running (and restarting) at the same time
oh neat... it seems postgress is already native on the nas OS.
do you think the version matters?
pg_resetwal (PostgreSQL) 11.11
Holy yes do not use that
lmao
welp. presumably I could copy this folder anywhere run the commands, and pull it back? going to make a linux vm
gotta do some real work tho, will pick this up tonight
or you can modify the docker container start command (entrypoint is what you need)
so that the container just starts without starting postgres in it
command part, commented out... then get into the container shell?
yes but no
if you comment it out, it will be the default command and still start pg
ah hmm
set command to 'sleep infinity' for example
or straight to the pg_resetwal /var/lib/postgresql/data and make sure restart is "no" (with quotes)
sleep worked. I'm in the container
hmm. know the pg user by chance?
immich
pg_resetwal -U immich ...
(the user is set in your env from above)
invalid option -- 'U'
oh, as the command
hmm...
hmm, weird
ah wait
maybe?
internet says
gosu immich pg_resetwal /var/lib/postgresql/data
still no user immich
I think immich is the PG user, not the OS user
there's a postgres user tho
gunna try that
sorry, I didn't double-check
above is DB_USERNAME=postgres, not immich :^)
lmao
that's what's in all the default .env files no?
so using 'postgres' instead of 'immich' here then
yep
Write-ahead log reset
🤞I have mine to totally different from a old habit to prevent multiple dbses getting mixed up between multiple instances ¯\_(ツ)_/¯
well, try booting the database container up
makes sense
recreating with default command, one sec, unless you want me to run the command manually
all good, recreate it (but keep immich-server commented)
meanwhile on my end :D

(i have very nice multiple backups and fs snapshots; still it refuses, seems to be a problem of uptime-kuma cross-machine offline migrations (my own troubleshooting, unrelated))
V2 supports a real DB (sadly Maria but better than SQLite)
sqlite is a real db!! :D
and I'm still in wonder how cockroachdb is about the only one with a db that just stays up, just stays healthy, just heals itself, you don't have to hold its hand for upgrades or anything
sad part is its licensing makes it unattractive to take up in any project that's not new corporative we-will-buy-it-in choice
i've had most problems with maria-mysql (not mentioning msdb, that's just unreasonable :D ft. upgrade migrations), pg and sqlite have been so-so
mongo tends to be bad, redis tends to be ok, but abused by devlopers :D
there is one other db, i don't remember the name of atm, and can't check :/
@HashtagOctothorp how is it going, do you have anything up or logs?
hmmm...
main container:
But still "unhealthy", while the DB container says "starting" but normal log output
main container was supposed to be down? :)
but ok
oh sorry
the db itself doesn't produce any more logs other than what it said?
I didn't realize you wanted me not to do the main container when booting the pg container
nope
can you dump the db, if you know how? (before proceeding)
no worries, it's just one step at a time, usually safer and saner
from here?
that's restoring the db, afaik
we want to back it up
we don't have a backup
but if something goes wrong (mainly dumb human mistakes) in the next 24h, we have something to come back to
so answer is no
🤦♂️


ig should be docker compose exec database pg_dump -U postgres immich > mydump.xyz
ah
well
:D
you can make the backup anyway
but this looks a lot greener
i thought you didn't have any of those
yall went straight to synology snapshots
well losing the wal shouldn't be not that much, but it might be smarter to recover anyway
after backing up our current state, I'd start restoring the db backups from newest, testing, if fails then an older copy
(after backing up) then yes
except the immich_postgres is database in your case
kk. I'm going to pause for now becuase I REALLY need to do some work-work.
will pick up tonight after the kids are down.
but this is promising.
yup, jfyi i might not be available, quite late atm and I'm planning to eat (smh cooking like 3rd or 4th time in 24h, I question where I can fit it all, quite skinny)
Sounds like me in my youth... I was eating ~6000 calories a day and barely keeping the weight on.
Its gunna be an 8-9 hour break XD
kids → make it 3d-7d
fml lol

BUT! The backup from the existing auto-generated backups... *chefs kiss*
*hacker voice* I'm in.
Thanks for your help guys.
This thread has been closed. To re-open, use the button below.
lol, totally forgot about discord existing meanwhile