Unable to join new cnpg instance
Good Morning,
I know you guys aren't cloudnative-pg or pg_vectors support, but I'm having an issue that appears to be the result of something with pg_vectors and you all are waaaay more knowledgeable in this area than I- I also may not be the first or last of your users to encounter this.
My setup:
K8S
Cloudnative-pg version 16.4 with using the tensorchord cloudnative-pgvector.rs image with pgvector.rs 0.3.0
I recently replaced a node, in my cluster, and cloudnative began the join process to replace the postgres node that was removed during the swap, however, it now produces an error indicating that the backup fails as a result of a filename being too long for tar format
I've tried backing down to 16.3, but no change in behavior- it seems like this might be tied to the new pgvectors
I can paste the log as an additional comment due to length limits
5 Replies
:wave: Hey @Zilch,
Thanks for reaching out to us. Please follow the recommended actions below; this will help us be more effective in our support effort and leave more time for building Immich :immich:.
References
- Container Logs:
docker compose logs
docs
- Container Status: docker compose ps
docs
- Reverse Proxy: https://immich.app/docs/administration/reverse-proxy
Checklist
1. :ballot_box_with_check: I have verified I'm on the latest release(note that mobile app releases may take some time).
2. :ballot_box_with_check: I have read applicable release notes.
3. :ballot_box_with_check: I have reviewed the FAQs for known issues.
4. :ballot_box_with_check: I have reviewed Github for known issues.
5. :ballot_box_with_check: I have tried accessing Immich via local ip (without a custom reverse proxy).
6. :ballot_box_with_check: I have uploaded the relevant logs, docker compose, and .env files, making sure to use code formatting.
7. :ballot_box_with_check: I have tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable
(an item can be marked as "complete" by reacting with the appropriate number)
If this ticket can be closed you can use the /close
command, and re-open it later if needed.
Successfully submitted, a tag has been added to inform contributors. :white_check_mark:{"level":"info","ts":"2024-08-22T14:36:38Z","logger":"pg_basebackup","msg":"WARNING: aborting backup due to backend exiting before pg_backup_stop was called","pipe":"stderr","logging_pod":"postgres16-8-join"}
{"level":"info","ts":"2024-08-22T14:36:38Z","logger":"pg_basebackup","msg":"pg_basebackup: error: backup failed: ERROR: file name too long for tar format: \"pg_vectors/indexes/0000000000000000000000000000000065de7f3829e7a01800096f010011f2ad/segments/4378fbe3-644b-4937-8671-86878244ed2c\"","pipe":"stderr","logging_pod":"postgres16-8-join"}
{"level":"info","ts":"2024-08-22T14:36:38Z","logger":"pg_basebackup","msg":"pg_basebackup: removing data directory \"/var/lib/postgresql/data/pgdata\"","pipe":"stderr","logging_pod":"postgres16-8-join"}
{"level":"error","ts":"2024-08-22T14:36:38Z","msg":"Error joining node","logging_pod":"postgres16-8-join","error":"error in pg_basebackup, exit status 1","stacktrace":"github.com/cloudnative-pg/cloudnative-pg/pkg/management/log.(*logger).Error\n\tpkg/management/log/log.go:125\ngithub.com/cloudnative-pg/cloudnative-pg/pkg/management/log.Error\n\tpkg/management/log/log.go:163\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/join.joinSubCommand\n\tinternal/cmd/manager/instance/join/cmd.go:139\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/join.NewCmd.func2\n\tinternal/cmd/manager/instance/join/cmd.go:72\ngithub.com/spf13/cobra.(*Command).execute\n\tpkg/mod/github.com/spf13/[email protected]/command.go:985\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tpkg/mod/github.com/spf13/[email protected]/command.go:1117\ngithub.com/spf13/cobra.(*Command).Execute\n\tpkg/mod/github.com/spf13/[email protected]/command.go:1041\nmain.main\n\tcmd/manager/main.go:66\nruntime.main\n\t/opt/hostedtoolcache/go/1.22.5/x64/src/runtime/proc.go:271"}
Error: error in pg_basebackup, exit status 1
That is definitely between pgvecto.rs and cnpg, and I don't think there's anything we can suggest to work around it. I recommend you report it as an issue with pgvecto.rs
Just to update here... since @bo0tzz helped me out on another server- I saved smart_search/clip_index to a script, then nuked clip_index- I was then able to join the new database instance to the cluster. I've rerun the script to recreate the index and all seems well now
curious how you linked
pg_vectors/indexes/0000000000000000000000000000000065de7f3829e7a01800096f010011f2ad/segments/4378fbe3-644b-4937-8671-86878244ed2c
to smart_search/clip_index