Issues connecting to Postgres Database after redeploying service
Hi, as part of performing the required database-migration, I've redeployed my service for the first time in ~8 months. Building the project still seemed to work just fine, but suddenly there's errors connecting to the database (even before migrating).
The strange thing is that (if I remember correctly) originally I was still able to interact with the (old) database before migrating, but after migrating, the server just crashes. This means the timeline is something like this:
1. For 8 months, the service worked well
2. Redeploying the service lead to database-connection-errors, but it was still possible to write/read the database
3. Migrating the databese results in BOTH database-connection-errors as well as crashes and the server becoming inoperable
What could be causing these issues?
I'm thinking that maybe port/environment-variables changed after the 8 months without my knowledge?
If I can provide any logs, I'd be happy to do so. π
Solution:Jump to solution
Actually, it turns out that the crash was caused for different reasons at different places.
1. Rocket seems to decide how many connections to create. More specifically,
rocket_sync_db_pools
. That package "defaults to the configured number of workers * 4", which seems to be >100 on railway. I assume "workers" means "CPUs", but feel free to correct me if you know more than I do π Thus, on Railway, there were simply too many connections.
2. Locally, when trying to connect to the Railway-DB, my code only tries to create 32 connections (maybe because of 8 CPUs?).
- Thus, I was always able to connect to my local DB
- However, connecting to the Railway-DB failed because creating 32 connections within the default timeout
of 5s wasn't possible....86 Replies
Project ID:
N/A
do you use reference variables?
If you mean those, then yes.
and then is DATABASE_URL also a reference variable?
I'd like to provide a
Project ID
, but I'm not sure where to find it. The closest thing I can find is the Service ID
.Yep, and as far as I can tell it's referencing the new database. (That may have been my doing)
looks good to me
okay all that's in order, can you share the error message?
Sure!
The original error (after redeploying the server, but before migrating the database):
Part 2:
still trying to connect to the old database, have you hardcoded the database credentials into your code somewhere?
The current error, after migrating the database:
okay now that's the new database
Exactly
is this the only app you are connecting to the database from?
Yes. There's only one app in total (+ the old DB and the new DB)
how many pooled connections is your app trying to open?
I'm not sure what that is. How can I check it?
that would be an in code thing
Intuitively I'd say one π
not much of a pool then π
I'll see if I find something to the contrary
either way, remove any deployments on your apps service
This remove button?
yep
and then remove the active deployment on your postgres service
Both new and old?
(legacy and migrated)
legacy databases don't have deployments, so just the v2 database
Done π
then redeploy the v2 database
In any order?
wdym order?
Nevermind, I didn't see the edit
do you have some desktop software like dbgate where you could test a connection to postgres?
I have DBeaver CE, which probably has that functionality. However, I've only ever used it to explore a local database.
The DB built without errors, but the Deploy Logs show errors:
looks fine to me
connect to the v2 database with dbeaver to confirm its accepting connections
Do you know where to find the main info required to connect to the database? For example, it seems like I'm using an incorrect URL.
I'll do my best and respond in about 1-2h, thank you for your help so far!
it would be in the variables tab of the v2 postgres service
I managed to connect to the database!
It seems to be working fine when navigating it in DBeaver
---
I assume the errors I've been getting in Railway stem from something changing after redeploying the service that's using the DB
have you since redeployed your app?
I have, but it's still producing errors:
I tried locally rebuilding the project, and locally it's running just fine.
Which makes me think that it's not an in-code thing, but something about the environment-variables getting messed up
postgres has a default max connection limit of 100, to me, this looks like your code is trying to open more than 100 connections?
1. What about the errors makes you think that?
2. The code hasn't changed, and it used to work before, so I'm not sure whether this could be an issue.
1. because that's what the error is saying, too many clients
2. I know but that's all I can think of right now
what environment variables does your code use to connect to the database?
1. Ah, you're right. I read
Error: database pool init error for pool named 'voteon_date_db'
and somehow figured that it had issues initiating any connection.
The two obvious ones are: (these are just my local values)
I thought there was also a PORT variable, but I can't find it at the moment.
And these should be all that are requiredthe database url would include the port
I think it's time to start digging into your code and how it's trying to open the connection
The only thing I can find is that railway has
Database Version: 15.5.0
while locally I have Database Version: 14.10.0
.
Do you know whether that could be an issue?well the legacy database was postgres 13, but there has to be a deeper issue here, postgres isn't going to be telling you theres too many client connections for fun
Update:
1. I managed to reproduce the bug locally by trying to connect to the railway-database via my script.
2. I still have no idea what I could be doing wrong.
The good news is that now there's at least some more output...
I'd start to look at how your initialising your Postgres pool
hey @guess_wh0, could you show me how you connect to the database or is rocket automatically picking it up?
and, can you show me your cargo.toml please
Hi, of course, I'm happy to provide the info you need!
Cargo.toml:
right so Iβm guessing this is something diesel is doing
I donβt know much about diesel, Iβm going to look into it. please can I see how you intotialiss it, or do you just use the env variable?
I use an environment variable.
1 sec, working on providing the rust-code
(Reduced) main function: (Slightly outdated full version can be found here)
database::establish_manual_connection();
: (Slightly outdated full version can be found here) (heavily inspired from docs-example)
The local environment-variables look like this. The second line can be used to try to connect to the railway-DB after inserting the password (and receive the error):
---
By the way, the full source code can be found here. It's using slightly outdated packages (as opposed to the
Cargo.toml
i sent you), but it used to work just fine.
Should you require any additional information, I'd be happy to provide it!GitLab
Files Β· master Β· Glitchy-Tozier / voteon.date Β· GitLab
Check when your friends' free time matches!
Awesome, gonna take a deeper look when home but this seems like it is not wiwtint for a connection to reply and just starting new ones. Do you have lots of migrations or something?
There's 5 migrations if you include creating the database. However, all of them are >9 months old. There shouldn't be any new ones that require execution.
Does it work if you run a local Postgres database?
Yep, I'm able to connect to and use the local DB
However, versions differ β
What version is your local one?
The message isnβt loading on my phone give me a break brody
Oh right, finally, 14
I mean you could try that but in not sure how much success youβd have with it, worth a shot though maybe
Downgrading to 14 on railway that is
I could also see if i can manage to upgrade my local DB.
And check if I then receive a local error.
That'll take a few hours though since I need to get home first
that would be good, take your time π
Finally, after hours of fighting to first upgrade psql and then the actual database, I managed to upgrade to 15.5.0! (I'm new to this)
However, everything the web-server still works when connecting to the local DB, while failing to connect to the online one.
Do you know if you are using TLS?
In diesel
I'm not sure. Any idea how to find out?
The only thing I can find is that Rocket supports it.
ChatGPT tells me this:
However, Diesel itself doesn't directly handle TLS (Transport Layer Security) or encryption. The use of TLS in a Rust application using Diesel would typically depend on the database server and the connection settings you use when configuring your database connection.
For example, if you are connecting to a PostgreSQL database using Diesel, the TLS or SSL connection would be configured through the PostgreSQL connection settings rather than directly within Diesel.
You probably know best though what is and isn't possible.
I've checked the API docs for diesel, but I don't recognize anything there.
I tried disabling the migrations (see reducid code-sample) but unfortunately the error persists. Makes me think it's something else.
postgres has a a default max connection limit of 100, something you are doing is trying to open more than that
try not using manual connection?\
I get what you're saying, but haven't yet found out what could be the issue. The main function (see above) is the only place where I open a connection, and it's only called once.
Of course, there may be some dark magic going on in the background that I don't know about.
how many times is the rocket ufnction being called?
Only once. It's basically the
main()
-function of the server. See last reply to @Brody .im thinking it might accidentaly being called more then that
I did, but that cuts off all database-functionality, even with my local DB. That code is directly copied from Diesel's Getting Started guide by diesel's code, so I don't think I've made an error there.
hmmmm
See this comment for what the
establish_manual_connection()
-function looks like.can you add some logs in the rockejt function to see how many times the function is actually running
I did, it's run only once.
It seems to be async though β or the error is occuring later β as ALL print-statements are executed, even the ones at the very end of the function.
wtf, this is why i hate diesel
Oh right, here's the full backtrace.
The part of my code it highlights (
voteon_date/backend/src/main.rs:53:1
) doesn't seem particularly helpful. That line is just the closing bracket of the rocket()
(=main) function.I feel the same way π
all this tells me is that it is coming from the main function
so like
Okaaaay, so after some digging, it seems
.attach(DbConn::fairing())
is the problematic line (see rocket()
-function).
However, I don't see where I'm doing anything dangerous. DbConn
is just a rocket guard type and looks exactly like it's shown in the docs.
(Plus, I've added some methods to the struct. However, these 1. do not generate new connections and 2. aren't called until a route is requested.)
It feels strange that I'm able to connect to my local DB but not the Railway one.
Could it be that somehow they're set up differently? Allowing/disallowing different things?it's just a standard postgres image, and according to postgres docs the max concurrent connection limit is 100, and it's not going to tell you you've hit that just for fun, this is 100% a code issue
I understand that. The strange thing is that:
1. it used to work just fine and I haven't changed anything about my setup (though maybe the package-manager did some cheating)
2. it still works with my local DB.
This leads me to believe that the code isn't just straight out wrong in every regard but that there has to be some interaction that results in this particular issue.
yeah, unfortunately this doesn't look like it is a issue with postgres or railway, and most likely something to do with diesel. im not as experienced with diesel, but i can recommend sqlx to be a solid replacement. much more reliable in my experience
sqlx - Rust
The async SQL toolkit for Rust, built with β€οΈ by the LaunchBadge team.
going to second what milo said, this isn't going to be an issue with postgres or railway, and running locally hardly ever means it will run perfectly elsewhere, this is a code issue unfortunately, the problematic interaction is that your app is opening too many connections, find a way to stop that behaviour and your problem is solved, but I would still take milo's advice and switch to sqlx
Well that's unfortunate as it'll mean a lot of rewriting. However, if it's the main way to get the app running again, I'll have to do it.
Thank you for your suggestion regarding
sqlx
. I'll perform 1-2 more tests in hopes of finding the bug somewhere, but if they turn out unhelpful, I'll work on replacing diesel
with sqlx
.
I'll write an update when there's new information
The website's back up!! πSolution
Actually, it turns out that the crash was caused for different reasons at different places.
1. Rocket seems to decide how many connections to create. More specifically,
rocket_sync_db_pools
. That package "defaults to the configured number of workers * 4", which seems to be >100 on railway. I assume "workers" means "CPUs", but feel free to correct me if you know more than I do π Thus, on Railway, there were simply too many connections.
2. Locally, when trying to connect to the Railway-DB, my code only tries to create 32 connections (maybe because of 8 CPUs?).
- Thus, I was always able to connect to my local DB
- However, connecting to the Railway-DB failed because creating 32 connections within the default timeout
of 5s wasn't possible.
- Surprisingly, diesel
didn't do anything wrong this time (at least according to my understanding)
- Limiting pool_size
and timeout
allowed me to connect again.---
Question to you experts: Is there a certain value for
pool_size
(= nr of connections) you would recommend to use? Why?if "workers" means cpu cores, then thats 32 * 4 = 128 connections
yep too many pooled connections
respectfully, you could have saved a lot of time if you had taken my advice
but to answer your question, a pool size of 10 is fine for most scenarios