Lots of "SSL connection has been closed unexpectedly" errors
Hey there, we're using Neon as a DB for a control plane for batch processing jobs. We're on the Launch plan. Execution of batch processes have multiple k8s workers connecting to the DB to report the status of their job runs. This ends up creating a lot of concurrent connections. We've recently been seeing this error
and it's been happening a lot. We don't know if it's because we're triggering some kind of abuse protection system or getting rate limited in some way but by all the docs I've read we're well within the limits. Monitoring says that the most concurrent connections we've ever had is 103 which is under the need for scaling above 0.25 vCPU but we did try scaling up to a min of 1 vCPU to see if that would improve and the results were inconclusive. Could we get some help on this?
15 Replies
genetic-orange•2y ago
Hi! 103 connections is right around the limit supported by the 0.25 CU size.
Are you using the
pooler URL that's provided by Neon? This will allow more connections to be opened: https://neon.tech/docs/connect/connection-pooling#enable-connection-pooling
If you're using the pooled URL and still seeing this error then it's worth opening a support ticket. They can pull logs to see if anything unusual is happening.Neon
Connection pooling - Neon Docs
Neon uses PgBouncer to offer support for connection pooling, enabling up to 10,000 concurrent connections. PgBouncer is a lightweight connection pooler for Postgres. This topic describes Neon's defaul...
extended-salmonOP•2y ago
@ShinyPokemon Hey, so we were going off of this https://neon.tech/docs/manage/endpoints#how-to-size-your-compute which says 112 max connections. We will raise the limit to 1 vCPU to see if that changes anything...
genetic-orange•2y ago
Nice, it should help. You should use the pooled URL too. That way you'll conect through PGBouncer which supports more connections. Under the hood your PGbouncer has a connection pool but it's transparent to your apps.
extended-salmonOP•2y ago
We did turn on connection pooling though..
genetic-orange•2y ago
When you say "turn on" do you mean you updated your k8s Deployments/Pods with the new URL? It's not an on/off thing. It's a new URL you need to use to connect to the database
extended-salmonOP•2y ago
Oh, no. I know it's the connection string.
Yeah, we swapped out the connection string to the pooled one.
genetic-orange•2y ago
Ah good! OK, in that case it's odd you would be hitting limits. Are these Pods performing long running transactions?
extended-salmonOP•2y ago
Running a job now. See what it does.
No, they're not. We're not doing anything that complex. It's just doing simple inserts and updates. Mostly logging type operations.
genetic-orange•2y ago
Hmm, ok. The only other theory I have is that the database might be suspending after a few minutes once the job completes. This would terminate the SSL connection, and if not handled on the application side it will appear to be an error.
If you're seeing it mid-job though, then it's worth opening a support ticket to get them to pull logs
extended-salmonOP•2y ago
Will let you know. Got lots of pods and lots of concurrent jobs executing right now.
Okay, we ran the job, but deeper investigation has exposed other things we need to look into and eliminate from consideration. If we are able to trace it back to Neon DB then I'll let you know.
genetic-orange•2y ago
Good stuff. Don't hesitate to reach out!
extended-salmonOP•2y ago
Cool, so we switched to a more expensive, but managed PG database for tests and our issues went away. Will work on opening a ticket.
genetic-orange•2y ago
@BitShift 🇺🇦 so the issue still happened even with pooled connections?
extended-salmonOP•2y ago
Yes
We file the ticket and we included a link to this thread
genetic-orange•2y ago
Good stuff. Support will be able to check logs and hopefully narrow down what's unique about the situation with Neon
Sorry you're having this issue