Determine if connection is dead or not
Hello!
I'm migrating from a managed Postges solution to Neon and I have a serverless python function connecting to the database. Despite stateless nature of these functions they are actually cached for some time (~15-30 min) and allow to reuse the same variables (including open connections), which allows to connect once and work with the same connection on multiple function invocations.
The code used to have the connection alive at all times, but with Neon the connection is no longer alive after 5 mins of inactivity. This is as expected (cluster scales to zero) - https://neon.tech/docs/connect/connection-latency#check-the-status-of-a-compute
Upon querying the code gave either some EOF error, or hang until a python function timeout.
https://stackoverflow.com/questions/26741175/psycopg2-db-connection-hangs-on-lost-network-connection
https://stackoverflow.com/questions/24130305/postgres-ssl-syscall-error-eof-detected-with-python-and-psycopg
1. I was trying to set it up to somehow keep the connection alive with "keepalive" messages or to check if a connection is dead but soon realized that a serverless function would not be able to do that after execution.
2. My next idea was to somehow check a connection with a test "SELECT 1;" statement, and with a timeout. Since the test execution takes 0.1-0.3 sec for me, and a new connection - 0.6-1 sec, there would be some gain in response time on most invocations.
This was buried by inability to time a query in a serverless python environment since the platform controls the threads and you need another thread to work with a timer.
3. Now I'm kind of set with opening a new connection and closing it on each invocation, which does work reliably, even though a bit slower.
But the question remains - how to check on the client that a connection is dead after the period of inactivity (5 min)?
Stack Overflow
Psycopg2 db connection hangs on lost network connection
Problem description
I'm using psycopg2 to connect to my PostgreSQL database on a remote host. I open a connection and wait for requests, then for each request I run queries on the connection and r...
Stack Overflow
Postgres SSL SYSCALL error: EOF detected with python and psycopg
Using psycopg2 package with python 2.7 I keep getting the titled error: psycopg2.DatabaseError: SSL SYSCALL error: EOF detected
It only occurs when I add a WHERE column LIKE ''%X%'' clause to my
Neon
Connection latency and timeouts - Neon Docs
Neon's Autosuspend feature ('scale to zero') is designed to minimize costs by automatically scaling a compute resource down to zero after a period of inactivity. By default, Neon scales a compute to...
18 Replies
harsh-harlequin•2y ago
Hmm. I'm surprised you need a timeout for that. Sending a SELECT 1 query on a terminated connection should immediately respond with RST and error
desirable-aquamarineOP•2y ago
@Conrad Ludgate Uhm, I'm using Psycopg library (v3), which uses libpq under the hood, and it just timeouts on a query after the connection sat idle for 5+ mins. The backend is also serverless, so I think if Neon sends some message about terminating the connection, the backend just doesn't receive it, and on the next invocation thinks that the connection is still active.
harsh-harlequin•2y ago
That shouldn't be how TCP works 🤔
If your python run is warm, and the connection is cached, then it will buffer the TCP packets until your program eventually reads them, including FIN or RST packets. If your python program scales to 0 then the connection is dropped and you don't have this problem anyway.
I'll have a test myself, this is suspicious
As for work arounds:
1) maybe pgbouncer?
2) switch to asyncio so you don't need threaded timers?
desirable-aquamarineOP•2y ago
Thank you @Conrad Ludgate
1) PGBouncer - will research and try, thank you for the info. Does it require to install something on Neon side?
2) Uhm, tried asyncio, doesn't seem to work for same reason (loops are controlled by the platform), or maybe I'm just noobed out on that. Code worked locally though.
Also I hear you that the query on a dead connection should give an error instantly. Do you imply that it needed to be properly closed or this is expected even with network issues etc.?
Does Neon send smth or properly terminate a connection before scaling to 0?
harsh-harlequin•2y ago
will research and try, thank you for the info. Does it require to install something on Neon side?Neon provides it for free, it keeps connections warm on the server side for you. To use it with neon, change your connection host from
ep-foo-bar-1234.us-east-2.aws.neon.tech to ep-foo-bar-1234-pooler.us-east-2.aws.neon.tech, adding the -pooler suffixdesirable-aquamarineOP•2y ago
Seems I already use it. I'll test without it though.
harsh-harlequin•2y ago
Hmm, fair enough then. 1s to start a new connection is quite slow 🤔
desirable-aquamarineOP•2y ago
Free tier from another region, and on cold start. When Neon is up, it's more like 0.6s. Wasn't better in a managed Postgres for warm state, so not a problem for me.
harsh-harlequin•2y ago
Ah yeah, there's quite a few round trips in a postgres startup...
desirable-aquamarineOP•2y ago
what's your opinion on this? @Conrad Ludgate
Also I hear you that the query on a dead connection should give an error instantly. Do you imply that it needed to be properly closed or this is expected even with network issues etc.?
Does Neon send smth or properly terminate a connection before scaling to 0?
harsh-harlequin•2y ago
We do. We should be sending the message "terminating connection due to administrator command". Let me run a test
harsh-harlequin•2y ago
yep. That TLS data is a postgres message, being followed immediately by a TCP FIN message, closing the connection

harsh-harlequin•2y ago
And I tried to run a query on this connection, and within 1ms it had determined that the connection was broken and started a new one
desirable-aquamarineOP•2y ago
Thanks a lot for the help 🙏
I'll test this also locally. My guess is that on serverless python function it just doesn't receive anything there (maybe address changes or the platform completely disregards all messages after the function execution). So then the new instance with a cached connection assumes everything is ok, sends a query and waits endlessly.
harsh-harlequin•2y ago
Which runtime is this? Lambda?
desirable-aquamarineOP•2y ago
Google Cloud Functions gen1, python 3.11
harsh-harlequin•2y ago
Thanks
like-gold•13mo ago
hi @Conrad Ludgate, found this thread and seems very relevant to the issue i'm experiencing. I'm drizzle + node in aws lambda and I'm constantly hit by the
Terminating connection due to administrator command error. I understand why it happens due to neon compute being suspended due to inactivity.
From my understanding the best solution is for the client to reconnect to the db in case when this happens, however i'm not exactly sure how to achieve this following the neon docs.
I'm using drizzle + node in aws lambda. Here is how db connection pool is created:
And I am very curios can the Terminating connection due to administrator command error be accounted for in this setup, thanks in advance.