N
Neon2y ago
desirable-aquamarine

Determine if connection is dead or not

Hello! I'm migrating from a managed Postges solution to Neon and I have a serverless python function connecting to the database. Despite stateless nature of these functions they are actually cached for some time (~15-30 min) and allow to reuse the same variables (including open connections), which allows to connect once and work with the same connection on multiple function invocations. The code used to have the connection alive at all times, but with Neon the connection is no longer alive after 5 mins of inactivity. This is as expected (cluster scales to zero) - https://neon.tech/docs/connect/connection-latency#check-the-status-of-a-compute Upon querying the code gave either some EOF error, or hang until a python function timeout. https://stackoverflow.com/questions/26741175/psycopg2-db-connection-hangs-on-lost-network-connection https://stackoverflow.com/questions/24130305/postgres-ssl-syscall-error-eof-detected-with-python-and-psycopg 1. I was trying to set it up to somehow keep the connection alive with "keepalive" messages or to check if a connection is dead but soon realized that a serverless function would not be able to do that after execution. 2. My next idea was to somehow check a connection with a test "SELECT 1;" statement, and with a timeout. Since the test execution takes 0.1-0.3 sec for me, and a new connection - 0.6-1 sec, there would be some gain in response time on most invocations. This was buried by inability to time a query in a serverless python environment since the platform controls the threads and you need another thread to work with a timer. 3. Now I'm kind of set with opening a new connection and closing it on each invocation, which does work reliably, even though a bit slower. But the question remains - how to check on the client that a connection is dead after the period of inactivity (5 min)?
Stack Overflow
Psycopg2 db connection hangs on lost network connection
Problem description I'm using psycopg2 to connect to my PostgreSQL database on a remote host. I open a connection and wait for requests, then for each request I run queries on the connection and r...
Stack Overflow
Postgres SSL SYSCALL error: EOF detected with python and psycopg
Using psycopg2 package with python 2.7 I keep getting the titled error: psycopg2.DatabaseError: SSL SYSCALL error: EOF detected It only occurs when I add a WHERE column LIKE ''%X%'' clause to my
Neon
Connection latency and timeouts - Neon Docs
Neon's Autosuspend feature ('scale to zero') is designed to minimize costs by automatically scaling a compute resource down to zero after a period of inactivity. By default, Neon scales a compute to...
18 Replies
harsh-harlequin
harsh-harlequin2y ago
Hmm. I'm surprised you need a timeout for that. Sending a SELECT 1 query on a terminated connection should immediately respond with RST and error
desirable-aquamarine
desirable-aquamarineOP2y ago
@Conrad Ludgate Uhm, I'm using Psycopg library (v3), which uses libpq under the hood, and it just timeouts on a query after the connection sat idle for 5+ mins. The backend is also serverless, so I think if Neon sends some message about terminating the connection, the backend just doesn't receive it, and on the next invocation thinks that the connection is still active.
harsh-harlequin
harsh-harlequin2y ago
That shouldn't be how TCP works 🤔 If your python run is warm, and the connection is cached, then it will buffer the TCP packets until your program eventually reads them, including FIN or RST packets. If your python program scales to 0 then the connection is dropped and you don't have this problem anyway. I'll have a test myself, this is suspicious As for work arounds: 1) maybe pgbouncer? 2) switch to asyncio so you don't need threaded timers?
desirable-aquamarine
desirable-aquamarineOP2y ago
Thank you @Conrad Ludgate 1) PGBouncer - will research and try, thank you for the info. Does it require to install something on Neon side? 2) Uhm, tried asyncio, doesn't seem to work for same reason (loops are controlled by the platform), or maybe I'm just noobed out on that. Code worked locally though. Also I hear you that the query on a dead connection should give an error instantly. Do you imply that it needed to be properly closed or this is expected even with network issues etc.? Does Neon send smth or properly terminate a connection before scaling to 0?
harsh-harlequin
harsh-harlequin2y ago
will research and try, thank you for the info. Does it require to install something on Neon side?
Neon provides it for free, it keeps connections warm on the server side for you. To use it with neon, change your connection host from ep-foo-bar-1234.us-east-2.aws.neon.tech to ep-foo-bar-1234-pooler.us-east-2.aws.neon.tech, adding the -pooler suffix
desirable-aquamarine
desirable-aquamarineOP2y ago
Seems I already use it. I'll test without it though.
harsh-harlequin
harsh-harlequin2y ago
Hmm, fair enough then. 1s to start a new connection is quite slow 🤔
desirable-aquamarine
desirable-aquamarineOP2y ago
Free tier from another region, and on cold start. When Neon is up, it's more like 0.6s. Wasn't better in a managed Postgres for warm state, so not a problem for me.
harsh-harlequin
harsh-harlequin2y ago
Ah yeah, there's quite a few round trips in a postgres startup...
desirable-aquamarine
desirable-aquamarineOP2y ago
what's your opinion on this? @Conrad Ludgate Also I hear you that the query on a dead connection should give an error instantly. Do you imply that it needed to be properly closed or this is expected even with network issues etc.? Does Neon send smth or properly terminate a connection before scaling to 0?
harsh-harlequin
harsh-harlequin2y ago
We do. We should be sending the message "terminating connection due to administrator command". Let me run a test
harsh-harlequin
harsh-harlequin2y ago
yep. That TLS data is a postgres message, being followed immediately by a TCP FIN message, closing the connection
No description
harsh-harlequin
harsh-harlequin2y ago
And I tried to run a query on this connection, and within 1ms it had determined that the connection was broken and started a new one
desirable-aquamarine
desirable-aquamarineOP2y ago
Thanks a lot for the help 🙏 I'll test this also locally. My guess is that on serverless python function it just doesn't receive anything there (maybe address changes or the platform completely disregards all messages after the function execution). So then the new instance with a cached connection assumes everything is ok, sends a query and waits endlessly.
harsh-harlequin
harsh-harlequin2y ago
Which runtime is this? Lambda?
desirable-aquamarine
desirable-aquamarineOP2y ago
Google Cloud Functions gen1, python 3.11
harsh-harlequin
harsh-harlequin2y ago
Thanks
like-gold
like-gold13mo ago
hi @Conrad Ludgate, found this thread and seems very relevant to the issue i'm experiencing. I'm drizzle + node in aws lambda and I'm constantly hit by the Terminating connection due to administrator command error. I understand why it happens due to neon compute being suspended due to inactivity. From my understanding the best solution is for the client to reconnect to the db in case when this happens, however i'm not exactly sure how to achieve this following the neon docs. I'm using drizzle + node in aws lambda. Here is how db connection pool is created:
import { drizzle } from 'drizzle-orm/neon-serverless'

const pool = new Pool({
connectionString: "myconnectionstring",
});
const db = drizzle(pool, { schema: { ...schema, ...relations } });
import { drizzle } from 'drizzle-orm/neon-serverless'

const pool = new Pool({
connectionString: "myconnectionstring",
});
const db = drizzle(pool, { schema: { ...schema, ...relations } });
And I am very curios can the Terminating connection due to administrator command error be accounted for in this setup, thanks in advance.

Did you find this page helpful?