Neon•2y ago

desirable-aquamarine

Determine if connection is dead or not

Hello! I'm migrating from a managed Postges solution to Neon and I have a serverless python function connecting to the database. Despite stateless nature of these functions they are actually cached for some time (~15-30 min) and allow to reuse the same variables (including open connections), which allows to connect once and work with the same connection on multiple function invocations. The code used to have the connection alive at all times, but with Neon the connection is no longer alive after 5 mins of inactivity. This is as expected (cluster scales to zero) - https://neon.tech/docs/connect/connection-latency#check-the-status-of-a-compute Upon querying the code gave either some EOF error, or hang until a python function timeout. https://stackoverflow.com/questions/26741175/psycopg2-db-connection-hangs-on-lost-network-connection https://stackoverflow.com/questions/24130305/postgres-ssl-syscall-error-eof-detected-with-python-and-psycopg 1. I was trying to set it up to somehow keep the connection alive with "keepalive" messages or to check if a connection is dead but soon realized that a serverless function would not be able to do that after execution. 2. My next idea was to somehow check a connection with a test "SELECT 1;" statement, and with a timeout. Since the test execution takes 0.1-0.3 sec for me, and a new connection - 0.6-1 sec, there would be some gain in response time on most invocations. This was buried by inability to time a query in a serverless python environment since the platform controls the threads and you need another thread to work with a timer. 3. Now I'm kind of set with opening a new connection and closing it on each invocation, which does work reliably, even though a bit slower. But the question remains - how to check on the client that a connection is dead after the period of inactivity (5 min)?

Stack Overflow

Psycopg2 db connection hangs on lost network connection

Problem description I'm using psycopg2 to connect to my PostgreSQL database on a remote host. I open a connection and wait for requests, then for each request I run queries on the connection and r...

Stack Overflow

Postgres SSL SYSCALL error: EOF detected with python and psycopg

Using psycopg2 package with python 2.7 I keep getting the titled error: psycopg2.DatabaseError: SSL SYSCALL error: EOF detected It only occurs when I add a WHERE column LIKE ''%X%'' clause to my

Neon

Connection latency and timeouts - Neon Docs

Neon's Autosuspend feature ('scale to zero') is designed to minimize costs by automatically scaling a compute resource down to zero after a period of inactivity. By default, Neon scales a compute to...

18 Replies

harsh-harlequin•2y ago

Hmm. I'm surprised you need a timeout for that. Sending a SELECT 1 query on a terminated connection should immediately respond with RST and error

desirable-aquamarineOP•2y ago

@Conrad Ludgate Uhm, I'm using Psycopg library (v3), which uses libpq under the hood, and it just timeouts on a query after the connection sat idle for 5+ mins. The backend is also serverless, so I think if Neon sends some message about terminating the connection, the backend just doesn't receive it, and on the next invocation thinks that the connection is still active.

harsh-harlequin•2y ago

That shouldn't be how TCP works 🤔 If your python run is warm, and the connection is cached, then it will buffer the TCP packets until your program eventually reads them, including FIN or RST packets. If your python program scales to 0 then the connection is dropped and you don't have this problem anyway. I'll have a test myself, this is suspicious As for work arounds: 1) maybe pgbouncer? 2) switch to asyncio so you don't need threaded timers?

desirable-aquamarineOP•2y ago

Thank you @Conrad Ludgate 1) PGBouncer - will research and try, thank you for the info. Does it require to install something on Neon side? 2) Uhm, tried asyncio, doesn't seem to work for same reason (loops are controlled by the platform), or maybe I'm just noobed out on that. Code worked locally though. Also I hear you that the query on a dead connection should give an error instantly. Do you imply that it needed to be properly closed or this is expected even with network issues etc.? Does Neon send smth or properly terminate a connection before scaling to 0?

harsh-harlequin•2y ago

will research and try, thank you for the info. Does it require to install something on Neon side?

Neon provides it for free, it keeps connections warm on the server side for you. To use it with neon, change your connection host from ep-foo-bar-1234.us-east-2.aws.neon.tech to ep-foo-bar-1234-pooler.us-east-2.aws.neon.tech, adding the -pooler suffix

desirable-aquamarineOP•2y ago

Seems I already use it. I'll test without it though.

harsh-harlequin•2y ago

Hmm, fair enough then. 1s to start a new connection is quite slow 🤔

desirable-aquamarineOP•2y ago

Free tier from another region, and on cold start. When Neon is up, it's more like 0.6s. Wasn't better in a managed Postgres for warm state, so not a problem for me.

harsh-harlequin•2y ago

Ah yeah, there's quite a few round trips in a postgres startup...

desirable-aquamarineOP•2y ago

what's your opinion on this? @Conrad Ludgate Also I hear you that the query on a dead connection should give an error instantly. Do you imply that it needed to be properly closed or this is expected even with network issues etc.? Does Neon send smth or properly terminate a connection before scaling to 0?

harsh-harlequin•2y ago

We do. We should be sending the message "terminating connection due to administrator command". Let me run a test

harsh-harlequin•2y ago

yep. That TLS data is a postgres message, being followed immediately by a TCP FIN message, closing the connection

harsh-harlequin•2y ago

And I tried to run a query on this connection, and within 1ms it had determined that the connection was broken and started a new one

desirable-aquamarineOP•2y ago

Thanks a lot for the help 🙏 I'll test this also locally. My guess is that on serverless python function it just doesn't receive anything there (maybe address changes or the platform completely disregards all messages after the function execution). So then the new instance with a cached connection assumes everything is ok, sends a query and waits endlessly.

harsh-harlequin•2y ago

Which runtime is this? Lambda?

desirable-aquamarineOP•2y ago

Google Cloud Functions gen1, python 3.11

harsh-harlequin•2y ago

Thanks

like-gold•13mo ago

hi @Conrad Ludgate, found this thread and seems very relevant to the issue i'm experiencing. I'm drizzle + node in aws lambda and I'm constantly hit by the Terminating connection due to administrator command error. I understand why it happens due to neon compute being suspended due to inactivity. From my understanding the best solution is for the client to reconnect to the db in case when this happens, however i'm not exactly sure how to achieve this following the neon docs. I'm using drizzle + node in aws lambda. Here is how db connection pool is created:

import { drizzle } from 'drizzle-orm/neon-serverless'

const pool = new Pool({
  connectionString: "myconnectionstring",
});
const db = drizzle(pool, { schema: { ...schema, ...relations } });

import { drizzle } from 'drizzle-orm/neon-serverless'

const pool = new Pool({
  connectionString: "myconnectionstring",
});
const db = drizzle(pool, { schema: { ...schema, ...relations } });

And I am very curios can the Terminating connection due to administrator command error be accounted for in this setup, thanks in advance.

Gaming

Programming

Determine if connection is dead or not

Did you find this page helpful?