Connections stuck in idle/ClientRead - Django application
Hey everyone 
We've been battling a weird issue since Dec 16 and I'm honestly stuck. Hoping someone can point us in the right direction.
What's happening:
Our Django app randomly freezes - heartbeat stops etc. When I check
PID | STATE | WAIT_EVENT | DURATION | QUERY
10342 | idle | ClientRead | 32:32 | SELECT pid, state...
7452 | idle | ClientRead | 02:36 | select 1
12732 | idle | ClientRead | 02:36 | select 1
8199 | idle in transaction| ClientRead | 00:42 | SAVEPOINT "s139..."
So Postgres shows
The fact that even
What we've tried:
-
-
-
- TCP keepalives
- Tried both pooler AND direct endpoint - same issue on both
Our setup:
- Django 4.x + psycopg3 (
- AWS ECS Fargate
- Following the Neon Django docs config
Timeline:
This started Dec 16. I noticed there were some "Apply config" events in our Neon project around 9:18-9:43 PM that day. Could something have changed on the infrastructure side?
Impact:
When it happens, we have to restart the container. It's happening multiple times a day now.
Anyone seen anything like this? Or any ideas what else we could check?
Thanks
We've been battling a weird issue since Dec 16 and I'm honestly stuck. Hoping someone can point us in the right direction.
What's happening:
Our Django app randomly freezes - heartbeat stops etc. When I check
pg_stat_activity, I see a bunch of connections stuck like this:PID | STATE | WAIT_EVENT | DURATION | QUERY
10342 | idle | ClientRead | 32:32 | SELECT pid, state...
7452 | idle | ClientRead | 02:36 | select 1
12732 | idle | ClientRead | 02:36 | select 1
8199 | idle in transaction| ClientRead | 00:42 | SAVEPOINT "s139..."
So Postgres shows
idle (query finished) but ClientRead (waiting for client to read the response). The response just... never makes it back to our app?The fact that even
select 1 gets stuck for 2+ minutes makes me think this isn't something on our end?What we've tried:
-
CONN_MAX_AGE=0 (fresh connection per request)-
DISABLE_SERVER_SIDE_CURSORS=True-
CONN_HEALTH_CHECKS=True- TCP keepalives
- Tried both pooler AND direct endpoint - same issue on both
Our setup:
- Django 4.x + psycopg3 (
psycopg[binary,pool])- AWS ECS Fargate
- Following the Neon Django docs config
Timeline:
This started Dec 16. I noticed there were some "Apply config" events in our Neon project around 9:18-9:43 PM that day. Could something have changed on the infrastructure side?
Impact:
When it happens, we have to restart the container. It's happening multiple times a day now.
Anyone seen anything like this? Or any ideas what else we could check?
Thanks
