What happened:
Our production database experienced a sudden outage. CPU spiked to ~99% from IOWait, then dropped after Postgres crashed. The database is currently unreachable.
Postgres logs show:
PANIC: could not open file "global/pg_control": Input/output error
Followed by the read replica spamming:
FATAL: could not connect to the primary server: connection to server at "db.<project-ref>.supabase.co", port 5432 failed: Connection refused
Current state:
All PostgREST calls returning 503
/rest-admin/v1/ready returning 503
Auth health checks: 200 (healthy)
Storage: 200 (healthy)
Project dashboard still shows ACTIVE_HEALTHY
Environment:
Region: us-east-2
Read replica enabled
Ask:
Looks like a disk I/O failure on the primary instance. Is there a known incident in us-east-2? Can the primary be restarted or failed over?