N
Neon2y ago
national-gold

Sudden error in an RO Replica

Dear Neon Team, we are currently switching from a self-hosted database to Neon and have been very pleased so far. I am personally performing the migration and have now encountered an error that occurs in a table, but only when I access it via the RO Replica. The executed SQL is executed via Dapper in C#, is very simple and reads as follows: SELECT * FROM "Devices"."Android" WHERE "Identifier" = @Identifier AND "Salt" = @Salt As soon as I try to execute this via the RW Node, everything works, but I would still like to report it as a BUG, as I am unsure why this is happening. I think if I delete the RO Replica and recreate it, everything should work again. I haven't done this yet as I don't know if it would make it harder for you to understand how the bug was created on the server side. The entire stack trace is attached below. I hope this helps to fix the bug and make Neon a better service. If further information is required, I will of course be happy to provide it within my means.
Npgsql.PostgresException (0x80004005): 58030: [NEON_SMGR] [shard 0] could not read block 0 in rel 1663/40976/2691.0 from page server at lsn 0/908AB390

POSITION: 8
DETAIL: page server returned error: Bad request: tried to request a page version that was garbage collected. requested at 0/908AB390 gc cutoff 0/B67C2088
at Npgsql.Internal.NpgsqlConnector.ReadMessageLong(Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at Npgsql.PostgresDatabaseInfo.LoadBackendTypes(NpgsqlConnector conn, NpgsqlTimeout timeout, Boolean async)
at Npgsql.PostgresDatabaseInfo.LoadPostgresInfo(NpgsqlConnector conn, NpgsqlTimeout timeout, Boolean async)
at Npgsql.PostgresDatabaseInfoFactory.Load(NpgsqlConnector conn, NpgsqlTimeout timeout, Boolean async)
at Npgsql.Internal.NpgsqlDatabaseInfo.Load(NpgsqlConnector conn, NpgsqlTimeout timeout, Boolean async)
at Npgsql.NpgsqlDataSource.Bootstrap(NpgsqlConnector connector, NpgsqlTimeout timeout, Boolean forceReload, Boolean async, CancellationToken cancellationToken)
at Npgsql.Internal.NpgsqlConnector.Open(NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.PoolingDataSource.OpenNewConnector(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.PoolingDataSource.<Get>g__RentAsync|34_0(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.NpgsqlConnection.<Open>g__OpenAsync|42_0(Boolean async, CancellationToken cancellationToken)
at Dapper.SqlMapper.QueryRowAsync[T](IDbConnection cnn, Row row, Type effectiveType, CommandDefinition command) in /_/Dapper/SqlMapper.Async.cs:line 488
at [REDACTED (our own code)
Exception data:
Severity: ERROR
SqlState: 58030
MessageText: [NEON_SMGR] [shard 0] could not read block 0 in rel 1663/40976/2691.0 from page server at lsn 0/908AB390
Detail: page server returned error: Bad request: tried to request a page version that was garbage collected. requested at 0/908AB390 gc cutoff 0/B67C2088
Position: 8
File: pagestore_smgr.c
Line: 2209
Routine: neon_read_at_lsn
Npgsql.PostgresException (0x80004005): 58030: [NEON_SMGR] [shard 0] could not read block 0 in rel 1663/40976/2691.0 from page server at lsn 0/908AB390

POSITION: 8
DETAIL: page server returned error: Bad request: tried to request a page version that was garbage collected. requested at 0/908AB390 gc cutoff 0/B67C2088
at Npgsql.Internal.NpgsqlConnector.ReadMessageLong(Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at Npgsql.PostgresDatabaseInfo.LoadBackendTypes(NpgsqlConnector conn, NpgsqlTimeout timeout, Boolean async)
at Npgsql.PostgresDatabaseInfo.LoadPostgresInfo(NpgsqlConnector conn, NpgsqlTimeout timeout, Boolean async)
at Npgsql.PostgresDatabaseInfoFactory.Load(NpgsqlConnector conn, NpgsqlTimeout timeout, Boolean async)
at Npgsql.Internal.NpgsqlDatabaseInfo.Load(NpgsqlConnector conn, NpgsqlTimeout timeout, Boolean async)
at Npgsql.NpgsqlDataSource.Bootstrap(NpgsqlConnector connector, NpgsqlTimeout timeout, Boolean forceReload, Boolean async, CancellationToken cancellationToken)
at Npgsql.Internal.NpgsqlConnector.Open(NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.PoolingDataSource.OpenNewConnector(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.PoolingDataSource.<Get>g__RentAsync|34_0(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.NpgsqlConnection.<Open>g__OpenAsync|42_0(Boolean async, CancellationToken cancellationToken)
at Dapper.SqlMapper.QueryRowAsync[T](IDbConnection cnn, Row row, Type effectiveType, CommandDefinition command) in /_/Dapper/SqlMapper.Async.cs:line 488
at [REDACTED (our own code)
Exception data:
Severity: ERROR
SqlState: 58030
MessageText: [NEON_SMGR] [shard 0] could not read block 0 in rel 1663/40976/2691.0 from page server at lsn 0/908AB390
Detail: page server returned error: Bad request: tried to request a page version that was garbage collected. requested at 0/908AB390 gc cutoff 0/B67C2088
Position: 8
File: pagestore_smgr.c
Line: 2209
Routine: neon_read_at_lsn
2 Replies
ratty-blush
ratty-blush2y ago
cc @John @ Neon @Christian Schwarz
dependent-tan
dependent-tan2y ago
This can creep in if the replica is too far behind (eg. scale to 0 while RW is busy). https://github.com/neondatabase/neon/pull/6357 I think the PR to address this issue is pending review and deployment.

Did you find this page helpful?