K
Kord7mo ago
DarkAtra

Kord does not recover from UnknownHostException

It seems like Kord doesn't recover from UnknownHostException - at least not fully as the Discord bot appears as offline and does no longer react to application commands until it is restarted. I ran into this issue two times in the past few weeks. The bot is using google dns to resolve hostnames. Stack trace is attached as screenshot since... discord message character limit... Also, here's the project in case you want to look something up: https://github.com/DarkAtra/v-rising-discord-bot
No description
21 Replies
SchlaubiBus
SchlaubiBus7mo ago
Would you like to it to retry after an UnknownHostException? I think it's intended, that it doesn't recover from name resulution issues, as the indicate a persistent problem
DarkAtra
DarkAtraOP7mo ago
Yes, I would expect Kord to attempt to recover in such a case. In my experience, most UnknownHostException are temporary and usually recoverable by retrying with backoff. is there any way for me to force gateway reconnects when an UnknownHostException occurs for the time being? @SchlaubiBus any idea?
SchlaubiBus
SchlaubiBus7mo ago
I don't think there is a way
DarkAtra
DarkAtraOP6mo ago
The same issue just happened again today at around 1am. My bot was unable to resolve the hostname gateway-us-east1-b.discord.gg for about 2 minutes (using google DNS). I thought about switching the dns provider that ok-http uses from system (google dns) to something else. However, this could still fail since there is no guarantee that DNS lookups always succeed. I think the only way of dealing with this issue permantently is to make kord more resilient and attempt reconnects with backoff. Right now it's getting stuck in a weird state where it can query and update messages in a discord server but doesn't react to application commands anymore.
SchlaubiBus
SchlaubiBus6mo ago
When does this happen, because it wouldn't make sense for this to happen during a connection
DarkAtra
DarkAtraOP6mo ago
The cause likely is:
2025-04-09T01:13:50.030Z ERROR 1 --- [atcher-worker-3] dev.kord.gateway.DefaultGateway :
java.net.SocketException: Socket closed
at java.base@21.0.5/sun.nio.ch.NioSocketImpl.endRead(NioSocketImpl.java:243) ~[na:na]
at java.base@21.0.5/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:323) ~[na:na]
at java.base@21.0.5/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:346) ~[na:na]
at java.base@21.0.5/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:796) ~[na:na]
at java.base@21.0.5/java.net.Socket$SocketInputStream.read(Socket.java:1099) ~[na:na]
at java.base@21.0.5/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:489) ~[na:na]
at java.base@21.0.5/sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:483) ~[na:na]
at java.base@21.0.5/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70) ~[na:na]
at java.base@21.0.5/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1461) ~[na:na]
at java.base@21.0.5/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1066) ~[na:na]
at okio.InputStreamSource.read(JvmOkio.kt:93) ~[na:na]
at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:153) ~[na:na]
2025-04-09T01:13:50.030Z ERROR 1 --- [atcher-worker-3] dev.kord.gateway.DefaultGateway :
java.net.SocketException: Socket closed
at java.base@21.0.5/sun.nio.ch.NioSocketImpl.endRead(NioSocketImpl.java:243) ~[na:na]
at java.base@21.0.5/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:323) ~[na:na]
at java.base@21.0.5/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:346) ~[na:na]
at java.base@21.0.5/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:796) ~[na:na]
at java.base@21.0.5/java.net.Socket$SocketInputStream.read(Socket.java:1099) ~[na:na]
at java.base@21.0.5/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:489) ~[na:na]
at java.base@21.0.5/sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:483) ~[na:na]
at java.base@21.0.5/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70) ~[na:na]
at java.base@21.0.5/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1461) ~[na:na]
at java.base@21.0.5/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1066) ~[na:na]
at okio.InputStreamSource.read(JvmOkio.kt:93) ~[na:na]
at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:153) ~[na:na]
SchlaubiBus
SchlaubiBus6mo ago
I men that's just internal code of okhttp
DarkAtra
DarkAtraOP6mo ago
discord message length fucked up the stack trace - i've updated it.
SchlaubiBus
SchlaubiBus6mo ago
it's still just okhttp
DarkAtra
DarkAtraOP6mo ago
this is the disconnect from the gateway and then kord attempts to reconnect right after - this is where the hostname exception occurs
SchlaubiBus
SchlaubiBus6mo ago
I don't see that in that error
DarkAtra
DarkAtraOP6mo ago
correct me if i'm wrong but this is how i understand the gateway connection: There is a websocket connection that kord uses to receive events from discord - for example for command issued by users on a server. When this websocket connection is closed by the peer for whatever reason, kord attempts to reconnect to the gateway. Now if the hostname of the gateway could not be resolved - the attempt is aborted and kord never reconnects. i cant post the full stack trace as i dont have nitro. the logs reads:
2025-04-09T01:13:50.030Z ERROR 1 --- [atcher-worker-3] dev.kord.gateway.DefaultGateway :
java.net.SocketException: Socket closed
...
...
2025-04-09T01:13:58.060Z ERROR 1 --- [atcher-worker-3] dev.kord.gateway.DefaultGateway :
java.net.UnknownHostException: gateway-us-east1-b.discord.gg: Temporary failure in name resolution
...
...
2025-04-09T01:13:50.030Z ERROR 1 --- [atcher-worker-3] dev.kord.gateway.DefaultGateway :
java.net.SocketException: Socket closed
...
...
2025-04-09T01:13:58.060Z ERROR 1 --- [atcher-worker-3] dev.kord.gateway.DefaultGateway :
java.net.UnknownHostException: gateway-us-east1-b.discord.gg: Temporary failure in name resolution
...
...
SchlaubiBus
SchlaubiBus6mo ago
You can upload them as a file?
SchlaubiBus
SchlaubiBus6mo ago
This is the reconnect code
No description
DarkAtra
DarkAtraOP6mo ago
DarkAtra
DarkAtraOP6mo ago
the errors are 8 minutes apart tho.. so yeah not sure if they are related
SchlaubiBus
SchlaubiBus6mo ago
There should be more logs about retrying
DarkAtra
DarkAtraOP6mo ago
nope, nothing else besides the two errors i just sent you. The UnknownHostException reoccurs like 20 times or so tho
SchlaubiBus
SchlaubiBus6mo ago
Then you need to increase the log level
DarkAtra
DarkAtraOP3mo ago
ok, i'll set it to trace for kord and report back when it happens again @SchlaubiBus so it finally happened again while i had DEBUG logs enabled. Here's what i found: * Kord attempts to reconnect a total of 10 times. However, none of the reconnect attempts succeeded so it closes the gateway connection with:
2025-07-06T02:09:27.837Z TRACE 1 --- [atcher-worker-3] dev.kord.gateway.retry.LinearRetry : retry attempt 10, delaying for 20s
2025-07-06T02:09:47.838Z WARN 1 --- [atcher-worker-3] dev.kord.gateway.DefaultGateway : retry limit exceeded, gateway closing

2025-07-06T02:09:27.837Z TRACE 1 --- [atcher-worker-3] dev.kord.gateway.retry.LinearRetry : retry attempt 10, delaying for 20s
2025-07-06T02:09:47.838Z WARN 1 --- [atcher-worker-3] dev.kord.gateway.DefaultGateway : retry limit exceeded, gateway closing

* i think the bot lost it's internet connection for a few minutes this night - which would explain why it didn't reconnect in time Is there a way to customize the retry behaviour so that it's a bit more forgiving - e.g. exponential backoff over 5 minutes or something like that? I'm currently trying this:
kord = Kord(
token = botProperties.discordBotToken
) {
gateways { resources, shards ->
// shared between all shards
val rateLimiter = IdentifyRateLimiter(resources.maxConcurrency, defaultDispatcher)
shards.map {
DefaultGateway {
client = resources.httpClient
identifyRateLimiter = rateLimiter
reconnectRetry = UnlimitedExponentialRetry(
initialInterval = Duration.ofSeconds(2),
maxInterval = Duration.ofMinutes(1),
multiplier = 2.0
)
}
}
}
}
kord = Kord(
token = botProperties.discordBotToken
) {
gateways { resources, shards ->
// shared between all shards
val rateLimiter = IdentifyRateLimiter(resources.maxConcurrency, defaultDispatcher)
shards.map {
DefaultGateway {
client = resources.httpClient
identifyRateLimiter = rateLimiter
reconnectRetry = UnlimitedExponentialRetry(
initialInterval = Duration.ofSeconds(2),
maxInterval = Duration.ofMinutes(1),
multiplier = 2.0
)
}
}
}
}
where UnlimitedExponentialRetry is this: https://github.com/DarkAtra/v-rising-discord-bot/blob/b1dfdf1ba81a92754e2f14e753a96494c1ff8fc5/src/main/kotlin/de/darkatra/vrising/discord/UnlimitedExponentialRetry.kt I'll close this thread if this fixes the issue.
SchlaubiBus
SchlaubiBus3mo ago
Another issue would be to use docker health checks and then the container restarts

Did you find this page helpful?