Seeing periodic "Workers runtime canceled this request" errors across multiple endpoints
In the past week or so we've started seeing a number of these errors:
They occur across a number of different endpoints within the Worker, so it seems like more of a systemic problem than a dangling Promise in a particular endpoint.
We're using Rust/Wasm with custom bindings to Cloudflare Workers, but has been running in production for a few months now, and these problems only began recently.
We are also using Hyperdrive to connect to the database, and have been steadily increasing our usage of database connections.
2 Replies
I found this other thread of someone mentioning seeing similar errors with Hyperdrive: https://discord.com/channels/595317990191398933/1150557986239021106/1391880095819632700
Here are some request IDs where we see this happening:
-
97552d2c09e57725
- 97554aee6b6834ff
- 97554a01fc5c1ed8
Also, I should mention that we're also seeing these errors on some endpoints that do not connect to the database over Hyperdrive, so it's not fully apparent to me that that is the issue
An update on this: I found that for one of the endpoints I'm seeing an occurrence of this error when trying to parse a JSON request body that has a Content-Length of 53483111 (53MB).
So that is at least one lead (and something that I'll have to address)As a further clue:
We have also been seeing this, but in our case it appears to be a side-effect of a deliberate policy of ours of injecting delays before responding to worker fetches with invalid parameters. The errors started the day we first deployed that policy.
When our ingress worker decides that an incoming request is invalid, it waits on a promise of up to 25s (depending on the nature of the invalidity) before returning the response. The idea is that the sending end is forced to wait that long before retrying.
Inserting that delay has greatly reduced the number of spam connection attempts that we have to deal with, and hence the number of logged events to process each day.
...but clearly the workers runtime doesn't like it. Maybe in our case we would be better off doing something other than just throwing a delay into the response.