Apologies for going silent on this. The issue - and homing in on a workaround that actually works -
Apologies for going silent on this. The issue - and homing in on a workaround that actually works - has taken all this time.
The root of the problem is that console logs written from a hibernatable-websocket DO during a wakeup that was triggered by a websocket message don't get through to logpush (we opened https://github.com/cloudflare/workerd/issues/2089 on this last July; it's still open). Our solution to that had been to send all log output from such a DO to a separate Logging worker, originally using a fetch but recently with an RPC invocation.
What we hadn't noticed until a couple of weeks ago was that something about that external-logging setup was keeping the main DO awake full time - every second of every day. Turning it off is what immediately collapsed our duration costs... but also meant we were missing all those log messages.
We now have a different workaround, which writes logs generated during websocket-triggered wakeups into persistent store, and dumps them out to the console on the next non-websocket wakeup. There were some nuances in juggling how often to write to storage, and how to ensure that a non-websocket wakeup would happen in a sufficiently timely manner to avoid the stored logs growing indefinitely. As of now, we seem to be in a good place: we're getting all the log messages, and our duration metric is under control.
The root of the problem is that console logs written from a hibernatable-websocket DO during a wakeup that was triggered by a websocket message don't get through to logpush (we opened https://github.com/cloudflare/workerd/issues/2089 on this last July; it's still open). Our solution to that had been to send all log output from such a DO to a separate Logging worker, originally using a fetch but recently with an RPC invocation.
What we hadn't noticed until a couple of weeks ago was that something about that external-logging setup was keeping the main DO awake full time - every second of every day. Turning it off is what immediately collapsed our duration costs... but also meant we were missing all those log messages.
We now have a different workaround, which writes logs generated during websocket-triggered wakeups into persistent store, and dumps them out to the console on the next non-websocket wakeup. There were some nuances in juggling how often to write to storage, and how to ensure that a non-websocket wakeup would happen in a sufficiently timely manner to avoid the stored logs growing indefinitely. As of now, we seem to be in a good place: we're getting all the log messages, and our duration metric is under control.
GitHub
We have one DurableObject that uses the hibernation API. It implements webSocketMessage() in which we use console.log() to log stuff. On miniflare (wrangler dev) that works perfectly fine, the logs...