Cloudflare Developers•2mo ago

Hey Wes. We aren't using Party Server

Hey Wes. We aren't using Party Server here; however it sounds like you are possibly awaiting the fetch result inside the realtime path. Is the await fetch() inside the onMessage handler and sibling to the broadcast() call? You should be able to resolve this by moving the fetch to another codepath or ensuring the fetch isn't blocking the broadcast.

31 Replies

wesbos•2mo ago

I'm not awaiting it, but I can calling the fetch from within the onMessage handler

MarakOP•2mo ago

Can you share a code snippet? You may want to create a separate code path for the fetch ( I think it's called a Party? ) that will sync the state such that the main broadcast loop is always responsive.

wesbos•2mo ago

ohh ok, let me look into that - 1 sec

MarakOP•2mo ago

It is expected that any slow events inside the realtime path will block operations. I'd be surprised if a fire and forget wasn't working intuitively inside the loop; however moving the fetch to an outside code path that syncs state might be a better design. We aren't using Party Server here, someone else might know better.

Larry•2mo ago

PartyServer has lifecycle hooks onMessage (WebSocket) and onRequest (fetch). I think Marak is on the right track. I'm just helping him remember the PartyServer terminology

wesbos•2mo ago

yeah I was thinking about just firing up a second process that is a "client" of the web socket instead, but Im puzzled why this "fire and forget" seems to be blocking the network I'm acutally not doing either - in onStart() I run a recursive function every 1 second that fires a fetch

Larry•2mo ago

In general, I avoid outgoing fetches from Durable Objects. It's outside of the ideal concurrency model with inputGates and outputGates. If you do NOT await a fetch, then it will be held until all other events are done processing by the output gates. It doesn't sound like that's what's happening here exactly but that's an example why fetches are problematic.

wesbos•2mo ago

So the better approach would be to listen for messages on another worker?

Larry•2mo ago

If it's truely fire and forget, I make an RPC call to a Worker that then forwards the fetch. You can even pass a Request object in to the Worker over the RPC. The other way around is what I'm suggesting. Make the fetch use another Worker.

MarakOP•2mo ago

Moving the fetch to another worker might be the better move

wesbos•2mo ago

https://github.com/wesbos/LED-GRID/blob/main/src/server.ts#L173

GitHub

LED-GRID/src/server.ts at main · wesbos/LED-GRID

Contribute to wesbos/LED-GRID development by creating an account on GitHub.

wesbos•2mo ago

this is the function which doesn not seem to be blocking. But I like your approach of moving it to another worker much better

Larry•2mo ago

I haven't done this myself but if you make the other "Worker" (actually another DO) be an RpcTarget, I hear that it's super local and efficient.

wesbos•2mo ago

I dont think it needs to be a durable object, does it? Its just a fetch call

Larry•2mo ago

A short-lived DO with no storage is essentially a Worker, but yeah.

wesbos•2mo ago

ahh gotcha

Larry•2mo ago

Think of it more like another process. I believe you can even export it from the same Worker project as you export the DO.

wesbos•2mo ago

yeah exactly - ill try that now Hmm - so I moved all the logic to a second worker and it's still blocking the websocket

MarakOP•2mo ago

It's a bit challenging to read, I think the issue is around here:

this.wled.sendPixels(updates).then(() => {
  console.log('WLED sent', Date.now());
  setTimeout(this.updateLED.bind(this), 1000)
})

this.wled.sendPixels(updates).then(() => {
  console.log('WLED sent', Date.now());
  setTimeout(this.updateLED.bind(this), 1000)
})

Using then() is most likely blocking

wesbos•2mo ago

I removed all of that and still persisted

MarakOP•2mo ago

You may want to try something like:

setTimeout(() => {
  this.wled?.sendPixels(updates).catch(console.error);
}, 0);

setTimeout(() => {
  this.wled?.sendPixels(updates).catch(console.error);
}, 0);

wesbos•2mo ago

Tried it - even removed async from all the functions im going to try just make a separate script that acts as a client and send the data over websockets

MarakOP•2mo ago

Yes, I would suggest removing the timer loop and async awaits outside the broadcast code paths if possible

wesbos•2mo ago

weird if I replace the call with a slow fetch it doesnt block.. So there is something weird going on outside of the fetch Ill dig in some more- thanks for the help! I really appreciate it and now the original code is working just fine.. time for a break haha

MarakOP•2mo ago

Happy to help. I would suggest decoupling the timer updates for state from the broadcasting code paths. It may be a better design to have the loop updating state separated from the code broadcasting the state to the clients.

Larry•2mo ago

setTimeout also breaks the concurrency model

wesbos•2mo ago

What a rabbit hole. So I stripped everything out. IT has nothing to do with the promises or callstack. It seems like a bug in the workerd network level. I tried running wrangler dev --remote and there are zero issues holy smokes. So the endpoint that I was hitting was a .local domain: http://wled-grid.local - I swapped it to the straight IP address http://192.168.1.100 and it's still giving slow requests, but the blocked websockets are no longer happening 😐 I know macs have trouble with .local dns resolution so maybe its even a lower level issue with my mac

MarakOP•2mo ago

If it's possible you could try switching to RPC invocations via service bindings instead of making HTTP fetch requests.

wesbos•2mo ago

I tried it. The fetch needs to happen because it's a piece of hardware running a web server, but I moved the fetch to a separate worker and called it via RPC and the issue persisted

Larry•2mo ago

Thanks to cycling back to us. I wonder if it's a timing thing that only occurs locally? I had some tests that failed locally because the clock didn't advance between invocations locally and occured in the same millisecond. Whereas in production, no two requests from the same user ever arrive in the same millisecond. Do you key off of milliseconds ever in your code?

wesbos•2mo ago

nope.. One more thing I noticied is that workerd would spike the upload once the websockets could go through. It really feels like a lower level DNS thing

Gaming

Programming

Hey Wes. We aren't using Party Server

Did you find this page helpful?