Hey! I have a general architecture

Hey! I have a general architecture question - I have a bunch of always on, always connected iot devices that I need to be able to send commands to. For security reasons, we don't want these to have open endpoints do we're experimenting with having them open websockets to durable objects. We Jeff to have a UI be able to send and receive commands from the devices, so have started opening websockets from the UI to the durable object worker. It seems a little brittle, as we sometimes lose connection to the devices. Does this sound like a reasonable architecture, or is there a better way to approach this project? Additionally, we'd like to monitor the connections but each device is attaching to its own DO. Any suggestions on how to monitor across a collection of them or a better way to do it?
9 Replies
Larry
Larry2mo ago
DOs sounds like an ideal platform for your system. First, you'll need to consider if you need robust reconnection semantics. Take a look at the reconnecting-websocket. PartySocket is a fork of that and Cloudflare's own AgentClient uses that fork. However, that just has the client reconnect. You may also need a way for either/both sides to "catchup" after a reconnection. That's typically done with semantics like SSE's Last-Event-Id header. Second, you'll need to find the right granularity for your DO's. A DO per iot device is a clean boundary, but that could be really high fanout for your UI needs. A pool of iot service DOs will reduce the fanout, but then you have the complexity of managing the pool. Take a look at durable-utils/StaticShardedDO. It was created by this Discord server's most helpful contributor, Lambros Petrou. You may be able to use that for a pooled fanout. Good luck and come back here often to update us on your progress and get help when needed.
GitHub
GitHub - lambrospetrou/durable-utils: Utilities for Cloudflare Dura...
Utilities for Cloudflare Durable Objects and Workers - lambrospetrou/durable-utils
Dan T
Dan TOP2mo ago
Thanks so much!!! One thing I've noticed is that there's support for an idle "ping" response for durable objects - I'm not super super accustomed to websockets - from the client side, is there a need for sending pings to keep the connection alive reliably?
Larry
Larry2mo ago
Prefer sending 0x09 WS ping frames. If your client is a browser though, you can’t so that’s when you need the Cloudflare auto response feature. 25 seconds is a conservative heartbeat because almost all intermediaries will keep the connection open for 30 seconds
Dan T
Dan TOP2mo ago
Awesome! This is all super helpful. Will sending WS ping frames result in higher costs with the durable objects? (mostly, will this wake up a hibernating one?)
Larry
Larry2mo ago
Nope. They are free and automatic
Dan T
Dan TOP2mo ago
Amazing! So for recapping as I attempt to implement this: - From the iot device, I should be opening a websocket and sending ping frames every 25 seconds - From a client ui (browser based), when I connect, I should be opening a websocket and may need to use pings with auto responses - I can use a durable object to pool multiple connections into a single "room", which will let me get stats across all of them more easily - If I want to keep a single DO per iot device, I can use the sharding utility lib to handle the sharding and try to query across them Does that sound about right?
Larry
Larry2mo ago
Almost. The sharding lib handles pooling so it won’t be one DO per device but it will be as easy as if it were. Note Lambos’s sharding lib is experimental. Read the code and understand what it is doing
Dan T
Dan TOP2mo ago
Ahhh got it, thank you! Is there any examples of a command queue pattern in durable objects your aware of I should reference, or should I be doing that by hand?
Larry
Larry2mo ago
I rolled my own that's not open sourced (yet) otherwise I would share it. After mine was already in production, I looked at the pre-release Actor base class that the super-smart and helpful Brayden Wilmoth is working on and it is inspired by the Actor programming model, which is also the reference concept for Cloudflare DOs. I personally like my own queuing system better than Brayden's but his is publicly available and mine is not... yet. It could have changed since I last looked and last spoke to Brayden but his implementation only had a queue on the receiving end. Cloudflare DOs already queue messages in order on the receiving end if you follow certain rules (and I do) so I didn't see the value in that. For me, the real value is the queue on the sending side that allows any client to catchup in a reconnection situation. YMMV and it's a good exercise to look at Brayden's work regardless. BTW, after he developed the first/best GUI for Cloudflare D1, he was acqui-hired and his Outterbase offering became the "Cloudflare D1 Data Explorer". Amazing work! Great guy!!! There is also Cloudflare Queues offering and look at Cloudflare's RPC capabilities. Cloudflare RPC supports passing full Request objects around as a parameter. I have this thought that one day I'll use that and make a JavaScript Request object be the envelope for my commands. Other Cloudflare products support any Structured Clone-able data type not just JSON. The Cloudflare DO KV API is one. Cloudflare Queues is another. I don't think they all support Request objects though. I only know for certain that Cloudlfare RPC does. Note, I'm pretty sure Cloudflare Queues is built on Cloudflare DOs so in theory DOs can do anything Queues can do. In practice though, Cloudflare intentionally does not expose the Cap-n-proto serialization capability that will serialize (almost) any StructuredClone-able data type. So, as Yogi Bera says, "In theory there is no difference between theory and practice. In practice there is."

Did you find this page helpful?