Looking for thoughts on the following

Looking for thoughts on the following thought process I'm having with DOs: I have a number of users who will open a browser or similar client. They will pull a DO stub for their own user_id, connect to Websockets and listen for notifications An admin user posts a new message, which pulls a list of all subscribed user_id'ss and generates a stub for each user, pulls all open Websockets, and sends the message (a single user may have multiple clients listening for notifications at once) What's the best way to handle this pattern In theory? Each message being sent could be going out to any number of users (in the 10s of thousands for reasonable scale assumptions), is there a way to use subrequests or similar to be able to instantiate many stubs and send a message to many users? I don't wanna spend too long going down a particular experiment to find there's an existing solution, or if there's an architectural reason why this pattern should not be used
14 Replies
1984 Ford Laser
1984 Ford LaserOP7mo ago
Or if I use Service Bindings am I all good?
DaniFoldi
DaniFoldi7mo ago
The main constraint you will be working around is the 1000 subrequest limit. A system that works for up to ~1M users is to “nominate” some of the DOs to relay messages (you could use another class but this is slightly more efficient, although more complicated to reason about) In practice: - wherever your message comes in from: find the list of users that should be notified. Let’s assume you use ids or usernames, the important thing is that they are well below 1KB in length - split them up into batches of up to 1000, depending on other subrequests like logs or maximum connections per user - for each batch, find the first user and send the message and the id list to the DO - in a DO, send the message to the user and send every other DO the same message with an empty list To make it simpler, you can split the broadcast system into a separate DO, and you can make the fanout unlimited as well, if instead of the user id list you only send say the first and last id it should fan out to. Then, in the DO you can query the list, if the length is over 1000, split it up further, otherwise send the message
lambrospetrou
lambrospetrou7mo ago
As Dani said, once you are in the many thousands users territory you need to shard across multiple DOs. A single DO has certain capabilities, it's not autoscalable. Once you need to use many DOs to do the same logic, my StaticShardedDO abstraction could be useful to make it easy to call them all. Check it out at https://github.com/lambrospetrou/durable-utils?tab=readme-ov-file#staticshardeddo If you do try it out, let me know if you have feedback on its API. Also, the list of blog posts in our new page has a lot of examples how to architecture production apps: https://developers.cloudflare.com/durable-objects/what-are-durable-objects/#durable-objects-in-cloudflare Especially the post about Cloudflare Queues is very nice to read.
1984 Ford Laser
1984 Ford LaserOP7mo ago
I will have a read, thank you My current approach is this: - Worker A, invoked by fetch(), will pull a list of IDs from a 3rd party (let's assume list is in thousands) - the list of IDs is sliced into batches, and then pushed to a queue - the consumer worker (which happens to be bound within this same worker) iterates over the list of IDs, and invokes a Service Binding RPC method (which is also bound to this same worker) called stubTest(id) - stubTest() does a idFromName() and get() for a stub, then calls wsTest() on the stub - wsTest() does this.state.getWebSockets() and iterates over each possible connection Using Queues for this has seemed to work okay so far. If I have a list of 2000 IDs, my DO metrics shows 4K DO requests, and GB-sec usage of around 40GB-sec for those 2K IDs. The Workers metrics shows 2K requests, though I'm unsure if those fall under billed requests or not While I understand the use case for using sharded DOs, is there anything about my process that seems to stand out? If I already had the calculated ID from idFromName(), instead of pulling it each time, would that half the number of DO requests?
lambrospetrou
lambrospetrou7mo ago
If I already had the calculated ID from idFromName(), instead of pulling it each time, would that half the number of DO requests?
I don't understand what you mean here. Are the IDs you pull in step 1, the DO IDs, or application specific message you will broadcast?
1984 Ford Laser
1984 Ford LaserOP7mo ago
Just not sure if that produces a request to the DO or is it the call to getWebSockets() that produces the second request per stub invocation? IDs are from elsewhere, not necessarily in DO UUID format Like in my testing I called a list of 2K IDs, and produced 4K DO requests in my dash. I assume the first request per stub is the creation of the stub, and the second is the call to getWebSockets()
lambrospetrou
lambrospetrou7mo ago
When you create a stub there is no request happening. The request is only when you call something on the stub.
1984 Ford Laser
1984 Ford LaserOP7mo ago
I only call 1 function on each stub, I will need to investigate further
lambrospetrou
lambrospetrou7mo ago
Going at the beginning though, since these notifications are from an admin and would be broadcasted to all users, do you need this specific notifications to be using each individual DO which is per user? This seems wasteful. Depending on how many admin broadcasts you have, did you think of segregating these admin notifications into their own flow with dedicated DOs to handle them, and retain your user DOs for user specific messages? For example, you can have a group of 10-100 DOs each handling a chunk of your users (depending on how long each broadcast takes could be hundreds or low thousands users per DO), and therefore you would only need to send to these DOs the announcement, and these DOs will broadcast against all their websockets. Maybe that's what you described above, but the per user DO then was quite confusing. Creating 10K DO stubs for each message you want to broadcast is going to hurt you perf-wise, which is why I am wondering if you could consolidate them more.
1984 Ford Laser
1984 Ford LaserOP7mo ago
Admin-user was probably the wrong analogy Main reason for using DO per user is because a user can be subscribed to an arbitrary number of channels/admins at any one time, which can and will be entirely different to any other user. Following this flow would allow a user to receive a message from these admins whenever they are sent out, on whatever clients that have open and connected to their DO instance. Kinda similar to a PubSub setup I guess Notifications are also sent out very irregularly, which is why I wanted to simplify the architecture where possible
lambrospetrou
lambrospetrou7mo ago
As I always say, it depends on exact scenarios, with real numbers. The dimensions you need to consider is how many fan outs do you do per second/minute. What is the degree of that fan out (how many users need to receive that message). How many different "channels" does a user subscribe to? Depending on the answers for the above, you need to see if it's worth having DOs be at the channel level instead of the user level. I am not an expert on websockets, but something to consider.
1984 Ford Laser
1984 Ford LaserOP7mo ago
To simplify further, I'm essentially creating a push notification system where a channel will create a new notification that is sent out to all subscribed users. Users can be subscribed to any number of channels. The way my process is laid out means that scale is handled no worries. A channel might only send 1 notification every couple days, but a user might be subscribed to 200 channels so could expect a semi steady stream if they stayed connected. I will investigate the sharding solution, I am wondering if I would be better served by an existing solution from elsewhere. Pity PubSub is not taking anyone new for foreseeable future
lambrospetrou
lambrospetrou7mo ago
Another thing to consider, might be simpler, is that you hash all your userbase across N DOs, and all messages for a user has to go to that DO. So, it's not a DO per user, it's a DO per X users. Also, each channel has its own DO that keeps a list of subscribed users. Once a message is sent to that channel DO, the channel DO takes from its local storage all subscribed users, hashes their IDs into the "per user group DOs" needed, and sends the batch requests to those (this could be your Queue as well). This way, you do not fan out across thousands of user DOs but 2-3 orders of magnitude less, and you still scale at the number of channels you can have. The limitation of this approach is the volume of notifications that a user will need to receive per second. This is what defines what N and X must be, the number of DOs to have for a group of users. Anyway, we seem to be doing general architecture diagrams now so I will stop here until there is a concrete usecase.
1984 Ford Laser
1984 Ford LaserOP7mo ago
I like the idea of hashing the userbase across DOs. A user might have multiple clients open at once, but usually a single user would be receiving messages per minutes, not seconds, so a single DO might be able to handle thousands of users at once, as only certain users will need to receive a message. Appreciate your input, great food for thought

Did you find this page helpful?