I have a cron job (the scheduled export in the worker entrypoint) which interacts with some DOs. So in this handler, I get DO stubs using idFromName, call the init function so they know their own entity id, and then call whatever other DO methods I need. In this case it wouldn’t be as nice to reimagine this as a fetch call where I inject my own headers
Makes sense. Using Workers RPC is more efficient. If it's possible that the first access to a DO instance comes from this cron job RPC call, then your init call is the only way I can think to handle it also. However, if you can be assured that it will always have been called before the cron job, then you might be able to use the headers approach.
Nice pattern to handle this is to have init return a RPCTarget with methods you want to be public and make everything else on the DO private. This makes it impossible to access the DO without going through your init fn
You can pipeline the init call to reduce round trips and it can work out cheaper as multiple methods invoked on same rpctarget instance is counted as single session
Are there any footguns with that approach? For instance do such sessions timeout? Or is that a non-issue when calling from a cron Worker? Might it be if calling from another DO?
If using the “using” keyword it should automatically be disposed. Unsure about max or idle timeout. Good idea to have wrapper on calling side to do retries with backoff
Ah thanks for your response! Very helpful. For my use-case it would actually be more ideal for the DO to not hop around to be closer to the requests. I know that sounds unintuitive, but I’m building an app where the choice of location is intentional, and it may be possible that requests may come from customers from anywhere in the world - and moving the DO due to a high number of requests from a distant location would be not ideal. Happy to get in touch directly for more info
Not for the init() + actual call if they are pipelined. But you can even use the same RpcTarget across requests, and followup RPCs would go straight to the target host/DO instead of the usual routing. If this matters to your app is a whole other story
But, if I understand this correctly, the problem with using the same RpcTarget across requests is that you now have a resource on the server you have to worry about being left dangling, right?
Well, at some point it's going to be teared down since the client worker will be evicted, but yes. If you are not careful it could end up consuming resources and resetting your DO eventually, if clients holding the RpcTarget don't terminate ever.
We can't make fetch private though, so there would still be a way to bypass the init call for websocket upgrade requests, right? So there's at least one place we need to remember to call init or add the header, whichever method we use to pass in the entity id?
Yup using a custom header for websockets upgrade request which I then store using serializeAttachment. This way don’t have to store on a variable on the DO and it can also store other context about the web socket connection
We have found it’s easier to reason about if DO instance variables are only things that can be set in constructor. Otherwise it’s easy to accidentally use a instance variable which hasn’t been set like entity ID
I think in my case I'll keep it in DO storage, not an individual ws client connected to the DO. I might still want to know the original entity id even when I have no connected clients.
So I'm finally making the jump to SQLite in DOs and don't have any prior experience with SQL, are there any gotchas I need to be aware of when using things like VIEWs or TRIGGERs? The pipelines I'd be looking at would be at most hundreds of rows total in a table, using index columns where possible. Generated columns from JSON data also sounds interesting for some uses. My use cases would usually be write once-read a few times at most, wanna simplify how I'm storing and managing my relational data
Interesting I'll take a look. I don't want to add a bunch of extra complexity to my projects by havin to work with an ORM but if it isnt intrusive them Im down. Is it a heavy dependency?
Yup, and their types are wrong in some scenarios which can cause you headaches to debug in the future. I still find it to be good if you avoid the relational api, which is where most bugs appear
I found good results using drizzle kit to define schema and migrations, but for runtime, I have a simple custom handler for casting types on return from the DO using the existing schema definitions in TS.
The only aspect one needs to be careful with raw SQL and this approach is consistent type casting on writes for fields like dates as numbers or string however you choose to store them.
The side perk to custom SQL is it's much much easier for an LLM to debug/refine code based on the SQL. It's a level of indirection with ORMs sometimes.
I find myself more and more optimizing my software design decisions based upon the cognizability of it to an LLM. A funny thing about my own code and LLM use is that if I have poorly named identifiers it can throw it into a doom loop. I called something "serialize" and there was JSON in the area and it constantly thought the output was a string, but it was really pre-processing before turning it into a string. Once I renamed it, the LLM had no trouble.
I have a semi-short lived use case, for stateful requests, and don't feel like they need to store to a persistant store, but I'm worried they'll be prematurely killed.
Is there a way to execute a worker from a DO where the walltime of this worker is not added to the walltime of the DO?
It might be a bit cheeky, but what I am trying to do:
Have a DO take care of administration for my users.
Once in a while (via alarms), the DO asks a worker to check in with an external API for an update.
This external API can be extremely slow. Think 10+ seconds to come up with an answer. As such, the worker returns a response near-instantly, while via the external API fetch is finished, and a D1 table is updated.
This way the DO isn't blocked on the fetch for very long.
From a cost perspective, paying for 10s of walltime is ofc very expensive (especially as it is just waiting). So the idea of this setup was that the DO would run in short burst, and let the worker do the waiting on the external API.
However, and documentation suggests the same: Cloudflare is being nice (which I appreciate btw) and doesn't charge for the second worker invocation. But also adds the walltime of the second worker to the DO walltime. And I can't find a way for it to stop doing that
I tried several things:
Use service-bindings (works, but walltime is added together)
Use service-bindings and set smart placement on the worker (works, but DO now shows both requests in the GB-s graph, suggesting they are both billed under DO pricing)
Use direct HTTP (doesn't work, as you can't call a worker on your own account from another worker without a service-binding, it seems)
Any advise or best practice how to deal with DOs and fetches to external APIs that are very slow?
DO invokes Worker via Service Binding, Service Binding returns after initiating work with , allowing Durable Object to be evicted. Worker continues processing, and then once complete, uses the calling DOs ID to create a stub to it, and returns the response.