W
Windmill4mo ago
rubenf

I will write a blog post asap on it, but

I will write a blog post asap on it, but this weekend we achieved a very cool property for a distributed sytem: - Scripts were always 100% reliable as in, they would either execute and be completed with a success or failure, or retried if the worker crashed at ANY point (and I really mean any, even mid transaction, that's the beauty on relying on the beast that is Postgresql). It was achieved using atomic statement for pulling jobs and writing back their progress timestamps, regularly and on completion. - Flows were 99% reliable but had some extremely ephemeral point-in-time where if a crash happened, a flow could be stuck forever. Those events were so rare and unlikely on a modern infra that we didn't prioritize improving that but that is now done: Flows are now guaranteed to complete when they are scheduled given that enough workers are there to process them. This is done through a series of atomic statements in the right places of the finite state machine that runs the flows. If such crash on the machine happen, the flow will be guaranteed to progress in a finite amount of time and propagate the error back up, and then have it be treated by error handlers if any making windmill 100% observable.
7 Replies
andness
andness4mo ago
This is very interesting but I miss one clarification: this only applies to scripts that are part of a flow and have retries enabled right?
rubenf
rubenf4mo ago
No it applies to all flows It's not about flow failing because the script errored, it's about nodes/machines literally crashing without windmill being informed of it Windmill will now handle it properly 100% of the time
andness
andness4mo ago
Ok so you can actually restart the script exactly at the point it stopped? Like ...
db.execute("drop table foo")
--- CRASH ---
db.execute("create table bar")
db.execute("drop table foo")
--- CRASH ---
db.execute("create table bar")
And you will not retry the drop table but continue with the create table?
rubenf
rubenf4mo ago
Not exactly where it stopped, it's restarted
andness
andness4mo ago
Yeah so if the script isn't written to be restartable it will fail then (since it tries to drop the now non-existing table foo).
rubenf
rubenf4mo ago
Yes, idempotency right now still need to be implemented at the user level
andness
andness4mo ago
For data pipelines I generally try to write them so that they are idempotent and restartable since that makes for a more robust system, but a risk with this auto-restarting is that you get duplicated data maybe if you're not aware? Like if a flow uses the first script to determine where to start loading data say, and then a second script does the loading. If you restart the second script you risk loading all the data doubly. When I implement things like that I try to make the flow as a whole restartable, and having it automatically restart one step like that would actually undermine the idempotency I've buillt in)
Want results from more Discord servers?
Add your server
More Posts
wmill sync push errorHi, I'm hitting some cryptic errors trying to promote changes made on a `staging` workspace to a `prVisual bug after update from .208 to .250 to modal form buttonsHello, after instance update all my modal forms became like this, is there a quick fix, I don't wantDelay in catching docker exit code in bash scriptToying around with scheduling a bash script that spins up a docker container to see if windmill woulPostgres request throws ExecutionErr when trying to get value from field using enum.We have a PostgreSQL database that we want to query. in one of the tables we use an enum as a type fRenaming background runnable IDs?Is it possible to rename background runnable IDs? Being able to name component IDs is super helpful Facing error when migrating windmill from our prod instance to staging oneI am trying to migrate production windmill environment to staging windmill environment. I have ran cTrouble Connecting to Self-Hosted Instance Using CLIHello I have an instance hosted on AKS and I am trying to activate git integration on a workspace, bNEW [Major] 🔴 Flow & Metadata CopilotNEW [Major] 🔴 Flow & Metadata Copilot Released on 15/02/2024 under v1.270.0 The Flow & Metadata Default database for Postgres scripts?I'd like to be able to provide a defined resource as a default value for the Database input on a PosIs the issue of slectedRow fixed in newer verisons?Hello I am on 1.208.0 And I am still experiencing the issue when the data in the table gets changed,