Had to restart the server mid-job and now have several workflows stuck in status: "running". They won't progress because the worker process died with them.
The Problem:
Using Effect's workflow system with Postgres persistence. The architecture: RPC endpoint receives a request from convex→ immediately returns submitted: true → uses Effect.forkDaemon to execute the workflow in the background. Fire-and-forget style.
Convex tracks status and currentStep per request, but when the server crashes, those stay as "running" indefinitely. The workflow is supposed to update the state in convex every action inside it.
A hacky fix?: Scheduled cron that auto-resubmits stuck workflows with the same idempotency id?
How should i best handle resuming workflows after an application crash?