Cloudflare Developers•4w ago

Hey @avenceslau | Workflows We have

Hey @avenceslau We have been encountering Workflow instances that occasionally are terminating on their own while running and aren't throwing an error as expected. It seems to occur about once a day now, and it requires manual intervention for us to kick the workflow off again. Essentially these Workflow instances appear to be receiving a "terminate" signal but we are not the ones initiating them. Any idea on what might be up?

20 Replies

ajay | dreamlit.aiOP•4w ago

Some examples: https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/77ede742-c001-4d0c-9813-7b6efabc1acf https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/e4f91796-31f8-42db-a693-4f6da6b51c46 https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/f4bf7053-3cbc-42a1-821d-8ff2c0d33c81 https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/ea4014ca-fa34-4531-b161-2739abad5ebc

avenceslau•4w ago

Going to investigate, can you give me the time when it happens? (Ideally in UTC)

andrew•4w ago

i think the best timestamp we have is the last end date in each of the workflow: https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/fe7fbb5e-5807-47e7-8db5-6def93dd70d6 2025-10-30T01:18:35.701Z https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/ea4014ca-fa34-4531-b161-2739abad5ebc 2025-10-29T15:02:39.630Z https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/f4bf7053-3cbc-42a1-821d-8ff2c0d33c81 2025-10-28T19:03:01.096Z https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/e4f91796-31f8-42db-a693-4f6da6b51c46 2025-10-24T11:40:31.139Z https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/77ede742-c001-4d0c-9813-7b6efabc1acf 2025-10-21T07:57:01.801Z but it's hard to tell if that end time corresponds exactly with when it self-terminates

avenceslau•4w ago

Yup thanks that helps.

ajay | dreamlit.aiOP•4w ago

Thanks for investigating!

avenceslau•4w ago

What do you mean by are not receiving an error as expected? Feel free to share with me the workflow via DM @ajay1495 @andrew | dreamlit.ai sorry for the ping, want to get this fixed for you guys asap if its a bug

ajay | dreamlit.aiOP•4w ago

Thanks for the follow up, sharing shortly

ajay | dreamlit.aiOP•4w ago

Here's our Workflow wrapper. Essentially, it wraps all the code in a try catch that allows us to gracefully handle when an error is thrown

workflow-entrypoint-...

ajay | dreamlit.aiOP•4w ago

Meaning, if an error is thrown, it gets gracefully handled (we log the error and throw in Sentry) before terminating. What we noticed is that these ongoing polling jobs (which extend that base class), will just spontaneously terminate. And we can't figure out why, they're not going through the exception handling there, so it must not be through an error being thrown.

avenceslau•4w ago

You call terminate on the workflow?

ajay | dreamlit.aiOP•4w ago

We only call terminate explicitly when we want to pause polling. In those handful of instances above, there was no pausing polling initiated That would mean we'd pause service for our customer Also, when we call .terminate() on our end, we log that in our db.

avenceslau•4w ago

So I checked all this instances and I see calls to binding calls to terminate matching does time stamps. Could this be a bug on your end? Is this logging done inside of the workflow that you are terminating? Terminating a workflow from within itself might lead to undefined behavior Meaning, if an error is thrown, it gets gracefully handled (we log the error and throw in Sentry) before terminating and that's what I interpreted from here.

ajay | dreamlit.aiOP•4w ago

So we restarted these polling jobs, which calls .terminate() before starting again (because our system thinks the worker is still running, since it never went through the error fallback logic) So maybe that's what you're seeing on your end? Timeline is basically: - PollWorkflow runs, we log the workerId - spontanenously terminates (which doesn't clear out the workerId in our db since it doesn't go through error fallback logic) - we notice, and we restart polling - restarting polling calls .terminate bc it thinks workerId still running based on db, then kicks off new PollWorkflow instance.

ajay | dreamlit.aiOP•4w ago

We don't terminate a workflow within itself. Here's the worker code, and only place we .terminate this workflow

stopPolling.tsx

ajay | dreamlit.aiOP•4w ago

Hang on... we are seeing CANCEL logs for these. Maybe this is on our end.

avenceslau•4w ago

I suspect this is the culprit (await instance.status()).status === "running" don't know if your workflow sleeps or waitForEvents but if it does it might change state to waiting you would call terminate right? (true for any other state )

ajay | dreamlit.aiOP•4w ago

Actually that would mean we would not call .terminate() based on the logic there if the workflow is sleeping (which is certainly a bug but not the culprit here)

if ((await instance.status()).status === "running") {
          console.log(
            `Client database connection ${clientDatabaseConnection.id} is being polled by worker ${clientDatabaseConnection.processingWorkerId}. Sending terminate...`,
          );

          try {
            await instance.terminate();

if ((await instance.status()).status === "running") {
          console.log(
            `Client database connection ${clientDatabaseConnection.id} is being polled by worker ${clientDatabaseConnection.processingWorkerId}. Sending terminate...`,
          );

          try {
            await instance.terminate();

avenceslau•4w ago

Bad reading on my side, please investigate a bit further, if you find out that this is on us let me know, I will be happy to help

ajay | dreamlit.aiOP•4w ago

Appreciate it. Checked our session replay for the user it appears they actually initiated this flow based on the path they took. So this is completely expected, no issue from CF Workflows side. Many thanks for investigating regardless.

avenceslau•4w ago

You are welcome 🫡

Gaming

Programming

Hey @avenceslau | Workflows We have

Did you find this page helpful?