Right, we should have expected retry
Right, we should have expected retry functionality there correct? It appears that it was in that stuck in that
Running
state. When I queried for the events for this instanceId via the GraphQL analytics API - there was only 1 ATTEMPT_START for that ID, I do not see that any retries were executed.
Even after I terminated the instance, while the workflow run changed the status to Terminated
, the step status still says Running
. See screenshot. Perhaps I've misconfigured something?
Thank you for your help!
8 Replies
@Matt Silverlock For context here are the events from that run:
[
{
"datetime": "2025-03-16T06:37:31Z",
"eventType": "ATTEMPT_START",
"instanceId": "cb2f6bf4-b248-4d1f-9504-cec844eb75f3",
"stepCount": 1,
"stepName": "get client database connection",
"wallTime": 0,
"workflowName": "poll-database-workflow",
"wallTimeTillNextStep": 0
},
{
"datetime": "2025-03-16T06:37:31Z",
"eventType": "STEP_START",
"instanceId": "cb2f6bf4-b248-4d1f-9504-cec844eb75f3",
"stepCount": 1,
"stepName": "get client database connection",
"wallTime": 0,
"workflowName": "poll-database-workflow",
"wallTimeTillNextStep": 0
},
{
"datetime": "2025-03-16T06:37:31Z",
"eventType": "WORKFLOW_START",
"instanceId": "cb2f6bf4-b248-4d1f-9504-cec844eb75f3",
"stepCount": 0,
"stepName": "",
"wallTime": 0,
"workflowName": "poll-database-workflow",
"wallTimeTillNextStep": 1
},
{
"datetime": "2025-03-16T06:37:32Z",
"eventType": "STEP_SUCCESS",
"instanceId": "cb2f6bf4-b248-4d1f-9504-cec844eb75f3",
"stepCount": 1,
"stepName": "get client database connection",
"wallTime": 0,
"workflowName": "poll-database-workflow",
"wallTimeTillNextStep": 0
},
{
"datetime": "2025-03-16T06:37:32Z",
"eventType": "ATTEMPT_SUCCESS",
"instanceId": "cb2f6bf4-b248-4d1f-9504-cec844eb75f3",
"stepCount": 1,
"stepName": "get client database connection",
"wallTime": 0,
"workflowName": "poll-database-workflow",
"wallTimeTillNextStep": 0
}
]
@HardlyWorkin' @Matt Silverlock
Let me know if any guidance on this. We're hoping to rely on Cloudflare Workflows for a polling process, so this seemingly spurious error gives us pause.
Having some clarity here would help us greatly.
cc @Diogo Ferreira
@ajay1495 What is your account ID?
Sorry for the delay. Account ID is: 470d4729e23e8936fd2a8f6569770873
@Matt Silverlock
@Matt Silverlock @Diogo Ferreira
@HardlyWorkin' appreciate any guidance here 🙂
Are these still showing as running?
If the Workflow is showing as terminated - the step showing as “running” is a bug we’re fixing. The fix won’t be retroactive.
No, please see the prior message. Here's the screenshot again for reference.
The workflow was stuck in a running state and it wasn't timing out and retrying.

It was stuck like that for about 1 day and 8 hours, until I manually terminated it.
@Matt Silverlock
I'm sorry - what is your ask? What I said above still applies:
If the Workflow is showing as terminated - the step showing as “running” is a bug we’re fixing. The fix won’t be retroactive.You then said:
The workflow was stuck in a running state and it wasn't timing out and retrying.My understanding is thus the steps are (incorrectly) showing as running, which is a metrics/metadata bug, but the instance is now terminated. What is not clear is whether this is still occurring? We fixed a number of metadata/state bugs, including cases like these.
Sorry - I know it's confusing. There's actually two issues at play here:
1. I noticed that an instance was in a RUNNING state for a long time (both the instance and its single step): about 1 day and 8 hours. See first screenshot (before terminating). The single step that was in the instance only had a single try inside of it that was RUNNING for that entire 1 day and 8 hours. This is counter to expectation.
What should have happened is: after 2 minutes in the single step, the system should have timed out that step, and retried it. You can see the retry policy there.
Instead, the step was stuck in that RUNNING state (along with the instance) for over 1 day.
My question is: do we know why the retry policy wasn't being applied here? Why didn't the system auto retry after 2 minutes and instead had the step running for 1 day 8 hours?
We have a polling process that needs to be online to detect events and this caused an outage for us. Having clarity here would greatly help us build confidence we can rely on CF workflows moving forward.
2. Then, I manually terminated instance (1 day and 8 hrs into the instance being started - see second screenshot). The instance status updated correctly to TERMINATED, but the steps were showing as RUNNING. This is less of an issue and sounds like it may have been addressed.
Thanks for taking the time, @Matt Silverlock !