Workflows stuck QUEUED
We are seeing a handful of workflows occasionally be stuck in QUEUED. They never get out of this state. We have to manually restart them. Hasn't been an issue until about a week ago, now seeing this happen sporadically when we start workflows. Any idea what might be up?

50 Replies
We are investigating.
Thank you
Hey 👋 . We're looking into how your instances are not picked up internally. Can you run the
/link
command so that we can take a closer look? It would also help if you could provide an instance ID for one of the faulty instances. Sorry for the inconvenienceYup doing now
Done
Here are a few instances of this issue:
https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/5b469a93-a966-4780-913f-7dc9d87ea3cd
https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/545acf71-af0e-44c6-95a0-4c00fc84b1dd
https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/f0f020bb-a98d-489f-88d4-935d4cc02e73
https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/18ed7d36-38cf-45ec-bfc2-6d5f1b977365
Thanks for looking into this @Caio - lmk if you need anything else
Just saw another instance of this.
https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/9b756d9a-0356-4c12-b4e9-90c448bd3b22
This is quite concerning if we're basically just dropping events every so often.
@avenceslau @Caio
also we facing an issue retrying doesnt work, even though hours passed than first run it seems stuck, it was working two or three days ago

and it is happening for every workflow
Hi can you please link your account with /link
okey
but can you access isnt it private ?
You need to do /link here on discord 😅
I can’t
@avenceslau is there any update
Yup I will have a quick look
if it is needed I can provide other workflow links
If you can send me link to a instance where that has happened.
but I already shared with you that instance
I think you deleted it. At least I don't see it
I shared again now can you see
I copied the url of instance and used with /link
That does not work the way you think. Just paste it here
Thanks will report back in a bit
is there any update
@avenceslau is there anything that should be done by us
@avenceslau
You have to give us some time to investigate. And please don't ping us.
Hey, can you tell me roughly how many instances do you have running on this workflow?
like max 30 for a day
this problem still occurs, today just one instance runned instead of that still stuck retrying
https://dash.cloudflare.com/d717a4f9813d81c0515ede7c76004bd1/workers/workflows/MeetingSummary/instance/GLFRuniao_de_Elegibilidade_do_Visto_L1_41hsu1
Hey I just DM'ed you can you please check?
@avenceslau | Workflows @Caio we are continuing to see this issue across our workflows

It's having impact for our customers. We would appreciate any update you can provide, this is obviously very concerning for us and we really would like to stay on Cloudflare Workflows
Sorry about that we are taking a look at what’s wrong
Thank you. About half our customers were affected
Which workflow is this instance from?
https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/f79fb9d5-bcdf-4663-b3cd-aafced457b97
Is another instance of this
But what's really weird... is that as soon as I load the page. It's like it "notices" it was stuck and then resumes
I assume you're able to pull the workflow from that url, but let me know if not.
Here's another instance, this time I'm screenshotting
https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/11eec632-b356-47c8-961e-7a68bd383b72

It was stuck in QUEUED from our logs for about a day. But then as soon as I loaded the status page there, it seemed to "wake up"
Was this one "stuck" for a day?
What about this one?
Yes they were both stuck

From our internal dashboard. For about 1 day it was stuck in that state (since yesterday about 11am est)
For context: they are polling jobs (they poll, wait 1 minute, then recursively invoke). At the beginning when they start we log an an event in our db and when they restart we also log an event in our db. So that's how we're able to detect when they get stuck
(we see a RESTART event with an instance ID, but we don't see a START event with that instnace ID)
Has this happened with any instances from today (we rolled out a mitigation to this kinds of errors)
I see one instance of this at 3am est today (13 hours ago)


https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/449c661d-b99d-4aa9-8b00-5b491f8feed8
(which, as I noted before, now it started after I navigated to that page, after being stuck in QUEUED for 13 hours)
We rolled out this change 5h ago, instances after that should not be affected
Okay as in we shouldn't see any more workflow instances stuck in that QUEUED state?
New ones, instances that have been created before today might still get affected
Okay thank you. Will keep an eye on it and let you know if we see it again!
Yup do let me know
We are expecting a ton of traffic on Tuesday this coming week so hopefully we should be ironed out by then
Thanks again @avenceslau | Workflows , nice to know you guys got our back!
Hey @avenceslau it just happened again.
I see 3 instances stuck in QUEUED. For example:
https://dash.cloudflare.com/470d4729e23e8936fd2a8f6569770873/workers/workflows/poll-database-workflow/instance/c5681934-852f-4033-a880-a5e38b384bcc

They were stuck in QUEUED for about 18 minutes each. As soon as we noticed them and navigated to the status pages, they started.
You can check the background time and wall time to confirm (~18 minutes)

Just to keep you posted, we are investigating.
great, thank you. (ajay and i work together)