When architecting workflows and steps,

When architecting workflows and steps, what are the pros/cons of using a queue in conjunction with a workflow vs. just running a loop and performing a step. Example: I read from the db a list of URLs that need to be moved over to R2. What would be better architecture: Read in the list, loop through and perform batch steps for each? Or add each one to the queue which then triggers a new workflow for each. Here is what I have come up with, curious to hear what else I should be thinking about. No Queue (stay within workflow) approach: Pros - avoid queue pricing; can aggregate results of all of them and then do more processing once they are all done. Cons - may hit the 1024 steps limit depending on size of list and number of inner steps; all share same workflow storage so may hit 1GB storage limit Use Queue (run new workflow per item) approach: Pros - can rely on queue retries; starting new workflow reduces number of inner steps to avoid 1024 limit; allow for 1GB storage for each Cons - consumes more workflow requests, increasing price; not able to easily process after all files in list are processed. added cost due to using queue. Anything I'm missing or misinterpreting? Curious how others architect your workflows. thanks
1 Reply
scook
scook3mo ago
Following, because I have the same questions. Here's a couple other pros/cons that depend on your use case: 1. The "retryability" is fundamentally different between workflows vs. queues. If a workflow step fails, it will keep retrying that step until either it succeeds or the entire workflow ends with an error. If a queue item fails, it marks that one for retry/delay and continues on with the other items. Thus, an issue with a single item is capable of blocking the entire workflow, whereas the opposite is true for queues. 2. Queues can have multiple producers that are sending messages to it. So theoretically you have multiple workflows, or multiple invocations of the same workflow running in parallel, all writing to the same queue. I can imagine some scenarios where you'd want the queue to be a single 'chokepoint' in your pipeline to control throughput. If you have all your logic in a Workflow alone, perhaps consider if its dangerous for multiple instances of your workflow to be running simultaneously.

Did you find this page helpful?