So let's break this apart:
1. You have clients blocking (HTTP? WebSocket?) while the background job processes
2. Some jobs take > 15 minutes to process, over several HTTP calls. This means that each consumer is effectively tied up while it's working on that prompt. This attaches the number of users you have to the number of consumers you need - so even if we could increase to say 20 or 30, you'll quickly run into the same problem at even (just) 2-3x volume, because it sounds like it scales linearly. Worse, if new users ask for more work to be done.
3. You're (somehow?) snapshotting/checkpointing progress per job?