Mastra Cloud start-async returns 524 on cold start
Mastra Cloud
Hi,
I’m intermittently getting HTTP 524 (Cloudflare timeout) when calling:
POST /api/workflows/<id>/start-async
from a GitHub Actions cron, even though the payload is tiny and valid.
Symptom The start-async call returns 524 A timeout occurred (Cloudflare HTML). Other days, the exact same request (same payload, same workflow) returns 2xx and runs fine. There is no --max-time on this curl call, so the timeout happens between Cloudflare and Mastra Cloud, not on the GitHub side.
What I’ve already done 1. Removed local debug network calls
Lowered LLM concurrency. Simplified the agent prompt and made RAG more targeted (low topK, condensed corpus). Occasional OpenAI rate_limit_exceeded errors are handled via fallback and don’t break the workflow.
3. Fixed PgVector cold-start init
Previously: new PgVector(DATABASE_URL) at module load in my Mastra entrypoint → could block cold start. Now: PgVector only created lazily inside my RAG tools (singleton) on the first RAG call, i.e. after start-async should have responded.
Despite this, I still see 524 on start-async.
Hypothesis & Questions I expected start-async to:
1. Enqueue a run, 2. Respond quickly (200/202) with { runId, status: "started" }, 3. Do heavy work (DB, RAG, LLM) in the background.
Observed behavior suggests that on some cold starts, start-async is blocked by startup work (bundle download / init / DB) and Cloudflare times out before the first response.
Questions:
- Can start-async be delayed by cold-start initialization on Mastra Cloud? - Are there known cases where this leads to 524 even with a tiny payload? - Any recommended way (config / pattern) to guarantee that start-async responds quickly regardless of cold start?
Happy to share specific request IDs / headers in DM if that helps trace a failing call.