Hey all,
I’m working on a project where we ingest U.S. Federal Procurement Data System (FPDS) contract data into Supabase. The source is the FPDS Atom feed (XML), which we parse and store into a table.
Right now we already have ~2.8M records ingested successfully, but the process is hitting limits as we scale up to larger departments. The main challenges are:
• Supabase Edge Function runtime (23s cap). Works fine for small slices, but long-running pulls time out.
• Unstable FPDS API. It often times out or drops the connection, which breaks ingestion.
• No automatic resume. If a run fails, we either lose progress or need to re-trigger manually.
What I’m trying to figure out is the most efficient and scalable way to handle this kind of workload in Supabase. Specifically:
• Is it realistic to manage a dataset of this scale (millions of records) fully within Supabase Edge Functions?
• How do people usually approach resuming and recovery when the upstream API is flaky and fails for many reasons [timeout, gateway errors, etc]?
• For very large backfills, is the common practice to stay entirely within Supabase, or to offload the ingestion to an external worker service and only use Supabase for storage?
I’d love to hear from others who’ve dealt with big/unstable external data sources and bulk ingestion in Supabase. What patterns have worked well for you?
Thanks in advance!