Retry failed pages during batchscrape[FIRECRAWL SDK]
Is there a way to retry failed pages in batchscrape? i'm using the webhook method with batchscrape in firecrawl sdk and only the successful scrapped pages are returned with batchscrape.page. am i missing something here? i can see in dashboard we are actually logging the failed pages but i don't think its being returned to webhook.
9 Replies
Hi @pratosh Yes, failed pages in batchscrape are sent to your webhook, but as separate events with success: false. You're likely only processing the successful
const errors = await firecrawl.checkBatchScrapeErrors(batchId);
// errors.data contains failed URLs with error messages And then retry with a new batch with just failed URLs.
batchscrape.page events and missing the failure events
However there's no automatic retry mechanism, but you can:
1. Collect Failed URLs from Webhooks
Listen for webhook events where success: false and collect the sourceURL from metadata.
2. Use the Errors Endpoint
After a batch completes, fetch all errors:
// JS SDKconst errors = await firecrawl.checkBatchScrapeErrors(batchId);
// errors.data contains failed URLs with error messages And then retry with a new batch with just failed URLs.
From what i observed, the failed webhooks were not received itself in our server. All the batchscrape.page events that i received were in success true. In cases where few pages failed to be scrapped the batchscrape.completed was also not being received. After increasing the waitFor limit to 3000, the pages now seems to be scrapped and the completed webhook is also received.
From the docs i can notice "You’ll receive one batch_scrape.page event for every URL successfully scraped." which confirms what i have put right?

Yeah, that's correct but I see possible reason seems is the low
waitFor limit, pages are timing out before they can be properly processed as either success or failure. This causes:
- No webhook events for timed-out pages
- No batch_scrape.completed event (because the batch doesn't cleanly finish)
Could you increase the waitFor to some higher value? As if there are failed it will be captured https://github.com/firecrawl/firecrawl/blob/9f4f011a7834a2067cb40cc884379bbc719a968f/apps/api/openapi.json#L168Yeah, i have increased the value of waitFor and it seems to be working as expected, is 3000 is a good enough value for waitFor or is it too high?
Also is there a reason why failed pages are not emitted to webhook? it would really be helpful to rescrape them on the fly instead of waiting for whole batch scrape to finish and then rescrape the failed urls again.
Also i don't think this https://docs.firecrawl.dev/api-reference/endpoint/batch-scrape-get-errors is working for v2. can we check this?
Firecrawl Docs
Get Batch Scrape Errors - Firecrawl Docs
Yeah, 3000 is a good enough, if its working for your case. Regarding this - https://docs.firecrawl.dev/api-reference/endpoint/batch-scrape-get-errors what response you get while executing this API Call?
This is what i get
Are you using /v2 endpoint for this? If so, added a fix here https://github.com/firecrawl/firecrawl/pull/2471
Yes i'm using the v2 endpoint, thanks for the help.
Also is it possible we can get failed pages(scrape) notified via webhooks as well?
I think you'll need to handle that in your webhook? I'll have to check. Feel free to open a GitHub issue on https://github.com/firecrawl/firecrawl.