batch scrape repeating results
I shared this over X this morning but batch scrape seems to be failing for me today. I send three urls and I get three results but their all for the same url ( usually the first ). I am using the cloud API.
There is a gist here:
https://gist.github.com/kristoph/ee658b7d7fe0ea16a1d435a069be8295
Am I doing something wrong?
Gist
Firecrawl: url of call with result from that that call
Firecrawl: url of call with result from that that call - batch_scrape_failure_0.txt
10 Replies
This is a link to a cleaner gist:
https://gist.github.com/kristoph/b4f0a9e1b239cf50a1ef90c1fb36fbf5
The first line is the payload that creates the batch scrape then subsequently the url's being called and the JSON results.
you can see that even though I passed three distinct URL's I am only getting data for the single one
Gist
Firecrawl: clean batch scrape result
Firecrawl: clean batch scrape result. GitHub Gist: instantly share code, notes, and snippets.
@mogery is looking at it! Thanks for sharing these!
Hi there. We have a bug where the results are returned in reverse order. If you do not specify a
skip
, you should see all 3 results. Working on fixing thisBut will this work longer term or are you just suggesting it as a short term work around?
Oh I see so without the skip the payload repeats until such time as all the results are returned.
This is a shorter term workaround — working on a fix to make it work properly to make this workaround unnecessary
Weird! Looking into it
This actually makes sense to me. The get call without a skip should return whatever results were accumulated thus far.
It's arguably not very efficient but you could poll until status was completed and then just iterate over the results.
( By the way I also found onlyMainContent isn't working with batch scrape but I think someone else already reported that )
Oh, I misunderstood your message. That is correct.
Hey is there a timeline for when this might be fixed?
@mogery
Hey @ikristoph007 I'm having trouble reproducing both of these.
onlyMainContent
straight-up works for me (it's true
by default, disabling it works also). Going to attempt the order bug a few more times