Batch Scrape Delay
Hi there,
Might be something very obvious that I am missing but I will ask anyhow. I am trying to play a bit with batch scrape. I use the documentation sample and I am not getting any results with sync and async use. After checking out the IDs that I get (they appear normally on the activity log on the dashboard), they seem to be on status "Scraping" when I check through API. This is already 12 hour past since I sent the request. Is it normal? And if yes, what does the 'sucess': True mean anyhow?
{'success': True, 'status': 'scraping', 'total': 0, 'completed': 0, 'creditsUsed': 0, 'expiresAt': '2025-01-02T20:13:18.000Z', 'data': [], 'error': None, 'next': 'https://api.firecrawl.dev/v1/batch/.....'}
Thanks in advance
22 Replies
I am getting the exact same problem
I am using this with 1 url and the total is 1 when I first call the status api, but then it goes to 0 on the next call.
It's like the ID is getting disassociated with the url it is supposed to track
@Moderator Can we get some help with this? It's blocking my work atm
been getting the same issue, it takes forever to scrape 2 pages, but regular single page scrape works fine
@Adobe.Flash is there any fix for this?
Hey yall, we are investigating!
@Justin @lol882192 can you share the urls you are trying so I can replicate here?
Hey Nick,
This is the code
I pulled this code from the fire crawl ai bot to figure out where I was making the mistake, could not get any answers
Let me know if you wanna see the output on my end
@Adobe.Flash I sent it through the pylon chat
@Adobe.Flash I am getting the same issue by just trying to scrape the sites from the documentation example:
batch_scrape_result = app.batch_scrape_urls(['firecrawl.dev', 'mendable.ai'], {'formats': ['markdown', 'html']})
Hey @MadBlackPig @Justin @lol882192 we just deployed a fix for this, sorry for the inconvenience.
Just tried it out, I confirm that everything works on my side. Thanks a lot!
Just checked it on my side, its still taking quite a while for 2 pages:

just checked the logs on our side -- on our end seems like the scrapes finished almost immediately
still nothing?
console keeps printing out the id, I'll check what the status is on my end with the ID alone
Not sure what happened, but I briefly closed the program to try and upgrade my firecrawl-py install, and when I came back it said it was all done.
ah, odd
Gonna try it again
How long should I be waiting to get a return?
like, 5 seconds should get both pages
just started the job, 6f35f09b-44f2-4d53-aaa2-e71da3580819?skip=0'}
https://api.firecrawl.dev/v1/batch/scrape/6f35f09b-44f2-4d53-aaa2-e71da3580819?skip=0
ok I still see the same thing:

oh lol
while True:
batch_scrape_status = app.check_batch_scrape_status(batch_scrape_job['id'])
print(batch_scrape_status)
if batch_scrape_status.get('status') == 'completed':
break
time.sleep(1)
print(batch_scrape_status)
try this
you were printing the same status over and over again
not querying again
omg that is weirdly embarrasing
haha it's fine :D
i have a few more embarrassing commits on the firecrawl repo :p
👍
When I do a batch scrape, is there any way to isolate the markdowns of each of the sources? I need to scrape both, but I'd like to store them separately
Yeah, each object in the data array will have its own markdown field
I will check it out, thanks again!