Batch Scrape Delay

Hi there, Might be something very obvious that I am missing but I will ask anyhow. I am trying to play a bit with batch scrape. I use the documentation sample and I am not getting any results with sync and async use. After checking out the IDs that I get (they appear normally on the activity log on the dashboard), they seem to be on status "Scraping" when I check through API. This is already 12 hour past since I sent the request. Is it normal? And if yes, what does the 'sucess': True mean anyhow? {'success': True, 'status': 'scraping', 'total': 0, 'completed': 0, 'creditsUsed': 0, 'expiresAt': '2025-01-02T20:13:18.000Z', 'data': [], 'error': None, 'next': 'https://api.firecrawl.dev/v1/batch/.....'} Thanks in advance
22 Replies
Justin
Justin10mo ago
I am getting the exact same problem I am using this with 1 url and the total is 1 when I first call the status api, but then it goes to 0 on the next call. It's like the ID is getting disassociated with the url it is supposed to track @Moderator Can we get some help with this? It's blocking my work atm
lol882192
lol88219210mo ago
been getting the same issue, it takes forever to scrape 2 pages, but regular single page scrape works fine @Adobe.Flash is there any fix for this?
Adobe.Flash
Adobe.Flash10mo ago
Hey yall, we are investigating! @Justin @lol882192 can you share the urls you are trying so I can replicate here?
lol882192
lol88219210mo ago
Hey Nick, This is the code
batch_scrape_job = app.async_batch_scrape_urls(urls=['firecrawl.dev', 'mendable.ai'], params={'formats': ['markdown']})
print(batch_scrape_job)

# (async) You can then use the job ID to check the status of the batch scrape:
batch_scrape_status = app.check_batch_scrape_status(batch_scrape_job['id'])
while True:
print(batch_scrape_status)
if batch_scrape_status.get('status') == 'completed':
break
time.sleep(1)

print(batch_scrape_status)
batch_scrape_job = app.async_batch_scrape_urls(urls=['firecrawl.dev', 'mendable.ai'], params={'formats': ['markdown']})
print(batch_scrape_job)

# (async) You can then use the job ID to check the status of the batch scrape:
batch_scrape_status = app.check_batch_scrape_status(batch_scrape_job['id'])
while True:
print(batch_scrape_status)
if batch_scrape_status.get('status') == 'completed':
break
time.sleep(1)

print(batch_scrape_status)
I pulled this code from the fire crawl ai bot to figure out where I was making the mistake, could not get any answers Let me know if you wanna see the output on my end
Justin
Justin10mo ago
@Adobe.Flash I sent it through the pylon chat
MadBlackPig
MadBlackPigOP10mo ago
@Adobe.Flash I am getting the same issue by just trying to scrape the sites from the documentation example: batch_scrape_result = app.batch_scrape_urls(['firecrawl.dev', 'mendable.ai'], {'formats': ['markdown', 'html']})
mogery
mogery10mo ago
Hey @MadBlackPig @Justin @lol882192 we just deployed a fix for this, sorry for the inconvenience.
MadBlackPig
MadBlackPigOP10mo ago
Just tried it out, I confirm that everything works on my side. Thanks a lot!
lol882192
lol88219210mo ago
Just checked it on my side, its still taking quite a while for 2 pages:
No description
mogery
mogery10mo ago
just checked the logs on our side -- on our end seems like the scrapes finished almost immediately still nothing?
lol882192
lol88219210mo ago
console keeps printing out the id, I'll check what the status is on my end with the ID alone Not sure what happened, but I briefly closed the program to try and upgrade my firecrawl-py install, and when I came back it said it was all done.
mogery
mogery10mo ago
ah, odd
lol882192
lol88219210mo ago
Gonna try it again How long should I be waiting to get a return?
mogery
mogery10mo ago
like, 5 seconds should get both pages
lol882192
lol88219210mo ago
just started the job, 6f35f09b-44f2-4d53-aaa2-e71da3580819?skip=0'} https://api.firecrawl.dev/v1/batch/scrape/6f35f09b-44f2-4d53-aaa2-e71da3580819?skip=0
lol882192
lol88219210mo ago
ok I still see the same thing:
No description
mogery
mogery10mo ago
oh lol while True: batch_scrape_status = app.check_batch_scrape_status(batch_scrape_job['id']) print(batch_scrape_status) if batch_scrape_status.get('status') == 'completed': break time.sleep(1) print(batch_scrape_status) try this you were printing the same status over and over again not querying again
lol882192
lol88219210mo ago
omg that is weirdly embarrasing
mogery
mogery10mo ago
haha it's fine :D i have a few more embarrassing commits on the firecrawl repo :p
lol882192
lol88219210mo ago
👍 When I do a batch scrape, is there any way to isolate the markdowns of each of the sources? I need to scrape both, but I'd like to store them separately
mogery
mogery10mo ago
Yeah, each object in the data array will have its own markdown field
lol882192
lol88219210mo ago
I will check it out, thanks again!

Did you find this page helpful?