Firecrawl•10mo ago

Batch Scrape Delay

Hi there, Might be something very obvious that I am missing but I will ask anyhow. I am trying to play a bit with batch scrape. I use the documentation sample and I am not getting any results with sync and async use. After checking out the IDs that I get (they appear normally on the activity log on the dashboard), they seem to be on status "Scraping" when I check through API. This is already 12 hour past since I sent the request. Is it normal? And if yes, what does the 'sucess': True mean anyhow? {'success': True, 'status': 'scraping', 'total': 0, 'completed': 0, 'creditsUsed': 0, 'expiresAt': '2025-01-02T20:13:18.000Z', 'data': [], 'error': None, 'next': 'https://api.firecrawl.dev/v1/batch/.....'} Thanks in advance

22 Replies

Justin•10mo ago

I am getting the exact same problem I am using this with 1 url and the total is 1 when I first call the status api, but then it goes to 0 on the next call. It's like the ID is getting disassociated with the url it is supposed to track @Moderator Can we get some help with this? It's blocking my work atm

lol882192•10mo ago

been getting the same issue, it takes forever to scrape 2 pages, but regular single page scrape works fine @Adobe.Flash is there any fix for this?

Adobe.Flash•10mo ago

Hey yall, we are investigating! @Justin @lol882192 can you share the urls you are trying so I can replicate here?

lol882192•10mo ago

Hey Nick, This is the code

batch_scrape_job = app.async_batch_scrape_urls(urls=['firecrawl.dev', 'mendable.ai'], params={'formats': ['markdown']})
print(batch_scrape_job)

# (async) You can then use the job ID to check the status of the batch scrape:
batch_scrape_status = app.check_batch_scrape_status(batch_scrape_job['id'])
while True:
    print(batch_scrape_status)
    if batch_scrape_status.get('status') == 'completed':
        break
    time.sleep(1)

print(batch_scrape_status)

batch_scrape_job = app.async_batch_scrape_urls(urls=['firecrawl.dev', 'mendable.ai'], params={'formats': ['markdown']})
print(batch_scrape_job)

# (async) You can then use the job ID to check the status of the batch scrape:
batch_scrape_status = app.check_batch_scrape_status(batch_scrape_job['id'])
while True:
    print(batch_scrape_status)
    if batch_scrape_status.get('status') == 'completed':
        break
    time.sleep(1)

print(batch_scrape_status)

I pulled this code from the fire crawl ai bot to figure out where I was making the mistake, could not get any answers Let me know if you wanna see the output on my end

Justin•10mo ago

@Adobe.Flash I sent it through the pylon chat

MadBlackPigOP•10mo ago

@Adobe.Flash I am getting the same issue by just trying to scrape the sites from the documentation example: batch_scrape_result = app.batch_scrape_urls(['firecrawl.dev', 'mendable.ai'], {'formats': ['markdown', 'html']})

mogery•10mo ago

Hey @MadBlackPig @Justin @lol882192 we just deployed a fix for this, sorry for the inconvenience.

MadBlackPigOP•10mo ago

Just tried it out, I confirm that everything works on my side. Thanks a lot!

lol882192•10mo ago

Just checked it on my side, its still taking quite a while for 2 pages:

mogery•10mo ago

just checked the logs on our side -- on our end seems like the scrapes finished almost immediately still nothing?

lol882192•10mo ago

console keeps printing out the id, I'll check what the status is on my end with the ID alone Not sure what happened, but I briefly closed the program to try and upgrade my firecrawl-py install, and when I came back it said it was all done.

mogery•10mo ago

ah, odd

lol882192•10mo ago

Gonna try it again How long should I be waiting to get a return?

mogery•10mo ago

like, 5 seconds should get both pages

lol882192•10mo ago

just started the job, 6f35f09b-44f2-4d53-aaa2-e71da3580819?skip=0'} https://api.firecrawl.dev/v1/batch/scrape/6f35f09b-44f2-4d53-aaa2-e71da3580819?skip=0

lol882192•10mo ago

ok I still see the same thing:

mogery•10mo ago

oh lol while True: batch_scrape_status = app.check_batch_scrape_status(batch_scrape_job['id']) print(batch_scrape_status) if batch_scrape_status.get('status') == 'completed': break time.sleep(1) print(batch_scrape_status) try this you were printing the same status over and over again not querying again

lol882192•10mo ago

omg that is weirdly embarrasing

mogery•10mo ago

haha it's fine :D i have a few more embarrassing commits on the firecrawl repo :p

lol882192•10mo ago

👍 When I do a batch scrape, is there any way to isolate the markdowns of each of the sources? I need to scrape both, but I'd like to store them separately

mogery•10mo ago

Yeah, each object in the data array will have its own markdown field

lol882192•10mo ago

I will check it out, thanks again!

Gaming

Programming

Batch Scrape Delay

Did you find this page helpful?