Firecrawl•16mo ago

scrape job status

Hi! I noticed when scraping web sites with a lot of pages (>1000) scrape job gets stuck (or just job status) in a state from where I don't know any more what is going on. For example, right now I have a running job (job has been limited to max of 1000 scrape urls), and if I fetch the status using the API, I get the following data: - status: active - current: 1000 - total: 1000 - data: 0 items in the list - partial_data: 50 items in the list This state is now the same for hours. Items in partial_data are the same, from index 951 to 1000. And that's it. Nothing is coming to the webhook, and the job isn't listed in logs (https://www.firecrawl.dev/app/logs). Should this kind of behaviour be expected? Should we wait for hours to get all the complete data?

9 Replies

arrrloOP•16mo ago

Finally, the job failed, the status is failed. But this came after a very long time and without any information to our webhook or anywhere else.

Adobe.Flash•16mo ago

Hm.. that's odd @arrrlo Looking into it. Can you dm me your email so we can analyze the logs and see what happened? It shouldnt' have had this behavior

arrrloOP•16mo ago

dm sent

iwdwebman•16mo ago

I am getting a very similar issue on several websites I am attempting to crawl. I don't even get the current or total and its just stuck in active but on the dashboard I can download the documents. JobID: 52204c2f-6360-4851-97a0-a353fd2f4569 Gets response: { "success": true, "status": "active", "data": null, "partial_data": [] } In others I get no activity log and have nothing to look into for the reason of failure.

mogery•16mo ago

Hey all! Wanted to give y'all an update. We've built a fix for this and we're currently testing it. This behavior is an edge case that should only happen sometimes currently. If all goes well the fix will be live soon.

arrrloOP•16mo ago

hi @mogery , did it go well?

mogery•16mo ago

Hey, it's looking promising but we still need to iterate and test. I'll keep you updated 🤞🏻

arrrloOP•15mo ago

thanks 👍

mogery•15mo ago

Hey all, we've merged in a fix for this, it should be much better once the deployment goes through @arrrlo @iwdwebman

Gaming

Programming

scrape job status

Did you find this page helpful?