F
Firecrawl16mo ago
arrrlo

scrape job status

Hi! I noticed when scraping web sites with a lot of pages (>1000) scrape job gets stuck (or just job status) in a state from where I don't know any more what is going on. For example, right now I have a running job (job has been limited to max of 1000 scrape urls), and if I fetch the status using the API, I get the following data: - status: active - current: 1000 - total: 1000 - data: 0 items in the list - partial_data: 50 items in the list This state is now the same for hours. Items in partial_data are the same, from index 951 to 1000. And that's it. Nothing is coming to the webhook, and the job isn't listed in logs (https://www.firecrawl.dev/app/logs). Should this kind of behaviour be expected? Should we wait for hours to get all the complete data?
9 Replies
arrrlo
arrrloOP16mo ago
Finally, the job failed, the status is failed. But this came after a very long time and without any information to our webhook or anywhere else.
Adobe.Flash
Adobe.Flash16mo ago
Hm.. that's odd @arrrlo Looking into it. Can you dm me your email so we can analyze the logs and see what happened? It shouldnt' have had this behavior
arrrlo
arrrloOP16mo ago
dm sent
iwdwebman
iwdwebman16mo ago
I am getting a very similar issue on several websites I am attempting to crawl. I don't even get the current or total and its just stuck in active but on the dashboard I can download the documents. JobID: 52204c2f-6360-4851-97a0-a353fd2f4569 Gets response: { "success": true, "status": "active", "data": null, "partial_data": [] } In others I get no activity log and have nothing to look into for the reason of failure.
mogery
mogery16mo ago
Hey all! Wanted to give y'all an update. We've built a fix for this and we're currently testing it. This behavior is an edge case that should only happen sometimes currently. If all goes well the fix will be live soon.
arrrlo
arrrloOP16mo ago
hi @mogery , did it go well?
mogery
mogery16mo ago
Hey, it's looking promising but we still need to iterate and test. I'll keep you updated 🤞🏻
arrrlo
arrrloOP15mo ago
thanks 👍
mogery
mogery15mo ago
Hey all, we've merged in a fix for this, it should be much better once the deployment goes through @arrrlo @iwdwebman

Did you find this page helpful?