scrape job status
Hi!
I noticed when scraping web sites with a lot of pages (>1000) scrape job gets stuck (or just job status) in a state from where I don't know any more what is going on.
For example, right now I have a running job (job has been limited to max of 1000 scrape urls), and if I fetch the status using the API, I get the following data:
-
status
: active
- current
: 1000
- total
: 1000
- data
: 0 items in the list
- partial_data
: 50 items in the list
This state is now the same for hours. Items in partial_data
are the same, from index 951 to 1000.
And that's it. Nothing is coming to the webhook, and the job isn't listed in logs (https://www.firecrawl.dev/app/logs).
Should this kind of behaviour be expected?
Should we wait for hours to get all the complete data?9 Replies
Finally, the job failed, the status is
failed
.
But this came after a very long time and without any information to our webhook or anywhere else.Hm.. that's odd @arrrlo Looking into it.
Can you dm me your email so we can analyze the logs and see what happened?
It shouldnt' have had this behavior
dm sent
I am getting a very similar issue on several websites I am attempting to crawl. I don't even get the current or total and its just stuck in active but on the dashboard I can download the documents. JobID: 52204c2f-6360-4851-97a0-a353fd2f4569
Gets response:
{
"success": true,
"status": "active",
"data": null,
"partial_data": []
}
In others I get no activity log and have nothing to look into for the reason of failure.
Hey all! Wanted to give y'all an update. We've built a fix for this and we're currently testing it. This behavior is an edge case that should only happen sometimes currently. If all goes well the fix will be live soon.
hi @mogery , did it go well?
Hey, it's looking promising but we still need to iterate and test. I'll keep you updated 🤞🏻
thanks 👍
Hey all, we've merged in a fix for this, it should be much better once the deployment goes through @arrrlo @iwdwebman