FirecrawlF
Firecrawl16mo ago
Sachin

Unstable behavior of Crawl Jobs for V1

I'm facing this issue of incomplete crawling where the actual count of pages for the website is different from what the crawl job is able to scrape.

The /map endpoint is giving correct count of URLs in the sitemap, which is 104. [ website - freenome.com ]

but the crawling job is able to scrape sometimes 55, 28 or 29 pages only.
Peculiar thing which I have noticed is that the credit usage is also high irrespective of the actual scraped pages.

Here's the output:
Job Info {'status': 'completed', 'completed': 69, 'total': 69, 'creditsUsed': 69, 'expiresAt': '2024-09-04T09:08:31.000Z', 'next': 'http://api.firecrawl.dev/v1/crawl/72dda60f-5384-498d-81b1-0a830dfa0cc8?skip=28'}

shape of the output saved in a pandas dataframe for "data" key: (28, 2)


parameters used -
crawl_params = {
"excludePaths": [],
"includePaths": [],
"maxDepth": 2,
"ignoreSitemap": False,
"limit": 400,
"allowBackwardLinks": False,
"allowExternalLinks": False,
"scrapeOptions": {
"formats": ["markdown"],
"headers": {},
"includeTags": [],
"excludeTags": [],
"onlyMainContent": True,
"waitFor": 300
}
}
NOTE - I have tried using True value for ignoreSitemap parameter as well, but didn't help either.

CCing - @Adobe.Flash @mogery @rafaelmiller
Was this page helpful?