Firecrawl doesn't seem to crawl everything
I'm trying to run Firecrawl on pytorch's documentation, and I merely get ~15 results with these URLs:
Clearly it's missing out on a whole lot of pages. Here's how I'm calling it:
Am I missing something? Are any of the other default parameters truncating the crawl somehow?
3 Replies
hey @Julia from Storia your parameters for maxDepth and limit don't seem right. You should use 10000.
I tested the url:
and I was able to get 15 URLs for this page. Another option to consider for retrieving more URLs is setting
crawlerOptions.allowBackwardCrawling = true
. This allows the crawler to retrieve URLs beyond those containing the base URL.I also got 15 URLs, but I was expecting hundreds.
Also, in Python, 10_000 means 10000
Python Enhancement Proposals (PEPs)
PEP 515 – Underscores in Numeric Literals | peps.python.org
Python Enhancement Proposals (PEPs)