maxDiscoveryDepth
Hello, I would like to ask how does the maxDiscoveryDepth works? Right now I am trying the depth to be two and the limit to be 10 for https://books.toscrape.com/ to test this parameters. I somehow don't get it. Tho the results were like this:
Does it follow the maxDepth that I wanted?
9 Replies
Hi @edsaur
maxDiscoveryDepth
means the maximum number of “hops” from the first/parent page that Firecrawl will follow when discovering new URLs.
if you set maxDiscoveryDepth:2
Depth 0 → only the first/parent page. https://books.toscrape.com
Depth 1 → the first/parent page + pages directly linked from it. https://books.toscrape.com/catalogue/category/books/travel_2/index.html
Depth 2 → the first/parent page → pages linked from it → pages linked from those.
and limit is the number total pages it will fetch, even if more are available.
I hope this answer's how maxDiscovery depth works - https://docs.firecrawl.dev/advanced-scraping-guide#maxdiscoverydepthWould it be the "links" that is the depth 2? Because I think it's pure https://books.toscrape.com/catalogue/category/books/travel_2/index.html that I recieve from depth that is two and the limit is 10...
So to have other pages, the limit should be more than 10 right?
Thank you so much! I am just so new with this T_T
Yes, correct.
Thanks alot for the help @Gaurav Chadha!
Another questions since, we have the "prompt" in scraping and also I believe in crawl endpoint, if a website is in another language than english could I prompt to translate what are we scraping? And does firecrawl support multilingual sites?
You could use JSON mode to get translated content. But the /crawl's prompt is just there to automatically generate the crawl parameters for you.
Thanks! Do I need the OpenAI_BASE_URL if ever? Because I tried /extract and it gave me an error saying that I dont have any BASE_URL
I am using a self-hosted btw
yes, as
/extract
uses structured data extraction from scraped content using LLM so OPENAI_BASE_URL
will be rquired for self-hosted environment.Noted sir, thank you so much! But for normal /scrape and /crawl prompt we just need the OPENAI_API_KEY right?
only if you require to use the LLM otherwise you can skip