Tom - Hi guys, please would any of you know how...
Hi guys, please would any of you know how to handle this use case:
I need to scrape jobs (headlines are enough) on career pages of companies.
The pages are often paginated.
If I use the crawl method, the crawler goes through the pagination, that's good, BUT it also crawles all the child pages (e.g. the details of every single job posting). I don't want the child pages scraped (slow, expensive).
How can I scrape just the first level of every page in pagination?
Note, I don't know what pages I'm scraping, so I don't know what attributes does some pagination button have.
1 Reply
Hi @Tom Use
maxDiscoveryDepth https://docs.firecrawl.dev/advanced-scraping-guide#maxdiscoverydepth
which controls how deep the crawler follows links based on discovery order
Set maxDiscoveryDepth: 1 to crawl only:
The root page (discovery depth 0)
Pages directly linked from the root (discovery depth 1)
This will capture pagination pages but stop before following links to individual job postings.Firecrawl Docs
Advanced Scraping Guide | Firecrawl
Learn how to improve your Firecrawl scraping with advanced options.