F
Firecrawl2mo ago
Tom

Tom - Hi guys, please would any of you know how...

Hi guys, please would any of you know how to handle this use case: I need to scrape jobs (headlines are enough) on career pages of companies. The pages are often paginated. If I use the crawl method, the crawler goes through the pagination, that's good, BUT it also crawles all the child pages (e.g. the details of every single job posting). I don't want the child pages scraped (slow, expensive). How can I scrape just the first level of every page in pagination? Note, I don't know what pages I'm scraping, so I don't know what attributes does some pagination button have.
1 Reply
Gaurav Chadha
Gaurav Chadha2mo ago
Hi @Tom Use maxDiscoveryDepth https://docs.firecrawl.dev/advanced-scraping-guide#maxdiscoverydepth which controls how deep the crawler follows links based on discovery order Set maxDiscoveryDepth: 1 to crawl only: The root page (discovery depth 0) Pages directly linked from the root (discovery depth 1) This will capture pagination pages but stop before following links to individual job postings.
Firecrawl Docs
Advanced Scraping Guide | Firecrawl
Learn how to improve your Firecrawl scraping with advanced options.

Did you find this page helpful?