Firecrawl•2mo ago

Tom - Hi guys, please would any of you know how...

Hi guys, please would any of you know how to handle this use case: I need to scrape jobs (headlines are enough) on career pages of companies. The pages are often paginated. If I use the crawl method, the crawler goes through the pagination, that's good, BUT it also crawles all the child pages (e.g. the details of every single job posting). I don't want the child pages scraped (slow, expensive). How can I scrape just the first level of every page in pagination? Note, I don't know what pages I'm scraping, so I don't know what attributes does some pagination button have.

1 Reply

Gaurav Chadha•2mo ago

Hi @Tom Use maxDiscoveryDepth https://docs.firecrawl.dev/advanced-scraping-guide#maxdiscoverydepth which controls how deep the crawler follows links based on discovery order Set maxDiscoveryDepth: 1 to crawl only: The root page (discovery depth 0) Pages directly linked from the root (discovery depth 1) This will capture pagination pages but stop before following links to individual job postings.

Firecrawl Docs

Advanced Scraping Guide | Firecrawl

Learn how to improve your Firecrawl scraping with advanced options.

Gaming

Programming

Tom - Hi guys, please would any of you know how...

Did you find this page helpful?