Scraping single page with load more button

Hi, I just discovered Crawlee and seems a very great project. I'm scraping a single url (https://jobs.workable.com/search) that contains a list of items with a load more button. Each time an item is clicked a floating modal show the item information. In this scenario all the power of crawlee to remember visited urls, retries, etc is not a help. My idea is: - From the start page, click on each of the initial items and scrape its content - Click on the load more button and repeat the process. The help I'm requesting is in how to apply best practices for: - how to "remember/store" the last scrapped item index/id - how to handle with errors Thanks in advance
Jobs
Search jobs using the new Job Finder from Workable. Explore thousands of open job listings hosted by Workable‘s all-in-one recruitment software, trusted by companies worldwide.
3 Replies
metropolitan-bronze
metropolitan-bronze3y ago
I'd recommend checking out the infiniteScroll function in Crawlee: https://crawlee.dev/api/puppeteer-crawler/namespace/puppeteerUtils#infiniteScroll Your use case with a Load more button can be solved by using the buttonSelector option which checks and clicks a button if it appears while scrolling. See more in the docs: https://crawlee.dev/api/puppeteer-crawler/namespace/puppeteerUtils#buttonSelector
puppeteerUtils | API | Crawlee
A namespace that contains various utilities for Puppeteer - the headless Chrome Node API. Example usage: ```javascript import { launchPuppeteer, puppeteerUtils } from 'crawlee'; // Open https://www.example.com in Puppeteer const browser = await launchPuppeteer(); const page = await browser.newPage...
metropolitan-bronze
metropolitan-bronze3y ago
And your clicking of each item on the page can be done in the stopScrollCallback: https://crawlee.dev/api/puppeteer-crawler/namespace/puppeteerUtils#buttonSelector
puppeteerUtils | API | Crawlee
A namespace that contains various utilities for Puppeteer - the headless Chrome Node API. Example usage: ```javascript import { launchPuppeteer, puppeteerUtils } from 'crawlee'; // Open https://www.example.com in Puppeteer const browser = await launchPuppeteer(); const page = await browser.newPage...
like-gold
like-goldOP3y ago
Thanks!

Did you find this page helpful?