infinite scrolling of pages

i have a crawler that goes through collection pages of stores and scrapes their product links and goes through those product page links to get product data when getting the product links in the collection pages, many sites utilize an infinite scrolling to render in all the products how do i implement infinite scrolling into this specific crawler route handler here below while scraping the product page urls to render in all the products to make sure i scraped all the products on the page:
kotnRouter.addHandler('KOTN_DETAIL', async ({ page, log }) => {
log.info('Scraping product URLs');

await page.goto(page.url(), { waitUntil: 'domcontentloaded' })


const productUrls: string[] = [];

const links = await page.$$eval('a', (elements) =>
elements.map((el) => el.getAttribute('href'))
);

for (const link of links) {
if (link && !link.startsWith('https://')) {
const productUrl = 'https://www.kotn.com' + link;
if (productUrl.includes('/products')) {
productUrls.push(productUrl);
}
}
}

// Push unique URLs to the dataset
const uniqueProductUrls = Array.from(new Set(productUrls));
console.log(uniqueProductUrls);
await Dataset.pushData({
urls: uniqueProductUrls,
});

await Promise.all(
uniqueProductUrls.map((link) => kotnCrawler.addRequests([{ url: link, label: 'KOTN_PRODUCT' }]))
);

linksCount += uniqueProductUrls.length;

console.log(uniqueProductUrls);
console.log(`Total product links scraped so far: ${linksCount}`);

});
z
kotnRouter.addHandler('KOTN_DETAIL', async ({ page, log }) => {
log.info('Scraping product URLs');

await page.goto(page.url(), { waitUntil: 'domcontentloaded' })


const productUrls: string[] = [];

const links = await page.$$eval('a', (elements) =>
elements.map((el) => el.getAttribute('href'))
);

for (const link of links) {
if (link && !link.startsWith('https://')) {
const productUrl = 'https://www.kotn.com' + link;
if (productUrl.includes('/products')) {
productUrls.push(productUrl);
}
}
}

// Push unique URLs to the dataset
const uniqueProductUrls = Array.from(new Set(productUrls));
console.log(uniqueProductUrls);
await Dataset.pushData({
urls: uniqueProductUrls,
});

await Promise.all(
uniqueProductUrls.map((link) => kotnCrawler.addRequests([{ url: link, label: 'KOTN_PRODUCT' }]))
);

linksCount += uniqueProductUrls.length;

console.log(uniqueProductUrls);
console.log(`Total product links scraped so far: ${linksCount}`);

});
z
6 Replies
xenophobic-harlequin
xenophobic-harlequinOP2y ago
(PLAYWRIGHT crawler btw)
eastern-cyan
eastern-cyan2y ago
playwrightUtils | API | Crawlee
A namespace that contains various utilities for Playwright - the headless Chrome Node API. Example usage: ```javascript import { launchPlaywright, playwrightUtils } from 'crawlee'; // Navigate to https://www.example.com in Playwright with a POST request const browser = await launchPlaywright(); c...
xenophobic-harlequin
xenophobic-harlequinOP2y ago
I’m not sure on how to implement those playwright Utils properly to keep scrolling incrementally and use that in my touter sorry but I’m not as experienced with the utils
optimistic-gold
optimistic-gold2y ago
Here is an example on how to use it, it's using Puppeteer but it works the exact same with Playwright, scroll to the infiniteScroll example: https://docs.apify.com/academy/node-js/dealing-with-dynamic-pages#scraping-dynamic-content
How to scrape from dynamic pages | Academy | Apify Documentation
Learn about dynamic pages and dynamic content. How can we find out if a page is dynamic? How do we programmatically scrape dynamic content?
xenophobic-harlequin
xenophobic-harlequinOP2y ago
Thanks! Does this implement the scroll and pause too?
optimistic-gold
optimistic-gold2y ago

Did you find this page helpful?