infinite scrolling

trying to get infinite scrolling to render in all products while scraping them as the page is being scrolled down i looked at the documentation but didnt understand how to do this:
kotnRouter.addHandler('KOTN_DETAIL', async ({ log, page, parseWithCheerio }) => {
log.info(`Scraping product URLs`);

const $ = await parseWithCheerio()

const productUrls: string[] = [];

$('a').each((_, el) => {
let productUrl = $(el).attr('href');
if (productUrl) {
if (!productUrl.startsWith('https://')) {
productUrl = 'https://www.kotn.com' + productUrl;
if(productUrl.includes('/products')){
productUrls.push(productUrl);

}
}
}
});

// Push unique URLs to the dataset
const uniqueProductUrls = Array.from(new Set(productUrls));

await Dataset.pushData({
urls: uniqueProductUrls,
});

await Promise.all(uniqueProductUrls.map(link => kotnPw.addRequests([{ url: link, label: 'KOTN_PRODUCT' }])));

linksCount += uniqueProductUrls.length;

await infiniteScroll(page, {
maxScrollHeight: 0,
});

console.log(uniqueProductUrls);
console.log(`Total product links scraped so far: ${linksCount}`);
// Run bronPuppet crawler once after pushing the first product requests
if (linksCount === uniqueProductUrls.length) {
await kotnPw.run();
}
});
kotnRouter.addHandler('KOTN_DETAIL', async ({ log, page, parseWithCheerio }) => {
log.info(`Scraping product URLs`);

const $ = await parseWithCheerio()

const productUrls: string[] = [];

$('a').each((_, el) => {
let productUrl = $(el).attr('href');
if (productUrl) {
if (!productUrl.startsWith('https://')) {
productUrl = 'https://www.kotn.com' + productUrl;
if(productUrl.includes('/products')){
productUrls.push(productUrl);

}
}
}
});

// Push unique URLs to the dataset
const uniqueProductUrls = Array.from(new Set(productUrls));

await Dataset.pushData({
urls: uniqueProductUrls,
});

await Promise.all(uniqueProductUrls.map(link => kotnPw.addRequests([{ url: link, label: 'KOTN_PRODUCT' }])));

linksCount += uniqueProductUrls.length;

await infiniteScroll(page, {
maxScrollHeight: 0,
});

console.log(uniqueProductUrls);
console.log(`Total product links scraped so far: ${linksCount}`);
// Run bronPuppet crawler once after pushing the first product requests
if (linksCount === uniqueProductUrls.length) {
await kotnPw.run();
}
});
5 Replies
passive-yellow
passive-yellowOP•2y ago
i also want to make sure it scrolls up a little bit every time it scrolls fully down to make sure it renders it in properly
unwilling-turquoise
unwilling-turquoise•2y ago
To make it scroll up a bit every time after it scrolls down, you can use this option: https://crawlee.dev/api/3.1/playwright-crawler/namespace/playwrightUtils#scrollDownAndUp @harish For scraping the products you can either, Wait for the scroll to finish and then select all the products and add them to the queue. Or You can add the infiniteScroll to a Promise.all or Promise.race in orderer for it to keep scrolling while you run another function beside it in the same Promise.all or Promise.race. Or You can run the infiniteScroll function, and inside the stopScrollCallback option, you can collect the products and stop it once you don't find more. https://crawlee.dev/api/3.1/playwright-crawler/namespace/playwrightUtils#stopScrollCallback
passive-yellow
passive-yellowOP•2y ago
how do you implement this into the router do you write it in under a playwrightutils class or what do you do
MEE6
MEE6•2y ago
@harish just advanced to level 5! Thanks for your contributions! 🎉
correct-apricot
correct-apricot•2y ago
Hey @harish, you can either use the context aware method from context object, or you can use the method from playwrighUtils/puppeteerUtils, that needs page object as an argument.

Did you find this page helpful?