Hello! need to scrape multiple links within same page using Puppeteer

The title of my post is not explicit , in fact I am scraping a website with multiple products but the category tree of the menu is a little complex, There are products category on the top level and products families on the sub-level, I need to retrieve a count of the products of each category but the only way to do that is to go through each family of product. I am stuck at this point how do I enqueue Links within the category products page ( there are more than one families within a category ); IN ADDITION I need to push the name of the category in the dataset, AND the name of each family as such: { "category" { "family" : "family_name", "detail": 10}} Here is my code :
await enqueueLinks({
globs:['https://www.1fotrade.com/pro/categorie-*'],
selector:'td.element_menu a',
label:'list'
})

});

router.addHandler('list', async ({ enqueueLinks, request, page, log }) => {
log.debug(`Extracting data: ${request.url}`);
const category = await page.$eval('h1',el=> el.textContent);

await enqueueLinks({
globs:['https://www.1fotrade.com/pro/famille-*'],
selector:'.menu_titre_famille a',
label: 'detail',
});
await Dataset.pushData({
category
});

});

router.addHandler( 'detail',async ({ request, page, log }) => {
const title = await page.title();
log.info(`${title}`, { url: request.loadedUrl });
const items = await page.$$('.liste_produit tr.productLine');
const detail = items.length;
const name = await page.$eval('h1',el => el.textContent);


await Dataset.pushData({
name, detail
});
});
await enqueueLinks({
globs:['https://www.1fotrade.com/pro/categorie-*'],
selector:'td.element_menu a',
label:'list'
})

});

router.addHandler('list', async ({ enqueueLinks, request, page, log }) => {
log.debug(`Extracting data: ${request.url}`);
const category = await page.$eval('h1',el=> el.textContent);

await enqueueLinks({
globs:['https://www.1fotrade.com/pro/famille-*'],
selector:'.menu_titre_famille a',
label: 'detail',
});
await Dataset.pushData({
category
});

});

router.addHandler( 'detail',async ({ request, page, log }) => {
const title = await page.title();
log.info(`${title}`, { url: request.loadedUrl });
const items = await page.$$('.liste_produit tr.productLine');
const detail = items.length;
const name = await page.$eval('h1',el => el.textContent);


await Dataset.pushData({
name, detail
});
});
1 Reply
adverse-sapphire
adverse-sapphire•3y ago
For use cases this complex and specific, I would recommend manually looping through each category, and then for each category looping through each family under that category. Within each iteration, manually creating a request and scraping the data you need. enqueueLinks is just not flexible enough for your case 😄

Did you find this page helpful?