mixed headful and headless in a PlaywrightCrawler

I want to check content of some requests in headful mode, approve it then let crawler scrap it in headless mode. I've tried @crawlee/browser-pool but it doesn't seem to have autoscaledPool.
4 Replies
Pepa J
Pepa J2y ago
Hi @le fishe au chocolat I am no sure if I understand. First of all how would you like to "confirm it"? You may want to run two crawlers - one with headfull mode and the second one in headless mode. In the first crawler you may set a countrer for requests being done and abort it once these requests are proceeded.
router.addHandler('detail', async ({ request, page, log, crawler }) => {
const title = await page.title();

console.log(i);
if (i++ > 1) {
await crawler.autoscaledPool.abort();
// crawler.headless = true;
}
log.info(`${title}`, { url: request.loadedUrl });

await Dataset.pushData({
url: request.loadedUrl,
title,
});
});
router.addHandler('detail', async ({ request, page, log, crawler }) => {
const title = await page.title();

console.log(i);
if (i++ > 1) {
await crawler.autoscaledPool.abort();
// crawler.headless = true;
}
log.info(`${title}`, { url: request.loadedUrl });

await Dataset.pushData({
url: request.loadedUrl,
title,
});
});
Then the second crawler starts up and continues in the headlesss mode.
conscious-sapphire
conscious-sapphire2y ago
other way is use two request queues
Pepa J
Pepa J2y ago
the headless option is related to Crawler not a Request. You may simple run into issues by reusing one actor twice.
exotic-emerald
exotic-emeraldOP2y ago
Thank you so much, I finally made it work with 2 crawlers.

Did you find this page helpful?