Getting puppeteer-har and autoconsent to work with puppeteer crawler
Hi guys,
I am totally new to crawlee, so this might or might not be an easy question.
I want to get all the cookies and the third party trackers or resources from our website and monitor any changes. The changes are done by a Website Agency and I want to be sure we keep compliant with the privacy regulations.
So I thought it would be a good Idea to use duckduckgos autoconsent to first consent to all cookies. Then I want to list all connections that are made e.g. by google fonts or CDNs. For this I thought of using puppeteer-har.
I have originally done this with vanilla puppeteer and this worked, but I need a crawler to get all the links on our website. So I stumbled upon crawlee. I tried to put my original script inside the requestHandler but the result.har file is empty:
{"log":{"version":"1.2","creator":{"name":"chrome-har","version":"0.11.12","comment":"https://github.com/sitespeedio/chrome-har"},"pages":[],"entries":[]}}
I guess this is due the page.goto method already invoked before puppeteer-har is initialized. So I need to build something like this:
const puppeteer = require('puppeteer');
const PuppeteerHar = require('puppeteer-har');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const har = new PuppeteerHar(page);
await har.start({ path: 'results.har' });
await page.goto('http://example.com');
await har.stop();
await browser.close();
})();
with puppeteerCrawler.
If I am totally lost and all of this can be done much easier, just tell me.
Thanks for your time and your answers!1 Reply
statutory-emerald•3y ago
You need to start the collection in
preNavigationHooks
and stop it in requestHandler
You need to connect these two so I recommend just having a map object between request.uniqueKey
and the initialized har
object