Getting puppeteer-har and autoconsent to work with puppeteer crawler

Hi guys, I am totally new to crawlee, so this might or might not be an easy question. I want to get all the cookies and the third party trackers or resources from our website and monitor any changes. The changes are done by a Website Agency and I want to be sure we keep compliant with the privacy regulations. So I thought it would be a good Idea to use duckduckgos autoconsent to first consent to all cookies. Then I want to list all connections that are made e.g. by google fonts or CDNs. For this I thought of using puppeteer-har. I have originally done this with vanilla puppeteer and this worked, but I need a crawler to get all the links on our website. So I stumbled upon crawlee. I tried to put my original script inside the requestHandler but the result.har file is empty: {"log":{"version":"1.2","creator":{"name":"chrome-har","version":"0.11.12","comment":"https://github.com/sitespeedio/chrome-har"},"pages":[],"entries":[]}} I guess this is due the page.goto method already invoked before puppeteer-har is initialized. So I need to build something like this: const puppeteer = require('puppeteer'); const PuppeteerHar = require('puppeteer-har'); (async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); const har = new PuppeteerHar(page); await har.start({ path: 'results.har' }); await page.goto('http://example.com'); await har.stop(); await browser.close(); })(); with puppeteerCrawler. If I am totally lost and all of this can be done much easier, just tell me. Thanks for your time and your answers!
1 Reply
statutory-emerald
statutory-emerald3y ago
You need to start the collection in preNavigationHooks and stop it in requestHandler You need to connect these two so I recommend just having a map object between request.uniqueKey and the initialized har object

Did you find this page helpful?