Custom headers

I have a suuuper secure website that I'm trying to scrape and now I want to try to use the sitemaps and use google.com as referer . How can I put this header for all requests ?
7 Replies
Pepa J
Pepa J3y ago
Hello @NeoNomade , What crawler do you use cheerio/pupeteer?
NeoNomade
NeoNomadeOP3y ago
Pupeteer @Pepa J
Pepa J
Pepa J3y ago
@NeoNomade Have you tried setting extra headers in postPageCreateHooks?
// ...
postPageCreateHooks: [async (page) => {
await page.setExtraHTTPHeaders({
referer: 'google.com'
})
}],
// ...
// ...
postPageCreateHooks: [async (page) => {
await page.setExtraHTTPHeaders({
referer: 'google.com'
})
}],
// ...
NeoNomade
NeoNomadeOP3y ago
no, this is in the crawler creation in main.js ?
Pepa J
Pepa J3y ago
yes
NeoNomade
NeoNomadeOP3y ago
ok, will try
ArgumentError: Did not expect property `postPageCreateHooks` to exist, got `async (page) => {
await page.setExtraHTTPHeaders({
referer: 'https://www.google.com'
})
}` in object `PuppeteerCrawlerOptions`
at ow (/run/media/neonomade/work/technitool_scrapers/MSC_Puppeteer/node_modules/ow/dist/index.js:36:24)
at new PuppeteerCrawler (/run/media/neonomade/work/technitool_scrapers/MSC_Puppeteer/node_modules/@crawlee/puppeteer/internals/puppeteer-crawler.js:77:26)
at file:///run/media/neonomade/work/technitool_scrapers/MSC_Puppeteer/src/main.js:22:17
at ModuleJob.run (node:internal/modules/esm/module_job:194:25) {
validationErrors: Map(1) {
'PuppeteerCrawlerOptions' => Set(1) {
'Did not expect property `postPageCreateHooks` to exist, got `async (page) => {\n' +
' await page.setExtraHTTPHeaders({\n' +
" referer: 'https://www.google.com'\n" +
' }) \n' +
' }` in object `PuppeteerCrawlerOptions`'
}
}
}

Node.js v18.15.0
ArgumentError: Did not expect property `postPageCreateHooks` to exist, got `async (page) => {
await page.setExtraHTTPHeaders({
referer: 'https://www.google.com'
})
}` in object `PuppeteerCrawlerOptions`
at ow (/run/media/neonomade/work/technitool_scrapers/MSC_Puppeteer/node_modules/ow/dist/index.js:36:24)
at new PuppeteerCrawler (/run/media/neonomade/work/technitool_scrapers/MSC_Puppeteer/node_modules/@crawlee/puppeteer/internals/puppeteer-crawler.js:77:26)
at file:///run/media/neonomade/work/technitool_scrapers/MSC_Puppeteer/src/main.js:22:17
at ModuleJob.run (node:internal/modules/esm/module_job:194:25) {
validationErrors: Map(1) {
'PuppeteerCrawlerOptions' => Set(1) {
'Did not expect property `postPageCreateHooks` to exist, got `async (page) => {\n' +
' await page.setExtraHTTPHeaders({\n' +
" referer: 'https://www.google.com'\n" +
' }) \n' +
' }` in object `PuppeteerCrawlerOptions`'
}
}
}

Node.js v18.15.0
Pepa J
Pepa J3y ago
Ah, I am sorry, possible should be:
const crawler = new PuppeteerCrawler({
proxyConfiguration,
requestHandler: router,
browserPoolOptions: {
postPageCreateHooks: [async (page) => {
await page.setExtraHTTPHeaders({
referer: 'google.com'
})
}],
},
});
const crawler = new PuppeteerCrawler({
proxyConfiguration,
requestHandler: router,
browserPoolOptions: {
postPageCreateHooks: [async (page) => {
await page.setExtraHTTPHeaders({
referer: 'google.com'
})
}],
},
});

Did you find this page helpful?