userAgent in different crawlers

How to set the userAgent in different crawlers?
5 Replies
foreign-sapphire
foreign-sapphire3y ago
In different crawlers? Or in different requests for the same crawler?
optimistic-gold
optimistic-goldOP3y ago
like cheerio, puppeteer. the apis are not always the same. for my case i need to differentiate between cheerio ua and puppeteer ua.
inland-turquoise
inland-turquoise3y ago
You should use preNavigationHooks for it: https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerCrawlerOptions#preNavigationHooks example for Cheerio:
preNavigationHooks: [
(crawlingContext, requestAsBrowserOptions) => {
requestAsBrowserOptions.headers = {
'User-Agent': 'La Centrale/6.17.1 (iPhone; iOS 13.6; Scale/2.00)',
'accept-language': 'en-US;q=1',
Accept: 'application/json',
};
},
],
preNavigationHooks: [
(crawlingContext, requestAsBrowserOptions) => {
requestAsBrowserOptions.headers = {
'User-Agent': 'La Centrale/6.17.1 (iPhone; iOS 13.6; Scale/2.00)',
'accept-language': 'en-US;q=1',
Accept: 'application/json',
};
},
],
for Puppeteer you should use page object. you can try to use setExtraHTTPHeaders() (inside preNavigationHooks too): https://pptr.dev/next/api/puppeteer.page.setextrahttpheaders example:
await page.setExtraHTTPHeaders({
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36',
'upgrade-insecure-requests': '1',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,en;q=0.8'
})
await page.setExtraHTTPHeaders({
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36',
'upgrade-insecure-requests': '1',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,en;q=0.8'
})
eastern-cyan
eastern-cyan3y ago
or just add it to request object: {url, headers: { 'user-agent': '[UA-STRING]' } }
optimistic-gold
optimistic-goldOP3y ago
@Alexey Udovydchenko this works like a charm - thanks!

Did you find this page helpful?