Accessing browser.newPage() inside PuppeteerCrawler

Hi, I'm trying to integrate the puppeteer-extra-plugin-recaptcha into my crawling, and I've gotten everything working except for one bit: in the documentation it says I need to create a new page with
const page = await browser.newPage()
const page = await browser.newPage()
However, I can't figure out where I can hook the page with that call to get the captcha integration working properly. My thoughts were that it would need to be done in the preNavigationHooks - maybe through crawlingContext? Any ideas/pointers would be greatly appreciated!
5 Replies
narrow-tan
narrow-tan3y ago
preNavigationHooksin PuppeteerCrawlerOptions should work for you: https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerCrawlerOptions#preNavigationHooks example:
preNavigationHooks: [async ({ page }, gotoOptions) => {
gotoOptions.waitUntil = 'domcontentloaded';
// use page
await page.title();

}],
preNavigationHooks: [async ({ page }, gotoOptions) => {
gotoOptions.waitUntil = 'domcontentloaded';
// use page
await page.title();

}],
harsh-harlequin
harsh-harlequinOP3y ago
that part makes sense, but is there any way to set the page to browser.newPage() or does it do that in the background automatically?
inland-turquoise
inland-turquoise3y ago
In the background, Crawlee already runs browser.newPage() for you. Launching browsers or creating pages yourself can cause issues.
wise-white
wise-white3y ago
You can pass in your own launcher, basically what you get from Puppeteer to the crawler - https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerLaunchContext#launcher
PuppeteerLaunchContext | API | Crawlee
Apify extends the launch options of Puppeteer. You can use any of the Puppeteer compatible LaunchOptions options by providing the launchOptions property. Example: ```js // launch a headless Chrome (not Chromium) const launchContext = { // Apify helpers useCh...
wise-white
wise-white3y ago
Basically, you import puppeteer, wrap it with the extra and than provide that into the options to the crawler

Did you find this page helpful?