Accessing browser.newPage() inside PuppeteerCrawler
Hi, I'm trying to integrate the puppeteer-extra-plugin-recaptcha into my crawling, and I've gotten everything working except for one bit: in the documentation it says I need to create a new page with
However, I can't figure out where I can hook the page with that call to get the captcha integration working properly. My thoughts were that it would need to be done in the preNavigationHooks - maybe through crawlingContext?
Any ideas/pointers would be greatly appreciated!
5 Replies
narrow-tan•3y ago
preNavigationHooks
in PuppeteerCrawlerOptions
should work for you:
https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerCrawlerOptions#preNavigationHooks
example:
harsh-harlequinOP•3y ago
that part makes sense, but is there any way to set the page to browser.newPage() or does it do that in the background automatically?
inland-turquoise•3y ago
In the background, Crawlee already runs
browser.newPage()
for you. Launching browsers or creating pages yourself can cause issues.wise-white•3y ago
You can pass in your own launcher, basically what you get from Puppeteer to the crawler - https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerLaunchContext#launcher
PuppeteerLaunchContext | API | Crawlee
Apify extends the launch options of Puppeteer.
You can use any of the Puppeteer compatible
LaunchOptions
options by providing the launchOptions
property.
Example:
```js
// launch a headless Chrome (not Chromium)
const launchContext = {
// Apify helpers
useCh...wise-white•3y ago
Basically, you import puppeteer, wrap it with the extra and than provide that into the options to the crawler