BrowserPoolOptions
Hello i have browserPoolOptions in PlaywrightCrawler like this:
browserPoolOptions: {
maxOpenPagesPerBrowser: 0,
useFingerprints: true,
preLaunchHooks: [async (pageId, launchContext) => {
launchContext.launchOptions = {
...launchContext.launchOptions,
// eslint-disable-next-line max-len
userAgent: 'Mozilla/5.0 (Linux; U; Android 3.2; nl-nl; GT-P6800 Build/HTJ85B) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13',
bypassCSP: true,
ignoreHTTPSErrors: true,
hasTouch: true,
isMobile: true,
deviceScaleFactor: 1,
};
}],
},
and when I run this scraper, it will start with scraping page by page but the problem is that the scraper doesn't close craped page so every window is still open so after while actor reach crytical level of memory consuption and fall down.
Do you have any ideas how to fix it?
11 Replies
foreign-sapphire•2y ago
Hello? What is your browser-pool version?
conscious-sapphireOP•2y ago
Hello Petr, I use this versions:
"apify": "^3.1.4",
"crawlee": "^3.3.1",
Hello @Lukas Sirhal ,
May you provide us with the full configuration, and maybe even the website, that you are scraping? Can you replicate the same behaviour on other websites? Does it happen with configuration with mode default values? Can you test if it is relate to any specific option attribute? What is the error in log, only timeout on page?
conscious-sapphireOP•2y ago
const crawler = new PlaywrightCrawler({
useSessionPool: true,
proxyConfiguration,
persistCookiesPerSession: true,
requestHandlerTimeoutSecs: 180,
navigationTimeoutSecs: 180,
requestQueue,
launchContext: {
// launcher: firefox,
launchOptions: {
// useChrome: true,
headless: false
}
},
sessionPoolOptions: {
maxPoolSize: 80,
},
autoscaledPoolOptions: {
desiredConcurrency: 1,
maxConcurrency: 1,
},
browserPoolOptions: {
maxOpenPagesPerBrowser: 0,
useFingerprints: true,
preLaunchHooks: [async (pageId, launchContext) => {
launchContext.launchOptions = {
...launchContext.launchOptions,
// eslint-disable-next-line max-len
userAgent: 'Mozilla/5.0 (Linux; U; Android 3.2; nl-nl; GT-P6800 Build/HTJ85B) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13',
bypassCSP: true,
ignoreHTTPSErrors: true,
hasTouch: true,
isMobile: true,
deviceScaleFactor: 1,
};
}],
},
preNavigationHooks: [async ({ page }, gotoOptions) => {
gotoOptions.waitUntil = 'domcontentloaded';
await playwrightUtils.blockRequests(page);
}],
requestHandler: router,
});
Hello here is whole configuration
And i try this listing url where i pick details url
@Lukas Sirhal just advanced to level 4! Thanks for your contributions! 🎉
conscious-sapphireOP•2y ago
But there are any errors in console... it's just start crawling open every page and scrape data what is fine. But Crawler didn't close the window ... so after few pages is Memory is critically overloaded.
@Lukas Sirhal Works fine for me, can you share what page are you trying to scrape?
conscious-sapphireOP•2y ago
Do you any chance wait in your code for
networkidle
? I see a lot of these new requests outgoing even when the page is fully loaded
conscious-sapphireOP•2y ago
Do you think it will help?
conscious-sapphireOP•2y ago
Because i have same issue : https://console.apify.com/view/runs/tgrq2cNQDjG87u8cJ