Sessions and proxies?

I am having a hard time understanding sessions and proxies. I have the following crawler setup:
const crawler = new PuppeteerCrawler({
requestList,
useSessionPool: true,
persistCookiesPerSession: true,
proxyConfiguration,
requestHandler: router,
requestHandlerTimeoutSecs: 100,
headless: false,
minConcurrency: 20,
maxConcurrency: 30,
launchContext: {
launcher: PuppeteerExtra,
useIncognitoPages: true
},
})
const crawler = new PuppeteerCrawler({
requestList,
useSessionPool: true,
persistCookiesPerSession: true,
proxyConfiguration,
requestHandler: router,
requestHandlerTimeoutSecs: 100,
headless: false,
minConcurrency: 20,
maxConcurrency: 30,
launchContext: {
launcher: PuppeteerExtra,
useIncognitoPages: true
},
})
Basically I want to run the same task concurrently with different proxies. Unless I set useIncognitoPages: true, only one session is used concurrently with one proxy. Is this how it should work? What is the point of having a session pool if only one is used?
4 Replies
rare-sapphire
rare-sapphire•3y ago
Session logic is to stick with IP (proxy) until error and keep cookies as long as session "alive", so if you want random IPs per each request do not use session, if you need cookies handle it by own logic
like-gold
like-goldOP•3y ago
So with concurrency, Crawlee uses the same session in parallel in case I use sessionPool? Regarding manually handling cookies and stuff, probably easier to set useIncognitoPages: true. That way each page has its own proxy and everything is handled. How would I use random proxies without useSessionPool? With the below config Puppeteer is running on the same proxy.
const crawler = new PuppeteerCrawler({
// useSessionPool: true,
requestHandler: router,
maxConcurrency: 2,
headless: false,
proxyConfiguration,
requestList,
})

await crawler.run()
const crawler = new PuppeteerCrawler({
// useSessionPool: true,
requestHandler: router,
maxConcurrency: 2,
headless: false,
proxyConfiguration,
requestList,
})

await crawler.run()
And pages also share cookies.
MEE6
MEE6•3y ago
@Jeno just advanced to level 1! Thanks for your contributions! 🎉
rare-sapphire
rare-sapphire•3y ago
If you need random access then expected way is useSessionPool: false, persistCookiesPerSession: false otherwise I not sure how exactly it will end up with some other session settings and incognito pages, may be SDK will enforce cookies, may be not, never tried this way actually 😉 To check in more details you can add some log output based on context.session

Did you find this page helpful?