Crawl using the same tab and session

hey guys, Im using crawlee to crawl a site, but I need it to use the same browser tab to visit the consecutive pages using the same session. right now it opens a new tab for each request and generates a new session for it, then it closes that tab, and uses a new tab and new session for the next request. Which is horrible for anti-bot detection as the session retires the cookies it gets from the first request as well. any ideas what Im doing wrong?
5 Replies
national-gold
national-gold3y ago
I already replied in a different thread, but will keep it there for visibility: Actually - browser is using the same session before it gets retired. Maybe you have it set to retire session after each request? By default it should start with one browser and keep opening new tabs while still using the same IP/session, and I think by default it retires it after like 100 requests or something like that.
stormy-gold
stormy-goldOP3y ago
I dont know why its starting a new session every time, but here's the configuration that I'm using: browserPoolOptions: { maxOpenPagesPerBrowser: 100 }, minConcurrency: 1, maxConcurrency: 1, maxRequestRetries: 1, maxRequestsPerMinute: 2, useSessionPool: true, sessionPoolOptions: { maxPoolSize: 10, sessionOptions: { maxUsageCount: 300, maxAgeSecs: 18000, }, blockedStatusCodes: [404], persistStateKeyValueStoreId: 'my-key-value-store-for-sessions', persistStateKey: 'my-session-pool', }, persistCookiesPerSession: true,
national-gold
national-gold3y ago
why do you think it's creating a new session though? It should open new tab - that's to be expected, but the session should be the same. At least I don't see anything in your config that would force new session on every request...
stormy-gold
stormy-goldOP3y ago
the debug logs in the console shows shows : DEBUG PlaywrightCrawler:SessionPool: Created new Session - session_XXXXXXXXX upon every request load. There is also another bug, when the session pool size is greater than 1, it creates two sessions immediately when you run the app. and then a new upon after each request. The app also opens two seperate tabs at all times, one is always "About:blank" and the other one loads the actual request the new session on each request problem is resolved when I set the maxPoolSize to 1. But it still creates two tabs , one blank, one for the request
sunny-green
sunny-green3y ago
I guess, It creates new sessions only at the beginning of the run. As it fills sessionPool. When sessionPool is full, it keeps using existing sessions from the pool. Btw, you can check your sessions with SDK_SESSION_POOL_STATE var in Key-Value store. Blank tab is fine. It's how Puppeteer/PLaywrihgt works.

Did you find this page helpful?