Pause concurrent requests ?

Hello, I have the following issue, I have a website that I'm scraping and I need to login every 100-150 items. The issue is, if I'm going with more than 1 concurrent requests when in needs to login it already has in progress requests, which will go wrong. So I have a marker that I'm extracting to know when I need to login again. I want to go with >1 concurrent requests and stop everything when that marker is found, do the login and then resume. Could it be possible to achieve that ?
17 Replies
NeoNomade
NeoNomadeOP2y ago
@Andrey Bykov
deep-jade
deep-jade2y ago
There's no straight-forward way to do it, but you could have some variable, which you would check at the beginning of requestHandler, and e.g. if variable tells that the login is in progress - it would just wait and reload the page when the login is successful. Another option is to set concurrency to 1, throw err for every other page, proceed with only 1 page - login - increase concurrency again. Maybe something else - but these two options are the first that came into my mind
NeoNomade
NeoNomadeOP2y ago
can I set concurrency from routes ? I've tried to do crawler.maxConcurrency but it doesn't work
deep-jade
deep-jade2y ago
should be something like crawler.autoscaledPool.maxConcurrency - not sure 100%, need to double check, but I think the above option should be correct yeah, should be correct
NeoNomade
NeoNomadeOP2y ago
ok trying right now
Pepa J
Pepa J2y ago
Hello @NeoNomade Have you been successful with your approache? May you share it with use if you were? I think I tried to solve something something similiar in the past and the issue may be the shared cookies between tabs while using more than one concurrency in your implementation.
launchContext: {
useIncognitoPages: true,
},
launchContext: {
useIncognitoPages: true,
},
And also setting new/empty cookies in preNavigationHooks for each login.
NeoNomade
NeoNomadeOP2y ago
It didn’t work as expected, I will also try the incognito pages
Pepa J
Pepa J2y ago
You may also it with combination with the:
browserPoolOptions: {
maxOpenPagesPerBrowser: 1,
},
browserPoolOptions: {
maxOpenPagesPerBrowser: 1,
},
NeoNomade
NeoNomadeOP2y ago
If I go incognito, I trigger a check that they have and I need to wait for 8 seconds
Pepa J
Pepa J2y ago
@NeoNomade Are you talking about some protection like Cloudflare? Did this helped to solve the original issue? As long as we don't know about the website, we probably cannot help more. 🙂
NeoNomade
NeoNomadeOP2y ago
elcorteingles.es/supermercado/ @Pepa J here is the website, I have to remain logged in to keep my location, otherwise they are sending you to a default location. If you'll open the website or a product page in private browsing you'll see a loading circle that'll make you wait for 5-8 seconds
Pepa J
Pepa J2y ago
@NeoNomade I believe this is related to the cookies, so what you need to do is to get the cookies after the waiting and before the login and remeber these, and then set them in preNavigationHooks , but there might be some additional magic like the server will generate you another sessionId and denny you to log out with the old one, but that is something you should try. Also concurency of 10 with 8s login should be always faster than 10 requests in sequence where each of them has individual 8s login.
NeoNomade
NeoNomadeOP2y ago
it doesn't work @Pepa J
Pepa J
Pepa J2y ago
@NeoNomade I am sorry at this point I don't have enough information about what is happening in your browser, nor code to test.
NeoNomade
NeoNomadeOP2y ago
I will drop here my main and routes @Pepa J : main : https://pastebin.com/pjVQSn09 routes: https://pastebin.com/891s1KL7
Pepa J
Pepa J2y ago
cannot see any of our suggestions being implemented 🙂
NeoNomade
NeoNomadeOP2y ago
Have been tested, didn’t work, this is The implementation that works , slow but it works

Did you find this page helpful?