How to manually pass datasets, sessions, cookies, proxies between Requests?
It might be obvious but have not been able to figure this out, nor in the documentation nor in the forums.
I want to manually manage my datasets and session, but I want to make a Request use a session I have created and to pass on the dataset to the handler of the request.
I know I could pass on using the userData, or I could create it in a different file and simply import it, but these seem like the wrong approaches.
6 Replies
stormy-gold•3y ago
For datasets - you could open e.g. several named datasets, and then just save depending on some condition to one or another. For sessionPool - you could also provide e.g. createSessionFunction of sessionPoolOptions - https://crawlee.dev/api/core/interface/SessionPoolOptions#createSessionFunction
You could also use BasicCrawler https://crawlee.dev/api/basic-crawler and explicitly call the request, mark session good/bad, etc
but I guess the main question is - what exactly are you trying to achieve?
multiple-amethyst•3y ago
dataset management should be separate logic imho, since its not related to the way how you making requests; cookies per raw request always in headers, so if you want to keep i.e. doing requests as logged user then find auth cookies and reuse them
fascinating-indigoOP•3y ago
Honestly, I just want to make sure I am using the authentication cookies with the same proxy in the same session. Since i don't know how exactly Crawlee handles the session, Request is called.
You are absolutely right, the database login make a lot more sense to be separate. I am worried about hitting the sever from diffident proxies which have the same auth cookies
fascinating-indigoOP•3y ago
"Having our cookies and other identifiers used only with a specific IP will reduce the chance of being blocked." https://crawlee.dev/docs/guides/session-management i was trying to do this,
Session Management | Crawlee
How to manage your cookies, proxy IP rotations and more
fascinating-indigoOP•3y ago
The documentation is very good at explain how to use every class separately but doe snot provide example of how to use it in the crawler
stormy-gold•3y ago
in crawlers
session
is part of the crawlingContext
. Also you could provide sessionPoolOptions
, where you can specify the session options, createSesssionFunction, etc. You could even specify one session to make sure you're going with only 1 IP, or specify that session should be retired let's say after 500 request, etc. Do you need to use only 1 account and thus you want to use only one session all the time?