How to manually pass datasets, sessions, cookies, proxies between Requests?

It might be obvious but have not been able to figure this out, nor in the documentation nor in the forums. I want to manually manage my datasets and session, but I want to make a Request use a session I have created and to pass on the dataset to the handler of the request. I know I could pass on using the userData, or I could create it in a different file and simply import it, but these seem like the wrong approaches.
6 Replies
stormy-gold
stormy-gold3y ago
For datasets - you could open e.g. several named datasets, and then just save depending on some condition to one or another. For sessionPool - you could also provide e.g. createSessionFunction of sessionPoolOptions - https://crawlee.dev/api/core/interface/SessionPoolOptions#createSessionFunction You could also use BasicCrawler https://crawlee.dev/api/basic-crawler and explicitly call the request, mark session good/bad, etc but I guess the main question is - what exactly are you trying to achieve?
multiple-amethyst
multiple-amethyst3y ago
dataset management should be separate logic imho, since its not related to the way how you making requests; cookies per raw request always in headers, so if you want to keep i.e. doing requests as logged user then find auth cookies and reuse them
fascinating-indigo
fascinating-indigoOP3y ago
Honestly, I just want to make sure I am using the authentication cookies with the same proxy in the same session. Since i don't know how exactly Crawlee handles the session, Request is called. You are absolutely right, the database login make a lot more sense to be separate. I am worried about hitting the sever from diffident proxies which have the same auth cookies
fascinating-indigo
fascinating-indigoOP3y ago
"Having our cookies and other identifiers used only with a specific IP will reduce the chance of being blocked." https://crawlee.dev/docs/guides/session-management i was trying to do this,
Session Management | Crawlee
How to manage your cookies, proxy IP rotations and more
fascinating-indigo
fascinating-indigoOP3y ago
The documentation is very good at explain how to use every class separately but doe snot provide example of how to use it in the crawler
stormy-gold
stormy-gold3y ago
in crawlers session is part of the crawlingContext . Also you could provide sessionPoolOptions, where you can specify the session options, createSesssionFunction, etc. You could even specify one session to make sure you're going with only 1 IP, or specify that session should be retired let's say after 500 request, etc. Do you need to use only 1 account and thus you want to use only one session all the time?

Did you find this page helpful?