Post Request with json data to get cookies and use these cookies to to scrap further Urls

Hello all, I have a special situation, website response depends on the location of the IP address. But there is a possibility to change the address. The way it works is by calling the endpoint which returns the cookies. I want to scrap the urls once I have the cookies. How can I do that with crawlee ? and how will those cookies be managed with sessions? It's a bit complicated to explain but I hope you guys get the idea of what I want. Thank you for reading that long post.
5 Replies
conscious-sapphire
conscious-sapphire3y ago
Hey @curioussoul, the simplest solution would be to do one request on this endpoint, parse the set-cookie header and set the cookie header while enqueueing other requests. An alternative option would be to call the endpoint in the createSessionFunction of the session pool and set cookies there.
evident-indigo
evident-indigoOP3y ago
Thankyou @vojtechmaslan I will try out. But I also need send json data to get the cookies. Would you be nice enough to give me a sample code if you have by any chance ? Thanks alot for your help.
conscious-sapphire
conscious-sapphire3y ago
Something like this could work: JSON post requests are the same as GET requests, you just have to specify the payload and method:
const request = {
url: 'https://example.com',
method: 'POST',
payload: JSON.stringify({ foo: 'bar' }),
const request = {
url: 'https://example.com',
method: 'POST',
payload: JSON.stringify({ foo: 'bar' }),
then in the handler of this request, you can access the response set-cookie headers:
router.addHandler('cookies', async ({ crawler, response }) => {
const { headers } = response;

// parse necessary cookies from headers['set-cookie']
// ...

// enqueue new requests with parsed cookies
const request = {
url: 'https://example.com',
headers: {
cookie: parsedCookie,
},
};
await crawler.requestQueue.addRequest(request);
});
router.addHandler('cookies', async ({ crawler, response }) => {
const { headers } = response;

// parse necessary cookies from headers['set-cookie']
// ...

// enqueue new requests with parsed cookies
const request = {
url: 'https://example.com',
headers: {
cookie: parsedCookie,
},
};
await crawler.requestQueue.addRequest(request);
});
evident-indigo
evident-indigoOP3y ago
Thankyou very much @vojtechmaslan but by doing that I would double the requests. Cookies wont change for each request so one time request for cookies is fine. I am currently trying to do via createSessionFunction. But docs are not helping that much. Do you have some guidance for that ? You may look at my code in the chat. Thankyou very much. This is how I am trying to do. createSessionFunction: async(sessionPool,options) => {
const new_session = new Session({ ...options, sessionPool, }); new_session.setCookiesFromResponse({}); <- How to get the response here to set the cookies ? return new_session``

Did you find this page helpful?