Request works in Postman but doesnt work with Cheerio Crawler, request object headers empty
Dear all, I am trying to scrap data from a public ip. For some reason cheeriocrawler is not getting the data back but in postman I could easily get the data. Proxy ip is whitelisted because I am using the same ip for postman and for cheerio.
Postman does add some default headers but when I look at my request object the headers are empty. Does someone knows at which points cheerio sets the headers and generate some fingerprints and how can I see them ?
Request {
id: 'OBTRQI5zvA4aIJ9',
url: 'https://someapi.com',
loadedUrl: 'https://someapi.com',
uniqueKey: '22586062-3f0d-40be-b499-f1a00261b5d3',
method: 'GET',
payload: undefined,
noRetry: false,
retryCount: 0,
errorMessages: [],
headers: {},
userData: [Getter/Setter],
handledAt: undefined
}
any help would be highly appreciated. Thanks6 Replies
correct-apricot•2y ago
Request is our structure which stores what URL to call, what HTTP method, and with what headers/payload to call it!
You probably want response.headers, where response comes from the context of the requestHandler function
fascinating-indigoOP•2y ago
Thanks @vladdy for your response. Actually, I am more interested in what is being sent in the request headers. I have debugged it further and found out that when I try to scrap the API it won't work in the first try and when I refresh the opened browser by crawlee it does work. I wanted to check what is going on so I used Playwright in head full mode and I could see that there was an error but when I refreshed the same page I got the response back. The api I am trying to scrap data from is very sensitive to some headers as you see in the picture. I think some headers are not set properly in the request and on refresh the browser adds default headers and then it works.

correct-apricot•2y ago
Oh those headeds
You can add them yourself!
When you enqueue the link, you can enqueue via an object with url and headers, and pass in any header you need on initial request
fascinating-indigoOP•2y ago
Still doesn't work. With the same proxy it works in ,in a simple browser but when I use it with crawlee it doesn't work.
@HonzaS do you have any idea ?
correct-apricot•2y ago
Did you testit with different proxy groups?
fascinating-indigoOP•2y ago
It wasn't related to proxy bur rather to cookies. Its solved now.