Can i get 403 status
Hi.
I guess this question might be a bit dumb, but i wanted to ask how does the crawlee work with requests?
If i want to access some particular website using pure request or axios i get 403 error, but with cralwee cheerioCrawler, i get the result i want.
I figured out the retry mechanism and the session rotation has to do something with it since it happened few times with my use case.
I know it's too much to ask, but just wondering like if its going trough some proxy's, how are user agents handled, tls handshake etc.
I'm asking all of this since I'm wondering if the website could block me again somehow? 😄
Sorry for a newbie question again 🙂
2 Replies
broad-brown•2y ago
crawlee handless a lot of the blocking prevention automatically and manages the the site's limits, changes fingerprints, manages request pools, MUCH better than anything else, especially axios requests or http requests which do nothing
most of the time you don't need to worry and crawlee does 99% of the behind the scenes work and you can on occasion rarely get blocked from sites if you are doing continous tests back to back at times or just by chance
mostly it has to do with how it makes you loook more like a realistic user
i don't know too much about much of it but like i said it is handled by crawlee and you can mess with some of the settings in the crawler like in the links below but its not going to do much moost of the time
that was the reason i switched to crawlee cause i would get blocked from practically every site i would scrape outside of amazon
there are ways and tutorials oou can find to use proxies in regular axios requests when scraping with cheerio or another library but crawlee is more often than not superior
but check out more of this if you want to rotate proxies and avoid getting blocked
https://crawlee.dev/docs/guides/proxy-management
https://crawlee.dev/docs/guides/proxy-management
Proxy Management | Crawlee
Using proxies to get around those annoying IP-blocks
rival-black•2y ago
There are lot of things: header generation, browser-like cyphers, http2, proxy rotation etc. For deeper understanding - https://docs.apify.com/academy/anti-scraping
Anti-scraping protections | Academy | Apify Documentation
Understand the various anti-scraping measures different sites use to prevent bots from accessing them, and how to appear more human to fix these issues.