Crawlee & Apify•2y ago

Can i get 403 status

Hi. I guess this question might be a bit dumb, but i wanted to ask how does the crawlee work with requests? If i want to access some particular website using pure request or axios i get 403 error, but with cralwee cheerioCrawler, i get the result i want. I figured out the retry mechanism and the session rotation has to do something with it since it happened few times with my use case. I know it's too much to ask, but just wondering like if its going trough some proxy's, how are user agents handled, tls handshake etc. I'm asking all of this since I'm wondering if the website could block me again somehow? 😄 Sorry for a newbie question again 🙂

2 Replies

broad-brown•2y ago

crawlee handless a lot of the blocking prevention automatically and manages the the site's limits, changes fingerprints, manages request pools, MUCH better than anything else, especially axios requests or http requests which do nothing most of the time you don't need to worry and crawlee does 99% of the behind the scenes work and you can on occasion rarely get blocked from sites if you are doing continous tests back to back at times or just by chance mostly it has to do with how it makes you loook more like a realistic user i don't know too much about much of it but like i said it is handled by crawlee and you can mess with some of the settings in the crawler like in the links below but its not going to do much moost of the time that was the reason i switched to crawlee cause i would get blocked from practically every site i would scrape outside of amazon there are ways and tutorials oou can find to use proxies in regular axios requests when scraping with cheerio or another library but crawlee is more often than not superior but check out more of this if you want to rotate proxies and avoid getting blocked https://crawlee.dev/docs/guides/proxy-management https://crawlee.dev/docs/guides/proxy-management

Proxy Management | Crawlee

Using proxies to get around those annoying IP-blocks

rival-black•2y ago

There are lot of things: header generation, browser-like cyphers, http2, proxy rotation etc. For deeper understanding - https://docs.apify.com/academy/anti-scraping

Anti-scraping protections | Academy | Apify Documentation

Understand the various anti-scraping measures different sites use to prevent bots from accessing them, and how to appear more human to fix these issues.

Gaming

Programming

Can i get 403 status

Did you find this page helpful?