Crawl Taobao, 1688
Hi everyone! My name is Giang and I am a fresher developer. Now I am build a web order from taobao, 1688, tmall,... But I have a big problem when crawl taobao. I just crawl 10 or 15 product before block of antiscraping. I think if i use proxy the problem is must login see product item and I have try to use cookie but i think if login in many ip it easy to block my account. If anyone has experience scraping Tmall/Taobao and could offer some advice or help that would be hugely helpful. Thanks!

1 Reply
afraid-scarlet•2y ago
Dealing with blocks can be quite challenging, as every website and project is distinct, and there's no universal solution that works flawlessly
Check out this link (You may find some tips how to bypass protection):
https://docs.apify.com/academy/anti-scraping
Also, if you scrape under login. Try to use low concurrency (10-20 maybe?):
https://crawlee.dev/api/cheerio-crawler/interface/CheerioCrawlerOptions#maxConcurrency
or maxRequestsPerMinute option (try to experimnet with the value. Maybe 80-100 ?):
https://crawlee.dev/api/cheerio-crawler/interface/CheerioCrawlerOptions#maxRequestsPerMinute