Is anyone scraping indeed with Apify and

Is anyone scraping indeed with Apify and having Cloudflare captcha issues over the passed two weeks?
14 Replies
adverse-sapphire
adverse-sapphire•16mo ago
Indeed has now much better bot protection. What worked with cheerio before now needs playwright with very good proxies.
ratty-blush
ratty-blush•16mo ago
it's not a matter of Apify. Bot protection is a lot better. Especially fingerprinting. I'm also using other tools and facing the same issues.
fascinating-indigo
fascinating-indigo•16mo ago
I'm using Playwright and Residential Proxies.
ratty-blush
ratty-blush•16mo ago
try to go headful and with xvfb try to use particular waits for scripts to be loaded . The whole trick about captchas is to learn what is triggering them and try to avoid as much as possible. Just throwing residential proxies is not solving the issue.
fascinating-indigo
fascinating-indigo•16mo ago
The only way I've gotten it working locally right now is via puppeteer-real-browser. I'm not sure if that will work if i wrap it in Apify code and deploy it to the platform.
ratty-blush
ratty-blush•16mo ago
that puppeteer-real-browser is just a collection of settings for chrome. nothing magic happens.
adverse-sapphire
adverse-sapphire•16mo ago
crawlee should do the same, no
fascinating-indigo
fascinating-indigo•16mo ago
That's what I was thinking, so I re-wrote our code to use the latest Apify SDK and Crawlee, but not luck. So I started going down the rabbit hole with other potential solutions.
adverse-sapphire
adverse-sapphire•16mo ago
can you share some url that always get captcha no matter how much you retry?
MEE6
MEE6•16mo ago
@danimalweb just advanced to level 1! Thanks for your contributions! 🎉
adverse-sapphire
adverse-sapphire•16mo ago
That url really does not load for me in any automated browser. With or without proxies, so the proxy is not an issue. I did not try the real browser plugin for puppeteer, if that works it should work on platform also.
fascinating-indigo
fascinating-indigo•16mo ago
Thanks for checking on that for me.
Louis Deconinck
Louis Deconinck•6mo ago
How to integrate the puppeteer-real-browser with Crawlee / Apify, @danimalweb & @NeoNomade ?

Did you find this page helpful?