Got captha and HTTP 403 using PlaywrightCrawler

Got captha and HTTP 403 when accessing wellfound.com I get captcha all the time when I access links like these (basically - accessing any job ad on wellfound): https://wellfound.com/company/kalepa/jobs/2651640-tech-lead-manager-full-stack-europe https://wellfound.com/company/pinatacloud/jobs/2655889-principal-software-engineer https://wellfound.com/company/wingspanapp/jobs/2629420-senior-software-engineer Screenshot attached. and this is not Cloudflare protection - it's some other anti-bot thing. I am using: - US residential proxies from smartproxy.com - PlaywrightCrawler with useSessionPool: false and persistCookiesPerSession: false - headless Firefox, both as launcher and in fingerprintGeneratorOptions browsers - my locale is en-US, timezone in America/New_York (to match US proxies) - in fingerprintGeneratorOptions devices: ['desktop'] - in launchContext: { useIncognitoPages: true } - I set pluginContent in preNavigationHooks to fix the "plugin length" problem, as described here: https://discord.com/channels/801163717915574323/1059483872271798333 And still this site detects me as robot! Any ideas how to overcome this? UPDATE1: the IP on screenshot is somewhere in US/Texas... UPDATE2: when I open these links in my desktop browser incognito mode - I get this captcha too...
No description
4 Replies
extended-salmon
extended-salmonOP3y ago
UPDATE 1: this IP on screenshot - it is somewhere in US/Texas UPDATE 2: when I open these links in my desktop browser/incognito mode - get this captcha too... UPDATE3: it is some variation of Cloudflare, just looked into source code of this captcha HTML... for people who like to look inside: https://wellfound.com/cdn-cgi/apps/head/JIiAUxCYLtpv-hVKsQ6mzsTHfds.js UPDATE4: It seems, opening wellfound.com in desktop browser, scrolling the page down to the end, and than in the same window opening one of the above links - it worked one-two times, no captcha. It looks like this thing thinks "aha, this is a normal end-user behavior" I am sure many people here already saw this protection, so let us share the experience... Any hints how to fight this protection?
stormy-gold
stormy-gold3y ago
Did You try to use headfull mode? + modifying headers can have positive impact + maybe don't use incognito mode
unwilling-turquoise
unwilling-turquoise3y ago
try higher number of retries, it can help sometimes I think the rationale is some IPs are burnt and you need to go through the pool of IPs and find those that work with that site and then ride those IPs
extended-salmon
extended-salmonOP3y ago
modifying headers can have positive impact
can you show an example? what should be changed? I mean, there are lot of headers...
Did You try to use headfull mode?
i did not. how headull mode would help? at the end it must run in headless mode anyway...

Did you find this page helpful?