Site can detect headless mode
I have a Crawlee Playwright bot that logs into a website and performs some actions on a schedule. I made a public version here without the site or actions: https://github.com/raywalz/web-automation-starter
For some reason, the website can detect headless mode despite the stealth plugin. It works fine in headed mode though. Any ideas? I have documentation on the setup in the readme of that project.
I may give up and use XVFB and headed mode all the time like I’ve seen a previous post here mention, but I want to try to keep it headless if I can.
GitHub
GitHub - raywalz/web-automation-starter: My starter project for aut...
My starter project for automatically interacting with web apps that require user login. - raywalz/web-automation-starter
3 Replies
Someone will reply to you shortly. In the meantime, this might help:
-# This post was marked as solved by foxt141. View answer.
inland-turquoise•5mo ago
Hi! As explained here I would recommend to try something other than the puppeteer stealth plugin, for example Crawlee's
PlaywrightCrawler
.
If it doesn't work I would attempt to use PuppeteerCrawler
- some websites are able to detect playwright, but fail with puppeteer.
Also, refer to this guide
If it still doesn't help, disabling headless
might be necessary - from my experience some websites with advanced web-scraping protection will indeed have scripts that are able to determine that.Reddit
From the webscraping community on Reddit: Is puppeteer-extra-plugin...
Explore this post and more from the webscraping community
Playwright crawler | Crawlee · Build reliable crawlers. Fast.
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.
Avoid getting blocked | Crawlee · Build reliable crawlers. Fast.
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.
ambitious-aquaOP•5mo ago
Thanks I’ll look into those. To be clear I am using Playwright, not Puppeteer. I’m just using the puppeteer-extra-plugin-stealth plugin with it. Though it has puppeteer in the name, it’s compatible with both since Playwright is just a fork.