Email obfuscated when using HTMLRewriter

Hi, I'm using HTMLRewriter to parse some html data, when deployed I see parsed text has email obfuscated but on my laptop the email shows just fine. I couldn't find this behavior documented anywhere, is it possible to disable this? This is using workers.dev domain. The exact text is [email protected]
5 Replies
squareclamp
squareclamp2mo ago
This is fetching data from 3rd party website.. I tried disabling scrape shield but still not luck. It seems like the fetch is being proxied through cloudflare cache
Cyb3r-Jak3
Cyb3r-Jak32mo ago
If the 3rd party site is using Cloudflare then you can’t override their scrape shield setting on your fetch.
squareclamp
squareclamp2mo ago
They are not using cloudflare judging from the response headers and I can see the email in plain text when running locally
Cyb3r-Jak3
Cyb3r-Jak32mo ago
The difference between local and deployed is likely due to the IP address being used. When local, it uses your local IP address which is less likely to flagged as automated as Cloudflare’s. If you run with —remote then you’ll probably see the email address as protected because the request will come from Cloudflare. I’d imagine there are other service that protect email addresses from scraping which they could be using.
squareclamp
squareclamp2mo ago
interesting, I didn't expect that.. I tried using colab too and it grabs the email.
Want results from more Discord servers?
Add your server
More Posts