Help Needed with Scraping Website Behind Anti-Bot Protection!
I've been trying to scrape this website: https://de.pandora.net/de/charms-armbander/charms/charms-mit-anhanger/bicolor-fahrrad-mit-drehenden-radern-charm-anhanger/763354C01.html
The script works perfectly on my local playground setup, but when I move it to a self-hosted environment, it fails to scrape.
I’ve also added a proxy server to bypass any unauthorized access issues, but I still can't get it to work. Here's the error message I'm encountering:
{
"content": "",
"markdown": "",
"html": "",
"linksOnPage": [],
"metadata": {
"sourceURL": "https://de.pandora.net/de/charms-armbander/charms/charms-mit-anhanger/bicolor-fahrrad-mit-drehenden-radern-charm-anhanger/763354C01.html",
"pageStatusCode": 401,
"pageError": "UNAUTHORIZED"
}
}
Has anyone dealt with similar issues or have any ideas on how to scrape websites behind anti-bot measures effectively? Any advice or tips would be greatly appreciated!
PANDORA
Bicolor Fahrrad mit Drehenden Rädern Charm-Anhänger
Trage den Bicolor Fahrrad mit Drehenden Rädern Charm-Anhänger als Symbol für frische Luft und Freiheit. Das zweifarbige Charm zeigt eine reduzierte Version eines Fahrrads mit realistischen Details wie Sattel, Lenker, Pedalen, Licht, Rädern und Reifen. Die Räder können sich sogar drehen. Ganz gleich, ob du es zu deiner eigenen Sammlung hinzufügst...
8 Replies
Hey there Julie. To do this, you'll have to set up your own proxy network. Or, you should use the cloud service, where we handle this all for you!
I am running playwright service ts with the env keys. But I have a confusion do I need to mention the port or just host address
PROXY_SERVER=host_address:port
@Caleb
@Adobe.Flash Bringing you into the convo, not familiar with setting up proxies here
I am using Geonode Site Unblocker proxies.
Hey @Julie Grace I believe you should have them both: PROXY_SERVER=http://PROXY_SERVER:PROXY_PORT
Hi @Adobe.Flash
I did set PROXY_SERVER=http://PROXY_SERVER:PROXY_PORT and in addition to that, I tried out three providers. Somehow I don't see any traffic getting routed.
I get the console message that server is running on 3000 but my proxy server port is 9000. Sorry my knowledge in proxies is little.
hey @Julie Grace gotcha.. ccing @thomas here who can provide better help around proxyies
Hi @thomas , any idea or update about it ?
Hi @Adobe.Flash , can we use headless browser like Browserbase or can this be a feature request ?