How to handle 403 error response using Puppeteer and JS when click on the button which hit an API
We are building a scrapper and that is using client side pagination and when we click on the Next page it calls the API but the api returns 403 as they are detecting it is coming from some bot. So how can we bypass that while opening the browser or while doing the scrapping.
Any suggestion will be halpful.
4 Replies
Someone will reply to you shortly. In the meantime, this might help:
deep-jade•2mo ago
Hi, if you are getting blocked like this you could try few antiblocking techniques.
First of all I would try using residential proxies, try to also specificy the country code. If that will not help I would try different browsers, mainly full chrome or firefox.
https://crawlee.dev/js/docs/examples/playwright-crawler-firefox
Additionally you could try some stealth plugin like camoufox
https://apify.com/templates/js-crawlee-playwright-camoufox
Using Firefox browser with Playwright crawler | Crawlee for JavaScr...
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.
Apify
Crawlee + Playwright + Camoufox · Template · Apify
Web scraper example with Crawlee, Playwright and Camoufox. Camoufox is a custom stealthy fork of Firefox. Try this template if you're facing anti-scraping challenges.
optimistic-goldOP•2mo ago
Thanks for the reply @Lukas Celnar but i am already using the residential proxes from the oxylabs but still it is failing i will try stealth plugin if that works
wise-white•2w ago
You might also check this option for the crawler: https://crawlee.dev/js/api/core/interface/SessionPoolOptions#blockedStatusCodes
SessionPoolOptions | API | Crawlee for JavaScript · Build reliable...
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.