CA
harsh-harlequin
Scetchy scrolling distance (apify/web-scraper)
Hi,
I'm using the apify/web-scraper to scrape a website. The pages don't initially load all the content, you need to scroll down to see more.
Now i have set the scroll distance setting to the highest number possible but still it only works half of the time.
Is there any way to improve the scroll behavior? Right now works on only let's say 50% of the requests, on what pages it works is different each run. Is there another way to control this functionality?
Thanks,
Bob
3 Replies
rival-black•2y ago
@BOBPG I've had good success using the infiniteScroll option, might want to try that however not sure how you are currently attempting.
Example how to:
https://blog.apify.com/how-to-scrape-the-web-with-playwright-ece1ced75f73/
Apify Blog
How to scrape the web with Playwright in 2023
Complete Playwright web scraping and crawling tutorial.
manual-pink•2y ago
We would need to see the runs where it fails. Scrolling is tricky because sometimes the loading takes too long or there is a button etc.
Doing screenshots is good. At worst, you would need to rewrite it to new actor where you have more control over the scrolling functionality https://crawlee.dev/api/playwright-crawler/namespace/playwrightUtils#infiniteScroll
playwrightUtils | API | Crawlee
A namespace that contains various utilities for
Playwright - the headless Chrome Node API.
Example usage:
```javascript
import { launchPlaywright, playwrightUtils } from 'crawlee';
// Navigate to https://www.example.com in Playwright with a POST request
const browser = await launchPlaywright();
c...
harsh-harlequinOP•2y ago
Thanks both for the suggestions, unfortunately i didn't get it to work. Had to hire a freelancer who build me a crawler outside of the APIFY platform. Definitely a feature that would be great to have just within the web crawler actor for the next guy 🙂