CA
Crawlee & Apify•2y ago
harsh-harlequin

Scetchy scrolling distance (apify/web-scraper)

Hi, I'm using the apify/web-scraper to scrape a website. The pages don't initially load all the content, you need to scroll down to see more. Now i have set the scroll distance setting to the highest number possible but still it only works half of the time. Is there any way to improve the scroll behavior? Right now works on only let's say 50% of the requests, on what pages it works is different each run. Is there another way to control this functionality? Thanks, Bob
3 Replies
rival-black
rival-black•2y ago
@BOBPG I've had good success using the infiniteScroll option, might want to try that however not sure how you are currently attempting. Example how to: https://blog.apify.com/how-to-scrape-the-web-with-playwright-ece1ced75f73/
Apify Blog
How to scrape the web with Playwright in 2023
Complete Playwright web scraping and crawling tutorial.
manual-pink
manual-pink•2y ago
We would need to see the runs where it fails. Scrolling is tricky because sometimes the loading takes too long or there is a button etc. Doing screenshots is good. At worst, you would need to rewrite it to new actor where you have more control over the scrolling functionality https://crawlee.dev/api/playwright-crawler/namespace/playwrightUtils#infiniteScroll
playwrightUtils | API | Crawlee
A namespace that contains various utilities for Playwright - the headless Chrome Node API. Example usage: ```javascript import { launchPlaywright, playwrightUtils } from 'crawlee'; // Navigate to https://www.example.com in Playwright with a POST request const browser = await launchPlaywright(); c...
harsh-harlequin
harsh-harlequinOP•2y ago
Thanks both for the suggestions, unfortunately i didn't get it to work. Had to hire a freelancer who build me a crawler outside of the APIFY platform. Definitely a feature that would be great to have just within the web crawler actor for the next guy 🙂

Did you find this page helpful?