Crawlee & Apify•3y ago

How to scrape sites that generate elements with dynamic attributes?

I am trying to scrape a site that generates different CSS classes for the target elements I need to get the value of each time the page is rendered and there are no other attributes to select or suitable parent elements to traverse and I would prefer not using XPATH. Is it possible to decode this HTML to its original form to more easily scrape it? Also is there any technique that would make it possible to detect changes or addition of pages?

2 Replies

Pepa J•3y ago

Hello @Casper It would be nice to have some example for such a website so we may investigate more. I was thinking about creating a solution that would basically iterate over all textNodes (using xpath) on the page (since this is in most cases what you want to scrape) and checked computed css styles and computed position of the elements on the page. This way it would be possible to obtain data based on some business input like select all textNodes with color #333, font-size: 10px+-10%, located under the navigation and right of the left menu. Not caring about HTML structure at all. But currently it is only in state of ideas. 😦 Maybe you might put together a PoC that would be just enought for your use-case.

xenophobic-harlequinOP•3y ago

yeah that might work. the site is https://www.boligportal.dk/lejligheder/odense/82m2-3-vaer-id-5276909

Lejeboliger med grønne oaser - 82 m² - 8.895 kr

Odenses grønne hjerte Tæt på city og alligevel i et roligt og historiemættet kvarter, finder du Oden

Gaming

Programming

How to scrape sites that generate elements with dynamic attributes?

Did you find this page helpful?