How to scrape sites that generate elements with dynamic attributes?
I am trying to scrape a site that generates different CSS classes for the target elements I need to get the value of each time the page is rendered and there are no other attributes to select or suitable parent elements to traverse and I would prefer not using XPATH. Is it possible to decode this HTML to its original form to more easily scrape it?
Also is there any technique that would make it possible to detect changes or addition of pages?
2 Replies
Hello @Casper It would be nice to have some example for such a website so we may investigate more.
I was thinking about creating a solution that would basically iterate over all textNodes (using xpath) on the page (since this is in most cases what you want to scrape) and checked computed css styles and computed position of the elements on the page.
This way it would be possible to obtain data based on some business input like
select all textNodes with color #333, font-size: 10px+-10%, located under the navigation and right of the left menu
. Not caring about HTML structure at all.
But currently it is only in state of ideas. 😦 Maybe you might put together a PoC that would be just enought for your use-case.xenophobic-harlequinOP•3y ago
yeah that might work. the site is https://www.boligportal.dk/lejligheder/odense/82m2-3-vaer-id-5276909
Lejeboliger med grønne oaser - 82 m² - 8.895 kr
Odenses grønne hjerte
Tæt på city og alligevel i et roligt og historiemættet kvarter, finder du Oden