I want the main content and all the css + js links in the html. How do I do it?
I attempt to set:
'formats': ['html', 'rawHtml'],
'onlyMainContent': True
and hope that the rawHtml will be the html + (css + js) files. But it is not true. It is the original html without noise content filtering.
The most important one for us is the html content after the noise filtering.
But we want to have the css to detect the hidden elements in the page also.
2 Replies
Hi, can anyone help me here ....
Hey Trung! Sorry that this was missed.
We will take a look into this. I believe there is a way to get css and JS today but It may require using our V0 endpoint.
So, we don't filter out any inline styles or js. However, if the styles and CSS are linked in another file, we don't get them either.
You'd have to do the parsing on your end to get those other files