HTMLRewriter treats self closing tags like <img/> as text
I am using HTMLRewriter to get the plain text of a website. Using the following code listening to the
text
text
event, it results in uggly <img ....../> or <iframe .../> being included in my result:
let total = ""; const rewriter = new HTMLRewriter() .on(`article`, { text(text: Text): void | Promise<void> { total += text.text; } }) ;
let total = ""; const rewriter = new HTMLRewriter() .on(`article`, { text(text: Text): void | Promise<void> { total += text.text; } }) ;
After transforming an html response, this is how response can look like:
<iframe class="hide_iframe_" src="[...]"></iframe><img class="hide_iframe" height="1" width="1" alt="" src="[...]"/> Actual normal text <img src="[...]" alt="..." placeholder="blur" /> ...rest of the article
<iframe class="hide_iframe_" src="[...]"></iframe><img class="hide_iframe" height="1" width="1" alt="" src="[...]"/> Actual normal text <img src="[...]" alt="..." placeholder="blur" /> ...rest of the article