Awesome tool! How do I extract the text only?

Great tool here, excited to start testing. I don't need the markdown. I don't want the links, image hosting URLs, etc. How do I just get the text content from the page out?
1 Reply
Adobe.Flash
Adobe.Flash16mo ago
Hey @steamwire_labs there are couple ways you can do it. The most straightforward one is passing a pageOptions.includeHtml = true and when you get the html back, just use bs4 or cheerio to extract the text with a .text() function. Another thing you can do, is pass a pageOptions.removeTags = [ 'img', 'a' ] that you can pass to remove html elements you don't want parsed to markdown.

Did you find this page helpful?