How to get only what's after html tag?

How to scrape only what's after the html tag <label>? (e.g. text after label 1, text after label 2) <div> <label>some label 1</label> text after label 1 </div> <div> <label>some label 2</label> text after label 2 </div>
3 Replies
rival-black
rival-black3y ago
under cheerioCralwer $('table').html()
robust-apricot
robust-apricot3y ago
I'd recommend doing something like this:
import { load } from 'cheerio';

const $ = load(`<div>
<label>some label 1</label>
text after label 1
</div>

<div>
<label>some label 2</label>
text after label 2
</div>`);

// The "contents" function will also return text nodes
const texts = [...$('div').contents()].filter((elem) => {
// Filter out any nodes that aren't text nodes, or that
// have empty content
return elem.type === 'text' && $(elem).text().trim();
// For each text node, return its text
}).map((elem) => $(elem).text().trim());

console.log(texts)
import { load } from 'cheerio';

const $ = load(`<div>
<label>some label 1</label>
text after label 1
</div>

<div>
<label>some label 2</label>
text after label 2
</div>`);

// The "contents" function will also return text nodes
const texts = [...$('div').contents()].filter((elem) => {
// Filter out any nodes that aren't text nodes, or that
// have empty content
return elem.type === 'text' && $(elem).text().trim();
// For each text node, return its text
}).map((elem) => $(elem).text().trim());

console.log(texts)
The output of this code is:
[ 'text after label 1', 'text after label 2' ]
[ 'text after label 1', 'text after label 2' ]
harsh-harlequin
harsh-harlequinOP3y ago
Thanks for the answers, I will try it soon. But how can I do this in the code? const $ = load(<div> <label>some label 1</label> text after label 1 </div> <div> <label>some label 2</label> text after label 2 </div>); I don't know what is the text in each page, this was just an example text, but the text in each page is different.

Did you find this page helpful?