C
C#•3w ago
yourFriend

Regex to match text between nested tags

Question:
Write a program that extracts all the text without any tags and attribute values from an HTML document. Sample text: <html> <head><title>News</title></head> <body><p><a href="http://softuni.org">Software University</a>aims to provide free real-world practical training for young people who want to turn into skillful software engineers.</p></body> </html> Sample result: News Software University aims to provide free real-world practical training for young people who want to turn into skillful software engineers.
I solved it without using Regex. Here's the code: https://paste.mod.gg/orqiwpelwkzq/0 But I was wondering what the regex pattern would look like for matching text within nested tags. :catderp: I have asked similar question but it didn't involve nested tags. Link: https://discord.com/channels/143867839282020352/1358651997745709267
BlazeBin - orqiwpelwkzq
A tool for sharing your source code with the world!
5 Replies
yourFriend
yourFriendOP•3w ago
Ok, so it might be somehow possible with balancing groups but definitely not ideal choice for real projects.
Regular Expression Language - Quick Reference - .NET
In this quick reference, learn to use regular expression patterns to match input text. A pattern has one or more character literals, operators, or constructs.
yourFriend
yourFriendOP•3w ago
Thanks, will close post after 24 hrs It makes so much sense. Funny enough that I also look for text between > and < when traversing the string character by character. But tried to use full html tags in regex 😅 Thank you so much for explanation.
Anton
Anton•3w ago
please don't write a parser
Jimmacle
Jimmacle•3w ago
you can't parse HTML with regex $htmlregex
MODiX
MODiX•3w ago
Stack Overflow
RegEx match open tags except XHTML self-contained tags
I need to match all of these opening tags: <p> <a href="foo"> But not self-closing tags: <br /> <hr class="foo" /> I came up with this and wanted to make

Did you find this page helpful?