C
C#4mo ago
Quark

✅ Regex to capture all attributes of html tag but it capture the last only.

Hi, i have this code (see the picture) i don't understand why i have the last attribute only. i need have all. please, help me. code to use: MatchCollection attrmatches = new Regex(@"<[a-zA-Z.]+(?:\s+(\w+)\s*=\s*""[^""]*"")*\s*>", RegexOptions.Compiled).Matches(line); thx
No description
16 Replies
leowest
leowest4mo ago
looks like you're reading a XML file why not just use a library to read xml
Joschi
Joschi4mo ago
https://stackoverflow.com/a/1732454 General note on HTML and trying to use regex to parse it. Unless you want something really simple it won't be happening.
Stack Overflow
RegEx match open tags except XHTML self-contained tags
I need to match all of these opening tags: <p> <a href="foo"> But not self-closing tags: <br /> <hr class="foo" /> I came up with this and wanted to make
Joschi
Joschi4mo ago
What exactly are you trying to do? Do you have examples on what should/shouldn't be matched?
i like chatgpt
i like chatgpt4mo ago
Give more detailed information. For example your input text and the expected output. It should be easy to solve.
Pobiega
Pobiega4mo ago
AngleSharp is a very good html parser library for C# would make this trivial to solve
i like chatgpt
i like chatgpt4mo ago
using System.Text.RegularExpressions;
string input =
@"
<doc att0 = "" att0 value "" att1 = "" att1 value "" att2="" att2 value ""> Something 1</doc>
<temp att3 = "" att3 value "" att4="" att4 value "" />
";

string pattern =
@"(?x) # to ignore whitespaces only in this pattern string
\b # to match word boundary
(?<attr>\w+) # a group named attr that matches at least one character
\s* # to match at least zero whitespaces
= # to match =
";

foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine(match.Groups["attr"].Value);
}
using System.Text.RegularExpressions;
string input =
@"
<doc att0 = "" att0 value "" att1 = "" att1 value "" att2="" att2 value ""> Something 1</doc>
<temp att3 = "" att3 value "" att4="" att4 value "" />
";

string pattern =
@"(?x) # to ignore whitespaces only in this pattern string
\b # to match word boundary
(?<attr>\w+) # a group named attr that matches at least one character
\s* # to match at least zero whitespaces
= # to match =
";

foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine(match.Groups["attr"].Value);
}
No description
Quark
Quark4mo ago
hi, thanx but i don't want use a library (i ask it ?). my line to read is in SyncRoot var (read the picture). i try to do that u see exactly . use a regex to capture all attributes in <tag > @talk is cheap, show me the code but, i need to identify the tag too to get his index and length in my line. if i use your regex, when i have a lot of tag in same line or text + a tag, i can't know who is the good tag (position in text). eg: blablabla <tag f="tt"> <tag j="klkll"> blabla
Unknown User
Unknown User4mo ago
Message Not Public
Sign In & Join Server To View
Quark
Quark4mo ago
i undersand but why the online regex can analyse the same line correctly ? https://regex101.com/ choose .net Pattern <[\w.]+(?:\s+(\w+)\s*=\s*""[^""]*"")*\s*> text : <SimBase.Document Type="MissionFile" version="1,0"> it can capture all attributes. for me, this is not because the html is irregular and regex is regular, regex can parse my text but if i use in a real code, it fail. same with http://regexstorm.net/tester Pattern : <[\w.]+(?:\s+(\w+)\s*=\s*\"[^\"]*\")*\s*/?>
regex101
regex101: build, test, and debug regex
Regular expression tester with syntax highlighting, explanation, cheat sheet for PHP/PCRE, Python, GO, JavaScript, Java, C#/.NET, Rust.
.NET Regex Tester - Regex Storm
Online .NET regular expression tester with real-time highlighting and detailed results output.
viceroypenguin
viceroypenguin4mo ago
honestly, you shoudl not try to parse html using regex. it is not valid to do so. using a library is a better means of accomplishing your goal.
i like chatgpt
i like chatgpt4mo ago
using System.Text.RegularExpressions;
string input =
@"
<doc att0 = "" att0 value "" att1 = "" att1 value "" att2="" att2 value ""> Something 1</doc>
<temp att3 = "" att3 value "" att4="" att4 value "" />
<nested att5 = "" att5 value "">
<nested.nested att6 = "" att6 value "" att7="" att7 value "">
<nested.nested.nested att8 = "" att8 value "" />
</nested.nested>
</nested>
";

string pattern =
@"(?x)
<
(?<tag> \w[\w.]+)
(?: \s+ (?<attr>\w+) \s* = \s* "" [^""]* "" )*
\s*
/?
>
";

foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine($"Tag: {match.Groups["tag"].Value}");
foreach (Capture capture in match.Groups["attr"].Captures)
{
Console.WriteLine($"\tAttribute: {capture.Value}");
}
}
using System.Text.RegularExpressions;
string input =
@"
<doc att0 = "" att0 value "" att1 = "" att1 value "" att2="" att2 value ""> Something 1</doc>
<temp att3 = "" att3 value "" att4="" att4 value "" />
<nested att5 = "" att5 value "">
<nested.nested att6 = "" att6 value "" att7="" att7 value "">
<nested.nested.nested att8 = "" att8 value "" />
</nested.nested>
</nested>
";

string pattern =
@"(?x)
<
(?<tag> \w[\w.]+)
(?: \s+ (?<attr>\w+) \s* = \s* "" [^""]* "" )*
\s*
/?
>
";

foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine($"Tag: {match.Groups["tag"].Value}");
foreach (Capture capture in match.Groups["attr"].Captures)
{
Console.WriteLine($"\tAttribute: {capture.Value}");
}
}
No description
Unknown User
Unknown User4mo ago
Message Not Public
Sign In & Join Server To View
Quark
Quark4mo ago
@talk is cheap, show me the codethx it work, i see my error comparing with my code
i like chatgpt
i like chatgpt4mo ago
You can close this question by typing /close and pressing enter.
Quark
Quark4mo ago
hip hip hip oura for @talk is cheap, show me the code