C
C#3mo ago
Kaenguruu24

Converting Regex to a DFA

I am currently developing a small simulation in which I can create and edit DFA's (Deterministic Finite Automata). The idea came from a video I saw about regex engines and that one of those actually uses this to parse regex. I am aware that Regex isn't just "Regex" and that there are many different "flavours" but I'd like to at least implement those who are like the base thing. What I've found so far is that I should first create an NFA and then, through some algorithms, convert that one into a DFA. However, at the moment, I'm struggling with the very first thing, which is converting a regex pattern into something like a tree structure. I've searched for libraries that do this but haven't found anything that I could use. Any ideas how to "parse" regex in the sense of splitting it into "tokens"? As an example: [a-z]|[A-Z] should basically get converted to
{
"or_branches" : [
"[a-z]",
"[A-Z]"
]
}
{
"or_branches" : [
"[a-z]",
"[A-Z]"
]
}
3 Replies
Kaenguruu24
Kaenguruu243mo ago
Currently reading this: https://deniskyashif.com/2019/02/17/implementing-a-regular-expression-engine/ Which seems fairly promising
Implementing a Regular Expression Engine
In this article we learn how to implement a simple and efficient regular expression engine following the Thompson's construction algorithm.
lycian
lycian3mo ago
The dotnet implementation for parsing regex is open source, so you could use that as a reference