such as .*ab
For this regex, I can build an NFA using the Thompson algorithm like:
When I try to use the subset construction algorithm to change it to a DFA, I find the result is weird.
the DFA graph:
I try to use it to match aaab
, but faild
When the DFA is in state "A" and reads a character a
, I don't know which is the next state.
If I choose "B" as the next state, it means that <any>
condition has a higher priority; thus, the DFA will circulate in state "B"
If I choose "C" as the next state, it means a
condition has a higher priorty and this DFA can not match aaaab
I have check the Compilers: Principles,Techniques,and Tools, but the book do not talk about how to handle the .
I believe I built a wrong DFA (or NFA), but I can't find where I went wrong.
The NFA is correct, but the construction table you have created has the following issues:
<any>
is a set of characters that also includes the characters a
and b
, but that makes the table ambiguous. The columns that describe the targets of transitions should be mutually exclusive, i.e. there should be no doubt which column to choose, but this is not the case for the characters a
and b
.
A solution is to rename the middle column to <other-than-a-b>
.
Related to this, is that when the input symbol is b
, you may follow the <any>
transition in the NFA diagram, which you have not taken into account when filling in the last column in the table.
Specifically, in the first data row of the table, the b
column should not be empty, as you can arrive in state 3 with it (and thus also 2, 4 and 5).
Also, when arriving with a
in states 6 and 7, you also have the possibility to still follow the <any>
transition instead, and stay in states 2, 3, 4, 5. This means there is no DFA state that matches only with NFA-states {6,7}.
And still related to this, when you reach state 8, you could have used the same characters to still be in the <any>
loop, so that 8 should be taken together with 2,3,4,5. You could still get more input, so there are outgoing transitions (out of the corresponding DFA state that will be an accept state because of the NFA 8-state).
Here is the corrected table:
NFA state | DFA state | <other-than-a-b> |
a |
b |
---|---|---|---|---|
{1,2,4,5} | A | B | C | B |
{2,3,4,5} | B | B | C | B |
{2,3,4,5,6,7} | C | B | C | D |
{2,3,4,5,8} | D | B | C | B |
You can now make the DFA diagram from this:
Here the labels "not a" are short for "b
or <other-than-a-b>
".
This diagram can still be simplified: the states A and B can be merged.