regexdfaformal-languagesautomaton

Formulation of language and regular expressions


I can't figure out what is the formal language and regular expression of this automaton :

DFA automaton

Screen Shot

I know that the instance of 'b' or 'a' have to be even. At first I thought the language was:

L = {(a^i)(b^j) | i(mod2) = j(mod2) = 0, i,j>=0}

But the automaton can start from 'b', so the language is incorrect. also, the regular expression i found, isn't match either ((aa)* + (bb)) -

can't get abab for example.


Solution

  • The regex I got by progressively ripping out nodes (order: 3,1,2,0) is:

    (aa|bb|(ab|ba)(bb|aa)*(ab|ba))*
    

    As far as I can tell, that's the simplest it goes. (I'd love to know if anyone has a simpler reduction—I'm actually taking a test on this stuff this week!)

    Step-by-step process

    We start off by adding a new start and accept state. Every old accept state (in this case, there's only one) gets linked to the new accept state with an ε transition:

    step 1—added new start and end states

    Next, we rip out state 3. We need to preserve all paths that run through state 3. In this case we've added a path from state 0 back to itself, paths from state 0 to state 2, and state 2 back to itself:

    step 2—removed state 3

    We do the same with state 1:

    step 3—state 1 removed

    We can simplify this a bit: we'll concatenate the looping-back transitions with commas. (At the end, this will turn into the union operator (| or etc. depending on your notation.)

    step 4—simplified

    We'll remove state 2 next, and get everything smooshed onto one big loop:

    step 5—state 2 removed

    Loops become stars; we remove the last state so we just have a transition from the start state to the end state connected with one big regular expression:

    step 6—all states removed

    And that's our regular expression!

    Language definition

    You're pretty close with the language definition. If you can allow something a little looser, it would be this:

    L = { w | w contains an even number of 'a's and 'b's }
    

    The problem with your definition is that you start the string w off with a every time, whereas the only restriction is on the parity of the number of a's and b's.