regexcontext-free-grammarfinite-automatasymmetric-difference

Can a RegEx with negative lookahead be represented as a finite automaton?


I'm working on a tool for manipulating context-free languages, and the internal representation of a grammar is stored as a finite automaton. Looking farther into EBNF and RegEx, I learned that EBNF has "exceptions" and RegEx has negative lookahead. I can see how these might be modeled by a symmetric difference NFA, but had suspected they are beyond the capabilities of a regular DFA or NFA.

But then I came across this which pretty plainly suggests that I was wrong. Can anyone point to a free resource that might show how to convert an EBNF with exceptions, or a RegEx with negative lookahead, into a DFA?


Solution

  • You can replace a negative lookahead with a positive lookahead on the full set of possible matches minus the negative lookahead match. E.g. if we were working with single characters a-z as the match space, /what(?!n)/ is the same as /what(?=[a-mo-z]|$)/ (the $ is needed as end-of-string is one of the possible matches). This is OK in theory but not so great in practice when dealing with larger matches, like /afraid of (?!chinese)/.

    https://cs.stackexchange.com/questions/2557/how-to-simulate-backreferences-lookaheads-and-lookbehinds-in-finite-state-auto gives a good treatment of converting lookaheads into something DFA-like, with an important caveat at the end.