regexparsinggrammarcontext-free-grammarchomsky-hierarchy

Regexp parse type-3 grammar


Reading Chomsky hierarchy ... ... I know regexp can't parse type-2 grammars (context-free grammars), and also type-1 and type-0. Can regular expressions parse/catch ALL type-3 grammars (regular grammars)?


Solution

  • Yes, provided they support alternation, concatenation, and the Kleene star. This is the case for regexes of the PCRE (Perl/Java/JavaScript/PHP/...) type: alternation is implemented by ((...)|(...)), concatenation by (...)(...), and the Kleene star by (...)*. (There are a few other details — in most of these languages you need to use something like \A and \z to indicate "start-of-string" and "end-of-string", which in a regular grammar is taken for granted — but that's the idea.)

    But not everything called a "regular expression" in a programming context necessarily has all of the above; for example, POSIX Basic Regular Expressions supports only a very limited form of alternation, where all "branches" of the alternation consist of a single character (e.g., whereas PCREs has both (a|b|c) and the special-case-equivalent [abc], POSIX BREs only have [abc], so can't express something like (ab|c)).