pythonregexrecursive-regex

Matching Nested Structures With Regular Expressions in Python


I seem to remember that Regular Expressions in DotNet have a special mechanism that allows for the correct matching of nested structures, like the grouping in "( (a ( ( c ) b ) ) ( d ) e )".

What is the python equivalent of this feature? Can this be achieved using regular expressions with some workaround? (Though it seems to be the sort of problem that current implementations of regex aren't designed for)


Solution

  • You can't do this generally using Python regular expressions. (.NET regular expressions have been extended with "balancing groups" which is what allows nested matches.)

    However, PyParsing is a very nice package for this type of thing:

    from pyparsing import nestedExpr
    
    data = "( (a ( ( c ) b ) ) ( d ) e )"
    print nestedExpr().parseString(data).asList()
    

    The output is:

    [[['a', [['c'], 'b']], ['d'], 'e']]
    

    More on PyParsing: