regexperlregex-recursion

Unexpected behavior around recursive regex


I am trying to match C++ argument type which can contain balanced <and > characters.

With this regex: (\<(?>[^<>]|(?R))*\>)

On this string: QMap<QgsFeatureId, QPair<QMap<Something, Complex> >>

It matches all expect the first 4 characters (QMap).

Now, if I add \w+ at the start of my regex, it now only matches the end of it (QPair<QMap<Something, Complex> >>) and not the whole string.

What is the explanation and how to solve this?

You can try it online here.

This is intented to use in Perl 5.10+ (5.24).


Solution

  • The (?R) construct recurses the entire pattern. When you add \w+ at the start, it is also accounted for when the recursion takes place. However, what you want to recurse is the Group 1 subpattern.

    You need a subroutine call that will recurse the capturing group subpattern:

    (\w+)(<(?:[^<>]++|(?2))*>)
    

    See the regex demo

    Details

    Results:

    Match:   QMap<QgsFeatureId, QPair<QMfap<Something, Complex> >>
    Group 1: QMap
    Group 2: <QgsFeatureId, QPair<QMfap<Something, Complex> >>