regexregex-lookaroundspcrebackreference

PCRE Regex - Backreference not working inside lookahead or after pipe


My regex query is the following (demo):

(?'a'[~_])(?=(?!\s)(?:(?!\k'a').)+(?<!\s)\k'a')|(?=(?=(?'b'[\s\S]*))(?'c'\k'a'(?!\s)(?:(?!\k'a').)+(?<!\s)(?=\k'b'\z)|(?<=(?=x^|(?&c))[\s\S])))\k'a'

The problem I'm facing is that backreferences to the named capture group (?'a'~_) fail to match in the part of the query on the right side of the main pipe:

(?=(?=(?'b'[\s\S]*))(?'c'\k'a'(?!\s)(?:(?!\k'a').)+(?<!\s)(?=\k'b'\z)|(?<=(?=x^|(?&c))[\s\S])))\k'a'

They do however work on the part to the left of the pipe:

(?'a'[~_])(?=(?!\s)(?:(?!\k'a').)+(?<!\s)\k'a')

The purpose of the query is to match only the surrounding delimiters of strings such as ~test~ or _test_, with a few additional criteria, which it does by first matching the opening delimiter with a lookahead (demo), and then using a variable length lookbehind to match the closing delimiter (demo with literals instead of backreferences).

While I am aware the query could be wildly simplified using \K or capture groups, neither are an option for me.


Solution

  • Your regex is great. You can just correct it a little.

    (?'a'[~_])(?=
       (?'d'(?!\s)(?:(?!\k'a').)+(?<!\s)\k'a') |
       (?=(?'b'.*))(?'c'
          ^(?>\k'a'(?&d)|.)*\k'a'(?&d)(?=\k'b'\z) |
          (?<=(?=x^|(?&c)).)
       )
    )
    

    Demo

    But I think that the performance of such a regex will be low.