My regex query is the following (demo):
(?'a'[~_])(?=(?!\s)(?:(?!\k'a').)+(?<!\s)\k'a')|(?=(?=(?'b'[\s\S]*))(?'c'\k'a'(?!\s)(?:(?!\k'a').)+(?<!\s)(?=\k'b'\z)|(?<=(?=x^|(?&c))[\s\S])))\k'a'
The problem I'm facing is that backreferences to the named capture group (?'a'~_)
fail to match in the part of the query on the right side of the main pipe:
(?=(?=(?'b'[\s\S]*))(?'c'\k'a'(?!\s)(?:(?!\k'a').)+(?<!\s)(?=\k'b'\z)|(?<=(?=x^|(?&c))[\s\S])))\k'a'
They do however work on the part to the left of the pipe:
(?'a'[~_])(?=(?!\s)(?:(?!\k'a').)+(?<!\s)\k'a')
The purpose of the query is to match only the surrounding delimiters of strings such as ~test~
or _test_
, with a few additional criteria, which it does by first matching the opening delimiter with a lookahead (demo), and then using a variable length lookbehind to match the closing delimiter (demo with literals instead of backreferences).
While I am aware the query could be wildly simplified using \K
or capture groups, neither are an option for me.
Your regex is great. You can just correct it a little.
(?'a'[~_])(?=
(?'d'(?!\s)(?:(?!\k'a').)+(?<!\s)\k'a') |
(?=(?'b'.*))(?'c'
^(?>\k'a'(?&d)|.)*\k'a'(?&d)(?=\k'b'\z) |
(?<=(?=x^|(?&c)).)
)
)
But I think that the performance of such a regex will be low.