I was wondering if it was possible to call a subroutine but not capture the result of that call.
For instance, let's say I want to recursively match and capture a balanced bracket {} structure like
{dfsdf{sdfdf{ {dfsdf} }}dfsf}
I could use this regex:
(^(?'nest'\{(?>[^{}]|(?&nest))*\}))
the first group is what I want to capture.
However my definition of 'nest':
(?'nest' ... )
and my recursive call to the 'nest' subroutine:
(?&nest)
are also capturing groups. I would like to make my regex more efficient and save space by not capturing those groups. Is there any way to do this?
edit: I expect it's impossible to not capture a subroutine definition, since its pattern needs to be captured for use elsewhere.
edit2:
I'm testing this regex with boost::regex as well as notepad++ regex. They actually appear define different capturing groups which is odd to me. I'm under the impression that they both use Perl regex by default.
Anyway, upon asking the question, I had the regex:
^\w+\s+[^\s]+\s+(?'header'(?'nest'\{(?>[^{}]|(?&nest))*\}))(?>\s+[^\s]+){5}\s+(?'data'(?>\{(?>[^{}]|(?&nest))*\}))\s+(?'class'(?>\{(?>[^{}]|(?&nest))*\}))
which I later realized contained needless characters that 'nest' already encapsulated. And I now have:
^\w+\s+[^\s]+\s+(?'nest'\{(?>[^{}]|(?&nest))*\})(?>\s+[^\s]+){5}\s+((?&nest))\s+((?&nest))
Notepad++ provides me with 3 capture groups when I do a replace statement
\\1: \1 \n \\2: \2 \n 3: \3 \n 4: \4
It tells me that "1 occurrence was replaced, next occurrence not found". The replacement has no text after the 4:, making me believe that the 4th capture group doesn't exist.
HOWEVER boost::regex_match returns an object with 6 positions:
0: metadata on the match
1: the entire match
2: the entire match
3: group1 from notepad++
4: group2 from notepad++
5: group3 from notepad++
I'm still trying to make send of positions 1 and 2.
edit3
I misunderstood yet another piece of the puzzle...
boost::cmatch.m_subs[i] != boost::cmatch[i]
I thought that they were equal. After some more debugging, it turns out that indexing into the object works exactly like the documentation says. But I incorrectly assumed that the object would contain a structure that mirrored what boost::cmatch[i] returned. It appears that boost::cmatch[i] first removes all entries from m_subs that have matched == false. The remaining entries line up with what boost::cmatch[i] returns.
Any subroutine placed into a (?(DEFINE).) construct won't capture anything.
If you just want to avoid having any captures, it's done like this
https://regex101.com/r/aT4TlM/1
Note the -
Subpattern definition construct
(?(DEFINE)(?'nest'\{(?>[^{}]|(?&nest))*\}))
May only be used to define functions. No matching is done in this group.
^(?&nest)(?(DEFINE)(?'nest'\{(?>[^{}]|(?&nest))*\}))
And since you have that BOS anchor there ^
it's the only way.
I.e. (?R)
is not an option.
Expanded
^
(?&nest)
(?(DEFINE)
(?'nest' # (1 start)
\{
(?>
[^{}]
| (?&nest)
)*
\}
) # (1 end)
)
Output
** Grp 0 - ( pos 0 , len 29 )
{dfsdf{sdfdf{ {dfsdf} }}dfsf}
** Grp 1 [nest] - NULL
Metrics
----------------------------------
* Format Metrics
----------------------------------
Atomic Groups = 1
Capture Groups = 1
Named = 1
Recursions = 2
Conditionals = 1
DEFINE = 1
Character Classes = 1