regexboost-regex

Invalid Boost Regex Lookbehind with OR and ^


I'm having an issue with boost regex and suspect its a bug, but knew someone here would know for sure and if there's a workaround

I'm checking the start of a selection for start of string, white-space or an underscore using

(?<=^|\s|_)

However under boost this creates an error:

ERROR: Bad regular expression at char 0. Invalid lookbehind assertion encountered in the regular expression.

Without the ^, all is well and similarly with just the ^ its fine.

Any help getting around this would be greatly received.

Cheers


Solution

  • Brief

    The code you presented (?<=^|\s|_) is a lookbehind using 3 possibilities:

    1. ^ Assert position at start of the line
    2. \s Match any whitespace character
    3. _ Match the underscore character literally

    Note that with the above, 2. and 3. are identical in the number of characters that it will match: One, while 1. will match zero characters (position assertion).

    Since 1. is of width 0, and 2. and 3. are of width 1, this causes the lookbehind to be of variable width. Some regex flavours will permit subtleties such as assertions to be used alongside fixed width matches, while others will not.

    Typically, in lookbehinds, any quantifiers or variations thereof where matches don't share the same length (variable length) causes errors as you've seen.

    Solution

    Some regex flavours will permit your code to run, while others will not. For regex flavours that do not permit this sort of behaviour, workarounds should be used.

    For your specific case, you can likely use the following regex to solve your issue

    (?:^|(?<=\s|_))