regexraku

"The Best Regex Trick" in Raku


The Best Regex Trick is about writing regexes that match r1 but not r2. The example they give is a regex that matches Tarzan (and "Tarzan and Jane") but not "Tarzan". After going through some things that don't work, they give the "best regex trick ever":

"Tarzan"|(Tarzan)

This supposedly matches the "bad string" first, skipping the good string, but not including the bad string in a capture group. If only the good string appears, we match on it last and include it in the capture group.

One downside of the "best regex trick" is that this still matches "Tarzan", even if it doesn't capture it. You can't eg use it in a conditional without some extra boilerplate?

This is based on PCRE-style regexes. Raku uses an entirely different regex notation. Is it possible to do the trick more simply? Ideally this should be possible:

> ('"Tarzan"', 'Tarzan', '"Tarzan and Jane"') <<~~>> /some-regex/
(Nil 「Tarzan」 「Tarzan」)

Solution

  • Direct equivalent of "The Best Regex Trick"

    say (« '"Tarzan"' Tarzan '"Tarzan and Jane"' » «~~» /'"Tarzan"' | (Tarzan)/)»[0]
    
    # (Nil 「Tarzan」 「Tarzan」)
    

    Discussion of the code, reading from right to left:


    is it possible to do the trick more simply?

    A step that's arguably in the right direction is:

    say « '"Tarzan"' Tarzan '"Tarzan and Jane"' » «~~» / '"Tarzan"' <()> | <(Tarzan)> /
    
    # (「」 「Tarzan」 「Tarzan」)
    

    Notes:


    The original specification for Raku regexes included some relevant features that have not yet been implemented. It's possible they may one day be implemented and provide a simpler solution.


    See also @tshiono's answer.


    See also @alatennaub's excellent answer on reddit.