regexpermutationrakunegation

perl6 Negating multiple words and permutations of their chars inside a regex


What is the best way to perform, inside a regex, negation of multiple words and permutations of chars that make up those words?

For instance: I do not want

"zero dollar"
"roze dollar"
"eroz dollar"
"one dollar"
"noe dollar"
"oen dollar"

but I do want

"thousand dollar"
"million dollar"
"trillion dollar"

If I write

not m/ [one | zero] \s dollar /

it will not match permutations of chars, and the "not" function outside will make the regex match everything else like "big bang" without the "dollar" in the regex.

m/ <- [one] | [zero] > \s dollar/ # this is syntax error.

Solution

  • Using a code assertion:

    You could match any word, and then use a <!{ }> assertion to reject words that are permutations of "one" or "zero":

    say "two dollar" ~~ / :s ^ (\w+) <!{ $0.comb.sort.join eq "eno" | "eorz" }> dollar $ /;
    

    Using before/after:

    Alternatively, you could pre-generate all permutations of the disallowed words, and then reject them using a <!before > or <!after > assertion in the regex:

    my @disallowed = <one zero>.map(|*.comb.permutations)».join.unique;
    
    say "two dollar" ~~ / :s ^ <!before @disallowed>\w+ dollar $ /;
    say "two dollar" ~~ / :s ^ \w+<!after @disallowed> dollar $ /;