regexraku

Raku regex to match two consecutive dots but not three


I'm looking for a Raku regex that matches two or more consecutive dots unless there are three dots.

These strings should match:

Yes.. Please go away.
I have the ball..
In this case....I vote yes.

These strings should not match:

You said this because...?
I dream...
Suppose...oh, wait.

I have difficulty reading Perl constructs such as negative lookahead assertions, and my understanding is they don't work in this case anyway:

\.\.(?!\.)

So, yeah, I'm hoping for something that might be more readable in Raku. Thx.


Solution

  • Turns out this is pretty simple in Raku:

    / [ ^ | <-[.]> ] [ \. ** 4..* | \. ** 2 ] [ <-[.]> | $ ] / 
    

    Reading the code, the regex asks for three [] groupings:

    1. a [] grouping of ^ start-of-line | or <-[.]> a custom - negative character class consisting of anything-other-than . dot, immediately followed by

    2. a [] grouping of 4..* four-or-more . dots | or 2 two-dots, immediately followed by

    3. a [] grouping of <-[.]> a custom - negative character class consisting of anything-other-than . dot | or $ end-of-line.

    Sample Input (1):

    These strings should match:
    
    Yes.. Please go away.
    I have the ball..
    In this case....I vote yes.
    
    These strings should not match:
    
    You said this because...?
    I dream...
    Suppose...oh, wait.
    

    Sample Output (using Raku one-liner):

    ~$ raku -ne '.put if / [ ^ | <-[.]> ] [ \. ** 4..* | \. ** 2 ] [ <-[.]> | $ ] /;'   file
    Yes.. Please go away.
    I have the ball..
    In this case....I vote yes.
    

    Additional notes: Above is a grep-like answer that returns lines if they contain a desired dot-length( 2 dots, 4-or-more dots) What to do if an undesired dot-length (e.g. 3 dots) is present in the same line as a desired dot-length? Below shows code that removes these (complex) lines:

    ~$ raku -ne '.put if / \. ** 2..* / and not / [ ^ | <-[.]> ] \. ** 3 [ <-[.]> | $ ] /;'   file
    

    The (second) Raku code block immediately above removes each of the lines below:

    These strings should not / do not match:
    
    You said this because...?
    I dream...
    Suppose...oh, wait.
    foo...bar.. (three-dots-then-two)
    one period should not match.
    ...
    ...foo
    bar...
    foo..bar...baz (two-dots-then-three)
    

    https://docs.raku.org/language/regexes