I'm looking for a Raku regex that matches two or more consecutive dots unless there are three dots.
These strings should match:
Yes.. Please go away.
I have the ball..
In this case....I vote yes.
These strings should not match:
You said this because...?
I dream...
Suppose...oh, wait.
I have difficulty reading Perl constructs such as negative lookahead assertions, and my understanding is they don't work in this case anyway:
\.\.(?!\.)
So, yeah, I'm hoping for something that might be more readable in Raku. Thx.
Turns out this is pretty simple in Raku:
/ [ ^ | <-[.]> ] [ \. ** 4..* | \. ** 2 ] [ <-[.]> | $ ] /
Reading the code, the regex asks for three []
groupings:
a []
grouping of ^
start-of-line |
or <-[.]>
a custom -
negative character class consisting of anything-other-than .
dot, immediately followed by
a []
grouping of 4..*
four-or-more .
dots |
or 2
two-dots, immediately followed by
a []
grouping of <-[.]>
a custom -
negative character class consisting of anything-other-than .
dot |
or $
end-of-line.
Sample Input (1):
These strings should match:
Yes.. Please go away.
I have the ball..
In this case....I vote yes.
These strings should not match:
You said this because...?
I dream...
Suppose...oh, wait.
Sample Output (using Raku one-liner):
~$ raku -ne '.put if / [ ^ | <-[.]> ] [ \. ** 4..* | \. ** 2 ] [ <-[.]> | $ ] /;' file
Yes.. Please go away.
I have the ball..
In this case....I vote yes.
Additional notes: Above is a grep-like answer that returns lines if they contain a desired dot-length( 2 dots, 4-or-more dots) What to do if an undesired dot-length (e.g. 3 dots) is present in the same line as a desired dot-length? Below shows code that removes these (complex) lines:
~$ raku -ne '.put if / \. ** 2..* / and not / [ ^ | <-[.]> ] \. ** 3 [ <-[.]> | $ ] /;' file
The (second) Raku code block immediately above removes each of the lines below:
These strings should not / do not match:
You said this because...?
I dream...
Suppose...oh, wait.
foo...bar.. (three-dots-then-two)
one period should not match.
...
...foo
bar...
foo..bar...baz (two-dots-then-three)