Using grep
(GNU grep 3.3) to search for all words with three consecutive double-letters (resulting in "bookkeeper"):
grep -E "((.)\2){3}" /usr/share/dict/american-english
Changing this to search for words with three double-letters, each of them followed by the letter "i" (resulting in "Mississippi"):
grep -E "((.)\2i){3}" /usr/share/dict/american-english
Changing this to search for words with three double-letters, each of them followed by any single letter (with a couple of results):
grep -E "((.)\2.){3}" /usr/share/dict/american-english
Changing this to search for words with three double-letters separated by an optional, single letter (even more results):
grep -E "((.)\2.?){3}" /usr/share/dict/american-english
Now, finally, my original task: Search for all words containing three double-letters:
grep -E "((.)\2.*){3}" /usr/share/dict/american-english
But this results in an empty set. Why? How can .?
match something .*
does not?
The POSIX regex engine does not handle patterns with back-references well, matching back references is an NP complete problem might provide some hints on why it is that difficult.
Since you are using a GNU grep
, the problem is easily solved with the PCRE engine,
grep -P '((.)\2.*){3}' file
since the PCRE engine can handle back-references in a more efficient way than the POSIX regex engine.
See the online demo.