regexrepeatregex-groupkleene-star

Using PCRE2 regex with repeating groups to find email addresses


I need to find all email addresses with an arbitrary number of alphanumeric words, separated through a period. To test the regex, I'm using the website https://regex101.com/.

The structure of a valid email addresses is word1.word2.wordN@word1.word2.wordN.word.

The regex /[a-zA-Z0-9.]+@[a-zA-Z0-9.]+.[a-zA-Z0-9]+/gm finds all email addresses included in the document string, but also includes invalid addresses like ........@....com, if present.

I tried to group the repeating parts by using round brackets and a Kleene star, but that causes the regex engine to collapse.

Invalid regex:

/([a-zA-Z0-9]+.?)*[a-zA-Z0-9]+@([a-zA-Z0-9]+.?)*[a-zA-Z0-9]+.[a-zA-Z0-9]+/gm

Although there are many posts concerning regex groups, I was unable to find an explanation, why the regex engine fails. It seems that the engine gets stuck, while trying to find a match.

How can I avoid this problem, and what is the correct solution?


Solution

  • I think the main issue that caused you troubles is:
    . (outside of []) matches any character,
    you probably meant to specify \. instead (only matches literal dot character).

    Also there is no need to make it optional with ?, because the non-dot part of your regex will just match with the alphanumerical characters anyway.

    I also reduced the right part (x*x is the same as x+), added a case-insensitive flag and ended up with this:

    /([a-z0-9]+\.)*[a-z0-9]+@([a-z0-9]+\.)+[a-z0-9]+/gmi