I need to find all email addresses with an arbitrary number of alphanumeric words, separated through a period. To test the regex, I'm using the website https://regex101.com/.
The structure of a valid email addresses is word1.word2.wordN@word1.word2.wordN.word
.
The regex /[a-zA-Z0-9.]+@[a-zA-Z0-9.]+.[a-zA-Z0-9]+/gm
finds all email addresses included in the document string, but also includes invalid addresses like ........@....com
, if present.
I tried to group the repeating parts by using round brackets and a Kleene star, but that causes the regex engine to collapse.
Invalid regex:
/([a-zA-Z0-9]+.?)*[a-zA-Z0-9]+@([a-zA-Z0-9]+.?)*[a-zA-Z0-9]+.[a-zA-Z0-9]+/gm
Although there are many posts concerning regex groups, I was unable to find an explanation, why the regex engine fails. It seems that the engine gets stuck, while trying to find a match.
How can I avoid this problem, and what is the correct solution?
I think the main issue that caused you troubles is:
.
(outside of []
) matches any character,
you probably meant to specify \.
instead (only matches literal dot character).
Also there is no need to make it optional with ?
, because the non-dot part of your regex will just match with the alphanumerical characters anyway.
I also reduced the right part (x*x
is the same as x+
), added a case-insensitive flag and ended up with this:
/([a-z0-9]+\.)*[a-z0-9]+@([a-z0-9]+\.)+[a-z0-9]+/gmi