What I want to accomplish is to match any word even if they are followed or preceded by non-alphanumeric characters.
So for example for the following string This string contains word1 and word2* and anotherword1, I would like to get two matches for word1 and word2 but not anotherword1 nor word1 in the anotherword1.
What I have right now is
\b(word1|word2)\b
but this will not match for word2 (ignoring the *).
From what I read \b only matches between an alphanumeric character and a non-alphanumeric character but I have no idea how to handle this special chars trailing my targeted words.
LE: I think (?i)(?<=^|[^a-zA-Z0-9])(word1|word2)(?=$|[^a-zA-Z0-9]) does the trick ... but does it look ok? Is it a simpler way of doing this?
You are looking for an adaptive word boundary (yes, it is my concept that I described here):
(?!\B\w)(word1|word2)(?!\B\w)
Or, if you just want to make sure there is no word char on both ends:
(?<!\w)(word1|word2)(?!\w)
The (?<!\w) and (?!\w) lookarounds are unambiguous leading ((?<!\w)) and trailing ((?!\w)) word boundaries.
The \b construct meaning depends on the context: \bw will match a w in *w as it will require a non-word character before \b, but \b\* will require a word character before * as * is a non-word character.
In languages that do not support lookbehinds, the (?<!\w) should be replaced with (^|\W) and further manipulations should be done in the code.