How can I rewrite my anchor to be more general and correct in all situations? I have understood that using \b
as an anchor is not optimal because it is implementation-dependent.
My goal is to match some type of word in a text file. For my question, the word to match is not of importance.
Assume \b
is the word boundary anchor and a word character is [a-zA-Z0-9_]
I constructed two anchors, one for the left and one for the right side of the regex. Notice how I handle the underscore, as I don't want it to be a word character when I read my text file.
(?<=\b|_)
positive lookbehind(?=\b|_)
positive lookaheadWhat would be the equivalent anchor constructs but using the more general caret ^
and $
dollar sign to get the same effect?
[The OP did not specify which regex language they are using. This answer uses Perl's regex language, but the final solution should be easy to translate into other languages. Also, I use whitespace as if the x
flag was provided, but that is also easily adjusted.]
With the help of a comment made by the OP, the following is my understanding of the question:
I have something like
\b\w+\b
, but I want to exclude_
the definition of a word.
You can use the following:
(?<! [^\W_] ) [^\W_]+ (?! [^\W_] )
An explanation follows.
\b
is equivalent to (?: (?<!\w)(?=\w) | (?<=\w)(?!\w) )
.
\b \w+ \b
is therefore equivalent to (?<!\w) \w+ (?!\w)
(after simplification).
So now we just need a pattern that matches everything \w
matches but _
. There are a few approaches that can be taken.
(?[ \w - [_] ])
(?!_)\w
\w(?<!_)
[^\W_]
Even though it's the least readable, I'm going to use the last one since it's the best supported.
We now have
(?<! [^\W_] ) [^\W_]+ (?! [^\W_] )