regexword-boundary

What is a word boundary in regex?


I'm trying to use regexes to match space-separated numbers. I can't find a precise definition of \b ("word boundary"). I had assumed that -12 would be an "integer word" (matched by \b\-?\d+\b) but it appears that this does not work. I'd be grateful to know of ways of .

[I am using Java regexes in Java 1.6]

Example:

Pattern pattern = Pattern.compile("\\s*\\b\\-?\\d+\\s*");
String plus = " 12 ";
System.out.println("" + pattern.matcher(plus).matches());

String minus = " -12 ";
System.out.println("" + pattern.matcher(minus).matches());

pattern = Pattern.compile("\\s*\\-?\\d+\\s*");
System.out.println("" + pattern.matcher(minus).matches());

This returns:

true
false
true

Solution

  • A word boundary, in most regex dialects, is a position between \w and \W (non-word char), or at the beginning or end of a string if it begins or ends (respectively) with a word character ([0-9A-Za-z_]).

    So, in the string "-12", it would match before the 1 or after the 2. The dash is not a word character.