regexword-boundaryword-boundaries

Regex matching on word boundary OR non-digit


I'm trying to use a Regex pattern (in Java) to find a sequence of 3 digits and only 3 digits in a row. 4 digits doesn't match, 2 digits doesn't match.

The obvious pattern to me was:

"\b(\d{3})\b"

That matches against many source string cases, such as:

">123<"
" 123-"
"123"

But it won't match against a source string of "abc123def" because the c/1 boundary and the 3/d boundary don't count as a "word boundary" match that the \b class is expecting.

I would have expected the solution to be adding a character class that includes both non-Digit (\D) and the word boundary (\b). But that appears to be illegal syntax.

"[\b\D](\d{3})[\b\D]"

Does anybody know what I could use as an expression that would extract "123" for a source string situation like:

"abc123def"

I'd appreciate any help. And yes, I realize that in Java one must double-escape the codes like \b to \b, but that's not my issue and I didn't want to limit this to Java folks.


Solution

  • You should use lookarounds for those cases:

    (?<!\d)(\d{3})(?!\d)
    

    This means match 3 digits that are NOT followed and preceded by a digit.

    Working Demo