javaregexword-boundaries

Java RegEx Syntax with Word Boundaries?


Firstly, my syntax will not be part of a script as such but it will be parsed via a form input--so any 'existing' solution pointing to Java code will not apply per se.

Okay, so here is what I need to do: I need to be able to input a term like:

'This is your airport and this is your car.' into an input field in such a way that only the word 'airport' or 'airports' to be matched. So nothing like '99airport' or 'airport99' should be matched. And I am close!

(?i).*\bair[port|ports].*

If I input the above as RegEx in a test site:

http://www.ocpsoft.org/tutorials/regular-expressions/java-visual-regex-tester/#!;t=123-45-6789%0A9876-5-4321%0A987-65-4321%20(attack)%0A987-65-4321%20%0A192-83-7465&r=(%3Fm)%5E(%5Cd%7B3%7D-%3F%5Cd%7B2%7D-%3F%5Cd%7B4%7D)%24&x=Found%20good%20SSN%3A%20%241

then, indeed, '99airport' does not match because of the beginning use of the Word Boundary identifier \b ; However, I don't know how to put the \b around the ending of the word so that 'airport99' also does not match. I have tried a few things but no luck. I think it is the syntax to be put around the [] which needs to be figured out.

And please don't pay too much attention to what needs to be matched or not--these are just random words. Currently, if my input has 'airport99' it does get matched but it shouldn't if I can figure out a solution.

Thanks!


Solution

  • I see you are using mather.matches to check for a word inside the input string. That is why you need the .* before and after a keyword. Since the text is coming from an input field, you do not need to match newline symbols, and no need in (?s) singleline/dotall modifier.

    However, you mistake character classes ([...]) with groups ((...)). Character classes match 1 character. For example, [port|ports] matches 1 character, either p, o, r, t, |, or s. Groups can be used to match specific sequences of symbols. E.g. (port|ports) will match either port or ports.

    Thus, in your case, you can use

    (?i).*\bairports?\b.*
    

    or - less effecient -

    (?i).*\bair(port|ports)\b.*
    

    In Java, String patrn = "(?i).*\\bairports?\\b.*";