javaregexstring

Java Pattern for Word without Spaces


I am wondering what the regex for a word would be, I can seem to find it anywhere? The string I\m trying to match "Loop-num + 5" and I want to extract the "Loop-num" part. I am unsure what the regex would be to do so.

Pattern pattern = Pattern.compile("(loop-.*)");
Matcher matcher = pattern.matcher("5 * loop-num + 5");
if(matcher.find()){
    String extractedString = matcher.group(1);
    System.out.println(extractedString);
}

From this I get: "loop-num + 5"


Solution

  • If you really plan to use the regex to match words (entities comprising just letters, optionally split with hyphen(s)), you need to consider the following regex:

    \b\pL+(?:-\pL+)*\b
    

    See regex demo

    Explanation:

    In Java:

    Pattern pattern = Pattern.compile("\\b\\pL+(?:-\\pL+)*\\b", Pattern.UNICODE_CHARACTER_CLASS);
    Matcher matcher = pattern.matcher("5 * loop-num + 5");
    if(matcher.find()){
        String extractedString = matcher.group(0);
        System.out.println(extractedString);
    }
    

    Note: in case words may include digits (not at the starting positions), you can use \b\pL\w*(?:-\pL\w*)*\b with Pattern.UNICODE_CHARACTER_CLASS. Here, \w will match letters, digits and an underscore.