javaregexcustomization

Can I define custom character class shorthands?


Java provides some useful character classes like \d and \w. Can I define my own character classes? For example, it would be useful to be able to define shorthands for character classes like [A-Za-z_].


Solution

  • Can I define my own character classes?

    No, you can't.

    Personally, when I have a (slightly) complicated regex, I break the regex up in smaller sub-regexes and then "glue" them together with a String.format(...) like this:

    public static boolean isValidIP4(String address) {
        String block_0_255 = "(0|[1-9]\\d|2[0-4]\\d|25[0-5])";
        String regex = String.format(
                "%s(\\.%s){3}", 
                block_0_255, block_0_255
        );
        return address.matches(regex);
    }
    

    which is far more readable than a single pattern:

    "(0|[1-9]\\d|2[0-4]\\d|25[0-5])(\\.(0|[1-9]\\d|2[0-4]\\d|25[0-5])){3}"
    

    Note that this is just a quick example: validating IP addresses can probably better be done by a class from the java.net package, and if you'd do it like that, the pattern should be placed outside the method and pre-compiled.

    Be careful with % signs inside your pattern!