javaregexstring

use regex in java with non printable chars


I'm using regex found here (link) to extract domain string that works fine.

the regex is

^((?!-)[A-Za-z0-9-]{1,63}(?<!-)\\.)+[A-Za-z]{2,6}$

I'm wondering, how could I change it in order to match domain which contains a non printable character instead of dot (.) ?

I know that regex code are like \x01, \x02, etc.. but if I replace dot with one of them, the regex doesn't match anymore

thanks in advance


Solution

  • . will match any single character regardless of whether it is printable. Your current group [A-Za-z0-9-] restricts it. You could change this to "any character except literal dot"... i.e. [^.].

    Pattern regex = Pattern.compile("^((?!-)[^.]{1,63}(?<!-)\\.)+[A-Za-z]{2,6}$");
    System.out.println(regex.matcher("\u0001\u0002\u0003\u0004..com").find()); // => false
    System.out.println(regex.matcher("\u0001\u0002\u0003\u0004.com").find()); // => true
    System.out.println(regex.matcher("google.com").find()); // => true
    

    If you're attempting to validate user entry of IDNs (international domain names), note note that there are new gTLDs that contain non alphanumeric characters Example .شبكة (.network).