I'm using regex found here (link) to extract domain string that works fine.
the regex is
^((?!-)[A-Za-z0-9-]{1,63}(?<!-)\\.)+[A-Za-z]{2,6}$
I'm wondering, how could I change it in order to match domain which contains a non printable character instead of dot (.) ?
I know that regex code are like \x01, \x02, etc.. but if I replace dot with one of them, the regex doesn't match anymore
thanks in advance
. will match any single character regardless of whether it is printable. Your current group [A-Za-z0-9-] restricts it. You could change this to "any character except literal dot"... i.e. [^.].
Pattern regex = Pattern.compile("^((?!-)[^.]{1,63}(?<!-)\\.)+[A-Za-z]{2,6}$");
System.out.println(regex.matcher("\u0001\u0002\u0003\u0004..com").find()); // => false
System.out.println(regex.matcher("\u0001\u0002\u0003\u0004.com").find()); // => true
System.out.println(regex.matcher("google.com").find()); // => true
If you're attempting to validate user entry of IDNs (international domain names), note note that there are new gTLDs that contain non alphanumeric characters Example .شبكة (.network).