javaregex

In Java how do I split a String by null char without using regex


I have code of the form

 String[] splitValues = s.split("\\u0000");

that is called a lot. When I profiled it, I saw that each call was a regex (Pattern) to be compiled and run this was causing a significant performance impact.

I can easily compile the pattern just once but then running split still takes up significant CPU.

I then looked at code for String,split() and it does optimizations if just passed a single char or backslash char but it not working for me because I specify null as \u0000, but I can't see how else I can do it,

public String[] split(String regex, int limit) {
        /* fastpath if the regex is a
         (1)one-char String and this character is not one of the
            RegEx's meta characters ".$|()[{^?*+\\", or
         (2)two-char String and the first char is the backslash and
            the second is not the ascii digit or ascii letter.
         */
        char ch = 0;
        if (((regex.length() == 1 &&
             ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
             (regex.length() == 2 &&
              regex.charAt(0) == '\\' &&
              (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
              ((ch-'a')|('z'-ch)) < 0 &&
              ((ch-'A')|('Z'-ch)) < 0)) &&
            (ch < Character.MIN_HIGH_SURROGATE ||
             ch > Character.MAX_LOW_SURROGATE))
        {

How can I split by null separator without need to use regular expression ?


Solution

  • Replacing

    String[] splitValues = s.split("\\u0000");
    

    with

    String[] splitValues = s.split("\0");
    

    continues to work, but importantly allows String.split() to use its fastpath and so the split works without requiring the use of regular expressions.

    What I am finding slightly confusing is why I had a \\ originally because doesn't that mean the \ is treated as a literal backslash and therefore the u0000 would not be treated as unicode char ?