I have code of the form
String[] splitValues = s.split("\\u0000");
that is called a lot. When I profiled it, I saw that each call was a regex (Pattern) to be compiled and run this was causing a significant performance impact.
I can easily compile the pattern just once but then running split still takes up significant CPU.
I then looked at code for String,split() and it does optimizations if just passed a single char or backslash char but it not working for me because I specify null as \u0000
, but I can't see how else I can do it,
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.length() == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
How can I split by null separator without need to use regular expression ?
Replacing
String[] splitValues = s.split("\\u0000");
with
String[] splitValues = s.split("\0");
continues to work, but importantly allows String.split()
to use its fastpath and so the split works without requiring the use of regular expressions.
What I am finding slightly confusing is why I had a \\
originally because doesn't that mean the \ is treated as a literal backslash and therefore the u0000 would not be treated as unicode char ?