I'm preparing for OCPJP exam and I ran into the following example:
class Test {
public static void main(String args[]) {
String test = "I am preparing for OCPJP";
String[] tokens = test.split("\\S");
System.out.println(tokens.length);
}
}
This code prints 16. I was expecting something like no_of_characters + 1. Can someone explain me, what does the split() method actually do in this case? I just don't get it...
It splits on every "\\S"
which in regex engine represents \S
non-whitespace character.
So lets try to split "x x"
on non-whitespace (\S
). Since this regex can be matched by one character lets iterate over them to mark places of split (we will use pipe |
for that).
'x'
non-whitespace? YES, so lets mark it | x
' '
non-whitespace? NO, so we leave it as is'x'
non-whitespace? YES, so lets mark it | |
So as result we need to split our string at start and at end which initially gives us result array
["", " ", ""]
^ ^ - here we split
But since trailing empty strings are removed, result would be
[""," "] <- result
,""] <- removed trailing empty string
so split returns array ["", " "]
which contains only two elements.
BTW. To turn off removing last empty strings you need to use split(regex,limit)
with negative value of limit like split("\\S",-1)
.
Now lets get back to your example. In case of your data you are splitting on each of
I am preparing for OCPJP
| || ||||||||| ||| |||||
which means
""|" "|""|" "|""|""|""|""|""|""|""|""|" "|""|""|" "|""|""|""|""|""
So this represents this array
[""," ",""," ","","","","","","","",""," ","",""," ","","","","",""]
but since trailing empty strings ""
are removed (if their existence was caused by split - more info at: Confusing output from String.split)
[""," ",""," ","","","","","","","",""," ","",""," ","","","","",""]
^^ ^^ ^^ ^^ ^^
you are getting as result array which contains only this part:
[""," ",""," ","","","","","","","",""," ","",""," "]
which are exactly 16 elements.