String s[] = {"\\s","\\S"};
for (int i=0;i<s.length;i++)
{
System.out.println("");
Pattern p2 = Pattern.compile(s[i]);
String tobeMatched = "sabc" + "\t" + "\t"+"abc dfg";
Matcher m2 = p2.matcher(tobeMatched);
System.out.println("expression:" + m2.pattern());
System.out.println(tobeMatched);
System.out.println("012345678901234567890123456789");
System.out.print("Position found:");
while (m2.find())
{
System.out.print(m2.start()); System.out.print(" ");
}
}
If you see the output below -when you print the tobeSearched string the TAB character takes 4 spaces but matcher.find() returns only 1 space for TAB and second TAB is returned at position 5. I was expecting spaces including TAB to be found at position 4,8 and 9 by matcher.start().
Can someone explain the logic used by matcher.find() and matcher.start() here for the TAB character.
Output
expression:\s
sabc abc dfg
012345678901234567890123456789
Position found:4 5 9
expression:\S
sabc abc dfg
012345678901234567890123456789
Position found:0 1 2 3 6 7 8 10 11 12
I think you misunderstood \s. \s does not stands for "space", but for "whitespace" character.
Space and tab are both one whitespace characters. In ASCII, space has the charcode 0x20 and tab has 0x0b.