I am trying to build a parser for a sonar-plugin where the tokens can contain spaces and tabs in order to use them for implementing a checking rule for spaces. Therefore, I want to store them to different tokens.
I set the space and tab as TokenType:
.withChannel(regexp(TokenType.TAB, "\t"))
.withChannel(regexp(TokenType.WHITESPACE, "\\s"))
But, tabs are regarded as spaces tokens as well,
because in Java the regexp for /s
matches any white space character (space, tab, line break, carriage return)
What's the right regexp to discriminate tabs from spaces?
With:
.withChannel(new BlackHoleChannel("\n")) //removes newlines from source code
.withChannel(regexp(TclTokenType.TAB, "\t")) //matches tabs
.withChannel(regexp(TokenType.WHITESPACE," ")) //matches spaces
Spaces are matched correctly, and tabs are recognized. The key is on the BlackHoleChannel
.
This is FILIaS's solution from revision 15 of the question.