javaregex

RegExp for whitespaces without tabs


I am trying to build a parser for a sonar-plugin where the tokens can contain spaces and tabs in order to use them for implementing a checking rule for spaces. Therefore, I want to store them to different tokens.

I set the space and tab as TokenType:

    .withChannel(regexp(TokenType.TAB, "\t"))
    .withChannel(regexp(TokenType.WHITESPACE, "\\s"))

But, tabs are regarded as spaces tokens as well, because in Java the regexp for /s matches any white space character (space, tab, line break, carriage return)

What's the right regexp to discriminate tabs from spaces?


Solution

  • With:

    .withChannel(new BlackHoleChannel("\n"))        //removes newlines from source code
    .withChannel(regexp(TclTokenType.TAB, "\t"))    //matches tabs
    .withChannel(regexp(TokenType.WHITESPACE," "))  //matches spaces
    

    Spaces are matched correctly, and tabs are recognized. The key is on the BlackHoleChannel.

    This is FILIaS's solution from revision 15 of the question.