c++antlr

Ignore certain unexpected literal in ANTLR


I need to parse C++ files (cpp, hpp, h) to extract some comments from it. I've decided to use this grammar.

As it done in the repo all block comments are skipped:

BlockComment: '/*' .*? '*/' -> skip;

But I want to do something like that: functionDefinition : attributeSpecifierSeq? declSpecifierSeq? declarator virtualSpecifierSeq? BlockComment? functionBody ;

Now, because all block comments are skipped no BlockComment appears in the AST obviously.

Ideally, solution must not be tied to any target programming languge, but If it is not possible, my target language is C++.

What I want to achieve: ignore all "unexpected" BlockComments (e.g. in the middle of function body), but parse the ones that follow function definition.

I've tried to achieve that behviour using modes and predicates, but couldn't succeed.


Solution

  • You only need to change -> skip to -> channel(HIDDEN):

    BlockComment: '/*' .*? '*/' -> channel(HIDDEN);
    

    After that, you can create a listener and override the method that gets triggered when a functionDefinition is entered. Inside that method, you look back 1 token before the functionBody in the token-stream and check if it is a block comment.

    Here's a quick demo:

    public class CPPDemo {
    
        public static void main(String[] args) {
            String source = "void somethingWithoutCommentBlock() {\n" +
                    "}\n" +
                    "\n" +
                    "void somethingWithCommentBlock() /* block comment */ {\n" +
                    "}";
    
            CPP14Lexer lexer = new CPP14Lexer(CharStreams.fromString(source));
            CommonTokenStream tokenStream = new CommonTokenStream(lexer);
            CPP14Parser parser = new CPP14Parser(tokenStream);
    
            ParseTreeWalker.DEFAULT.walk(new CPP14ParserBaseListener() {
    
                final List<Token> tokens = tokenStream.getTokens();
    
                // functionDefinition
                //    : attributeSpecifierSeq? declSpecifierSeq? declarator virtualSpecifierSeq? functionBody
                //    ;
                @Override
                public void enterFunctionDefinition(CPP14Parser.FunctionDefinitionContext ctx) {
                    int index = tokens.indexOf(ctx.functionBody().start);
                    Token beforeFunctionBody = tokens.get(index - 1);
    
                    if (beforeFunctionBody.getType() == CPP14Lexer.BlockComment) {
                        System.out.println(ctx.declarator().getText() + ": " + beforeFunctionBody);
                    }
                }
            }, parser.translationUnit());
        }
    }
    

    Running the class above will print:

    somethingWithCommentBlock(): [@6,57:75='/* block comment */',<144>,channel=1,5:0]