cclanglint

clang compiler intermediate results


Problem Statement

I wish to write a code the check compliance with a custom coding standard. Aspects include:

It seems to me that the easy part is writing code to enforce the rules, and the harder part is to write a parser that is able to break the code into some hierachical data structure that allows understanding context and various C constructs. For instance:

Integration with Compiler

The above seems like quite a large task. However, it seems this is already done by the preprocessor and compiler, and as such I would want to use their parsers and analyzers. When looking online I saw the ability to invoke gcc or clang with -c, -S, and -E flags that can generate partial results.

Question

Is there a way to get all intermediate results? Is there a way to output the map file and an AST? I wish to see the parsing the of the compiler as it finds C code within various contexts.


Solution

  • With Clang, you can dump the AST from command line with:

    clang -Xclang -ast-dump -fsyntax-only file.c
    

    Here's an example:

    #include <stdio.h>
    
    int main(void) {
        printf("%x\n", 123);
    }
    

    The output looks like this:

    [...] 
    bunch of nodes coming from #include <stdio.h>
    [...]
    `-FunctionDecl 0x26ac4940 <sprintf.c:3:1, line:5:1> line:3:5 main 'int (void)'
      `-CompoundStmt 0x26ac4b00 <col:16, line:5:1>
        `-CallExpr 0x26ac4aa0 <line:4:2, col:20> 'int'
          |-ImplicitCastExpr 0x26ac4a88 <col:2> 'int (*)(const char *, ...)' <FunctionToPointerDecay>
          | `-DeclRefExpr 0x26ac49e0 <col:2> 'int (const char *, ...)' Function 0x26aaa528 'printf' 'int (const char *, ...)'
          |-ImplicitCastExpr 0x26ac4ae8 <col:9> 'const char *' <NoOp>
          | `-ImplicitCastExpr 0x26ac4ad0 <col:9> 'char *' <ArrayToPointerDecay>
          |   `-StringLiteral 0x26ac4a00 <col:9> 'char[4]' lvalue "%x\n"
          `-IntegerLiteral 0x26ac4a20 <col:17> 'int' 123
    

    This is however only really useful for human consumption. If you want to programmatically process the AST you can either dump it in JSON form with -ast-dump=json or in binary form with clang -emit-ast file.c and write your own parser, or directly use libclang to generate and explore the AST recursively.

    The libclang doc gives some examples on how this could be done. For example:

    CXIndex index = clang_createIndex(0, 0);
    CXTranslationUnit unit = clang_parseTranslationUnit(
        index, "file.c", NULL, 0, NULL, 0, CXTranslationUnit_None);
    CXCursor cursor = clang_getTranslationUnitCursor(unit);
    
    void visit(CXCursor cursor, CXCursor parent, CXClientData client_data) {
        // Do what you want here...
        // Continue visiting nodes
        clang_visitChildren(cursor, visit, NULL);
    }
    
    clang_visitChildren(cursor, visit, NULL);
    
    // Free up data
    clang_disposeTranslationUnit(unit);
    clang_disposeIndex(index);