c++clangabstract-syntax-treeclang-query

How to get the entire expression on the right hand side of clang abstract syntax tree?


Let us take a toy example, say I have the following code in test.cpp file:

int main()
{
    int gt = 3; 
    int g = 10 / gt;
}

I want to find the variable name of the denominator in the division operation, then using clang I get the abstract syntax tree (AST) of the above code using the command clang -Xclang -ast-dump -fsyntax-only test.cpp . I get the following output

TranslationUnitDecl 0x34f8100 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x34f8638 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x34f8350 '__int128'
|-TypedefDecl 0x34f8698 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x34f8370 'unsigned __int128'
|-TypedefDecl 0x34f8728 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x34f86f0 'char *'
|   `-BuiltinType 0x34f8190 'char'
|-TypedefDecl 0x34f8a48 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'struct __va_list_tag [1]'
| `-ConstantArrayType 0x34f89f0 'struct __va_list_tag [1]' 1 
|   `-RecordType 0x34f8810 'struct __va_list_tag'
|     `-CXXRecord 0x34f8778 '__va_list_tag'
`-FunctionDecl 0x34f8af0 <test.cpp:1:1, line:5:1> line:1:5 main 'int (void)'
  `-CompoundStmt 0x34f8dc0 <line:2:1, line:5:1>
    |-DeclStmt 0x34f8c98 <line:3:2, col:12>
    | `-VarDecl 0x34f8c18 <col:2, col:11> col:6 used gt 'int' cinit
    |   `-IntegerLiteral 0x34f8c78 <col:11> 'int' 3
    `-DeclStmt 0x34f8da8 <line:4:2, col:17>
      `-VarDecl 0x34f8cc0 <col:2, col:15> col:6 g 'int' cinit
        `-BinaryOperator 0x34f8d80 <col:10, col:15> 'int' '/'
          |-IntegerLiteral 0x34f8d20 <col:10> 'int' 10
          `-ImplicitCastExpr 0x34f8d68 <col:15> 'int' <LValueToRValue>
            `-DeclRefExpr 0x34f8d40 <col:15> 'int' lvalue Var 0x34f8c18 'gt' 'int'

From the knowledge of above AST and using clang-query, I get the variable name of the denominator using the following command

clang-query> match declRefExpr(isExpansionInMainFile(), allOf(hasAncestor(binaryOperator(hasOperatorName("/"))), hasAncestor(declStmt())  ))

I get my output as

Match #1:

/home/clang-llvm/cpp/code/test.cpp:4:15: note: "root" binds here
        int g = 10 / gt;
                     ^~
1 match.

Now that we are on the same page, I have two questions.

  1. Here in the above toy example, instead of 10 if I have another variable then my query matches both the variables (numerator and denominator). How can I restrict my clang-query to match only the variable that is the denominator of the division operation? In other words, how to find the variable that is present in the right hand side of the binary operator "/" ? An example is int g = gw / gt;

  2. Instead of a variable gt, if I have an expression in the denominator then how can I get the whole expression using clang? In other words, how to get expression that is on the right hand side of the binary operator "/" in the abstract syntax tree? A simple example could be int g = gw / (gt - gw); and a complex example could be int g = gw / gt - gw / gr * gg / sqrt( gt - gw ^ 2) + gq;

I appreciate any help in this regard.


Solution

  • Clang has a traversal matcher "hasRHS()", which does exactly what you want.

    int main()
    {
        int gt = 3;
        int g = 10 / gt;
    
        int gw, gg, gr, gq;
        int g1 = gw / gt;
        int g2 = gw / (gt-gw);
        int g3 = gw / gt - gw / gr * gg / ( gt - gw ^ 2) + gq;
        return 0;
    }
    

    Output:

    clang-query> match varDecl(hasDescendant(binaryOperator(hasOperatorName("/"), hasRHS(expr().bind("myExpr")))))
    
    Match #1:
    /home/test.cpp:4:18: note: "myExpr" binds here
        int g = 10 / gt;
                     ^~
    /home/test.cpp:4:5: note: "root" binds here
        int g = 10 / gt;
        ^~~~~~~~~~~~~~~
    
    Match #2:
    /home/test.cpp:7:19: note: "myExpr" binds here
        int g1 = gw / gt;
                      ^~
    /home/test.cpp:7:5: note: "root" binds here
        int g1 = gw / gt;
        ^~~~~~~~~~~~~~~~
    
    Match #3:
    /home/test.cpp:8:19: note: "myExpr" binds here
        int g2 = gw / (gt-gw);
                      ^~~~~~~
    /home/test.cpp:8:5: note: "root" binds here
        int g2 = gw / (gt-gw);
        ^~~~~~~~~~~~~~~~~~~~~
    
    Match #4:
    /home/test.cpp:9:19: note: "myExpr" binds here
        int g3 = gw / gt - gw / gr * gg / ( gt - gw ^ 2) + gq;
                      ^~
    /home/test.cpp:9:5: note: "root" binds here
        int g3 = gw / gt - gw / gr * gg / ( gt - gw ^ 2) + gq;
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    4 matches.
    

    "myExpr" is bind to what you want. Ref: http://clang.llvm.org/docs/LibASTMatchersReference.html