c++clangabstract-syntax-treeclang-ast-matchers

How do I find setter and getter methods using Clang AST matchers?


I want to find setter and getter methods in a code base using a Clang AST matcher expression. For example, this code has one getter and one setter to report:

struct S {
  int m_x;

  int getX()                 // getter
  {
    return m_x;
  }

  void setX(int x)           // setter
  {
    m_x = x;
  }
};

For both kinds of method, I want to check that the body has a single statement. A getter body should be a return statement with the return value of a class member. A setter body should be an assignment statement that assigns the passed parameter to a class member. (This may not be a sufficiently tolerant set of criteria in practice, but I'd be satisfied with this as a first approximation.)

I found that I can use the cxxMethodDecl matcher to find all methods, but it's not clear how to dig into the bodies or to check the various properties. None of the examples in the documentation (linked above) do this; the closest seems to be the example:

cxxConstructorDecl(
  isCopyConstructor()
  ).bind("prepend_explicit")

but that appears to rely on the existence of the primitive matcher isCopyConstructor to classify the method body (or perhaps just its signature? the documentation does not say what any of them actually do), and there is no isSetter or similar.

How can I write a match expression to find all the setters and getters in a translation unit?


Solution

  • I interpret the question asking for a Clang AST Matcher that will report setters and getters. Such a matcher can be tested at the command line using clang-query.

    The following shell script contains a match expression that will find at least some cases of such functions (depending on exactly how they are written). The comments explain what each part does, so it should be feasible to adjust as needed:

    #!/bin/sh
    
    PATH=/d/opt/clang+llvm-18.1.8-msvc/bin:$PATH
    
    matcher='
     cxxMethodDecl(                   # Report C++ method declarations
      hasBody(                        # where the body
       compoundStmt(                  # is a compound statement
        statementCountIs(1),          # with one contained statement
        hasAnySubstatement(           # that
         anyOf(                       # is either:
          returnStmt(                 # (1) a return statement
           hasReturnValue(            # whose return value
            implicitCastExpr(         # is an implicit conversion
             hasSourceExpression(     # of
              memberExpr(             # a class member
               hasObjectExpression(   # of the object
                cxxThisExpr()         # `*this`, or
               )
              )
             )
            )
           )
          ),
          binaryOperator(             # (2) is a binary expression
           isAssignmentOperator(),    # using the assignment operator
           hasLHS(                    # where the left-hand side
            memberExpr(               # is a class member
             hasObjectExpression(     # of the object
              cxxThisExpr()           # `*this`, and
             )
            )
           ),
           hasRHS(                    # where the right-hand side
            implicitCastExpr(         # is an implicit conversion
             hasSourceExpression(     # of
              declRefExpr(            # a reference to a declaration
               hasDeclaration(        # of
                parmVarDecl()         # a parameter.
               )
              )
             )
            )
           )
          )
         )
        )
       )
      )
     )
    '
    
    clang-query \
      -c "m $matcher" \
      test.cc --
    
    # EOF
    

    To test this matcher, I used this test case:

    // test.cc
    // Testcases for a matcher to find accessor methods.
    
    struct S {
      int m_x;
    
      int getX()
      {
        return m_x;
      }
    
      void setX(int x)
      {
        m_x = x;
      }
    };
    
    // EOF
    

    which has this AST:

    $ clang -fsyntax-only -Xclang -ast-dump test.cc
    TranslationUnitDecl 0x23df46300a0 <<invalid sloc>> <invalid sloc>
    |-CXXRecordDecl 0x23df4630900 <<invalid sloc>> <invalid sloc> implicit struct _GUID
    | `-TypeVisibilityAttr 0x23df46309b0 <<invalid sloc>> Implicit Default
    |-TypedefDecl 0x23df4630a28 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
    | `-BuiltinType 0x23df4630670 '__int128'
    |-TypedefDecl 0x23df4630a98 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
    | `-BuiltinType 0x23df4630690 'unsigned __int128'
    |-TypedefDecl 0x23df4630e40 <<invalid sloc>> <invalid sloc> implicit __NSConstantString '__NSConstantString_tag'
    | `-RecordType 0x23df4630b80 '__NSConstantString_tag'
    |   `-CXXRecord 0x23df4630af0 '__NSConstantString_tag'
    |-CXXRecordDecl 0x23df4630e98 <<invalid sloc>> <invalid sloc> implicit class type_info
    | `-TypeVisibilityAttr 0x23df4630f50 <<invalid sloc>> Implicit Default
    |-TypedefDecl 0x23df4630fc8 <<invalid sloc>> <invalid sloc> implicit size_t 'unsigned long long'
    | `-BuiltinType 0x23df4630290 'unsigned long long'
    |-TypedefDecl 0x23df466b568 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
    | `-PointerType 0x23df4631020 'char *'
    |   `-BuiltinType 0x23df4630150 'char'
    |-TypedefDecl 0x23df466b5d8 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'char *'
    | `-PointerType 0x23df4631020 'char *'
    |   `-BuiltinType 0x23df4630150 'char'
    `-CXXRecordDecl 0x23df466b630 <test.cc:4:1, line:16:1> line:4:8 struct S definition
      |-DefinitionData pass_in_registers aggregate standard_layout trivially_copyable pod trivial literal
      | |-DefaultConstructor exists trivial needs_implicit
      | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
      | |-MoveConstructor exists simple trivial needs_implicit
      | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
      | |-MoveAssignment exists simple trivial needs_implicit
      | `-Destructor simple irrelevant trivial needs_implicit
      |-CXXRecordDecl 0x23df466b748 <col:1, col:8> col:8 implicit struct S
      |-FieldDecl 0x23df466b7f0 <line:5:3, col:7> col:7 referenced m_x 'int'
      |-CXXMethodDecl 0x23df466b8d8 <line:7:3, line:10:3> line:7:7 getX 'int ()' implicit-inline
      | `-CompoundStmt 0x23df466bba0 <line:8:3, line:10:3>
      |   `-ReturnStmt 0x23df466bb90 <line:9:5, col:12>
      |     `-ImplicitCastExpr 0x23df466bb78 <col:12> 'int' <LValueToRValue>
      |       `-MemberExpr 0x23df466bb48 <col:12> 'int' lvalue ->m_x 0x23df466b7f0
      |         `-CXXThisExpr 0x23df466bb38 <col:12> 'S *' implicit this
      `-CXXMethodDecl 0x23df466ba70 <line:12:3, line:15:3> line:12:8 setX 'void (int)' implicit-inline
        |-ParmVarDecl 0x23df466b998 <col:13, col:17> col:17 used x 'int'
        `-CompoundStmt 0x23df466bc98 <line:13:3, line:15:3>
          `-BinaryOperator 0x23df466bc78 <line:14:5, col:11> 'int' lvalue '='
            |-MemberExpr 0x23df466bc10 <col:5> 'int' lvalue ->m_x 0x23df466b7f0
            | `-CXXThisExpr 0x23df466bc00 <col:5> 'S *' implicit this
            `-ImplicitCastExpr 0x23df466bc60 <col:11> 'int' <LValueToRValue>
              `-DeclRefExpr 0x23df466bc40 <col:11> 'int' lvalue ParmVar 0x23df466b998 'x' 'int'
    

    The script produces this output:

    
    Match #1:
    
    $PWD\test.cc:7:3: note: "root"  
          binds here
        7 |   int getX()
          |   ^~~~~~~~~~
        8 |   {
          |   ~
        9 |     return m_x;
          |     ~~~~~~~~~~~
       10 |   }
          |   ~
    
    Match #2:
    
    $PWD\test.cc:12:3: note: "root" 
          binds here
       12 |   void setX(int x)
          |   ^~~~~~~~~~~~~~~~
       13 |   {
          |   ~
       14 |     m_x = x;
          |     ~~~~~~~~
       15 |   }
          |   ~
    2 matches.
    

    The procedure for creating the matcher was to basically follow the AST dump line by line, turning each of the elements I want to match into its corresponding matcher. In some cases that is straightforward (for example, the CXXMethodDecl AST node is matched by the cxxMethodDecl matcher), while for others I had to do some text searching in the matcher reference, along with trial-and-error, to find the right combination.