c++clangclang-tidyclang-ast-matchers

How do I match a for loop with vector declaring in it's body?


I hope to match all for-loop with vector declaring in it's body (might extend to while-loop and do-while-loop later):

#include <vector>

int main() {
  for (int i = 0; i < 10; i++) {
    std::vector<int> foo;
  }

  return 0;
}

The AST I got from this program is

`-FunctionDecl 0x55c1d33c8bc8 <main.cpp:3:1, line:9:1> line:3:5 main 'int ()'
  `-CompoundStmt 0x55c1d3402a48 <col:12, line:9:1>
    |-ForStmt 0x55c1d34029e0 <line:4:3, line:6:3>
    | |-DeclStmt 0x55c1d33c8d40 <line:4:8, col:17>
    | | `-VarDecl 0x55c1d33c8cb8 <col:8, col:16> col:12 used i 'int' cinit
    | |   `-IntegerLiteral 0x55c1d33c8d20 <col:16> 'int' 0
    | |-<<<NULL>>>
    | |-BinaryOperator 0x55c1d33c8db0 <col:19, col:23> 'bool' '<'
    | | |-ImplicitCastExpr 0x55c1d33c8d98 <col:19> 'int' <LValueToRValue>
    | | | `-DeclRefExpr 0x55c1d33c8d58 <col:19> 'int' lvalue Var 0x55c1d33c8cb8 'i' 'int'
    | | `-IntegerLiteral 0x55c1d33c8d78 <col:23> 'int' 10
    | |-UnaryOperator 0x55c1d33c8df0 <col:27, col:28> 'int' postfix '++'
    | | `-DeclRefExpr 0x55c1d33c8dd0 <col:27> 'int' lvalue Var 0x55c1d33c8cb8 'i' 'int'
    | `-CompoundStmt 0x55c1d34029c8 <col:32, line:6:3>
    |   `-DeclStmt 0x55c1d34029b0 <line:5:5, col:25>
    |     `-VarDecl 0x55c1d33cd388 <col:5, col:22> col:22 foo 'std::vector<int>' callinit destroyed
    |       `-CXXConstructExpr 0x55c1d3402988 <col:22> 'std::vector<int>' 'void () noexcept'
    `-ReturnStmt 0x55c1d3402a38 <line:8:3, col:10>
      `-IntegerLiteral 0x55c1d3402a18 <col:10> 'int' 0

Now how do I write a matcher (in clang-tidy customized check) to match such pattern?

I read the document for customizing clang-tidy checks and Matching the Clang AST, but they don't seem to provide enough information on how I should actually combine each APIs.


update: with @Scott McPeak's answer, I can match the for loop with clang-query in terminal but I have trouble transplanting this matcher into my clang-tidy check:

void LowPerfLoopCheck::registerMatchers(MatchFinder *Finder) {
  Finder->addMatcher(
      forStmt(hasDescendant(varDecl(hasType(hasDeclaration(cxxRecordDecl(
                                        matchesName("^::std::vector$")))))
                                .bind("vector-in-for"))),
      this);
}

When building clang-tidy, it says call of overloaded 'hasType()' is ambiguous:

llvm-project/clang-tools-extra/clang-tidy/misc/LowPerfLoopCheck.cpp: In member function ‘virtual void clang::tidy::misc::LowPerfLoopCheck::registerMatchers(clang::ast_matchers::MatchFinder*)’:
/home/wentao/Desktop/llvm-project/clang-tools-extra/clang-tidy/misc/LowPerfLoopCheck.cpp:19:44: error: call of overloaded ‘hasType(clang::ast_matchers::internal::PolymorphicMatcher<clang::ast_matchers::internal::HasDeclarationMatcher, void(clang::ast_matchers::internal::TypeList<clang::CallExpr, clang::CXXConstructExpr, clang::CXXNewExpr, clang::DeclRefExpr, clang::EnumType, clang::ElaboratedType, clang::InjectedClassNameType, clang::LabelStmt, clang::AddrLabelExpr, clang::MemberExpr, clang::QualType, clang::RecordType, clang::TagType, clang::TemplateSpecializationType, clang::TemplateTypeParmType, clang::TypedefType, clang::UnresolvedUsingType, clang::ObjCIvarRefExpr>), clang::ast_matchers::internal::Matcher<clang::Decl> >)’ is ambiguous
   19 |       forStmt(hasDescendant(varDecl(hasType(hasDeclaration(cxxRecordDecl(
      |                                     ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   20 |                                         matchesName("^::std::vector$")))))
      |                                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Why does the same code works for clang-query but not for clang-tidy checks?


Solution

  • You can find all for loops that contain, somewhere, a declaration of a std::vector variable using this AST matcher:

    #!/bin/sh
    
    PATH=$HOME/opt/clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04/bin:$PATH
    
    # In this query, the comments are ignored because clang-query (not the
    # shell) recognizes and discards them.
    query='m
      forStmt(                        # Report any "for" statement
        hasDescendant(                # where a descendant
          varDecl(                    # is a variable declaration
            hasType(                  # whose type
              cxxRecordDecl(          # is a struct/class/union (possibly templatized)
                matchesName(          # whose name matches the regex
                  "^::std::vector$"   # given here.
                )
              )
            )
          ).bind("vector_var_decl")   # Bind to the declaration AST node.
        )
      )
    '
    
    clang-query \
      -c="$query" \
      test.cc -- -w
    
    # EOF
    

    This matcher is expressed as a clang-query shell script for ease of experimentation, but the matcher expression can be copied directly into a C++ clang-tidy checker implementation like this:

    void HelloCheck::registerMatchers(MatchFinder *Finder) {
      Finder->addMatcher(
        forStmt(                        // Report any "for" statement
          hasDescendant(                // where a descendant
            varDecl(                    // is a variable declaration
              hasType(                  // whose type
                cxxRecordDecl(          // is a struct/class/union (possibly templatized)
                  matchesName(          // whose name matches the regex
                    "^::std::vector$"   // given here.
                  )
                )
              )
            ).bind("vector_var_decl")   // Bind to the declaration AST node.
          )
        ),
        this
      );
    }
    

    To create this matcher, I used the list of matchers at AST Matcher Reference. It takes some trial and error to find the right combination; I wish the documentation was better. One small tip I can give is you can use the anything() matcher as a placeholder that matches anything while developing the matcher incrementally.

    In this case, the first hard part (for me, despite some prior experience) was realizing I needed to use the cxxRecordDecl matcher before matchesName could be used. Originally, I tried hasDeclaration, but that matches declarations that don't necessarily have a name, and consequently matchesName will just silently fail (clang-query could be a lot more informative in cases like this). By using cxxRecordDecl, we're guaranteed to be working with a declaration that has a name to match, so it works.

    The second problem, when translating my original clang-query matcher into a C++ clang-tidy check, was an error about an ambiguous call to hasType. The original matcher had:

      varDecl(hasType(hasDeclaration(cxxRecordDecl(...))))
    

    Evidently, hasDeclaration in that spot causes problems in the C++ context (although I don't really understand why), and my solution was to just remove it:

      varDecl(hasType(cxxRecordDecl(...)))
    

    To expand to other kinds of loops, the whileStmt and doStmt matchers could be substituted for forStmt.

    Example input:

    #if 1
      #include <vector>
    #else
      // Simplified stand-in for easier AST inspection.
      namespace std {
        template <class T>
        class vector {};
      }
    #endif
    
    class SomethingElse {};
    
    int main() {
      for (int i = 0; i < 10; i++) {       // Report.
        std::vector<int> foo;              // Original example.
      }
    
      for (int i = 0; i < 10; i++) {       // Do not report.
        SomethingElse se;                  // Not a std::vector.
      }
    
      for (;;) {                           // Report.
        {
          std::vector<int> foo;            // Not an immediate child.
        }
      }
    
      return 0;
    }
    

    Corresponding output:

    $ ./cmd.sh
    
    Match #1:
    
    $PWD/test.cc:14:3: note: "root" binds here
      for (int i = 0; i < 10; i++) {       // Report.
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    $PWD/test.cc:15:5: note: "vector_var_decl" binds here
        std::vector<int> foo;              // Original example.
        ^~~~~~~~~~~~~~~~~~~~
    
    Match #2:
    
    $PWD/test.cc:22:3: note: "root" binds here
      for (;;) {                           // Report.
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    $PWD/test.cc:24:7: note: "vector_var_decl" binds here
          std::vector<int> foo;            // Not an immediate child.
          ^~~~~~~~~~~~~~~~~~~~
    2 matches.