I hope to match all for-loop with vector declaring in it's body (might extend to while-loop and do-while-loop later):
#include <vector>
int main() {
for (int i = 0; i < 10; i++) {
std::vector<int> foo;
}
return 0;
}
The AST I got from this program is
`-FunctionDecl 0x55c1d33c8bc8 <main.cpp:3:1, line:9:1> line:3:5 main 'int ()'
`-CompoundStmt 0x55c1d3402a48 <col:12, line:9:1>
|-ForStmt 0x55c1d34029e0 <line:4:3, line:6:3>
| |-DeclStmt 0x55c1d33c8d40 <line:4:8, col:17>
| | `-VarDecl 0x55c1d33c8cb8 <col:8, col:16> col:12 used i 'int' cinit
| | `-IntegerLiteral 0x55c1d33c8d20 <col:16> 'int' 0
| |-<<<NULL>>>
| |-BinaryOperator 0x55c1d33c8db0 <col:19, col:23> 'bool' '<'
| | |-ImplicitCastExpr 0x55c1d33c8d98 <col:19> 'int' <LValueToRValue>
| | | `-DeclRefExpr 0x55c1d33c8d58 <col:19> 'int' lvalue Var 0x55c1d33c8cb8 'i' 'int'
| | `-IntegerLiteral 0x55c1d33c8d78 <col:23> 'int' 10
| |-UnaryOperator 0x55c1d33c8df0 <col:27, col:28> 'int' postfix '++'
| | `-DeclRefExpr 0x55c1d33c8dd0 <col:27> 'int' lvalue Var 0x55c1d33c8cb8 'i' 'int'
| `-CompoundStmt 0x55c1d34029c8 <col:32, line:6:3>
| `-DeclStmt 0x55c1d34029b0 <line:5:5, col:25>
| `-VarDecl 0x55c1d33cd388 <col:5, col:22> col:22 foo 'std::vector<int>' callinit destroyed
| `-CXXConstructExpr 0x55c1d3402988 <col:22> 'std::vector<int>' 'void () noexcept'
`-ReturnStmt 0x55c1d3402a38 <line:8:3, col:10>
`-IntegerLiteral 0x55c1d3402a18 <col:10> 'int' 0
Now how do I write a matcher (in clang-tidy customized check) to match such pattern?
I read the document for customizing clang-tidy checks and Matching the Clang AST, but they don't seem to provide enough information on how I should actually combine each APIs.
update: with @Scott McPeak's answer, I can match the for loop with clang-query in terminal but I have trouble transplanting this matcher into my clang-tidy check:
void LowPerfLoopCheck::registerMatchers(MatchFinder *Finder) {
Finder->addMatcher(
forStmt(hasDescendant(varDecl(hasType(hasDeclaration(cxxRecordDecl(
matchesName("^::std::vector$")))))
.bind("vector-in-for"))),
this);
}
When building clang-tidy, it says call of overloaded 'hasType()' is ambiguous
:
llvm-project/clang-tools-extra/clang-tidy/misc/LowPerfLoopCheck.cpp: In member function ‘virtual void clang::tidy::misc::LowPerfLoopCheck::registerMatchers(clang::ast_matchers::MatchFinder*)’:
/home/wentao/Desktop/llvm-project/clang-tools-extra/clang-tidy/misc/LowPerfLoopCheck.cpp:19:44: error: call of overloaded ‘hasType(clang::ast_matchers::internal::PolymorphicMatcher<clang::ast_matchers::internal::HasDeclarationMatcher, void(clang::ast_matchers::internal::TypeList<clang::CallExpr, clang::CXXConstructExpr, clang::CXXNewExpr, clang::DeclRefExpr, clang::EnumType, clang::ElaboratedType, clang::InjectedClassNameType, clang::LabelStmt, clang::AddrLabelExpr, clang::MemberExpr, clang::QualType, clang::RecordType, clang::TagType, clang::TemplateSpecializationType, clang::TemplateTypeParmType, clang::TypedefType, clang::UnresolvedUsingType, clang::ObjCIvarRefExpr>), clang::ast_matchers::internal::Matcher<clang::Decl> >)’ is ambiguous
19 | forStmt(hasDescendant(varDecl(hasType(hasDeclaration(cxxRecordDecl(
| ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
20 | matchesName("^::std::vector$")))))
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Why does the same code works for clang-query but not for clang-tidy checks?
You can find all for
loops that contain, somewhere, a declaration of
a std::vector
variable using this AST matcher:
#!/bin/sh
PATH=$HOME/opt/clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04/bin:$PATH
# In this query, the comments are ignored because clang-query (not the
# shell) recognizes and discards them.
query='m
forStmt( # Report any "for" statement
hasDescendant( # where a descendant
varDecl( # is a variable declaration
hasType( # whose type
cxxRecordDecl( # is a struct/class/union (possibly templatized)
matchesName( # whose name matches the regex
"^::std::vector$" # given here.
)
)
)
).bind("vector_var_decl") # Bind to the declaration AST node.
)
)
'
clang-query \
-c="$query" \
test.cc -- -w
# EOF
This matcher is expressed as a
clang-query
shell script for ease of experimentation, but the matcher expression can
be copied directly into a C++ clang-tidy
checker implementation like
this:
void HelloCheck::registerMatchers(MatchFinder *Finder) {
Finder->addMatcher(
forStmt( // Report any "for" statement
hasDescendant( // where a descendant
varDecl( // is a variable declaration
hasType( // whose type
cxxRecordDecl( // is a struct/class/union (possibly templatized)
matchesName( // whose name matches the regex
"^::std::vector$" // given here.
)
)
)
).bind("vector_var_decl") // Bind to the declaration AST node.
)
),
this
);
}
To create this matcher, I used the list of matchers at
AST Matcher Reference.
It takes some trial and error to find the right combination; I wish the
documentation was better. One small tip I can give is you can use the
anything()
matcher as a placeholder that matches anything while
developing the matcher incrementally.
In this case, the first hard part (for me, despite some prior
experience) was realizing I needed to use the cxxRecordDecl
matcher
before matchesName
could be used. Originally, I tried
hasDeclaration
, but that matches declarations that don't necessarily
have a name, and consequently matchesName
will just silently fail
(clang-query
could be a lot more informative in cases like this). By
using cxxRecordDecl
, we're guaranteed to be working with a declaration
that has a name to match, so it works.
The second problem, when translating my original clang-query
matcher
into a C++ clang-tidy
check, was an error about an ambiguous call to
hasType
. The original matcher had:
varDecl(hasType(hasDeclaration(cxxRecordDecl(...))))
Evidently, hasDeclaration
in that spot causes problems in the C++
context (although I don't really understand why), and my solution was to
just remove it:
varDecl(hasType(cxxRecordDecl(...)))
To expand to other kinds of loops, the whileStmt
and doStmt
matchers
could be substituted for forStmt
.
Example input:
#if 1
#include <vector>
#else
// Simplified stand-in for easier AST inspection.
namespace std {
template <class T>
class vector {};
}
#endif
class SomethingElse {};
int main() {
for (int i = 0; i < 10; i++) { // Report.
std::vector<int> foo; // Original example.
}
for (int i = 0; i < 10; i++) { // Do not report.
SomethingElse se; // Not a std::vector.
}
for (;;) { // Report.
{
std::vector<int> foo; // Not an immediate child.
}
}
return 0;
}
Corresponding output:
$ ./cmd.sh
Match #1:
$PWD/test.cc:14:3: note: "root" binds here
for (int i = 0; i < 10; i++) { // Report.
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$PWD/test.cc:15:5: note: "vector_var_decl" binds here
std::vector<int> foo; // Original example.
^~~~~~~~~~~~~~~~~~~~
Match #2:
$PWD/test.cc:22:3: note: "root" binds here
for (;;) { // Report.
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$PWD/test.cc:24:7: note: "vector_var_decl" binds here
std::vector<int> foo; // Not an immediate child.
^~~~~~~~~~~~~~~~~~~~
2 matches.