clanglibtoolingclang-ast-matchers

Clang AST: VarDecl (global variables) and DeclStmt


I am currently working with the Clang AST and would like to implement a tool based on LibTooling. This should detect VarDecls where several variables are declared in one line. So as follows:

int a = 1, b = 1;

In a further step, these VarDecls should then be split in order to have the individual variables in separate declarations. The desired result would therefore be:

int a = 1;
int b = 1;

I have seen that clang-tidy provides such a functionality. However, this seems to be limited to local variables. A look at the source code of clang-tidy shows that an ASTMatcher is used for detection. This matcher checks whether a single declaration represents a given DeclStmt or several declarations. Like this:

declStmt(hasSingleDecl(anything()))

If you take a closer look at the Clang AST for different constructs, you will notice that only local variables have a DeclStmt as a parent node. Global variables, however, do not. The following example:

int a = 1, b = 1;

void func(){
    int l_a = 1, l_b = 1;
}

Results in an AST like that:

|-VarDecl 0x7ffff371f480 <test.c:3:1, col:9> col:5 a 'int' cinit
| `-IntegerLiteral 0x7ffff371f530 <col:9> 'int' 1
|-VarDecl 0x7ffff371f568 <col:1, col:16> col:12 b 'int' cinit
| `-IntegerLiteral 0x7ffff371f5d0 <col:16> 'int' 1
`-FunctionDecl 0x7ffff371f658 <line:5:1, line:7:1> line:5:6 func 'void ()'
  `-CompoundStmt 0x7ffff371f870 <col:12, line:7:1>
    `-DeclStmt 0x7ffff371f858 <line:6:5, col:25>
      |-VarDecl 0x7ffff371f718 <col:5, col:15> col:9 l_a 'int' cinit
      | `-IntegerLiteral 0x7ffff371f780 <col:15> 'int' 1
      `-VarDecl 0x7ffff371f7b8 <col:5, col:24> col:18 l_b 'int' cinit
        `-IntegerLiteral 0x7ffff371f820 <col:24> 'int' 1

Now to my questions: Why is there a difference between VarDecls and DeclStmts for local and global variables? And how could one alternatively detect global variables where several VarDecls are given by one statement?

I would be very grateful for any help.


Solution

  • As far as I can tell, the Clang AST does not have any direct indication of when two global variables are declared as part of the same declaration. However, it is possible to look at the location of the type specifier in a DeclaratorDecl by calling clang::DeclaratorDecl::getTypeSpecStartLoc. If two DeclaratorDecl nodes have the same type specifier start location, but the declarators have distinct locations (this condition is needed to not report template instantiations), then we can conclude they share a declaration.

    There's no way to search for nodes satisfying that condition using an AST matcher expression. Instead, one has to examine all of the declarations, build a map, and then walk the map afterward to detect declarators sharing type specifier locations.

    Here is a complete stand-alone program demonstrating the method:

    // find-multiple-decls.cc
    // Find declarations that declare multiple variables.
    
    #include "clang/AST/ASTContext.h"                          // clang::ASTContext
    #include "clang/AST/Decl.h"                                // clang::DeclaratorDecl
    #include "clang/AST/DeclBase.h"                            // clang::Decl
    #include "clang/AST/RecursiveASTVisitor.h"                 // clang::RecursiveASTVisitor
    #include "clang/AST/Type.h"                                // clang::QualType
    #include "clang/Basic/Diagnostic.h"                        // clang::DiagnosticsEngine
    #include "clang/Basic/DiagnosticOptions.h"                 // clang::DiagnosticOptions
    #include "clang/Basic/SourceLocation.h"                    // clang::{FileID, SourceLocation}
    #include "clang/Basic/SourceManager.h"                     // clang::SourceManager
    #include "clang/Frontend/ASTUnit.h"                        // clang::ASTUnit
    #include "clang/Frontend/CompilerInstance.h"               // clang::CompilerInstance
    #include "clang/Serialization/PCHContainerOperations.h"    // clang::PCHContainerOperations
    
    #include "llvm/Support/MemoryBufferRef.h"                  // llvm::MemoryBufferRef
    
    #include <iostream>                                        // std::cout
    #include <map>                                             // std::multimap
    #include <string>                                          // std::string
    
    #include <assert.h>                                        // assert
    
    using std::cout;
    using std::string;
    
    
    class Visitor : public clang::RecursiveASTVisitor<Visitor> {
    public:      // data
      clang::ASTUnit *m_astUnit;
      clang::ASTContext &m_astContext;
    
      // True to print out verbose diagnostics.  There is no switch to
      // turn it on; you have to change the initializer and recompile.
      bool m_verbose;
    
      // Map from the location of a type specifier to a map from the
      // location of a variable declared with that type specifier to the
      // DeclaratorDecl of that declaration.  If there are multiple
      // variables with the same declaration location (which can happen due
      // to macros or templates), only the first encountered in the AST
      // visit is retained.
      //
      // Note: Throughout, I call these things "variables", even though
      // structure fields are also included in the analysis, and the Clang
      // AST terminology does not consider fields to be variables.
      std::map<clang::SourceLocation,
               std::map<clang::SourceLocation, clang::DeclaratorDecl*> >
        m_tsLocToDeclLocToVarDecl;
    
    public:      // methods
      Visitor(clang::ASTUnit *astUnit)
        : clang::RecursiveASTVisitor<Visitor>(),
          m_astUnit(astUnit),
          m_astContext(astUnit->getASTContext()),
          m_verbose(false),
          m_tsLocToDeclLocToVarDecl()
      {}
    
      // Convenience methods to stringify some things.
      string locStr(clang::SourceLocation loc);
      string declLocStr(clang::Decl const *decl);
      string typeStr(clang::QualType qualType);
    
      // Visitor methods (called by RecursiveASTVisitor).
      bool VisitDeclaratorDecl(clang::DeclaratorDecl *varDecl);
    
      // Kick off the traversal.
      void traverseTU();
    
      // After 'traverseTU', examine 'm_tsLocToDeclLocToVarDecl' to find
      // cases where a single declaration declares multiple variables.
      void reportMultipleDecls();
    
      // Invoke 'traverseTU' followed by 'reportMultipleDecl'.
      void runAnalysis();
    };
    
    string Visitor::locStr(clang::SourceLocation loc)
    {
      return loc.printToString(m_astContext.getSourceManager());
    }
    
    string Visitor::declLocStr(clang::Decl const *decl)
    {
      return locStr(decl->getLocation());
    }
    
    string Visitor::typeStr(clang::QualType qualType)
    {
      return qualType.getAsString();
    }
    
    bool Visitor::VisitDeclaratorDecl(clang::DeclaratorDecl *varDecl)
    {
      clang::SourceLocation tsLoc = varDecl->getTypeSpecStartLoc();
      clang::SourceLocation declLoc = varDecl->getLocation();
    
      if (m_verbose) {
        cout << varDecl->Decl::getDeclKindName()
             << " name=\"" << varDecl->getQualifiedNameAsString()
             << "\", type=\"" << typeStr(varDecl->getType())
             << "\", declLoc=" << locStr(declLoc)
             << ", tsLoc=" << locStr(tsLoc)
             << "\n";
      }
    
      // This creates the secondary map, initially empty, if it does not
      // already exist.
      auto &declLocToVarDecl = m_tsLocToDeclLocToVarDecl[tsLoc];
    
      // This discards a second-or-later 'varDecl' with the same 'declLoc'.
      declLocToVarDecl.insert(std::make_pair(declLoc, varDecl));
    
      return true;
    }
    
    
    void Visitor::traverseTU()
    {
      this->TraverseDecl(m_astContext.getTranslationUnitDecl());
    }
    
    
    void Visitor::reportMultipleDecls()
    {
      cout << "reportMultipleDecls:\n";
      int multipleDeclCount = 0;
    
      // Initialized to the "invalid" location.
      clang::SourceLocation prevTSLoc;
      assert(prevTSLoc.isInvalid());
    
      // Consider every type specifier location...
      for (auto const &kv1 : m_tsLocToDeclLocToVarDecl) {
        clang::SourceLocation tsLoc = kv1.first;
        auto const &declLocToVarDecl = kv1.second;
    
        // Were there multiple distinct declaration locations sharing that
        // type specifier?
        size_t numDecls = declLocToVarDecl.size();
        if (numDecls > 1) {
          cout << "  " << numDecls
               << " decls with tsLoc=" << locStr(tsLoc) << ":\n";
    
          // Print all of the variables declared with 'tsLoc' (modulo
          // duplicates that were discarded during map construction).
          for (auto const &kv2 : declLocToVarDecl) {
            clang::SourceLocation declLoc = kv2.first;
            clang::DeclaratorDecl *varDecl = kv2.second;
    
            cout << "    name=\"" << varDecl->getQualifiedNameAsString()
                 << "\", declLoc=" << locStr(declLoc)
                 << "\n";
          }
    
          ++multipleDeclCount;
        }
    
        prevTSLoc = tsLoc;
      }
    
      cout << "  multipleDeclCount: " << multipleDeclCount << "\n";
    }
    
    
    void Visitor::runAnalysis()
    {
      traverseTU();
      reportMultipleDecls();
    }
    
    
    // This is all boilerplate for a program using the Clang C++ API
    // ("libtooling") but not using the "tooling" part specifically.
    int main(int argc, char const **argv)
    {
      // Copy the arguments into a vector of char pointers since that is
      // what 'createInvocationFromCommandLine' wants.
      std::vector<char const *> commandLine;
      {
        // Path to the 'clang' binary that I am behaving like.  This path is
        // used to compute the location of compiler headers like stddef.h.
        commandLine.push_back(CLANG_LLVM_INSTALL_DIR "/bin/clang");
    
        for (int i = 1; i < argc; ++i) {
          commandLine.push_back(argv[i]);
        }
      }
    
      // Parse the command line options.
      std::shared_ptr<clang::CompilerInvocation> compilerInvocation(
        clang::createInvocation(llvm::ArrayRef(commandLine)));
      if (!compilerInvocation) {
        // Command line parsing errors have already been printed.
        return 2;
      }
    
      // Boilerplate setup for 'LoadFromCompilerInvocationAction'.
      std::shared_ptr<clang::PCHContainerOperations> pchContainerOps(
        new clang::PCHContainerOperations());
      clang::IntrusiveRefCntPtr<clang::DiagnosticsEngine> diagnosticsEngine(
        clang::CompilerInstance::createDiagnostics(
          new clang::DiagnosticOptions));
    
      // Run the Clang parser to produce an AST.
      std::unique_ptr<clang::ASTUnit> ast(
        clang::ASTUnit::LoadFromCompilerInvocationAction(
          compilerInvocation,
          pchContainerOps,
          diagnosticsEngine));
    
      if (ast == nullptr ||
          diagnosticsEngine->getNumErrors() > 0) {
        // Error messages have already been printed.
        return 2;
      }
    
      Visitor visitor(ast.get());
      visitor.runAnalysis();
    
      return 0;
    }
    
    
    // EOF
    

    Makefile:

    # Makefile
    
    # Default target.
    all:
    .PHONY: all
    
    
    # ---- Configuration ----
    # Installation directory from a binary distribution.
    # Has five subdirectories: bin include lib libexec share.
    CLANG_LLVM_INSTALL_DIR = $(HOME)/opt/clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04
    
    # ---- llvm-config query results ----
    # Program to query the various LLVM configuration options.
    LLVM_CONFIG := $(CLANG_LLVM_INSTALL_DIR)/bin/llvm-config
    
    # C++ compiler options to ensure ABI compatibility.
    LLVM_CXXFLAGS := $(shell $(LLVM_CONFIG) --cxxflags)
    
    # Directory containing the clang library files, both static and dynamic.
    LLVM_LIBDIR := $(shell $(LLVM_CONFIG) --libdir)
    
    # Other flags needed for linking, whether statically or dynamically.
    LLVM_LDFLAGS_AND_SYSTEM_LIBS := $(shell $(LLVM_CONFIG) --ldflags --system-libs)
    
    
    # ---- Compiler options ----
    # C++ compiler.
    CXX := $(CLANG_LLVM_INSTALL_DIR)/bin/clang++
    
    # Compiler options, including preprocessor options.
    CXXFLAGS =
    CXXFLAGS += -g
    CXXFLAGS += -Wall
    CXXFLAGS += -Werror
    
    # Get llvm compilation flags.
    CXXFLAGS += $(LLVM_CXXFLAGS)
    
    # Tell the source code where the clang installation directory is.
    CXXFLAGS += -DCLANG_LLVM_INSTALL_DIR='"$(CLANG_LLVM_INSTALL_DIR)"'
    
    # Linker options.
    LDFLAGS =
    
    LDFLAGS += -g -Wall
    
    # Pull in clang+llvm via libclang-cpp.so, which has everything, but is
    # only available as a dynamic library.
    LDFLAGS += -lclang-cpp
    
    # Arrange for the compiled binary to search the libdir for that library.
    # Otherwise, one can set the LD_LIBRARY_PATH envvar before running it.
    # Note: the -rpath switch does not work on Windows.
    LDFLAGS += -Wl,-rpath=$(LLVM_LIBDIR)
    
    # Get the needed -L search path, plus things like -ldl.
    LDFLAGS += $(LLVM_LDFLAGS_AND_SYSTEM_LIBS)
    
    
    # ---- Recipes ----
    # Compile a C++ source file.
    %.o: %.cc
        $(CXX) -c -o $@ $(CXXFLAGS) $<
    
    # Executable.
    all: find-multiple-decl.exe
    find-multiple-decl.exe: find-multiple-decl.o
        $(CXX) -o $@ $^ $(LDFLAGS)
    
    # Test.
    .PHONY: check
    check: find-multiple-decl.exe
        ./find-multiple-decl.exe test.cc
    
    .PHONY: clean
    clean:
        $(RM) *.o *.exe
    
    
    # EOF
    

    When run on the input:

    // test.cc
    // Testcases for find-multiple-decl.cc.
    
    // Try with globals.
    int g_solo;
    int g_duo1, g_duo2;                    // Reported.
    
    // Try with locals.
    void f()
    {
      int l_solo;
      int l_duo1, l_duo2;                  // Reported.
    }
    
    template <class T>
    struct S {
      static T s_solo;
      static T s_duo1, s_duo2;             // Reported.
    
      // These are represented as FieldDecl in the Clang AST.
      T m_solo;
      T m_duo1, m_duo2;                    // Reported.
    };
    
    // One syntactic declaration turns into multiple AST nodes due to
    // template instantiation, and we shouldn't report that.
    template <class T>
    T S<T>::s_solo;
    
    // Not allowed in C++.
    #if 0
    template <class T>
    T S<T>::s_duo1, S<T>::s_duo2;
    #endif
    
    // Instantiate the template twice.
    void instantiate()
    {
      S<int> s1;
      S<float> s2;
    }
    
    // EOF
    

    it produces the output:

    $ ./find-multiple-decl.exe test.cc
    reportMultipleDecls:
      2 decls with tsLoc=test.cc:6:1:
        name="g_duo1", declLoc=test.cc:6:5
        name="g_duo2", declLoc=test.cc:6:13
      2 decls with tsLoc=test.cc:12:3:
        name="l_duo1", declLoc=test.cc:12:7
        name="l_duo2", declLoc=test.cc:12:15
      2 decls with tsLoc=test.cc:18:10:
        name="S::s_duo1", declLoc=test.cc:18:12
        name="S::s_duo2", declLoc=test.cc:18:20
      2 decls with tsLoc=test.cc:22:3:
        name="S::m_duo1", declLoc=test.cc:22:5
        name="S::m_duo2", declLoc=test.cc:22:13
      multipleDeclCount: 4
    

    It should be straightforward to incorporate this logic into a clang-tidy check.