c++abstract-syntax-treeclang-ast-matchers

Rewrite a c++ variable declaration code through AST


I'm currently trying to rewrite a c++ code from

Query q(
      "SELECT %LC FROM %T WHERE %W",
      getColumnArray(),
      "table_name",
      buildConstraints(id, "abc");

to

auto q = QueryBuilder.from("table_name").select(getColumnArray()).where(id,"abc").build();

I'm able to grab the target code block using

varDecl(isExpansionInMainFile(),hasType(cxxRecordDecl(hasName("Query"))))

However, I think there are additional improvements/requirements I may need help to understand:

  1. Would it be possible to match a Query where the argument contains "SELECT"?
  2. Once I have the target code block, how do I extract the arguments of the variable constructor (e.g., getColumnArray(), "table_name", etc.) to be able to reuse them in the new code?
  3. Would it be possible to get the wrapper function name of the matcher-matched code block?

Solution

  • I take it you are running the query in the question using clang-query. But you also state that your goal is to rewrite C++ code, and clang-query cannot do that; it merely finds instances that match a given expression and prints them to the console. To rewrite C++ code, you need to use one of the Clang APIs such as Libtooling for C++ or Python libclang.

    Nevertheless, I'll answer the questions for a pure clang-query context. See the AST Matcher Reference for more information (albeit terse) on the individual matchers.

    Q1: Filtering on string literals

    clang-query cannot filter on the contents of string literals, as asked and answered previously.

    Q2a: Getting the constructor arguments

    Having matched a varDecl, use hasInitializer(cxxConstructExpr(hasArgument(...))) to get the arguments passed to the constructor.

    This assumes there are a fixed number of arguments.

    Q2b: Getting arguments for a variable-argument callee

    In comments it was clarified that the Query class constructor takes a variable number of arguments, presumably something like:

    class Query {
    public:
      Query(char const *fmt, ...);
      ...
    };
    

    If all of the parameters had names and types, then we could use forEachArgumentWithParam, but they don't. So, instead, we have to use a sequence of optionally(hasArgument(N, ...)) for N up to some large but fixed limit.

    Q3: Getting the containing function

    Having matched a varDecl, use hasAncestor(functionDecl()) to get the definition of the containing function.

    Example query for Q2 and Q3

    Here is a shell script that runs clang-query and demonstrates the answers to Q2 and Q3:

    #!/bin/sh
    
    PATH=$HOME/opt/clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04/bin:$PATH
    
    query='m
    
      varDecl(
        isExpansionInMainFile(),
        hasType(cxxRecordDecl(hasName("S"))),
        hasInitializer(
          cxxConstructExpr(
            optionally(
              hasArgument(0,
                expr().bind("arg0")
              )
            ),
            optionally(
              hasArgument(1,
                expr().bind("arg1")
              )
            ),
            optionally(
              hasArgument(2,
                expr().bind("arg2")
              )
            ),
            optionally(
              hasArgument(3,
                expr().bind("arg3")
              )
            ),
            optionally(
              hasArgument(4,
                expr().bind("arg4")
              )
            ),
            optionally(
              hasArgument(5,
                expr().bind("arg5")
              )
            ),
            optionally(
              hasArgument(6,
                expr().bind("arg6")
              )
            )
          )
        ),
        hasAncestor(
          functionDecl().bind("containingFunction")
        )
      ).bind("varDecl")
    
    '
    
    if [ "x$1" = "x" ]; then
      echo "usage: $0 filename.cc -- <compile options like -I, etc.>"
      exit 2
    fi
    
    # Run the query.  Setting 'bind-root' to false means clang-query will
    # not also print a redundant "root" binding.
    clang-query \
      -c="set bind-root false" \
      -c="$query" \
      "$@"
    
    # EOF
    

    When run on the input:

    struct S {
      S(...);
    };
    
    void f()
    {
      S s1(0,1);
      S s2(0,1,2,3,4,5,6,7,8,9);
    }
    

    it produces the output:

    $ ./cmd.sh test.cc --
    
    Match #1:
    
    $PWD/test.cc:7:8: note: "arg0" binds here
      S s1(0,1);
           ^
    $PWD/test.cc:7:10: note: "arg1" binds here
      S s1(0,1);
             ^
    $PWD/test.cc:5:1: note: "containingFunction" binds here
    void f()
    ^~~~~~~~
    $PWD/test.cc:7:3: note: "varDecl" binds here
      S s1(0,1);
      ^~~~~~~~~
    
    Match #2:
    
    $PWD/test.cc:8:8: note: "arg0" binds here
      S s2(0,1,2,3,4,5,6,7,8,9);
           ^
    $PWD/test.cc:8:10: note: "arg1" binds here
      S s2(0,1,2,3,4,5,6,7,8,9);
             ^
    $PWD/test.cc:8:12: note: "arg2" binds here
      S s2(0,1,2,3,4,5,6,7,8,9);
               ^
    $PWD/test.cc:8:14: note: "arg3" binds here
      S s2(0,1,2,3,4,5,6,7,8,9);
                 ^
    $PWD/test.cc:8:16: note: "arg4" binds here
      S s2(0,1,2,3,4,5,6,7,8,9);
                   ^
    $PWD/test.cc:8:18: note: "arg5" binds here
      S s2(0,1,2,3,4,5,6,7,8,9);
                     ^
    $PWD/test.cc:8:20: note: "arg6" binds here
      S s2(0,1,2,3,4,5,6,7,8,9);
                       ^
    $PWD/test.cc:5:1: note: "containingFunction" binds here
    void f()
    ^~~~~~~~
    $PWD/test.cc:8:3: note: "varDecl" binds here
      S s2(0,1,2,3,4,5,6,7,8,9);
      ^~~~~~~~~~~~~~~~~~~~~~~~~
    2 matches.
    

    In the example, it stops at six arguments because that is as far as the query goes, but more optionally(hasArgument(N, ...)) can be added as needed.