cclangabstract-syntax-treelibtoolingclang-ast-matchers

How to Exclude Tagless Structs Using ASTMatcher?


I'm currently working with Clang's ASTMatcher to extract struct declarations and exclude tagless structs. Here's an example of the kind of struct I want to avoid matching:

#include <stdlib.h>

struct {
    int b;
} a;

int func() {
    a.b = 1;
    return 0;
}

For my matcher, I used:

auto structDeclMatcher = recordDecl(
    isExpansionInMainFile(),
    isDefinition()
).bind("structDecl");

I also tried using isAnonymousStructOrUnion() and unless(hasName("")) to exclude those structs, but neither approach worked. Structs without a tag still seem to get matched.

How can I modify my ASTMatcher to effectively exclude those structs and only match tagged structs? Are there any specific matchers or conditions I should use to differentiate between named and structs without a tag?


Solution

  • hasName

    After some experimenting and code tracing, I found that, for a struct declared without a tag, the effective name it uses for the hasName Clang AST matcher is:

    (anonymous)
    

    (This is a bit ironic considering this question's terminology history. So it goes.) See getNodeName() in ASTMatchersInternal.cpp.

    Consequently, to exclude tagless structure declarations with hasName, use:

    unless(
      hasName("(anonymous)")
    )
    

    matchesName

    Alternatively, if you use matchesName (which accepts a regex), then the effective name is entirely different(!). In that case, a tagless record declaration has an effective name like:

    ::struct (unnamed at /full/path/to/test.c:3:1)
    

    The exact construction comes from TypePrinter::printTag() and the matchesName definition. Note that it has the leading :: even if the input code was compiled as C, and that printTag() has some behavior variation depending on language options.

    Consequently, to exclude tagless structure declarations with matchesName, use:

    unless(
      matchesName("[`(]unnamed")
    )
    

    The regex accepts either backtick or open paren before unnamed because the former is used when emulating MSVC and the latter otherwise, and because omitting the punctuation requirement would let it match structs that have a name that happens to include the substring "unnamed" (and instead requiring a following space would fail if Policy.AnonymousTagLocations is not set).

    Demonstration

    We can use the clang-query command line tool (part of Clang) to test the match expression:

    $ cat test.c
    // test.c
    
    struct /*no tag*/ {
      int b;
    } a1;
    
    struct HasTag {
      int b;
    } a2;
    
    // EOF
    
    $ clang-query -c='m recordDecl(unless(hasName("(anonymous)")))' test.c --
    
    Match #1:
    
    $PWD/test.c:7:1: note: "root" binds here
    struct HasTag {
    ^~~~~~~~~~~~~~~
    1 match.