c++clangclang-tidy

Identifying type definitions in Macros with clang-tidy


I'm writing (my first) clang-tidy check. It should detect when macros define types, like:

#define MY_INT int
#define MY_STRING std::string
class MyClass {}
#define MY_CLASS MyClass
#define MY_INT_ARRAY std::array<int>
template <typename T> class MyTemplateClass {};
#define MY_TEMPLATE_TYPE MyTemplateClass<int>

The goal is to flag macros that define types and suggest using typedef or using instead.

I started using the existing cppcoreguidelines-macro-usage and stripped it down to only get the existing Macros via a registerPPCallbacks, which works fine. Since I figured out I need an ASTContext to process the type information, I also registered some random Matcher via registerMatchers(MatchFinder *Finder), so I'm able to get the ASTContext from Result.Context.

User-defined types like MyClass can now be detected using:

IdentifierInfo &II = Context->Idents.get(Identifier);
DeclContext::lookup_result Result = Context->getTranslationUnitDecl()->lookup(&II);

// Iterate through the lookup result to find a TypeDecl
for (NamedDecl *ND : Result) {
  if (TypeDecl *TD = dyn_cast<TypeDecl>(ND)) {
    return Context->getTypeDeclType(TD);          // User-defined type! 
  }
}

Unfortunatley fhis approach fails for built-in types (int, uint8_t) and standard types (std::string, std::array<int>), where the lookup result is empty. For such types a tried to just use string comparisons, but I feel this is a very brittle approach:

if (Identifier == "int") return Context->IntTy;
if (Identifier == "float") return Context->FloatTy;

For more advanced types like the mentioned template classes or arrays I'm out of ideas...

Is there a robust way to detect if a macro defines a type (especially for built-in, standard, and template types). Any suggestions on improving this approach or other alternatives?


Solution

  • Is there a robust way to detect if a macro defines a type?

    No.

    The problem is that the lookup context at the point of the macro definition is different from the lookup context(s) at the point(s) where it is expanded. So, it is impossible to reliably predict how the tokens in the macro definition will be interpreted.

    Problem #1: Type is declared later

    Consider:

    #define MYCLASS myclass
    class myclass {};
    

    Lookup at the definition site will not find the declaration that comes after it.

    Problem #2: Type is in a different scope

    Consider:

    namespace NS {
      class myclass {};
    }
    using namespace NS;
    #define MYCLASS myclass
    

    Naive lookup of myclass in the global scope (which is where the macro appears here) will miss NS::myclass. A more sophisticated lookup that takes into account the using directive will be defeated if that directive appears after the macro definition.

    Relatedly, there is no need for the macro definition to appear in the same scope as its expansions, so looking it up in its definition scope is not necessarily right in the first place.

    Problem #3: Definition relies on another macro

    Consider:

    template <typename T> class myclass {};
    #define MYCLASS_OF_MYINT myclass<MYINT>
    #define MYINT int
    

    If we feed the text myclass<MYINT> to the parser, in the proper scope of the definition point, it fails to parse because MYINT needs to be expanded, but that macro is not defined yet. If we remember all macro definitions that were in effect at the end of the translation unit, and expand occurrences of them in the definition of interest, we trip over redefinitions (which are rare, but the question specifies "robust").

    Problem #4: Interpretation is different at different expansions

    Consider:

    #define MYCLASS myclass
    namespace NS1 {
      class myclass {};
      MYCLASS x;           // Expands to a type.
    }
    namespace NS2 {
      int myclass;
      int f()
      {
        return MYCLASS;    // Expands to a variable.
      }
    }
    

    The macro is expanded twice, but that expansion is treated as a type in only one place. (Yes, this is pathological.)

    Possible alternative: checking expansion sites

    Instead of checking macro definitions, it would be feasible to check the macro expansion sites. Specifically, one could use RecursiveASTVisitor to search for all occurrences of TypeLoc (which represent syntactic occurrences of types), then, for each one, call getSourceRange(), and check to see if the "spelling" location of the first and last token correspond to the first and last token of some macro definition. Macro definitions can be recorded using PPCallbacks (which the question indicates you're already doing).

    The drawback of this approach is it will only report macros that are expanded somewhere in the translation unit being analyzed. But it avoids the problem of being unable to interpret a macro definition in isolation.