c++clangclang-tidyclang-ast-matchers

Matching sugared QualType of a template parameter in a varDecl()


Background

The built-in C++ types don't have defined sizes (just a minimum size), so we require all integer types to use explicitly-defined typedefs so the size is the same across all platforms, compilers, and CPU architectures. The lone exception to this is size_t, considering it's the return types of common operations like std::vector::size().

So we're looking at ways to detect via clang-tidy when a usage of size_t would cause a "narrowing conversion" warning on MSVC under x86 architectures.

The Problem

Given a function like this:

template <typename T>
void foo()
{
  int64_t i{0};
  T v = i;
}

I am trying to differentiate when the varDecl for v is created by a call to foo<size_t>() as opposed to the non-problematic foo<uint64_t>().

The AST dump for the above in Compiler Explorer:

|-FunctionTemplateDecl <line:4:1, line:9:1> line:5:6 foo
| |-TemplateTypeParmDecl <line:4:11, col:20> col:20 referenced typename depth 0 index 0 T
| |-FunctionDecl <line:5:1, line:9:1> line:5:6 foo 'void ()'
| | `-CompoundStmt <line:6:1, line:9:1>
| |   |-DeclStmt <line:7:3, col:16>
| |   | `-VarDecl <col:3, col:15> col:12 referenced v 'uint64_t':'unsigned long' listinit
| |   |   `-InitListExpr <col:13, col:15> 'uint64_t':'unsigned long'
| |   |     `-ImplicitCastExpr <col:14> 'uint64_t':'unsigned long' <IntegralCast>
| |   |       `-IntegerLiteral <col:14> 'int' 3
| |   `-DeclStmt <line:8:3, col:12>
| |     `-VarDecl <col:3, col:11> col:5 var 'T' cinit
| |       `-DeclRefExpr <col:11> 'uint64_t':'unsigned long' lvalue Var 0xcf7fc88 'v' 'uint64_t':'unsigned long'
| `-FunctionDecl <line:5:1, line:9:1> line:5:6 used foo 'void ()' implicit_instantiation
|   |-TemplateArgument type 'unsigned long'
|   | `-BuiltinType 'unsigned long'
|   `-CompoundStmt <line:6:1, line:9:1>
|     |-DeclStmt <line:7:3, col:16>
|     | `-VarDecl <col:3, col:15> col:12 used v 'uint64_t':'unsigned long' listinit
|     |   `-InitListExpr <col:13, col:15> 'uint64_t':'unsigned long'
|     |     `-ImplicitCastExpr <col:14> 'uint64_t':'unsigned long' <IntegralCast>
|     |       `-IntegerLiteral <col:14> 'int' 3
|     `-DeclStmt <line:8:3, col:12>
|       `-VarDecl <col:3, col:11> col:5 var 'unsigned long' cinit
|         `-ImplicitCastExpr <col:11> 'uint64_t':'unsigned long' <LValueToRValue>
|           `-DeclRefExpr <col:11> 'uint64_t':'unsigned long' lvalue Var 0xcf8a3a8 'v' 'uint64_t':'unsigned long'

What I've Tried

I can see there's an instantiation of the function template under the FunctionTemplateDecl node, and it looked like substTemplateTypeParmType() and hasReplacementType() were a promising combination to get the type of the varDecl, like so:

m varDecl(
  hasType(substTemplateTypeParmType(
    hasReplacementType(qualType())
  ))
)

...however, the replacement type has already been "desugared" by that point, and loses the size_t declaration name. This is despite the fact the documentation of SubstTemplateTypeParmType states:

They are used solely to record that a type was originally written as a template type parameter; therefore they are never canonical.

...and the class specifically has isSugared() and desugar() methods, which lead me to believe that the type is supposed to be sugared still with the size_t typedef.

(Maybe I'm incorrect in assuming that the fully desugared type is the "canonical" type?)

What I'm Asking

Are my assumptions incorrect that SubstTemplateTypeParmType should return the sugared type still?

Is there a way to traverse from the varDecl, back up to the FunctionTemplateDecl, get the specialization that was used, and get the sugared type from there?


Solution

  • Template specializations use canonical types

    Your goal, as I understand it, is to distinguish the specialization foo<uint64_t> from the specialization foo<size_t>, when the target platform uses unsigned long for both uint64_t and size_t.

    This is not possible in a straightforward way because both are names for the same entity; for example, they would have the same address in memory. Like most (all?) C++ compilers, Clang's representation of a template specialization uses template arguments that refer to canonical types, where a canonical type is the chosen representative of its semantic equivalence class (here, the built-in type unsigned long).

    The AST dump in the question accurately depicts this:

    |       `-VarDecl <col:3, col:11> col:5 var 'unsigned long' cinit
                                                 ^^^^^^^^^^^^^
    

    Within the instantiation body, the type of var is just (a SubstTemplateTypeParmType that refers to) unsigned long. The fact that the specialization argument was originally spelled size_t is gone by the time the instantiation machinery starts running. Consequently, an AST matcher cannot detect that missing information.

    (Note: The code in the question is slightly different from the code that was used to create the AST dump. In the question code, that variable is called v, whereas it was called var when the dump was created.)

    But SubstTemplateTypeParmType is not canonical?

    That's right, but it just means SubstTemplateTypeParmType itself is not canonical; the type it refers to is usually canonical. (I think the only exceptions are where the underlying type is also dependent, which arise when, for example, a class template that contains a member function template is instantiated).

    Elaborating a bit, SubstTemplateTypeParmType is non-canonical because it is simply a wrapper for, and semantically equivalent to, some other type. The type inside is normally canonical. In some ways, SubstTemplateTypeParmType is like TypedefType, which is also a non-canonical wrapper for another type (which might or might not be canonical).

    For example, if we define a template:

    template <typename T>
    void foo()
    {
      typedef T *Tptr;
    }
    

    then implicitly instantiate it by saying foo<unsigned long>, then inside the resulting instantiation, the declaration of Tptr will be a TypedefDecl whose getTypeForDecl() is a TypedefType pointing at a PointerType pointing at a SubstTemplateTypeParmType pointing at a BuiltinType. Only the final BuiltinType will be canonical in that case.

    Is there another way to do this?

    Yes, but probably not with AST matchers.

    The template-id (specialization name) foo<size_t> within the expression foo<size_t>() is a DeclRefExpr whose template_arguments() refer to size_t (that is not evident in the AST tree dump; I used my own tool that prints more details to investigate this) and its getDecl() points at the instantiation (the tree dump does show that).

    So, given an instantiation that uses a SubstTemplateTypeParmType in a way that is potentially troublesome:

    This procedure would be reasonably straightforward to do using the Clang C++ API. I suspect it cannot be done using AST matchers alone, but I think that also depends on exactly how you're identifying potentially troublesome type usages.