clibclang

How do I get the enum type of a clang::EnumConstantDecl?


I'm trying to write a clang tool that finds all calls to a function with a variable number of arguments, and returns the type of each argument. When I pass in a constant enumerator value, for example:

static void foo(char *, ...) {}

enum test {
  TEST_VALUE1,
  TEST_VALUE2
};

void main() {
  foo("bla", TEST_VALUE1);
}

I get this AST for the argument:

DeclRefExpr 0x55933eb553b8 'int' EnumConstant 0x55933eb53ec0 'TEST_VAL1' 'int'

I see how to get the clang::EnumConstantDecl object representing the enumerator TEST_VALUE1, but I don't see how to get the type of the enumeration, enum test. How do I get that?

Note: This is similar to the question Get function call argument enum type with libclang, but in that case, the argument was a variable of enum type rather than an enumerator, and the posted solution there does not work for this case.


Solution

  • First, I'll note that this answer is sort of a bug fix for my answer to the previous question. I failed to test the case of passing an enumerator directly, and if I had, would have noticed and corrected the problem then. But here we are.

    Enumerator types in C

    In C, the type of an enumerator (the constants inside an enumeration declaration) is int. Quoting C17 6.7.2.2 p3:

    The identifiers in an enumerator list are declared as constants that have type int and may appear wherever such are permitted.

    Thus, the primary reason that Clang shows the type of an EnumConstantDecl in C as int is because the language mandates that it has that type.

    Contrast this with C++17 10.2 p1:

    An enumeration is a distinct type (6.7.2) with named constants.

    BTW, this also shows why it is important to use the right tag when asking a question, in this case versus , since these are different languages and the answer in one case can be different than for the other. (The solution code in the earlier question works if the input language is C++.)

    Getting the type of an EnumConstantDecl.

    In the AST, we have a DeclRefExpr. To get the EnumConstantDecl, we call clang_getCursorReferenced, which in this case corresponds to the C++ API call DeclRefExpr::getDecl. (The question already indicates an understanding of how to do this, so this is just for completeness.)

    Then, to get the EnumDecl, we then call clang_getCursorSemanticParent. In the C++ API, that would be done with Decl::getDeclContext followed by dyn_cast to downcast to the subclass that context should be (e.g., dyn_cast<clang::EnumDecl>(enumConstantDecl->getDeclContext())).

    Finally, to get the EnumType corresponding to this EnumDecl, we call the usual clang_getCursorType, which corresponds in this case to the C++ call TagType::getTypeForDecl.

    This is encapsulated in the following procedure that retrieves the type of an expression in the usual way, except if it is a DeclRefExpr to an enumerator, it navigates to the enumeration type:

    // Like `clang_getCursorType`, but if the cursor is a `DeclRefExpr` to
    // an `EnumConstant`, get its `EnumDecl` rather than (e.g.) `int`, which
    // is what `clang_getCursorType` would return.
    CXType getCursorType_getEnumerationIfEnumerator(CXCursor c)
    {
      if (clang_getCursorKind(c) == CXCursor_DeclRefExpr) {
        // Get the declaration that the expression refers to.
        CXCursor referenced = clang_getCursorReferenced(c);
    
        if (clang_getCursorKind(referenced) == CXCursor_EnumConstantDecl) {
          // Get the `EnumDecl`.
          CXCursor semanticParent = clang_getCursorSemanticParent(referenced);
    
          // The type of that is what we want.
          return clang_getCursorType(semanticParent);
        }
      }
    
      // Just get the type normally.
      return clang_getCursorType(c);
    }
    

    Background: Declaration contexts

    The clang_getCursorSemanticParent function returns the "declaration context" of another declaration. All declarations are organized into a tree that is distinct from the main AST; the declaration tree only contains declarations.

    Furthermore, there are actually two declaration trees, one being "lexical" and the other "semantic". These are never different in C, but in C++, a method defined outside its class body has a lexical parent of (say) the translation unit, but a semantic parent of its class.

    Not every declaration can be a parent (interior node) in the declaration tree; only those that inherit DeclContext can be parents. But knowing that EnumDecl inherits DeclContext (via TagDecl) points us toward navigating in the declaration tree.

    See Declaration contexts in the Clang internals manual for more information.

    Complete updated program

    Below is an updated program that incorporates getCursorType_getEnumerationIfEnumerator, as well as fixing the issue that getUnderType failed to check if c itself could yield a type:

    // ---------------------------- BEGIN ADDED ----------------------------
    #include <clang-c/Index.h>
    #include <stdbool.h>
    #include <stdio.h>
    #include <string.h>
    
    // Client data for `getUnderTypeVisitor`.
    typedef struct GetUnderTypeData {
      // Underlying type, if any.
      CXType underType;
    
      // True if we find a type to use.
      bool found;
    } GetUnderTypeData;
    
    // Like `clang_getCursorType`, but if the cursor is a `DeclRefExpr` to
    // an `EnumConstant`, get its `EnumDecl` rather than (e.g.) `int`, which
    // is what `clang_getCursorType` would return.
    CXType getCursorType_getEnumerationIfEnumerator(CXCursor c)
    {
      if (clang_getCursorKind(c) == CXCursor_DeclRefExpr) {
        // Get the declaration that the expression refers to.
        CXCursor referenced = clang_getCursorReferenced(c);
    
        if (clang_getCursorKind(referenced) == CXCursor_EnumConstantDecl) {
          // Get the `EnumDecl`.
          CXCursor semanticParent = clang_getCursorSemanticParent(referenced);
    
          // The type of that is what we want.
          return clang_getCursorType(semanticParent);
        }
      }
    
      // Just get the type normally.
      return clang_getCursorType(c);
    }
    
    // Visitor for `getUnderType`.
    enum CXChildVisitResult getUnderTypeVisitor(
      CXCursor c, CXCursor parent, CXClientData client_data)
    {
      GetUnderTypeData *data = (GetUnderTypeData *)client_data;
    
      enum CXCursorKind kind = clang_getCursorKind(c);
    
      // The AST node `ImplicitCastExpr` is surfaced in the C API as an
      // "unexposed" kind.  So if we see an unexposed kind, assume that it
      // means `ImplicitCastExpr` and recursively search the children.
      if (clang_isUnexposed(kind)) {
        return CXChildVisit_Recurse;
      }
    
      // For any other kind, we probably have a usable type.
      else {
        data->underType = getCursorType_getEnumerationIfEnumerator(c);
        data->found = true;
        return CXChildVisit_Break;
      }
    }
    
    // Try to get the type of `c` after skipping any `ImplicitCastExpr`
    // nodes.  Return true and set `*underType` if we can, and return false
    // otherwise.
    bool getUnderType(CXCursor c, CXType * /*OUT*/ underType)
    {
      // If `c` itself is not unexposed, get its type.
      if (!clang_isUnexposed(clang_getCursorKind(c))) {
        *underType = getCursorType_getEnumerationIfEnumerator(c);
        return true;
      }
    
      GetUnderTypeData data;
      data.found = false;
    
      clang_visitChildren(c, getUnderTypeVisitor, &data);
      if (data.found) {
        *underType = data.underType;
        return true;
      }
      else {
        return false;
      }
    }
    // ----------------------------- END ADDED -----------------------------
    
    static enum CXChildVisitResult visitFuncCalls(CXCursor current_cursor,
                                                  CXCursor parent,
                                                  CXClientData client_data) {
      if (clang_getCursorKind(current_cursor) != CXCursor_CallExpr) {
        return CXChildVisit_Recurse;
      }
    
      static const char *FUNCTION_NAME = "foo";
    
      const CXString spelling = clang_getCursorSpelling(current_cursor);
      if (strcmp(clang_getCString(spelling), FUNCTION_NAME) != 0) {
        return CXChildVisit_Recurse;
      }
      clang_disposeString(spelling);
      
      for (int i = 0; i < clang_Cursor_getNumArguments(current_cursor); i++) {
        CXCursor argument = clang_Cursor_getArgument(current_cursor, i);
        CXType argument_type = clang_getCursorType(argument);
        CXString argument_type_spelling = clang_getTypeSpelling(argument_type);
    
        printf("Argument %d: %s\n", i, clang_getCString(argument_type_spelling));
    
        // -------------------------- BEGIN ADDED --------------------------
        CXType underType;
        if (getUnderType(argument, &underType)) {
          CXString underTypeSpelling = clang_getTypeSpelling(underType);
          printf("underType: %s\n", clang_getCString(underTypeSpelling));
          clang_disposeString(underTypeSpelling);
        }
        // --------------------------- END ADDED ---------------------------
    
        clang_disposeString(argument_type_spelling);
      }
    
      return CXChildVisit_Continue;
    }
    
    int main() {
      // --------------------------- BEGIN ADDED ---------------------------
      CXIndex index = clang_createIndex(0, 0);
      // ---------------------------- END ADDED ----------------------------
      const CXTranslationUnit unit = clang_parseTranslationUnit(
            index, "file.c", NULL, 0, NULL, 0, CXTranslationUnit_None);
      const CXCursor cursor = clang_getTranslationUnitCursor(unit);
      clang_visitChildren(cursor, visitFuncCalls, NULL /* client_data*/);
    }
    

    When run on the input:

    void foo(int a, ...) {}
    
    enum test {
      ENUM_VAL1,
      ENUM_VAL2
    }; 
    
    int main() {
      enum test e = ENUM_VAL1;
      int a = 1;
      foo(a, ENUM_VAL1);
    }
    

    It prints:

    Argument 0: int
    underType: int
    Argument 1: int
    underType: enum test     <-- got it, again