c++cclanglibclang

How to find out whether a member function is const or volatile with libclang?


I have an instance of CXCursor of kind CXCursor_CXXMethod. I want to find out if the function is const or volatile, for example:

class Foo {
public:
    void bar() const;
    void baz() volatile;
    void qux() const volatile;
};

I could not find anything useful in the documentation of libclang. I tried clang_isConstQualifiedType and clang_isVolatileQualifiedType but these always seem to return 0 on C++ member function types.


Solution

  • I can think of two approaches:

    Using the libclang lexer

    The code which appears in this SO answer works for me; it uses the libclang tokenizer to break a method declaration apart, and then records any keywords outside of the method parentheses.

    It does not access the AST of the code, and as far as I can tell doesn't involve the parser at all. If you are sure the code you investigate is proper C++, I believe this approach is safe.

    Disadvantages: This solution does not appear to take into account preprocessing directives, so the code has to be processed first (e.g., passed through cpp).

    Example code (the file to parse must be the first argument to your program, e.g. ./a.out bla.cpp):

    #include "clang-c/Index.h"
    #include <string>
    #include <set>
    #include <iostream>
    
    std::string GetClangString(CXString str)
    {
      const char* tmp = clang_getCString(str);
      if (tmp == NULL) {
        return "";
      } else {
        std::string translated = std::string(tmp);
        clang_disposeString(str);
        return translated;
      }
    }
    
    void GetMethodQualifiers(CXTranslationUnit translationUnit,
                             std::set<std::string>& qualifiers,
                             CXCursor cursor) {
      qualifiers.clear();
    
      CXSourceRange range = clang_getCursorExtent(cursor);
      CXToken* tokens;
      unsigned int numTokens;
      clang_tokenize(translationUnit, range, &tokens, &numTokens);
    
      bool insideBrackets = false;
      for (unsigned int i = 0; i < numTokens; i++) {
        std::string token = GetClangString(clang_getTokenSpelling(translationUnit, tokens[i]));
        if (token == "(") {
          insideBrackets = true;
        } else if (token == "{" || token == ";") {
          break;
        } else if (token == ")") {
          insideBrackets = false;
        } else if (clang_getTokenKind(tokens[i]) == CXToken_Keyword && 
                 !insideBrackets) {
          qualifiers.insert(token);
        }
      }
    
      clang_disposeTokens(translationUnit, tokens, numTokens);
    }
    
    int main(int argc, char *argv[]) {
      CXIndex Index = clang_createIndex(0, 0);
      CXTranslationUnit TU = clang_parseTranslationUnit(Index, 0, 
              argv, argc, 0, 0, CXTranslationUnit_None);
    
      // Set the file you're interested in, and the code location:
      CXFile file = clang_getFile(TU, argv[1]);
      int line = 5;
      int column = 6;
      CXSourceLocation location = clang_getLocation(TU, file, line, column);
      CXCursor cursor = clang_getCursor(TU, location);
    
      std::set<std::string> qualifiers;
      GetMethodQualifiers(TU, qualifiers, cursor);
    
      for (std::set<std::string>::const_iterator i = qualifiers.begin(); i != qualifiers.end(); ++i) {
        std::cout << *i << std::endl;
      }
    
      clang_disposeTranslationUnit(TU);
      clang_disposeIndex(Index);
      return 0;
    }
    

    Using libclang's Unified Symbol Resolution (USR)

    This approach involves using the parser itself, and extracting qualifier information from the AST.

    Advantages: Seems to work for code with preprocessor directives, at least for simple cases.

    Disadvantages: My solution parses the USR, which is undocumented, and might change in the future. Still, it's easy to write a unit-test to guard against that.

    Take a look at $(CLANG_SRC)/tools/libclang/CIndexUSRs.cpp, it contains the code that generates a USR, and therefore contains the information required to parse the USR string. Specifically, lines 523-529 (in LLVM 3.1's source downloaded from www.llvm.org) for the qualifier part.

    Add the following function somewhere:

    void parseUsrString(const std::string& usrString, bool* isVolatile, bool* isConst, bool *isRestrict) {
      size_t bangLocation = usrString.find("#");
      if (bangLocation == std::string::npos || bangLocation == usrString.length() - 1) {
        *isVolatile = *isConst = *isRestrict = false;
        return;
      }
      bangLocation++;
      int x = usrString[bangLocation];
    
      *isConst = x & 0x1;
      *isVolatile = x & 0x4;
      *isRestrict = x & 0x2;
    }
    

    and in main(),

    CXString usr = clang_getCursorUSR(cursor);
    const char *usr_string = clang_getCString(usr);
    std::cout << usr_string << "\n";
    bool isVolatile, isConst, isRestrict;
    parseUsrString(usr_string, &isVolatile, &isConst, &isRestrict);
    printf("restrict, volatile, const: %d %d %d\n", isRestrict, isVolatile, isConst);
    clang_disposeString(usr);
    

    Running on Foo::qux() from

    #define BLA const
    
    class Foo {
    public:
        void bar() const;
        void baz() volatile;
        void qux() BLA volatile;
    };
    

    produces the expected result of

    c:@C@Foo@F@qux#5
    restrict, volatile, const: 0 1 1
    

    Caveat: you might have noticed that libclang's source suggets my code should be isVolatile = x & 0x2 and not 0x4, so it might be the case you should replace 0x4 with 0x2. It's possible my implementation (OS X) has them replaced.