Consider the following example. I use python clang_example.py
to parse the header my_source.hpp
for function and method declarations.
#pragma once
namespace ns {
struct Foo {
struct Bar {};
Bar fun1(void*);
};
using Baz = Foo::Bar;
void fun2(Foo, Baz const&);
}
I use the following code to parse the function & method declarations using libclang's python bindings:
import clang.cindex
import typing
def filter_node_list_by_predicate(
nodes: typing.Iterable[clang.cindex.Cursor], predicate: typing.Callable
) -> typing.Iterable[clang.cindex.Cursor]:
for i in nodes:
if predicate(i):
yield i
yield from filter_node_list_by_predicate(i.get_children(), predicate)
if __name__ == '__main__':
index = clang.cindex.Index.create()
translation_unit = index.parse('my_source.hpp', args=['-std=c++17'])
for i in filter_node_list_by_predicate(
translation_unit.cursor.get_children(),
lambda n: n.kind in [clang.cindex.CursorKind.FUNCTION_DECL, clang.cindex.CursorKind.CXX_METHOD]
):
print(f"Function name: {i.spelling}")
print(f"\treturn type: \t{i.type.get_result().spelling}")
for arg in i.get_arguments():
print(f"\targ: \t{arg.type.spelling}")
Function name: fun1
return type: Bar
arg: void *
Function name: fun2
return type: void
arg: Foo
arg: const Baz &
Now I would like to extract the fully qualified name of the return type and argument types so I can correctly reference them from the outermost scope:
Function name: ns::Foo::fun1
return type: ns::Foo::Bar
arg: void *
Function name: ns::fun2
return type: void
arg: ns::Foo
arg: const ns::Baz &
Using this SO answer I can get the fully qualified name of the function declaration, but not of the return and argument types.
How do I get the fully qualified name of a type (not a cursor) in clang?
Note:
I tried using Type.get_canonical
and it gets me close:
print(f"\treturn type: \t{i.type.get_result().get_canonical().spelling}")
for arg in i.get_arguments():
print(f"\targ: \t{arg.type.get_canonical().spelling}")
But Type.get_canonical
also resolves typedefs and aliases, which I do not want. I want the second argument of fun2
to be resolved as const ns::Baz &
and not const ns::Foo::Bar &
.
EDIT:
After having tested Scott McPeak's answer on my real application case I realized that I need this code to properly resolve template classes and nested types of template classes as well.
Given the above code as well as
namespace ns {
template <typename T>
struct ATemplate {
using value_type = T;
};
typedef ATemplate<Baz> ABaz;
ABaz::value_type fun3();
}
I would want the return type to be resolved to ns::ABaz::value_type
and not ns::ATemplate::value_type
or ns::ATemplate<ns::Foo::Bar>::value_type
. I would be willing to settle for ns::ATemplate<Baz>::value_type
.
Also, I can migrate to the C++ API, if the functionality of the Python bindings are too limited for what I want to do.
Unfortunately, there does not appear to be a simple way to print a
type using fully-qualified names. Even in the C++ API,
QualType::getAsString(PrintingPolicy&)
ignores the
SuppressScope
flag due to the intervention of the
ElaboratedTypePolicyRAII
class (I don't know why, and the git commit history offers no clues
that I could find). Even if the C++ API worked as I would have
hoped/expected, PrintingPolicy
isn't exposed in the C or Python APIs.
Consequently, to do this, we have to resort to taking apart the type
structure in the client code, printing fully qualified names whenever
we hit a named type, which is typically expressed as
TypeKind.ELABORATED
. (I'm not sure if they always are.)
The following example program demonstrates the technique, embodied by
the type_str
function. As a proof of concept, it does not
exhaustively handle all of the cases, although it does cover the most
common ones. You can look at the source of
TypePrinter.cpp
to get an idea of what handling all cases entails.
#!/usr/bin/env python3
"""
Print types with fully-qualified names.
This demonstrates the basic approach, digging into the type structure to
print its details, including fully-qualified names when we encounter
named types. However, there are several unhandled cases, some of which
are indicated with TODOs below.
Also, beware that I was unable to get 'mypy' to work properly on this
(despite installing the 'types-clang' package), so the type annotations
below might be incorrect.
"""
import clang.cindex
import typing
def get_decl_fqn(decl: clang.cindex.Cursor) -> str:
"""
Given a Cursor that refers to a Declaration, get its fully
qualified name.
"""
# The semantic parent is the enclosing class, namespace, or
# translation unit.
parent = decl.semantic_parent
assert(parent is not None)
# When we hit the TU, just return the simple identifier.
if parent.kind == clang.cindex.CursorKind.TRANSLATION_UNIT:
return decl.spelling
# Otherwise, print the parent name as a qualifier.
else:
return get_decl_fqn(parent) + "::" + decl.spelling
def starts_with_letter(s: str) -> bool:
"""
True if 's' starts with a letter.
"""
return s != "" and s[0].isalpha()
def ends_with_letter(s: str) -> bool:
"""
True if 's' ends with a letter.
"""
return s != "" and s[-1].isalpha()
def join_type_strs(s1: str, s2: str) -> str:
"""
Join two strings containing fragments of type syntax, inserting a
space if both are non-empty and either has a letter adjacent to the
joined edge.
"""
if s1 != "" and s2 != "" and (ends_with_letter(s1) or starts_with_letter(s2)):
return s1 + " " + s2
else:
return s1 + s2
def type_str(t: clang.cindex.Type) -> str:
"""
Print 't' in C++ syntax, using fully qualified names for named
types. (In contrast, 't.spelling' omits qualifiers.)
"""
return join_type_strs(before_type_str(t), after_type_str(t))
def before_type_str(t: clang.cindex.Type) -> str:
"""
Print the part of 't' that would go before the declarator name in a
declaration of a variable with that type.
"""
return join_type_strs(before_type_str_nq(t), cv_qualifiers_str(t))
def cv_qualifiers_str(t: clang.cindex.Type) -> str:
"""
If 't' has any const/volatile/restrict qualifiers, return a string
containing them, separated by spaces. Otherwise, return "".
"""
qualifiers = []
if t.is_const_qualified():
qualifiers.append("const")
if t.is_volatile_qualified():
qualifiers.append("volatile")
if t.is_restrict_qualified():
qualifiers.append("restrict")
return " ".join(qualifiers)
def before_type_str_nq(t: clang.cindex.Type) -> str:
"""
Print the part of 't' that would go before the declarator name in a
declaration of a variable with that type, ignoring any CV
qualifiers.
"""
if t.kind == clang.cindex.TypeKind.ELABORATED:
# Most named types are represented with the "elaborated" node,
# which typically has a name.
return get_decl_fqn(t.get_declaration())
elif t.kind == clang.cindex.TypeKind.POINTER:
p = t.get_pointee()
# TODO: This does not handle pointer-to-function properly, since
# that requires additional parentheses.
return join_type_strs(before_type_str(p), "*")
elif t.kind == clang.cindex.TypeKind.LVALUEREFERENCE:
p = t.get_pointee()
return join_type_strs(before_type_str(p), "&")
elif t.kind == clang.cindex.TypeKind.RVALUEREFERENCE:
p = t.get_pointee()
return join_type_strs(before_type_str(p), "&&")
elif t.kind == clang.cindex.TypeKind.FUNCTIONPROTO:
rettype = t.get_result()
return before_type_str(rettype)
# TODO: FUNCTIONNOPROTO, pointer-to-member, and possibly others.
else:
# For other types, just use the spelling as its "before" syntax.
return t.spelling
def after_type_str(t: clang.cindex.Type) -> str:
"""
Print the part of 't' that would go after the declarator name in a
declaration of a variable with that type.
"""
if t.kind == clang.cindex.TypeKind.FUNCTIONPROTO:
res = "("
count = 0
for argtype in t.argument_types():
if count > 0:
res += ", "
count += 1
res += type_str(argtype)
res += ")"
return res
# TODO: FUNCTIONNOPROTO and the various array types.
return ""
# ------------- Original code, edited to call 'type_str' ---------------
def filter_node_list_by_predicate(
nodes: typing.Iterable[clang.cindex.Cursor], predicate: typing.Callable
) -> typing.Iterable[clang.cindex.Cursor]:
for i in nodes:
if predicate(i):
yield i
yield from filter_node_list_by_predicate(i.get_children(), predicate)
if __name__ == '__main__':
index = clang.cindex.Index.create()
translation_unit = index.parse('my_source.hpp', args=['-std=c++17'])
for i in filter_node_list_by_predicate(
translation_unit.cursor.get_children(),
lambda n: n.kind in [clang.cindex.CursorKind.FUNCTION_DECL, clang.cindex.CursorKind.CXX_METHOD]
):
print(f"Function name: {i.spelling}")
# ---- Edited section ----
# Compare the 'spelling' method to 'type_str' defined above.
t = i.type
print(f"\tFunction type spelling : {t.spelling}")
print(f"\tFunction type type_str(): {type_str(t)}")
# EOF
On your example input, it prints:
$ ./fq-type-name.py
Function name: fun1
Function type spelling : Bar (void *)
Function type type_str(): ns::Foo::Bar (void *)
Function name: fun2
Function type spelling : void (Foo, const Baz &)
Function type type_str(): void (ns::Foo, ns::Baz const &)
Notably, this fully qualifies ns::Foo::Bar
in the return type of
fun1
. It also uses ns::Baz
in the argument list of fun2
, rather
than using the underlying type, Bar
.
The revised question asks about a case involving templates and a
typedef
that is used as a scope qualifier, and wants to recover a
fully-qualified name that uses that typedef
. This is not possible
using the approach outlined above because we construct the qualifiers
by walking up the scope stack from the found declaration, ignoring how
the type was expressed originally.
Using the Python API, it is possible to see the original type syntax and its qualifiers by iterating over children, but the child list is difficult to interpret. For example, if the input is:
namespace ns {
struct A {
struct Inner {};
};
typedef A B;
B::Inner f(int x, A a);
}
and we use this code to print the TU:
def print_ast(node: clang.cindex.Cursor, label: str, indentLevel: int) -> None:
"""
Recursively print the subtree rooted at 'node'.
"""
indent = " " * indentLevel
print(f"{indent}{label}: kind={node.kind} " +
f"spelling='{node.spelling}' " +
f"loc={node.location.line}:{node.location.column}")
indentLevel += 1
index = 0
for c in node.get_children():
print_ast(c, f"child {index}", indentLevel)
index += 1
then the output is:
TU: kind=CursorKind.TRANSLATION_UNIT spelling='test3.cc' loc=0:0
child 0: kind=CursorKind.NAMESPACE spelling='ns' loc=1:11
child 0: kind=CursorKind.STRUCT_DECL spelling='A' loc=2:10
child 0: kind=CursorKind.STRUCT_DECL spelling='Inner' loc=3:12
child 1: kind=CursorKind.TYPEDEF_DECL spelling='B' loc=5:13
child 0: kind=CursorKind.TYPE_REF spelling='struct ns::A' loc=5:11
child 2: kind=CursorKind.FUNCTION_DECL spelling='f' loc=6:12
child 0: kind=CursorKind.TYPE_REF spelling='ns::B' loc=6:3
child 1: kind=CursorKind.TYPE_REF spelling='struct ns::A::Inner' loc=6:6
child 2: kind=CursorKind.PARM_DECL spelling='x' loc=6:18
child 3: kind=CursorKind.PARM_DECL spelling='a' loc=6:23
child 0: kind=CursorKind.TYPE_REF spelling='struct ns::A' loc=6:21
Observe that the qualified type B::Inner
is expressed as this pair of
adjacent children:
child 0: kind=CursorKind.TYPE_REF spelling='ns::B' loc=6:3
child 1: kind=CursorKind.TYPE_REF spelling='struct ns::A::Inner' loc=6:6
There's no simple way to see that the first child is the qualifier portion of the second child. This is a general problem with the Clang C API, and consequently of the Python API: accessors for specific roles are often missing, so one must resort to iterating over children and trying to reverse-engineer which is which. (I spent a couple weeks going down this road for a different project, and eventually had to admit defeat.)
Therefore, with the revised requirement of not merely computing a type syntax string that uses fully-qualified names, but one that adheres to the original syntax as closely as possible, I think it's going to be a difficult task to robustly complete using the Python API since that original syntax is tough to unambiguously retrieve.
I recommend instead using the C++ API. This is still non-trivial, but
all the information is there and available through accessors that
distinguish the various "child" roles. If you want a tip on getting
started, I have a tool on GitHub called
print-clang-ast
that
prints a lot (but by no means all) of the Clang AST in a moderately
readable JSON format. I even just
added
code to print the details of
NestedNameSpecifier
(which is how qualified names are represented) while trying to see if
the Python API could be used for what you what. If you try to
accomplish this using the C++ API but run into trouble, you could then
ask a new question based on where you get stuck.