We want to parse the following source file:
#pragma once
#include "Vec3.h"
template<typename T> struct Array;
struct S1
{
int S1i;
Array<Array<Vec3>> S1Grid;
};
struct S2
{
int S2i;
Array<Array<Vec3> > S2Grid;
};
struct S3
{
int S3i;
Array<Array<char>> S3Grid;
};
Using the following parser code:
#include <clang-c/Index.h>
#include <string>
static CXChildVisitResult CursorVisitorTest(CXCursor cursor, CXCursor parent, CXClientData client_data)
{
CXCursorKind Kind = clang_getCursorKind(cursor);
printf("%d %s\n", Kind, clang_getCString(clang_getCursorSpelling(cursor)));
return CXChildVisit_Recurse;
}
void Test()
{
CXIndex index = clang_createIndex(0, 0);
std::string header_path = "Example.h";
CXTranslationUnit TranslationUnit;
static const char* args[] = { "-std=c++17", "-xc++", "-DHEADER_TOOL" };
CXErrorCode error = clang_parseTranslationUnit2(
index,
header_path.c_str(),
args,
3,
nullptr,
0,
CXTranslationUnit_SingleFileParse,
&TranslationUnit
);
if (error == CXError_Success && TranslationUnit != nullptr)
{
CXCursor cursor = clang_getTranslationUnitCursor(TranslationUnit);
clang_visitChildren(cursor, &CursorVisitorTest, nullptr);
clang_disposeTranslationUnit(TranslationUnit);
}
}
We obtained the following output:
31 Array
27 T
2 S1
6 S1i
2 S2
6 S2i
6 S2Grid
2 S3
6 S3i
6 S3Grid
45 Array
45 Array
We observed that Clang fails to parse S1Grid
as a field. Judging by the fact that S2Grid
is properly parsed, we suspected >>
in S1Gird
's type is parsed as a right shift. Interestingly, S3Grid
is also properly parsed. Probably because char is a built-in type and Vec3 is not?
What can we do to make lib clang parse nested template correctly without manually adding a space in the source?
Clang version returned by clang_getClangVersion
is clang version 9.0.0 (tags/RELEASE_900/final)
The input file, Example.h
, contains syntax errors. First, there is an
#include
of Vec3.h
that is not found. If that is fixed (by just
commenting it out) then Clang reports the additional errors,
specifically that Vec3
is an undeclared identifier, which is what is
causing the problem.
Because there are syntax errors, Clang attempts to provide a "best
effort" parse of what the code is supposed to mean, but it necessarily
must use some heuristics to deal with the errors, and those heuristics
are imperfect. When using Clang 9, and this particular input, the
recovery heuristics evidently fail to recognize the >>
as the closing
delimiter of nested template-ids, and consequently S1Grid
is not
recorded as a field in the resulting AST.
Although Clang can be used to parse code containing syntax errors, and is regularly used in that way for IDE support, that isn't what it was originally designed to do, and in any case the heuristics are always going to be imperfect. So the simplest solution is to ensure that the code is free of syntax errors before parsing it.
If that is how you want to proceed, then you'll want to adjust the way you invoke Clang to check for and report syntax errors. See the question Is there a way to get a meaningfull error message when compiling code through libclang? for details on how to do that.
For this example input, Clang 9 does not report the S1Grid
field, but
Clang 11, 14, and 16 (all of the others I tested) do. Evidently the
error recovery heuristics improved for this case. Again, there's no
guarantee those versions will do the right thing on every input, but for
this one, they do.