I'm visiting the AST of the following code:
int x = 123;
In VisitVarDecl( VarDecl *d )
, I want the source range of the whole expression. I expect something intuitive as d->print()
prints: int x = 123
.
d->getSourceRange()
ends in "1." That is, it only consists of the type (see clang source).
If I'd like the whole thing, I need to do something like:
auto end_loc = Lexer::getLocForEndOfToken( d->getTypeSourceInfo()->getTypeLoc().getEndLoc(), 0, sm, LangOptions() );
auto src_range = SourceRange( d->getSourceRange().getBegin(), end_loc );
Can someone elaborate on this? I'd expect a simple method of the declaration to give me an "intuitive" source range.
This was a simple example that applies to other declarations such as TypeAliasDecl.
Generally, Clang AST nodes record the "end" location as the start character of the last token considered to be a part of that node's subtree. The Clang Compiler Front-End (CFE) Internals Manual has a short section on this topic:
SourceRange and CharSourceRange
Clang represents most source ranges by [first, last], where “first” and “last” each point to the beginning of their respective tokens. For example consider the
SourceRange
of the following statement:x = foo + bar; ^first ^last
To map from this representation to a character-based representation, the “last” location needs to be adjusted to point to (or past) the end of that token with either
Lexer::MeasureTokenLength()
orLexer::getLocForEndOfToken()
. For the rare cases where character-level source ranges information is needed we use theCharSourceRange
class.
So, the approach outlined in your question is indeed what one must do to find the range of an AST node.
One random example of this usage in the Clang sources is in DurationDivisionCheck.cpp
:
<< FixItHint::CreateInsertion(
Lexer::getLocForEndOfToken(
Result.SourceManager->getSpellingLoc(OpCall->getEndLoc()), 0,
*Result.SourceManager, Result.Context->getLangOpts()),
")");
The above is for an operator expression rather than a declaration, but the same approach is used for end locations of both constructs.
As a final remark, the "end" of statement AST nodes generally does not include the terminating semicolon; rather, it ends with the token just before the semicolon (when there is a semicolon). See the Q+A Why is the source location end off by two characters for a statement ending in a semicolon? for a bit more on that.