I am asking this basic question to make the records straight. Have referred this question and its currently accepted answer, which is not convincing. However the second most voted answer gives better insight, but not perfect either.
While reading below try to distinguish between the inline
keyword and “inlining” concept.
Here is my take:
This is done to save the call overhead of a function. It's more similar to macro-style code replacement. Nothing to be disputed.
inline
keywordThe
inline
keyword is a request to the compiler usually used for smaller functions, so that compiler can optimize it and make faster calls. The Compiler is free to ignore it.
I partially dispute this for below reasons:
inline
keyword completelyinline
keyword being mentioned or not.It's quite clear that the user doesn't have any control over function inlining with the use of keyword inline
.
inline
has nothing to do with the concept of inlining. Puttinginline
ahead of big / recursive functions won't help, while smaller function won't need it for being inlined.The only deterministic use of
inline
is to maintain the One Definition Rule.
i.e. if a function is declared with inline
then only below things are mandated:
.cpp
files), the compiler will generate only 1 definition and avoid multiple symbol linker error. (Note: If the bodies of that function are different then it is undefined behavior.)inline
function has to be visible / accessible in all the translation units who use it. In other words, declaring an inline
function in .h
and defining in any one .cpp
file will result in an “undefined symbol linker error” for other .cpp
filesIMO, the perception “A” is entirely wrong and the perception “B” is entirely right.
There are some quotes in standard on this, however I am expecting an answer which logically explains if this verdict correct or not.
Email reply from Bjarne Stroustrup:
"For decades, people have promised that the compiler/optimizer is or will soon be better than humans for inlining. This may be true in theory, but it still isn't in practice for good programmers, especially in an environment where whole-program optimization is not feasible. There are major gains to be had from judicious use of explicit inlining."
I wasn't sure about your claim:
Smaller functions are automatically "inlined" by optimizer irrespective of inline is mentioned or not... It's quite clear that the user doesn't have any control over function "inlining" with the use of keyword
inline
.
I've heard that compilers are free to ignore your inline
request, but I didn't think they disregarded it completely.
I looked through the Github repository for Clang and LLVM to find out. (Thanks, open source software!) I found out that The inline
keyword does make Clang/LLVM more likely to inline a function.
Searching for the word inline
in the Clang repository leads to the token specifier kw_inline
. It looks like Clang uses a clever macro-based system to build the lexer and other keyword-related functions, so there's noting direct like if (tokenString == "inline") return kw_inline
to be found. But Here in ParseDecl.cpp, we see that kw_inline
results in a call to DeclSpec::setFunctionSpecInline()
.
case tok::kw_inline:
isInvalid = DS.setFunctionSpecInline(Loc, PrevSpec, DiagID);
break;
Inside that function, we set a bit and emit a warning if it's a duplicate inline
:
if (FS_inline_specified) {
DiagID = diag::warn_duplicate_declspec;
PrevSpec = "inline";
return true;
}
FS_inline_specified = true;
FS_inlineLoc = Loc;
return false;
Searching for FS_inline_specified
elsewhere, we see it's a single bit in a bitfield, and it's used in a getter function, isInlineSpecified()
:
bool isInlineSpecified() const {
return FS_inline_specified | FS_forceinline_specified;
}
Searching for call sites of isInlineSpecified()
, we find the codegen, where we convert the C++ parse tree into LLVM intermediate representation:
if (!CGM.getCodeGenOpts().NoInline) {
for (auto RI : FD->redecls())
if (RI->isInlineSpecified()) {
Fn->addFnAttr(llvm::Attribute::InlineHint);
break;
}
} else if (!FD->hasAttr<AlwaysInlineAttr>())
Fn->addFnAttr(llvm::Attribute::NoInline);
We are done with the C++ parsing stage. Now our inline
specifier is converted to an attribute of the language-neutral LLVM Function
object. We switch from Clang to the LLVM repository.
Searching for llvm::Attribute::InlineHint
yields the method Inliner::getInlineThreshold(CallSite CS)
(with a scary-looking braceless if
block):
// Listen to the inlinehint attribute when it would increase the threshold
// and the caller does not need to minimize its size.
Function *Callee = CS.getCalledFunction();
bool InlineHint = Callee && !Callee->isDeclaration() &&
Callee->getAttributes().hasAttribute(AttributeSet::FunctionIndex,
Attribute::InlineHint);
if (InlineHint && HintThreshold > thres
&& !Caller->getAttributes().hasAttribute(AttributeSet::FunctionIndex,
Attribute::MinSize))
thres = HintThreshold;
So we already have a baseline inlining threshold from the optimization level and other factors, but if it's lower than the global HintThreshold
, we bump it up. (HintThreshold is settable from the command line.)
getInlineThreshold()
appears to have only one call site, a member of SimpleInliner
:
InlineCost getInlineCost(CallSite CS) override {
return ICA->getInlineCost(CS, getInlineThreshold(CS));
}
It calls a virtual method, also named getInlineCost
, on its member pointer to an instance of InlineCostAnalysis
.
Searching for ::getInlineCost()
to find the versions that are class members, we find one that's a member of AlwaysInline
- which is a non-standard but widely supported compiler feature - and another that's a member of InlineCostAnalysis
. It uses its Threshold
parameter here:
CallAnalyzer CA(Callee->getDataLayout(), *TTI, AT, *Callee, Threshold);
bool ShouldInline = CA.analyzeCall(CS);
CallAnalyzer::analyzeCall()
is over 200 lines and does the real nitty gritty work of deciding if the function is inlineable. It weighs many factors, but as we read through the method we see that all its computations either manipulate the Threshold
or the Cost
. And at the end:
return Cost < Threshold;
But the return value named ShouldInline
is really a misnomer. In fact the main purpose of analyzeCall()
is to set the Cost
and Threshold
member variables on the CallAnalyzer
object. The return value only indicates the case when some other factor has overridden the cost-vs-threshold analysis, as we see here:
// Check if there was a reason to force inlining or no inlining.
if (!ShouldInline && CA.getCost() < CA.getThreshold())
return InlineCost::getNever();
if (ShouldInline && CA.getCost() >= CA.getThreshold())
return InlineCost::getAlways();
Otherwise, we return an object that stores the Cost
and Threshold
.
return llvm::InlineCost::get(CA.getCost(), CA.getThreshold());
So we're not returning a yes-or-no decision in most cases. The search continues! Where is this return value of getInlineCost()
used?
It's found in bool Inliner::shouldInline(CallSite CS)
. Another big function. It calls getInlineCost()
right at the beginning.
It turns out that getInlineCost
analyzes the intrinsic cost of inlining the function - its argument signature, code length, recursion, branching, linkage, etc. - and some aggregate information about every place the function is used. On the other hand, shouldInline()
combines this information with more data about a specific place where the function is used.
Throughout the method there are calls to InlineCost::costDelta()
- which will use the InlineCost
s Threshold
value as computed by analyzeCall()
. Finally, we return a bool
. The decision is made. In Inliner::runOnSCC()
:
if (!shouldInline(CS)) {
emitOptimizationRemarkMissed(CallerCtx, DEBUG_TYPE, *Caller, DLoc,
Twine(Callee->getName() +
" will not be inlined into " +
Caller->getName()));
continue;
}
// Attempt to inline the function.
if (!InlineCallIfPossible(CS, InlineInfo, InlinedArrayAllocas,
InlineHistoryID, InsertLifetime, DL)) {
emitOptimizationRemarkMissed(CallerCtx, DEBUG_TYPE, *Caller, DLoc,
Twine(Callee->getName() +
" will not be inlined into " +
Caller->getName()));
continue;
}
++NumInlined;
InlineCallIfPossible()
does the inlining based on shouldInline()
's decision.
So the Threshold
was affected by the inline
keyword, and is used in the end to decide whether to inline.
Therefore, your Perception B is partly wrong because at least one major compiler changes its optimization behavior based on the inline
keyword.
However, we can also see that inline
is only a hint, and other factors may outweigh it.