I don't like to repeat myself in code, but also I don't want to lose performance by simple functions. Suppose the class has operator+
and function Add
with same functionality (considering former as handy way of using class in expressions and latter as "expilicit" way to do so)
struct Obj {
Obj operator+(float);
Obj Add(float);
/* some other state and behaviour */
};
Obj AddDetails(Obj const& a, float b) {
return Obj(a.float_val + b, a.some_other_stuff);
}
Obj Obj::operator+(float b) {
return AddDetails(*this, b);
}
Obj Obj::Add(float b) {
return AddDetails(*this, b);
}
For the purpose of making changes easier both functions are implemented with auxiliary function call. Therefore, any call to operator makes 2 calls what is not really pleasant.
But is compiler smart enough to eliminate such double calls?
I tested with simple classes (that contain built-in types and pointers) and optimizer just doesn't calculate something not needed, but how does it behave in large systems (with hot calls especially)?
If this is where RVO takes place, then does it work in larger sequences of calls (3-4) to fold it in 1 call?
P.S. Yes, yes, premature optimization is the root of all evil, but still I want an answer
Yes See the instructions clang generated on https://godbolt.org/z/VB23-W Line 21
movsd xmm0, qword ptr [rsp] # xmm0 = mem[0],zero
addsd xmm0, qword ptr [rip + .LCPI3_0]
it just takes the applies the code of AddDetails
directly instead of even calling your operator+. This is called inlining and worked even for this chain of value returning calls.
Not only RVO optimisation can happen to single line functions but every other optimisation including inlining see https://godbolt.org/z/miX3u1 and https://godbolt.org/z/tNaSW .
Look at this you can see gcc and clang heavily optimises even the non inlined declared code, ( https://godbolt.org/z/8Wf3oR )
#include <iostream>
struct Obj {
Obj(double val) : float_val(val) {}
Obj operator+(float b) {
return AddDetails(*this, b);
}
Obj Add(float b) {
return AddDetails(*this, b);
}
double val() const {
return float_val;
}
private:
double float_val{0};
static inline Obj AddDetails(Obj const& a, float b);
};
Obj Obj::AddDetails(Obj const& a, float b) {
return Obj(a.float_val + b);
}
int main() {
Obj foo{32};
Obj bar{foo + 1337};
std::cout << bar.val() << "\n";
}
Even without inlining no extra C-Tor Calls can be seen with
#include <iostream>
struct Obj {
Obj(double val) : float_val(val) {}
Obj operator+(float);
Obj Add(float);
double val() const {
return float_val;
}
private:
double float_val{0};
static Obj AddDetails(Obj const& a, float b);
};
Obj Obj::AddDetails(Obj const& a, float b) {
return Obj(a.float_val + b);
}
Obj Obj::operator+(float b) {
return AddDetails(*this, b);
}
Obj Obj::Add(float b) {
return AddDetails(*this, b);
}
int main() {
Obj foo{32};
Obj bar{foo + 1337};
std::cout << bar.val() << "\n";
}
However some of the optimisation is done due to the compiler knowing that the value won't change so lets change the main to
int main() {
double d{};
std::cin >> d;
Obj foo{d};
Obj bar{foo + 1337};
std::cout << bar.val() << "\n";
}
But then you can still see the optimisations on both compilers https://godbolt.org/z/M2jaSH and https://godbolt.org/z/OyQfJI