I'm trying to understand and take advantage of the ARMv8 ABI when choosing between pass-by-value versus pass-by-const&. In particular I have a struct that's a "Homogeneous Floating-point Aggregate (HFA)" in ARM lingo. I'll paste in some results from godbolt using armv8-a clang 18.1.0 with -O compiler switch. (I see similar disassembly results on my mac using Xcode 15.)
struct Thing {
const double x,y;
Thing( double a, double b) : x(a), y(b){}
Thing( const Thing& other) = default;
// Thing( const Thing& other) : x(other.x), y(other.y) {}
};
double diff( Thing t) {
return t.x - t.y;
}
double baz(double x, double y)
{
Thing i = {x, 5.};
return diff(i);
}
I'm going to compare the defaulted copy constructor to the one I typed in. As written above, I get:
diff(Thing): // @diff(Thing)
fsub d0, d0, d1
ret
baz(double, double): // @baz(double, double)
fmov d1, #-5.00000000
fadd d0, d0, d1
ret
That looks good. The interface to diff(Thing)
was for the caller to stick the two doubles into the register pair (d0,d1) and return the answer in register d0. There's no memory accessed anywhere--no mucking with the SP
or LP
; no stack frame created anywhere; no need to have the argument ever live in memory. And baz(double, double)
looks good too. Everything inlined--no Thing
even came to life.
But when I switch to the other version of the copy constructor, I get this for diff(Thing)
:
diff(Thing): // @diff(Thing)
ldp d0, d1, [x0]
fsub d0, d0, d1
ret
Suddenly now the caller needs to place the Thing
in memory someplace and pass a pointer to that memory in x0. Then diff(Thing)
has to go fetch the pair of doubles from that memory location. Ouch.
The diff(Thing)
signature didn't change, but suddenly the ABI is completely different! Is this drastic difference documented somewhere? Why isn't my hand-typed copy-constructor just as good as the defaulted one? I'm glad so much got inlined for baz
, but I want to know what is the best practice that I can hang on my wall? For example, for structs like this do I want:
bool operator==( const Thing& lhs, const Thing& rhs)
or
bool operator==( Thing lhs, Thing rhs)
or
(let the compiler make one)
I would like an explanation of the drastic difference above and advice on value v. const& parameters.
I'm not a C++ ABI expert but I'll take a shot.
ARM64, like most modern platforms, generally follows the Itanium C++ ABI, with some modifications that are not relevant here. The Itanium ABI defines the notion of a type being non-trivial for the purposes of calls, which forces it to always be passed in memory and not in registers.
A type is considered non-trivial for the purposes of calls if:
- it has a non-trivial copy constructor, move constructor, or destructor, or
- all of its copy and move constructors are deleted.
Notice that this is different from the C++ standard's notion of "not trivially copyable"; in particular it does not take assignment operators into consideration.
The notion of a "trivial copy constructor" is defined in the C++ standard, class.copy.ctor p11:
A copy/move constructor for class X is trivial if it is not user-provided and if:
- class X has no virtual functions ([class.virtual]) and no virtual base classes ([class.mi]), and
- the constructor selected to copy/move each direct base class subobject is trivial, and
- for each non-static data member of X that is of class type (or array thereof), the constructor selected to copy/move that member is trivial;
otherwise the copy/move constructor is non-trivial.
Your copy constructor Thing( const Thing& other) : x(other.x), y(other.y) {}
is user-provided. Therefore it is not a trivial copy constructor, and therefore it makes Thing
non-trivial for the purposes of calls, and so it cannot be passed in registers.
What your copy constructor actually does is irrelevant to this analysis, even if it does exactly the same thing as the default copy constructor would do. This makes a certain amount of sense: the calling conventions should be able to be determined on a syntactic level rather than semantic, based only on the declaration of the class and not the definition of its member functions.