(Sorry in advance for not having managed to reduce my problem to a simple failing test case...)
I have faced issues with upgrading to GCC 6.3.0 to build our codebase (relevant flags: -O3 -m32
).
Specifically, my application segfaults within a struct ctor call because of GCC optimizations.
In this ctor, GCC used movaps
:
movaps %xmm0,0x30a0(%ebx)
movaps
requires the operand to be 16-byte aligned. But at this point in time, %ebx
points to my object, which is not necessarily 16-byte aligned. From glibc :
“The address of a block returned by malloc or realloc in GNU systems is always a multiple of eight (or sixteen on 64-bit systems).“
Hence the segfault (when built with -O3 -m32
).
Why does it seem like GCC assumed the allocated object would be 16-byte aligned ? Am I misunderstanding something ?
Notes:
new
operator-m32 -O2
-m32 -O2 -ftree-slp-vectorize
-m32 -O3 -fno-tree-slp-vectorize
-m32 -O3
This other project, seem to have hit similar issues : https://github.com/godotengine/godot/issues/4623
Their investigation points to -fvect-cost-model=dynamic
. Investigation on my codebase rather points to -ftree-slp-vectorize
.
It's possible that the compiler has a reason to think the object has an alignment ≥ 16 bytes. It's possible to find out what the compiler thinks the alignment is by using the alignof()
operator in C++11. GCC has an extension __alignof__
that is available in C and earlier C++ versions.
A structure's alignment is the highest alignment of anything in it, recursively. There could be something in there with higher alignment than expected.
While the C++11 standard guarantees that memory returned by new
is aligned to the value needed by the "fundamental alignment requirement" of any object, this only applies to standard types and objects made of them. Using C++11 alignas()
or the __attribute__((aligned(x)))
GCC extension to request higher alignment might exceed what new
provides.
A solution to this would be to use std::aligned_alloc()
(C++11 or later) or posix_memalign()
(POSIX-only but < C++11) to get aligned memory. This could be coupled with the placement form of the new
operator to construct the object in that memory or class specific operator overloads of new
and delete
.