I've been working with SSE for a while now, and I've seen my share of alignment issues. This, however, is beyond my understanding:
I get different alignment whether I run the program using F5 (debug) or whether I run it outside the debugger (Ctrl+F5)!
Some background info:
I'm using a wrapper for a SSE-enabled datatype - with overloaded operators and custom allocator (overloadednew
and delete
operators using _mm_malloc
and _mm_free
). But in the example below, I've managed to reduce to problem even further, i.e. the issue also happens even if I don't use the custom allocator.
As you can see below, in main() I dynamically allocate a TestClass object on the heap, which contains a SSEVector type object. I'm using a dummy float[2]
member variable to "missalign" the stack a bit.
I obtain the following output when I run with F5:
object address 00346678
_memberVariable1 address 00346678
_sseVector address 00346688
And if I run with Ctrl+F5:
object address 00345B70
_memberVariable1 address 00345B70
_sseVector address 00345B80
As you can see, the alignment is different (i.e. not 16-byte) when I run it in the debugger. Is it just a coincidence that the alignment is correct when using Ctrl-F5? I'm using Visual Studio 2010 with a new project (default settings).
If I declare the object on the stack, i.e. TestClass myObject;
, this issue does not appear. Using __declspec(align(16))
does not help, either.
The code I used to reproduce the issue:
#include <iostream>
#include <string>
#include <xmmintrin.h> // SSE
//#include "DynAlignedAllocator.h"
//////////////////////////////////////////////////////////////
class SSEVector /*: public DynAlignedAllocator<16>*/
{
public:
SSEVector() { }
__m128 vec;
};
class TestClass
{
public:
TestClass() { }
/*__declspec(align(16))*/ float _memberVariable1 [2];
SSEVector _sseVector;
};
//////////////////////////////////////////////////////////////
int main (void)
{
TestClass* myObject = new TestClass;
std::cout << "object address " << myObject << std::endl;
std::cout << "_memberVariable1 address " << &(myObject->_memberVariable1) << std::endl;
std::cout << "_sseVector address " << &(myObject->_sseVector) << std::endl;
delete myObject;
// wait for ENTER
std::string dummy;
std::getline(std::cin, dummy);
return 0;
}
Any hints or commentaries are greatly appreciated. Thanks in advance.
When running under the debugger, you're using the debug heap, which may affect alignment.
Set _NO_DEBUG_HEAP=1
in your environment settings, and see if this helps.
See e.g. http://msdn.microsoft.com/en-us/library/aa366705%28v=vs.85%29.aspx
However, alignment is not guaranteed when allocating with malloc or new. The "correct" way of solving this in VS is to use _aligned_malloc
.
When you want your SSEVector as a member of another structure, you need to change the packing of this structure (using #pragma pack), or the __declspec(align) of SSEVector.
See How align works with data packing
What happens in your cases are (apart from the seemingly coincidental debugger/non-debugger difference):
SSEVector
is declared unaligned. If you allocate it directly using _aligned_malloc
, it'll be aligned. TestClass
is also unaligned, and uses default packing. If you allocate it using _aligned_malloc
, the TestClass
instance will be properly aligned. This doesn't help you at all, since you want the SSEVector
member variable to be aligned.
Adding an alignment requirement on SSEVector
using __declspec(align)
will tell the compiler that SSEVector
stack variables must be aligned, and that SSEVector
as a struct member must be aligned within the struct/class. Now, if you allocate a TestClass
using _aligned_malloc
, it will be properly aligned. And the SSEVector
offset in the struct is also properly aligned due to the declspec, so the absolute address of the SSEVector will be correct for your use.