delphiassemblyssebasm

How to use align-data-move SSE in Delphi XE3?


I was trying to run the following,

type
  Vector = array [1..4] of Single;

{$CODEALIGN 16}
function add4(const a, b: Vector): Vector; register; assembler;
asm
  movaps xmm0, [a]
  movaps xmm1, [b]
  addps xmm0, xmm1
  movaps [@result], xmm0
end;

It gives Access Violation on movaps, as far as I know, the movaps can be trusted if the memory location is 16-align. It works no problem if movups (no align is needed).

So my question is, in Delphi XE3, {$CODEALIGN} seems not working in this case.

EDIT

Very strange... I tried the following.

program Project3;

{$APPTYPE CONSOLE}

uses
  windows;  // if not using windows, no errors at all

type
  Vector = array [1..4] of Single;

function add4(const a, b: Vector): Vector;
asm
  movaps xmm0, [a]
  movaps xmm1, [b]
  addps xmm0, xmm1
  movaps [@result], xmm0
end;

procedure test();
var
  v1, v2: vector;
begin
  v1[1] := 1;
  v2[1] := 1;
  v1 := add4(v1,v2);  // this works
end;

var
  a, b, c: Vector;

begin
  {$ifndef cpux64}
    {$MESSAGE FATAL 'this example is for x64 target only'}
  {$else}
  test();
  c := add4(a, b); // throw out AV here
  {$endif}
end.

If 'use windows' is not added, everything is fine. If 'use window', then it will throw out exception at c := add4(a, b) but not in test().

Who can explain this?

EDIT it all makes sense to me, now. the conclusions for Delphi XE3 - 64-bit are

  1. stack frames at X64 are set to 16-byte (as required), {$CODEALIGN 16} aligns code for proc/fun to 16 byte.
  2. the dynamic array lives in heap, which can be set to align 16 using SetMinimumBlockAlignment(mba16byte)
  3. however, the stack vars are not always 16-byte aligned, for example, if you declare a integer var before v1, v2 in the above example, e.g. test(), the example will not work.

Solution

  • You need your data to be 16 byte aligned. That requires some care and attention. You can make sure that the heap allocator aligns to 16 bytes. But you cannot make sure that the compiler will 16 byte align your stack allocated variables because your array has an alignment property of 4, the size of its elements. And any variables declared inside other structures will also have 4 byte alignment. Which is a tough hurdle to clear.

    I don't think you can solve your problem in the currently available versions of the compiler. At least not unless you forgo stack allocated variables which I'd guess to be too bitter a pill to swallow. You might have some luck with an external assembler.