I have some Delphi/assembly code that compiles and works fine (XE2) for Win32, Win64, and OSX 32. However, since I need it to work on Linux, I have been looking at compiling FPC versions of it (so far, Win32/64, Linux32/64).
By and large, it works well, but the one thing I have not been able to get to work are calls/jumps to Delphi System
unit functions, like such:
jmp System.@FillChar
This appears to have the desired effect on FPC Win32/Linux32, but fails with an exception on FPC Win64/Linux64. (I am quite familiar with the calling convention differences among the platforms, so don't think that's the reason.)
What is the correct way of doing this on FPC for x64 platforms?
[Edit1] --- In response to David's comment, here is a simplified program that illustrates the problem (at least I hope it does so accurately):
program fpcx64example;
{$IFDEF FPC}
{$MODE DELPHI}
{$ASMMODE INTEL}
{$ELSE}
{$APPTYPE CONSOLE}
{$ENDIF}
procedure FillMemCall (p: pointer; len: longword; val: byte);
asm
// this function and the System function have the same parameters
// in the same order -- they are already in their proper places here
jmp System.@FillChar
end;
function MakeString (c: AnsiChar; len: longword): AnsiString;
begin
Setlength (Result, len);
if len > 0 then FillMemCall (PAnsiChar(Result), len, byte(c));
end;
begin
try
writeln (MakeString ('x',10));
except
writeln ('Exception!');
end;
end.
To compile with FPC:
[Win32:] fpc.exe fpcx64example.dpr
, [Win64:] ppcrossx64.exe fpcx64example.dpr
, [Linux32:] fpc.exe -Tlinux -XPi386-linux- -FD[path]\FPC\bin\i386-linux fpcx64example.dpr
, [Linux64:] ppcrossx64.exe -Tlinux -XPx86_64-linux- -FD[FPCpath]\bin\x86_64-linux fpcx64example.dpr
.
Works fine with Delphi (Win32/64). For FPC, removing jmp System.@FillChar
above gets rid of the exception on x64.
The solution (Thanks to FPK):
Delphi and FPC do not generate stack frames for functions under the exact same conditions, so that the RSP
register may have a different alignment in the versions compiled by the two. The solution is to avoid this difference. One way of doing so, for the FillMemCall example above, would look like such:
{$IFDEF CPU64} {$DEFINE CPUX64} {$ENDIF} // for Delphi compatibility
procedure FillMemCall (p: pointer; len: longword; val: byte);
{$IFDEF FPC} nostackframe; {$ENDIF} //Force same FPC behaviour as in Delphi
asm
{$IFDEF CPUX64}
{$IFNDEF FPC} .NOFRAME {$ENDIF} // To make it explicit (Delphi)...
// RSP = ###0h at the site of the last CALL instruction, so
// since the return address (QWORD) was pushed onto the stack by CALL,
// it must now be ###8h -- if nobody touched RSP.
movdqa xmm0, dqword ptr [rsp + 8] // <- Testing RSP misalignment -- this will crash if not aligned to DQWORD boundary
{$ENDIF}
jmp System.@FillChar
end;
This isn't exactly beautiful, but it now works for Win/Linux 32/64 for both Delphi and FPC.
Short answer: the correct way to do this is using a call instruction.
Long answer: x86-64 code requires that the stack is 16 byte aligned, so FillMemCall contains at the entry point a compiler generated sub rsp,8 and an add rsp,8 at the exit (the other 8 bytes are added/remove by the call/ret pair). Fillchar is on the other hand hand-coded assembler and uses the nostackframe directive so it does not contain a compiler generated sub/add pair and as soon fillchar is left, the stack is messed up because FillChar does not contain an add rsp,8 before the ret instruction.
Workarounds like using the nostackframe directive for FillMemCall or adjusting the stack before doing the jmp might be possible but are subject to be broken by any future compiler change.