I am porting 32-bit Delphi BASM code to 64-bit FPC (Win64 target OS) and wonder why the next instruction does not compile in 64-bit FPC:
{$IFDEF FPC}
{$ASMMODE INTEL}
{$ENDIF}
procedure DoesNotCompile;
asm
LEA ECX,[ECX + ESI + $265E5A51]
end;
// Error: Asm: 16 or 32 Bit references not supported
possible workarounds are:
procedure Compiles1;
asm
ADD ECX,ESI
ADD ECX,$265E5A51
end;
procedure Compiles2;
asm
LEA ECX,[RCX + RSI + $265E5A51]
end;
I just don't understand what is wrong with 32-bit LEA
instruction in Win64 target (it compiles OK in 32-bit Delphi, so it is a correct CPU instruction).
Optimization remarks:
The next code compiled by 64-bit FPC 2.6.2
{$MODE DELPHI}
{$ASMMODE INTEL}
procedure Test;
asm
LEA ECX,[RCX + RSI + $265E5A51]
NOP
LEA RCX,[RCX + RSI + $265E5A51]
NOP
ADD ECX,$265E5A51
ADD ECX,ESI
NOP
end;
generates the next assembler output:
00000000004013F0 4883ec08 sub $0x8,%rsp
project1.lpr:10 LEA ECX,[RCX + RSI + $265E5A51]
00000000004013F4 8d8c31515a5e26 lea 0x265e5a51(%rcx,%rsi,1),%ecx
project1.lpr:11 NOP
00000000004013FB 90 nop
project1.lpr:12 LEA RCX,[RCX + RSI + $265E5A51]
00000000004013FC 488d8c31515a5e26 lea 0x265e5a51(%rcx,%rsi,1),%rcx
project1.lpr:13 NOP
0000000000401404 90 nop
project1.lpr:14 ADD ECX,$265E5A51
0000000000401405 81c1515a5e26 add $0x265e5a51,%ecx
project1.lpr:15 ADD ECX,ESI
000000000040140B 01f1 add %esi,%ecx
project1.lpr:16 NOP
000000000040140D 90 nop
project1.lpr:17 end;
000000000040140E 4883c408 add $0x8,%rsp
and the winner is (7 bytes long):
LEA ECX,[RCX + RSI + $265E5A51]
all 3 alternatives (including LEA ECX,[ECX + ESI + $265E5A51]
which does not compile by 64-bit FPC) are 8 bytes long.
Not sure that the winner is best in speed.
I would regard this as a bug in the FPC assembler. The asm code you present is valid, and in 64 bit mode it is perfectly valid to use LEA with 32 bit registers, as you have done. The Intel processor documents are clear on the matter. The Delphi 64 bit inline assembler accepts this code.
To workaround this you will need to hand assemble the code:
DQ $265e5a510e8c8d67
In the Delphi CPU view this comes out as:
Project1.dpr.12: DQ $265e5a510e8c8d67 0000000000424160 678D8C0E515A5E26 lea ecx,[esi+ecx+$265e5a51]
I performed a very simple benchmarking to compare the use of 32 and 64 bit operands, and a version using two ADDs. The code looks like this:
{$APPTYPE CONSOLE}
uses
System.Diagnostics;
function BenchWithTwoAdds: Integer;
asm
MOV EDX,ESI
XOR EAX,EAX
MOV ESI,$98C34
MOV ECX,$ffffffff
@loop:
ADD EAX,ESI
ADD EAX,$265E5A51
DEC ECX
CMP ECX,0
JNZ @loop
MOV ESI,EDX
end;
function BenchWith32bitOperands: Integer;
asm
MOV EDX,ESI
XOR EAX,EAX
MOV ESI,$98C34
MOV ECX,$ffffffff
@loop:
LEA EAX,[EAX + ESI + $265E5A51]
DEC ECX
CMP ECX,0
JNZ @loop
MOV ESI,EDX
end;
{$IFDEF CPUX64}
function BenchWith64bitOperands: Integer;
asm
MOV EDX,ESI
XOR EAX,EAX
MOV ESI,$98C34
MOV ECX,$ffffffff
@loop:
LEA EAX,[RAX + RSI + $265E5A51]
DEC ECX
CMP ECX,0
JNZ @loop
MOV ESI,EDX
end;
{$ENDIF}
var
Stopwatch: TStopwatch;
begin
{$IFDEF CPUX64}
Writeln('64 bit');
{$ELSE}
Writeln('32 bit');
{$ENDIF}
Writeln;
Writeln('BenchWithTwoAdds');
Stopwatch := TStopwatch.StartNew;
Writeln('Value = ', BenchWithTwoAdds);
Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds);
Writeln;
Writeln('BenchWith32bitOperands');
Stopwatch := TStopwatch.StartNew;
Writeln('Value = ', BenchWith32bitOperands);
Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds);
Writeln;
{$IFDEF CPUX64}
Writeln('BenchWith64bitOperands');
Stopwatch := TStopwatch.StartNew;
Writeln('Value = ', BenchWith64bitOperands);
Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds);
{$ENDIF}
Readln;
end.
The output on my an Intel i5-2300:
32 bit BenchWithTwoAdds Value = -644343429 Elapsed time = 2615 BenchWith32bitOperands Value = -644343429 Elapsed time = 3915 ---------------------- 64 bit BenchWithTwoAdds Value = -644343429 Elapsed time = 2612 BenchWith32bitOperands Value = -644343429 Elapsed time = 3917 BenchWith64bitOperands Value = -644343429 Elapsed time = 3918
As you can see there's nothing to choose between either of the LEA options based on this. The differences between their times are well inside the variability of the measurement. However, the variant using ADD
twice wins hands down.
Some different results from different machines. Here's the output on a Xeon E5530:
64 bit BenchWithTwoAdds Value = -644343429 Elapsed time = 3434 BenchWith32bitOperands Value = -644343429 Elapsed time = 3295 BenchWith64bitOperands Value = -644343429 Elapsed time = 3279
And on a Xeon E5-4640 v2:
64 bit BenchWithTwoAdds Value = -644343429 Elapsed time = 4102 BenchWith32bitOperands Value = -644343429 Elapsed time = 5868 BenchWith64bitOperands Value = -644343429 Elapsed time = 5868