delphiassemblyfpcbasm

Why this LEA instruction does not compile?


I am porting 32-bit Delphi BASM code to 64-bit FPC (Win64 target OS) and wonder why the next instruction does not compile in 64-bit FPC:

{$IFDEF FPC}
  {$ASMMODE INTEL}
{$ENDIF}

procedure DoesNotCompile;
asm
      LEA   ECX,[ECX + ESI + $265E5A51]
end;

// Error: Asm: 16 or 32 Bit references not supported

possible workarounds are:

procedure Compiles1;
asm
      ADD   ECX,ESI
      ADD   ECX,$265E5A51
end;

procedure Compiles2;
asm
      LEA   ECX,[RCX + RSI + $265E5A51]
end;

I just don't understand what is wrong with 32-bit LEA instruction in Win64 target (it compiles OK in 32-bit Delphi, so it is a correct CPU instruction).


Optimization remarks:

The next code compiled by 64-bit FPC 2.6.2

  {$MODE DELPHI}
  {$ASMMODE INTEL}

procedure Test;
asm
        LEA     ECX,[RCX + RSI + $265E5A51]
        NOP
        LEA     RCX,[RCX + RSI + $265E5A51]
        NOP
        ADD     ECX,$265E5A51
        ADD     ECX,ESI
        NOP
end;

generates the next assembler output:

00000000004013F0 4883ec08                 sub    $0x8,%rsp
                         project1.lpr:10  LEA     ECX,[RCX + RSI + $265E5A51]
00000000004013F4 8d8c31515a5e26           lea    0x265e5a51(%rcx,%rsi,1),%ecx
                         project1.lpr:11  NOP
00000000004013FB 90                       nop
                         project1.lpr:12  LEA     RCX,[RCX + RSI + $265E5A51]
00000000004013FC 488d8c31515a5e26         lea    0x265e5a51(%rcx,%rsi,1),%rcx
                         project1.lpr:13  NOP
0000000000401404 90                       nop
                         project1.lpr:14  ADD     ECX,$265E5A51
0000000000401405 81c1515a5e26             add    $0x265e5a51,%ecx
                         project1.lpr:15  ADD     ECX,ESI
000000000040140B 01f1                     add    %esi,%ecx
                         project1.lpr:16  NOP
000000000040140D 90                       nop
                         project1.lpr:17  end;
000000000040140E 4883c408                 add    $0x8,%rsp

and the winner is (7 bytes long):

LEA     ECX,[RCX + RSI + $265E5A51]

all 3 alternatives (including LEA ECX,[ECX + ESI + $265E5A51] which does not compile by 64-bit FPC) are 8 bytes long.

Not sure that the winner is best in speed.


Solution

  • I would regard this as a bug in the FPC assembler. The asm code you present is valid, and in 64 bit mode it is perfectly valid to use LEA with 32 bit registers, as you have done. The Intel processor documents are clear on the matter. The Delphi 64 bit inline assembler accepts this code.

    To workaround this you will need to hand assemble the code:

    DQ    $265e5a510e8c8d67
    

    In the Delphi CPU view this comes out as:

    Project1.dpr.12: DQ    $265e5a510e8c8d67
    0000000000424160 678D8C0E515A5E26 lea ecx,[esi+ecx+$265e5a51]
    

    I performed a very simple benchmarking to compare the use of 32 and 64 bit operands, and a version using two ADDs. The code looks like this:

    {$APPTYPE CONSOLE}
    
    uses
      System.Diagnostics;
    
    function BenchWithTwoAdds: Integer;
    asm
        MOV   EDX,ESI
        XOR   EAX,EAX
        MOV   ESI,$98C34
        MOV   ECX,$ffffffff
    @loop:
        ADD   EAX,ESI
        ADD   EAX,$265E5A51
        DEC   ECX
        CMP   ECX,0
        JNZ   @loop
        MOV   ESI,EDX
    end;
    
    function BenchWith32bitOperands: Integer;
    asm
        MOV   EDX,ESI
        XOR   EAX,EAX
        MOV   ESI,$98C34
        MOV   ECX,$ffffffff
    @loop:
        LEA   EAX,[EAX + ESI + $265E5A51]
        DEC   ECX
        CMP   ECX,0
        JNZ   @loop
        MOV   ESI,EDX
    end;
    
    {$IFDEF CPUX64}
    function BenchWith64bitOperands: Integer;
    asm
        MOV   EDX,ESI
        XOR   EAX,EAX
        MOV   ESI,$98C34
        MOV   ECX,$ffffffff
    @loop:
        LEA   EAX,[RAX + RSI + $265E5A51]
        DEC   ECX
        CMP   ECX,0
        JNZ   @loop
        MOV   ESI,EDX
    end;
    {$ENDIF}
    
    var
      Stopwatch: TStopwatch;
    
    begin
    {$IFDEF CPUX64}
      Writeln('64 bit');
    {$ELSE}
      Writeln('32 bit');
    {$ENDIF}
      Writeln;
    
      Writeln('BenchWithTwoAdds');
      Stopwatch := TStopwatch.StartNew;
      Writeln('Value = ', BenchWithTwoAdds);
      Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds);
      Writeln;
    
      Writeln('BenchWith32bitOperands');
      Stopwatch := TStopwatch.StartNew;
      Writeln('Value = ', BenchWith32bitOperands);
      Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds);
      Writeln;
    
    {$IFDEF CPUX64}
      Writeln('BenchWith64bitOperands');
      Stopwatch := TStopwatch.StartNew;
      Writeln('Value = ', BenchWith64bitOperands);
      Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds);
    {$ENDIF}
    
      Readln;
    end.
    

    The output on my an Intel i5-2300:

    32 bit
    
    BenchWithTwoAdds
    Value = -644343429
    Elapsed time = 2615
    
    BenchWith32bitOperands
    Value = -644343429
    Elapsed time = 3915
    
    ----------------------
    
    64 bit
    
    BenchWithTwoAdds
    Value = -644343429
    Elapsed time = 2612
    
    BenchWith32bitOperands
    Value = -644343429
    Elapsed time = 3917
    
    BenchWith64bitOperands
    Value = -644343429
    Elapsed time = 3918
    

    As you can see there's nothing to choose between either of the LEA options based on this. The differences between their times are well inside the variability of the measurement. However, the variant using ADD twice wins hands down.

    Some different results from different machines. Here's the output on a Xeon E5530:

    64 bit
    
    BenchWithTwoAdds
    Value = -644343429
    Elapsed time = 3434
    
    BenchWith32bitOperands
    Value = -644343429
    Elapsed time = 3295
    
    BenchWith64bitOperands
    Value = -644343429
    Elapsed time = 3279
    

    And on a Xeon E5-4640 v2:

    64 bit
    
    BenchWithTwoAdds
    Value = -644343429
    Elapsed time = 4102
    
    BenchWith32bitOperands
    Value = -644343429
    Elapsed time = 5868
    
    BenchWith64bitOperands
    Value = -644343429
    Elapsed time = 5868