delphidelphi-xe2delphi-2010

How to use modern CPU instructions in Delphi? (Java faster than Delphi?)


A friend sent me a comparison between a recent version of Delphi and Java (source code available if you want it). Believe it or not (better believe it) Java is now significant faster than Delphi because Delphi compiler won't take advantage of modern CPU instructions! A big breakthrough for the 'slow' Java.

My question is: How can we use modern CPU instructions in Delphi WITHOUT resorting to ASM?

The FastCode project was a partial answer to the above question but it is now abandoned. There is any other project similar to FastCode?

This is another article showing that Java and C# it is indeed MUCH faster than Delphi: http://webandlife.blogspot.com/2011/12/c-performance-vs-delphi-performance.html


JAVA

import java.util.Date;

public class j
{
  public static void xxx(int n, int m)
  {
        double t;
        int i, j;
        double d, r;
        t = 0.0;
        for (j = 1; j <= n; j++)
        {
          t = t / 1000.0;
          for (i = 1; i <= m; i++)
          {
                t = t + i / 999999.0;
                d = t * t + i;
                r = (t + d) / (200000.0 * (i + 1));
                t = t - r;
          }
        }
        System.out.println(t);
  }

  public static void main(String [] args)
  {
        Date t1, t2;
                   
        t1 = new Date();
        xxx(1, 999999999);
        t2 = new Date();
        System.out.println((t2.getTime() - t1.getTime())/1000);
        t1 = new Date();
        xxx(1, 999999999);
        t2 = new Date();
        System.out.println((t2.getTime() - t1.getTime())/1000);
  }
}

25 sec

DELPHI

program d;
{$APPTYPE CONSOLE}
uses
  System.SysUtils, System.DateUtils;
var
  t1, t2: TDateTime;

procedure xxx (n: integer; m: integer);
var
  t: double;
  i, j: integer;
  d, r: double;
begin
  t:= 0.0;
  for j:= 1 to n do
  begin
        t:= t / 1000.0;
        for i:= 1 to m do
        begin
          t:= t + i / 999999.0;
          d:= t * t + i;
          r:= (t + d) / (200000.0 * (i + 1));
          t:= t - r;
        end;
  end;
  writeln(t);
end;

begin
  t1:= Now;
  xxx(1, 999999999);
  t2:= Now;
  writeln(SecondsBetween(t2,t1));

  t1:= Now;
  xxx(1, 999999999);
  t2:= Now;
  writeln(SecondsBetween(t2,t1));
end.

37 sec


I wonder how Lazarus compares with Delphi from this point of view.


Solution

  • According to your code, what is slow with the 32 bit Delphi compiler is the floating point arithmetic support, which is far from optimized, and copy a lot of content on/to the FPU stack.

    In respect to floating point arithmetic, not only Java JITted code will be faster. Even modern JavaScript JIT compilers can be much better than Delphi!

    This blog article is just a reference about this, and provide asm-level explanation about Delphi slowness for floating point:

    enter image description here

    But if you use the Delphi compiler targeting the Win64 platform, it will emit not x87 but SSE2 opcodes, and will be much faster. I suspect comparable to Java JITted executable.

    And, in respect to Java, any Delphi executable will use much less memory than the JVM, so here, Delphi executables are perfectly on the track!

    If you want your code to be faster, do not use asm nor low-level optimization trick, but change your algorithm. It could be order of magnitude faster than compilation hints. Dedicated process will be achieved with inlined asm opcodes - take a look at this great set of articles for such low level hacks. But it is not easy to master, and usually, proper software profiling and adding some cache is the best way to performance!