A friend sent me a comparison between a recent version of Delphi and Java (source code available if you want it). Believe it or not (better believe it) Java is now significant faster than Delphi because Delphi compiler won't take advantage of modern CPU instructions! A big breakthrough for the 'slow' Java.
My question is: How can we use modern CPU instructions in Delphi WITHOUT resorting to ASM?
The FastCode project was a partial answer to the above question but it is now abandoned. There is any other project similar to FastCode?
This is another article showing that Java and C# it is indeed MUCH faster than Delphi: http://webandlife.blogspot.com/2011/12/c-performance-vs-delphi-performance.html
JAVA
import java.util.Date;
public class j
{
public static void xxx(int n, int m)
{
double t;
int i, j;
double d, r;
t = 0.0;
for (j = 1; j <= n; j++)
{
t = t / 1000.0;
for (i = 1; i <= m; i++)
{
t = t + i / 999999.0;
d = t * t + i;
r = (t + d) / (200000.0 * (i + 1));
t = t - r;
}
}
System.out.println(t);
}
public static void main(String [] args)
{
Date t1, t2;
t1 = new Date();
xxx(1, 999999999);
t2 = new Date();
System.out.println((t2.getTime() - t1.getTime())/1000);
t1 = new Date();
xxx(1, 999999999);
t2 = new Date();
System.out.println((t2.getTime() - t1.getTime())/1000);
}
}
25 sec
DELPHI
program d;
{$APPTYPE CONSOLE}
uses
System.SysUtils, System.DateUtils;
var
t1, t2: TDateTime;
procedure xxx (n: integer; m: integer);
var
t: double;
i, j: integer;
d, r: double;
begin
t:= 0.0;
for j:= 1 to n do
begin
t:= t / 1000.0;
for i:= 1 to m do
begin
t:= t + i / 999999.0;
d:= t * t + i;
r:= (t + d) / (200000.0 * (i + 1));
t:= t - r;
end;
end;
writeln(t);
end;
begin
t1:= Now;
xxx(1, 999999999);
t2:= Now;
writeln(SecondsBetween(t2,t1));
t1:= Now;
xxx(1, 999999999);
t2:= Now;
writeln(SecondsBetween(t2,t1));
end.
37 sec
I wonder how Lazarus compares with Delphi from this point of view.
According to your code, what is slow with the 32 bit Delphi compiler is the floating point arithmetic support, which is far from optimized, and copy a lot of content on/to the FPU stack.
In respect to floating point arithmetic, not only Java JITted code will be faster. Even modern JavaScript JIT compilers can be much better than Delphi!
This blog article is just a reference about this, and provide asm-level explanation about Delphi slowness for floating point:
But if you use the Delphi compiler targeting the Win64 platform, it will emit not x87 but SSE2 opcodes, and will be much faster. I suspect comparable to Java JITted executable.
And, in respect to Java, any Delphi executable will use much less memory than the JVM, so here, Delphi executables are perfectly on the track!
If you want your code to be faster, do not use asm nor low-level optimization trick, but change your algorithm. It could be order of magnitude faster than compilation hints. Dedicated process will be achieved with inlined asm opcodes - take a look at this great set of articles for such low level hacks. But it is not easy to master, and usually, proper software profiling and adding some cache is the best way to performance!