I recently started using the Intel C++ compiler for some of my projects, while also learning masm assembly. I kept on hearing how it wasn't worth learning assembly since the compilers do a good job anyway of optimizing code, and so thought about having a look at which one was faster once and for all. To try and do so, I had the following c++ code:
#include <iostream>
#include <time.h>
using namespace std;
extern "C" {
int Add(int a, int b);
}
int main(int argc, char * argv[]){
int startingTime = clock();
for (int i = 0; i < 100; i++)
{
cout << "normal: " << i << endl;
cout << 1000 + 1000 << endl;
}
int timeTaken1 = clock() - startingTime;
startingTime = clock();
for (int i = 0; i < 100; i++){
cout << "assem" << i << endl;
cout << Add(2000, 2000) << endl;
}
int timeTaken2 = clock() - startingTime;
cout << "Time taken under normal addition: " << timeTaken1 << endl;
cout << "Time taken under assembly addition: " << timeTaken2 << endl;
cin.get();
return 0;
}
And the following masm code:
.model flat
.386
.code
public _Add
_Add PROC
push ebp ;
mov ebp, esp ;
mov eax, [ebp + 8] ;
mov ebx, [ebp + 12] ;
add eax, ebx ;
leave ; cleanup
ret ;
_Add endp
end
I am using Visual Studio to compile this, using the Intel Composer plugin. When I run this under Debug mode, it works perfectly - I can see "normal 99" and "assem 99" along with the relevant number. When I run this with /0d specified for the compiler, then it also works fine. However, when /02, /0x or /03 are specified, it only shows the normal (i+j) addition loop and the first value of the assembler addition i.e. only assem 0 and 4000 are shown.
My guess is that the assembly code is being optimized out by the Intel Compiler (this works fine with the VC++ compiler), and am curious to find out why this is occurring and how it can be worked around, while still letting Intel optimize the C++ part.
Thanks SbSpider
EDIT: I know this is a late, but thanks for all of the replies. It seems that it was an error in the assembly code rather than the intel compiler not using the assembly code.
Your assembly code is trashing the EBX
register (as Jongware noted) and this likely why the second loop in your C++ code is only executed once. If i
being stored in EBX
then changing EBX
to 2000 in Add
will cause the next test of the loop condition i < 100
to fail.
You need either save and restore the EBX
register in your assembly code or you need to pick another register that isn't assumed to be preserved across function calls (EAX
, EDX
, or ECX
).