I am running my code AMD optron 6270 machine. OS is Centos 6.2 release. I have made a simple program as
#include<stdio.h>
#include<stdlib.h>
int calling (long a);
int calling1 (long a);
int calling2 (long a);
int calling3 (long a);
int calling4 (long a);
int calling5 (long a);
void main()
{
long a,b=0;
printf("hi");
for (a=0; a<10000000; a++) b++;
b =Calling(a);
b =Calling5(a);
b =Calling4(a);
}
int Calling(long a)
{
long b=0;
for (a=0; a<100; a++) b = Calling1(a);
return 0;
}
int Calling1(long a)
{
long b=0;
for (a=0; a<10000000; a++) b++;
b =Calling2(a);
return 0;
}
int Calling2(long a)
{
long b=0;
for (a=0;a<10000000;a++) b++;
b =Calling3(a);
return 0;
}
int Calling3(long a)
{
long b=0;
for (a=0;a<10000000;a++) b++;
b =Calling4(a);
return 0;
}
int Calling4(long a)
{
long b=0;
for (a=0; a<10000000; a++) b++;
return 0;
}
int Calling5(long a)
{
long b=0;
for (a=0; a<10000000; a++) b++;
b=0;
for (a=0; a<10000000; a++) b++;
b=0;
for (a=0 ;a<10000000; a++) b++;
b=0;
for (a=0; a<10000000; a++) b++;
b=0;
return 0;
}
While profinling this code with gprof and Oprofile. I got different reports. Lets say i run the main.exe with gprof two times: 1st report with gprof Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
24.80 2.96 2.96 101 0.03 0.03 Calling4
24.71 5.91 2.95 100 0.03 0.12 Calling1
24.63 8.84 2.94 100 0.03 0.06 Calling3
23.78 11.68 2.84 100 0.03 0.09 Calling2
1.01 11.80 0.12 1 0.12 0.12 Calling5
0.34 11.84 0.04 main
0.00 11.84 0.00 1 0.00 11.65 Calling
2nd report with gprof
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
25.13 2.99 2.99 100 0.03 0.12 Calling1
24.88 5.95 2.96 101 0.03 0.03 Calling4
24.80 8.89 2.95 100 0.03 0.06 Calling3
23.48 11.69 2.79 100 0.03 0.09 Calling2
1.02 11.81 0.12 1 0.12 0.12 Calling5
0.17 11.83 0.02 main
0.00 11.83 0.00 1 0.00 11.66 Calling
Both the reports differ. Everytime I run my main.exe i get different profiling reports. When I tried Oprofile I also get different result as:
Oprofile report1 Using /var/lib/oprofile/samples/ for samples directory. warning: /no-vmlinux could not be found. CPU: AMD64 family15h, speed 2.2e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (CPU Clocks not Halted) with a unit mask of 0x00 (No unit mask) count 100000
samples % image name symbol name
92552 24.7672 main Calling4
91610 24.5151 main Calling3
91566 24.5033 main Calling1
91469 24.4774 main Calling2
3665 0.9808 main Calling5
1892 0.5063 no-vmlinux /no-vmlinux
916 0.2451 main main
10 0.0027 libc-2.12.so profil_counter
1 2.7e-04 ld-2.12.so _dl_cache_libcmp
1 2.7e-04 ld-2.12.so _dl_relocate_object
1 2.7e-04 ld-2.12.so _dl_sysdep_start
1 2.7e-04 ld-2.12.so strcmp
1 2.7e-04 libc-2.12.so __libc_fini
1 2.7e-04 libc-2.12.so _dl_addr
1 2.7e-04 libc-2.12.so _int_malloc
1 2.7e-04 libc-2.12.so exit
Oprofile report2
Using /var/lib/oprofile/samples/ for samples directory. warning: /no-vmlinux could not be found. CPU: AMD64 family15h, speed 2.2e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (CPU Clocks not Halted) with a unit mask of 0x00 (No unit mask) count 100000
samples % image name symbol name
92254 24.7719 main Calling4
91482 24.5646 main Calling1
91480 24.5641 main Calling3
91340 24.5265 main Calling2
3658 0.9822 main Calling5
1270 0.3410 no-vmlinux /no-vmlinux
916 0.2460 main main
6 0.0016 libc-2.12.so profil_counter
1 2.7e-04 ld-2.12.so _dl_lookup_symbol_x
1 2.7e-04 ld-2.12.so _dl_setup_hash
1 2.7e-04 ld-2.12.so _dl_sysdep_start
1 2.7e-04 ld-2.12.so bcmp
1 2.7e-04 libc-2.12.so __mcount_internal
1 2.7e-04 libc-2.12.so _dl_addr
1 2.7e-04 libc-2.12.so _int_free
1 2.7e-04 libc-2.12.so mcount
Can any one tell em why this happens? What are the possible reasons which cause this? How I can avoid this situation so that I can get a constant profiling results?
I wouldn't be concerned about subsequent reports differing. Reports can vary drastically depending on how the program is executed. Moreover, it's difficult to say much about what occurs between the two profiles. Depending on what other processes are running, both your system's cache and TLB will most certainly be in a different state than they were during the first profile. Unless you can ensure a controlled machine state, don't expect consistent results.
It's simple to understand why reports from each tool don't agree. The two tools are fundamentally different. Oprofile is a sampling-based profiler that, in essence, periodically interrupts the CPU. Gprof is instrumentation based; it must be compiled into your program. This produces a different binary than would otherwise run had gprof not been used. As a result, gprof will over estimate timing. Use oprofile for CPU-bound processes, and gprof for I/O-bound processes.