I am following Tim Mattson's lectures on OpenMP to learn ways of implementation of some parallel programming concepts.
I was trying to observe the running time behavior of a parallel program that computes the value of PI using 3x10^8 steps.
Here is the code,
#include <omp.h>
#include <stadio.h>
static long num_steps = 300000000;
double step;
#define PAD 8 // tried 50 too
#define NUM_THREADS 4
int main()
{
int i, nthreads;
double pi, sum[NUM_THREADS][PAD];
double ts, te;
ts = omp_get_wtime();
step = 1.0/(double) num_steps;
omp_set_num_threads(NUM_THREADS);
#pragma omp parallel
{
int i, id,nthrds;
double x;
id = omp_get_thread_num();
nthrds = omp_get_num_threads();
if (id == 0) nthreads = nthrds;
for (i=id, sum[id]=0.0;i< num_steps; i=i+nthrds) {
x = (i+0.5)*step;
sum[id][0] += 4.0/(1.0+x*x);
}
}
for(i=0, pi=0.0;i<nthreads;i++)
pi += sum[i][0] * step;
te = omp_get_wtime();
printf("%.10f\n", pi);
printf("%.f\n", te-ts);
}
Now I was on Ubuntu 14.04 LTS running on a Dual Core machine. A call to omp_get_num_procs()
returned 2. The running time was something like totally random, ranging from 1.31 second to 4.46 seconds. Whereas the serial program was taking 2.31 second almost always.
I tried creating 1, 2, 3, 4, upto 10 threads. The running time varies too much in every case, though the average is smaller in case of more threads. I wasn't running any other applications.
Can anyone explain why the running time varied too much?
How to calculate the run time accurately? The lecturer has given the running time of his computer which seems consistent. And he was also using Dual Core processor.
Result : 3.1415926536
Number of CPU-s : 2
Duration : 2.4025482161
There seems to be pretty consistent set of resulting code-execution times:
/* Duration : 2.3984972970
Duration : 2.4004815188
Duration : 2.3814983589
Duration : 2.4070654172
Duration : 2.3964317020
Duration : 2.3858104548
Duration : 2.3765923560
Duration : 2.3734730321
-O3:
Duration : 0.4159400249
Duration : 0.3089567909
Duration : 0.3106977220
Duration : 0.3312316008
Duration : 0.2856188160
Duration : 0.2984415500
Duration : 0.3282426349
Duration : 0.2836121118
:......
+ FYI: #pragma-overheads :......
Duration : 0.0001377461
Duration : 0.0001228561
Duration : 0.0001215260
REF:
Amdahl's Law >>> https://stackoverflow.com/revisions/18374629/3
criticism,
on
(not-)including also the real-world's infrastructure add-on
{ setup | termination }-overhead costs of #pragma omp parallel section
(
simplified test w/o the add-on costs of global OpenMP setup & configuration
)
*/
which turns attention to your System-under-Test workload background noise.
Best re-test your code on a head-less platform, so as to avoid any sort of GUI-related workloads from intervening the computing-part of the test.
May enjoy this sandboxed online-TiO-platform to re-run experiments.