As part of my research, I have computed the parallel solution to different banded systems using ScaLAPACK
. I am interested in reporting the achieved speedup as a function of both the rank for the matrix, r, and its bandwidth, b.
How would this be better achieved?
Here's my selected universes for both values:
r in {10,000 25,000 50,000 75,000 100,000 500,000 1,000,000 5,000,000 10,000,000}
b in {2 4 8 16 32 64 128 256 512 1024}
The cluster I am using has 64 cores total, so p is in {1, ..., 64}
.
I have computed both the speedup and the efficiency, s and e, as a function of p, r and b.
My goal is to somehow show how the speedup is performing based on r and b. I was thinking of creating some kind of surface projection of the (r,b)-space. But how can I resume the behavior of the speedup in one value?
A suggestion I had was to compute the Pearson correlation coefficient using both the attained and ideal (linear) speedup, however, this does NOT seem to work, since it does not take into account the existence of "speedup sweet-spots" that arise for smaller values of r.
Any hint?
Thanks in advance!
After having had some time to think about this, I have decided to report, the best achieved speedup multiplied by the Pearson linear correlation coefficient.
Such a plot looks as follows:
The best achieved speedup per instance of (r,b) is weighted by how "close to linear" it is, information contained on the Pearson linear correlation coefficient. Since the former is a value defined in [-1,1], then, for speedups far from linear, we will have a 0, while negative values will show slowdown, when this is expected. In the attached plot, we can see that the parallel solver, will indeed shod proper scalability for small values of the bandwidth, and it will get worse as this value gets increased.
If you guys have any hint, or any corrections, please let me know ;)