What is 'stride' in C and how can it be used?
Generally, stride is the distance steps take through something.
In the addition
routine, we have these loops:
for (long i = 0; i < COLS; i++)
for (long j = 0; j < ROWS; j++) {
sum += table[j][i];
}
In successive iterations of the innermost loop with j
equal to x
in the first iteration, one iteration accesses table[x][i]
, and the next accesses table[x+1][i]
. The distance between these two accesses is the size of one table[j]
, which is COLS
(2000) elements of short
(likely two bytes), so likely 4000 bytes. So the stride is 4000 bytes.
This is generally bad for the cache memory on typical processors, as cache memory is designed mostly for memory accesses that are close to each other (small strides). This is the cause of the program’s slow performance.
Since the operation in the loop, sum += table[j][i];
, is independent of the order it is executed in for all the i
and j
, we can easily remedy this problem by swapping the two for
statements:
for (long j = 0; j < ROWS; j++)
for (long i = 0; i < COLS; i++)
sum += table[j][i];
Then successive iterations of the innermost loop will access table[j][x]
and table[j][x+1]
, which have a stride of one short
, likely two bytes.
On my system, the program runs about twenty times faster with this change.