So I have this question from my professor, and I can not figure out why vector2
is faster and has less cache misses than vector1
.
Assume that the code below is a valid compilable C code.
Vector2:
void incrementVector2(INT4* v, int n) {
for (int k = 0; k < 100; ++k) {
for (int i = 0; i < n; ++i) {
v[i] = v[i] + 1;
}
}
}
Vector1:
void incrementVector1(INT4* v, int n) {
for (int i = 0; i < n; ++i) {
for (int k = 0; k < 100; ++k) {
v[i] = v[i] + 1;
}
}
}
NOTE: INT4 means the integer is 4 Bytes in size.
In terms of CPU specs: cache size = 12288KB, line size=64B and only considering this single cache interacting with main memory.
Question
Why does vector2
have a faster runtime than vector1
? And why vector1
will have more cache misses than vector2
?
Me and a few classmates worked on this for a while and couldn't figure it out. We thought it could be due to the fact that vector2
has better spatial locatlity, since the array is been accessed by the inner loop constantly, while in vector1
, only one element is accessed at a time?
we are not sure if this is correct, and also not sure how to bring cache lines in to this either.
We thought it could be due to the fact that vector2 has better spatial locatlity, since the array is been accessed by the inner loop constantly, while in vector1, only one element is accessed at a time?
Well, both codes have the same accessing pattern, iterating over the array v
with a stride of 1. Cache spacial-locality-wise both codes are the same. However, the second code:
void incrementVector1(INT4* v, int n) {
for (int i = 0; i < n; ++i) {
for (int k = 0; k < 100; ++k) {
v[i] = v[i] + 1;
}
}
}
Has a better temporal-locality because you access the same element 100 times, whereas in:
void incrementVector2(INT4* v, int n) {
for (int k = 0; k < 100; ++k) {
for (int i = 0; i < n; ++i) {
v[i] = v[i] + 1;
}
}
}
you only access it once on every 'n' iterations.
So either you did a mistake, your teacher is playing some kind of strange game or I am missing something obvious.