cperformancecachingcpu-cachecache-locality

Cache misses when accessing an array in nested loop


So I have this question from my professor, and I can not figure out why vector2 is faster and has less cache misses than vector1.

Assume that the code below is a valid compilable C code.

Vector2:

void incrementVector2(INT4* v, int n) {
     for (int k = 0; k < 100; ++k) {
          for (int i = 0; i < n; ++i) {
               v[i] = v[i] + 1;
          }
     }
}

Vector1:

void incrementVector1(INT4* v, int n) {
     for (int i = 0; i < n; ++i) {
          for (int k = 0; k < 100; ++k) {
               v[i] = v[i] + 1;
          }
     }
}

NOTE: INT4 means the integer is 4 Bytes in size.

In terms of CPU specs: cache size = 12288KB, line size=64B and only considering this single cache interacting with main memory.

Question

Why does vector2 have a faster runtime than vector1? And why vector1 will have more cache misses than vector2?

Me and a few classmates worked on this for a while and couldn't figure it out. We thought it could be due to the fact that vector2 has better spatial locatlity, since the array is been accessed by the inner loop constantly, while in vector1, only one element is accessed at a time? we are not sure if this is correct, and also not sure how to bring cache lines in to this either.


Solution

  • We thought it could be due to the fact that vector2 has better spatial locatlity, since the array is been accessed by the inner loop constantly, while in vector1, only one element is accessed at a time?

    Well, both codes have the same accessing pattern, iterating over the array v with a stride of 1. Cache spacial-locality-wise both codes are the same. However, the second code:

    void incrementVector1(INT4* v, int n) {
         for (int i = 0; i < n; ++i) {
              for (int k = 0; k < 100; ++k) {
                   v[i] = v[i] + 1;
              }
         }
    }
    

    Has a better temporal-locality because you access the same element 100 times, whereas in:

    void incrementVector2(INT4* v, int n) {
         for (int k = 0; k < 100; ++k) {
              for (int i = 0; i < n; ++i) {
                   v[i] = v[i] + 1;
              }
         }
    }
    

    you only access it once on every 'n' iterations.

    So either you did a mistake, your teacher is playing some kind of strange game or I am missing something obvious.