cpu-architecturebranch-prediction

2 bit branch predictor with two for loops


I got a 2 bit branch predictor, my starting state is weakly taken and I need to calculate the prediction accuracy:

for (int i=0; i < 100; i++)
{
   for (int j=0; j < 50; j++)
   {
   ...
   }
}

So with i = 0 we take the branch, so we are at i = 0 and j = 0 and set our predictor to strongly taken, right ? So if we iterate j now, does that mean we are not taking a new branch ? As we are still in the i = 0 branch, or does every iteration count as a new branch ?


Solution

  • Let's manually compile it into x86 assembly first for better understanding (any other would do to):

        mov ebx, 0        // this is our var i
    .L0:
    # /------------ inner loop start -----------\ 
        mov eax, 0        // this is our var j
    .L1: 
        // ...
        add     eax, 1
        cmp     eax, 50
        jl      .L1       // jump one
    # \------------ inner loop end -------------/
        add     ebx, 1
        cmp     ebx, 100
        jl      .L0       // jump two
    

    I think this code is pretty straight forward even if your not familiar with assembly:

    So on the first iteration we arrive at jump one and predict it will be taken. Since eax < 50 we take it and update it to strongly taken. Now we do this another 48 times. On the 50 iteration we don't jump because eax == 50. This is a single misprediction and we update to weakly taken.

    Now we arrive at jump two for the first time. since ebx < 100 we take it and update it to strongly taken. Now we start all over with that inner loop by jumping to L0. We do this another 98 times. On the 100 iteration of the inner loop we don't jump because ebx == 100. This is a single misprediction and we update to weakly taken.

    So we execute the innerloop 100 times with a single misprediction each for a total of 100 mispredictions for jump one and 100 * 49 = 4900 correct predictions. The outer loop is executed only once and has only 1 misprediction and 99 correct predictions.