cpucortex-a

cortexa7 CPU(s) took too long time to execute a loop compared to cortexa15 CPU(s)


I am testing CPU performance. I used 02 boards with armv7 and SMP support: cortexa15@1.5GHz dual core and cortexa7@1GHz dual core.

Then, execute a simple loop as below and measure time of execution:

#define DEFAULT_CALC_LOOPS 1000
#define LOOPS_MULTIPLIER 4.2
...
loops = DEFAULT_CALC_LOOPS;
...
void *calc(int loops)
{
    int i, j;
    for (i = 0; i < loops * LOOPS_MULTIPLIER; i++) {
        for (j = 0; j < 125; j++) {
            // Sum of the numbers up to J
            volatile int temp = j * (j + 1) / 2;
            (void)temp;
        }
    }
    return NULL;
}

The results showed on 02 boards after variety of tests:

There's a big difference between the above results.

Are there any dependence or limitation impacting to the results ? Who experienced with this can share me ideas ? Thanks.


Solution

  • For me, cortexa15 has over 2x - 3x performance compared to cortexa7. Besides, I am having cortexa15@1.5GHz and cortexa7@1GHz. So I also think the above result is reasonable.

    Below, I'll give an example for cortexa15 case study to measure execution time:

    1. Formula to calculate CPU time:

      CPU execution time = Instruction count x CPI x Clock cycle

    I: Number of Instruction

    CPI: cycles per instruction (IPC = 1/CPI)

    C: Clock cycle (1/CPU clock) - second

    1. Refer community: https://en.wikipedia.org/wiki/Instructions_per_second

    Take a look to cortexa15 dual-core (same with iWave G1M/N).

    Cortexa15 executes 9,900 MIPS at 1.5 GHz, average IPC = 6.6

    CPI = 1/IPC = 1/6.6 = 0.1515 cycle/instruction

    1. G1M/N have maximun 1.5 GHz ( range of clock ~1.3 GHz - 1.5 GHz) I assume the boards work with best effort (1.5 GHz)

    C = 1/(1.5.10^9) = 0.6667 ns

    1. Translate C code to assembly code for ARM arch:

      for (i = 0; i < loops * LOOPS_MULTIPLIER; i++) {

          for (j = 0; j < 125; j++) {
              // Sum of the numbers up to J
              volatile int temp = j * (j + 1) / 2;
              (void)temp;
          }
      

      }

    Refer: https://godbolt.org

    I = (((9+9) * 125) + 17) * 1000 * 4.2 = 9521400

    The CPU execution time finally is 0.000962 seconds. Approximate 0.962 ms to execute the loop with the best effort of CPU.

    In worst case (at 1.3 GHz), CPU time for the loop is around 1.109 ms.

    Via testing, I got the same values.

    --

    I do more case for cortexa7@1GHz.

    CPU execution time = 9521400 * 1/1.9 * 1ns = 5.011 (ms)