armcpupipelinecpu-cachecortex-a

ARM Cortex-A8 L2 cache miss overhead


I am reading ARM Cortex-A8 data sheet, in data sheet ARM stated that an Load data that missed in L2 take at least 28 core cycle to complete, now i could not imagine that during this 28 cycle CPU will stall and put bubble in pipeline or execute other instruction until this load complete? what if we have an branch based on this load result? what if we have another load just after that instruction that again missed in L2??


Solution

  • Even under a cache miss, the pipeline will go on until the RAW (read after write) dependency bites.

    ldr     r12, [r0], #4
    subs    r12, r12, r1
    beq     end_loop
    

    The subs instruction cannot be executed at the same time as ldr due to the RAW dependency.
    The beq instruction cannot be executed at the same time as subs due to the CPSR RAW dependency.

    All in all, the sequence above will take 6 cycles in best case: three cycles instruction execution plus 3 cycles L1 hit latency while it will be 3 + 28 = 31 cycles in worst case (total cache miss)