cachingpapi

Total Cache misses fewer than data cache misses (PAPI_L1_DCM > PAPI_L1_TCM)


For my application (SpMV) I have more data cache misses (PAPI_L1_DCM) than total cache misses (PAPI_L1_TCM) in level 1 cache. How can that be? For Level 2 the values are ok. That is, what the PAPI counters offer:

[PAPI_L1_ICM ][PAPI_L1_DCM ][PAPI_L1_TCM ][PAPI_L2_ICM ][PAPI_L2_DCM ][PAPI_L2_TCM ]
          1256       3388225       1442386          1007       2389903       2390908

Furthermore, I have the case that my cache accesses are below the cache misses of a level. I can't explain it by myself.

[PAPI_L2_TCA ][PAPI_L2_TCM ][PAPI_L2_DCA ][PAPI_L2_DCM ]
       1427361       2367210       1456111       2326503

Maybe the papi_avail output can explain. It would be also good to know the exact explanation of the addressed PAPI counters by Intel, but I didn't found it in the Manual: http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html

$ papi_avail
Available PAPI preset and user defined events plus hardware information.
--------------------------------------------------------------------------------
PAPI Version             : 5.4.1.0
Vendor string and code   : GenuineIntel (1)
Model string and code    : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (63)
CPU Revision             : 2.000000
CPUID Info               : Family: 6  Model: 63  Stepping: 2
CPU Max Megahertz        : 2501
CPU Min Megahertz        : 1200
Hdw Threads per core     : 2
Cores per Socket         : 12
Sockets                  : 2
NUMA Nodes               : 2
CPUs per Node            : 24
Total CPUs               : 48
Running in a VM          : no
Number Hardware Counters : 11
Max Multiplex Counters   : 32
--------------------------------------------------------------------------------

================================================================================
  PAPI Preset Events
================================================================================
    Name        Code    Avail Deriv Description (Note)
PAPI_L1_DCM  0x80000000  Yes   No   Level 1 data cache misses
PAPI_L1_ICM  0x80000001  Yes   No   Level 1 instruction cache misses
PAPI_L2_DCM  0x80000002  Yes   Yes  Level 2 data cache misses
PAPI_L2_ICM  0x80000003  Yes   No   Level 2 instruction cache misses
PAPI_L3_DCM  0x80000004  No    No   Level 3 data cache misses
PAPI_L3_ICM  0x80000005  No    No   Level 3 instruction cache misses
PAPI_L1_TCM  0x80000006  Yes   Yes  Level 1 cache misses
PAPI_L2_TCM  0x80000007  Yes   No   Level 2 cache misses
PAPI_L3_TCM  0x80000008  Yes   No   Level 3 cache misses
PAPI_CA_SNP  0x80000009  Yes   No   Requests for a snoop
PAPI_CA_SHR  0x8000000a  Yes   No   Requests for exclusive access to shared cache line
PAPI_CA_CLN  0x8000000b  Yes   No   Requests for exclusive access to clean cache line
PAPI_CA_INV  0x8000000c  Yes   No   Requests for cache line invalidation
PAPI_CA_ITV  0x8000000d  Yes   No   Requests for cache line intervention
PAPI_L3_LDM  0x8000000e  Yes   No   Level 3 load misses
PAPI_L3_STM  0x8000000f  No    No   Level 3 store misses
PAPI_BRU_IDL 0x80000010  No    No   Cycles branch units are idle
PAPI_FXU_IDL 0x80000011  No    No   Cycles integer units are idle
PAPI_FPU_IDL 0x80000012  No    No   Cycles floating point units are idle
PAPI_LSU_IDL 0x80000013  No    No   Cycles load/store units are idle
PAPI_TLB_DM  0x80000014  Yes   Yes  Data translation lookaside buffer misses
PAPI_TLB_IM  0x80000015  Yes   No   Instruction translation lookaside buffer misses
PAPI_TLB_TL  0x80000016  No    No   Total translation lookaside buffer misses
PAPI_L1_LDM  0x80000017  Yes   No   Level 1 load misses
PAPI_L1_STM  0x80000018  Yes   No   Level 1 store misses
PAPI_L2_LDM  0x80000019  Yes   No   Level 2 load misses
PAPI_L2_STM  0x8000001a  Yes   No   Level 2 store misses
PAPI_BTAC_M  0x8000001b  No    No   Branch target address cache misses
PAPI_PRF_DM  0x8000001c  Yes   No   Data prefetch cache misses
PAPI_L3_DCH  0x8000001d  No    No   Level 3 data cache hits
PAPI_TLB_SD  0x8000001e  No    No   Translation lookaside buffer shootdowns
PAPI_CSR_FAL 0x8000001f  No    No   Failed store conditional instructions
PAPI_CSR_SUC 0x80000020  No    No   Successful store conditional instructions
PAPI_CSR_TOT 0x80000021  No    No   Total store conditional instructions
PAPI_MEM_SCY 0x80000022  No    No   Cycles Stalled Waiting for memory accesses
PAPI_MEM_RCY 0x80000023  No    No   Cycles Stalled Waiting for memory Reads
PAPI_MEM_WCY 0x80000024  Yes   No   Cycles Stalled Waiting for memory writes
PAPI_STL_ICY 0x80000025  Yes   No   Cycles with no instruction issue
PAPI_FUL_ICY 0x80000026  Yes   Yes  Cycles with maximum instruction issue
PAPI_STL_CCY 0x80000027  Yes   No   Cycles with no instructions completed
PAPI_FUL_CCY 0x80000028  Yes   No   Cycles with maximum instructions completed
PAPI_HW_INT  0x80000029  No    No   Hardware interrupts
PAPI_BR_UCN  0x8000002a  Yes   Yes  Unconditional branch instructions
PAPI_BR_CN   0x8000002b  Yes   No   Conditional branch instructions
PAPI_BR_TKN  0x8000002c  Yes   Yes  Conditional branch instructions taken
PAPI_BR_NTK  0x8000002d  Yes   No   Conditional branch instructions not taken
PAPI_BR_MSP  0x8000002e  Yes   No   Conditional branch instructions mispredicted
PAPI_BR_PRC  0x8000002f  Yes   Yes  Conditional branch instructions correctly predicted
PAPI_FMA_INS 0x80000030  No    No   FMA instructions completed
PAPI_TOT_IIS 0x80000031  No    No   Instructions issued
PAPI_TOT_INS 0x80000032  Yes   No   Instructions completed
PAPI_INT_INS 0x80000033  No    No   Integer instructions
PAPI_FP_INS  0x80000034  No    No   Floating point instructions
PAPI_LD_INS  0x80000035  Yes   No   Load instructions
PAPI_SR_INS  0x80000036  Yes   No   Store instructions
PAPI_BR_INS  0x80000037  Yes   No   Branch instructions
PAPI_VEC_INS 0x80000038  No    No   Vector/SIMD instructions (could include integer)
PAPI_RES_STL 0x80000039  Yes   No   Cycles stalled on any resource
PAPI_FP_STAL 0x8000003a  No    No   Cycles the FP unit(s) are stalled
PAPI_TOT_CYC 0x8000003b  Yes   No   Total cycles
PAPI_LST_INS 0x8000003c  Yes   Yes  Load/store instructions completed
PAPI_SYC_INS 0x8000003d  No    No   Synchronization instructions completed
PAPI_L1_DCH  0x8000003e  No    No   Level 1 data cache hits
PAPI_L2_DCH  0x8000003f  No    No   Level 2 data cache hits
PAPI_L1_DCA  0x80000040  No    No   Level 1 data cache accesses
PAPI_L2_DCA  0x80000041  Yes   No   Level 2 data cache accesses
PAPI_L3_DCA  0x80000042  Yes   Yes  Level 3 data cache accesses
PAPI_L1_DCR  0x80000043  No    No   Level 1 data cache reads
PAPI_L2_DCR  0x80000044  Yes   No   Level 2 data cache reads
PAPI_L3_DCR  0x80000045  Yes   No   Level 3 data cache reads
PAPI_L1_DCW  0x80000046  No    No   Level 1 data cache writes
PAPI_L2_DCW  0x80000047  Yes   No   Level 2 data cache writes
PAPI_L3_DCW  0x80000048  Yes   No   Level 3 data cache writes
PAPI_L1_ICH  0x80000049  No    No   Level 1 instruction cache hits
PAPI_L2_ICH  0x8000004a  Yes   No   Level 2 instruction cache hits
PAPI_L3_ICH  0x8000004b  No    No   Level 3 instruction cache hits
PAPI_L1_ICA  0x8000004c  No    No   Level 1 instruction cache accesses
PAPI_L2_ICA  0x8000004d  Yes   No   Level 2 instruction cache accesses
PAPI_L3_ICA  0x8000004e  Yes   No   Level 3 instruction cache accesses
PAPI_L1_ICR  0x8000004f  No    No   Level 1 instruction cache reads
PAPI_L2_ICR  0x80000050  Yes   No   Level 2 instruction cache reads
PAPI_L3_ICR  0x80000051  Yes   No   Level 3 instruction cache reads
PAPI_L1_ICW  0x80000052  No    No   Level 1 instruction cache writes
PAPI_L2_ICW  0x80000053  No    No   Level 2 instruction cache writes
PAPI_L3_ICW  0x80000054  No    No   Level 3 instruction cache writes
PAPI_L1_TCH  0x80000055  No    No   Level 1 total cache hits
PAPI_L2_TCH  0x80000056  No    No   Level 2 total cache hits
PAPI_L3_TCH  0x80000057  No    No   Level 3 total cache hits
PAPI_L1_TCA  0x80000058  No    No   Level 1 total cache accesses
PAPI_L2_TCA  0x80000059  Yes   Yes  Level 2 total cache accesses
PAPI_L3_TCA  0x8000005a  Yes   No   Level 3 total cache accesses
PAPI_L1_TCR  0x8000005b  No    No   Level 1 total cache reads
PAPI_L2_TCR  0x8000005c  Yes   Yes  Level 2 total cache reads
PAPI_L3_TCR  0x8000005d  Yes   Yes  Level 3 total cache reads
PAPI_L1_TCW  0x8000005e  No    No   Level 1 total cache writes
PAPI_L2_TCW  0x8000005f  Yes   No   Level 2 total cache writes
PAPI_L3_TCW  0x80000060  Yes   No   Level 3 total cache writes
PAPI_FML_INS 0x80000061  No    No   Floating point multiply instructions
PAPI_FAD_INS 0x80000062  No    No   Floating point add instructions
PAPI_FDV_INS 0x80000063  No    No   Floating point divide instructions
PAPI_FSQ_INS 0x80000064  No    No   Floating point square root instructions
PAPI_FNV_INS 0x80000065  No    No   Floating point inverse instructions
PAPI_FP_OPS  0x80000066  No    No   Floating point operations
PAPI_SP_OPS  0x80000067  No    No   Floating point operations; optimized to count scaled single precision vector operations
PAPI_DP_OPS  0x80000068  No    No   Floating point operations; optimized to count scaled double precision vector operations
PAPI_VEC_SP  0x80000069  No    No   Single precision vector/SIMD instructions
PAPI_VEC_DP  0x8000006a  No    No   Double precision vector/SIMD instructions
PAPI_REF_CYC 0x8000006b  Yes   No   Reference clock cycles

================================================================================
  User Defined Events
================================================================================
    Name        Code    Avail Deriv Description (Note)
--------------------------------------------------------------------------------
Of 108 possible events, 56 are available, of which 12 are derived.

avail.c                                     PASSED

Solution

  • I found an explanation of the first problem. (PAPI_L1_DCM > PAPI_L1_TCM):

    The referenced native event is the L1D:REPLACEMENT which "Counts the number of lines brought into the L1 data cache." source

    $ papi_avail -e PAPI_L1_DCM
    Available PAPI preset and user defined events plus hardware information.
    --------------------------------------------------------------------------------
    PAPI Version             : 5.4.1.0
    Vendor string and code   : GenuineIntel (1)
    Model string and code    : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (63)
    CPU Revision             : 2.000000
    CPUID Info               : Family: 6  Model: 63  Stepping: 2
    CPU Max Megahertz        : 2501
    CPU Min Megahertz        : 1200
    Hdw Threads per core     : 2
    Cores per Socket         : 12
    Sockets                  : 2
    NUMA Nodes               : 2
    CPUs per Node            : 24
    Total CPUs               : 48
    Running in a VM          : no
    Number Hardware Counters : 11
    Max Multiplex Counters   : 32
    --------------------------------------------------------------------------------
    
    Event name:                   PAPI_L1_DCM
    Event Code:                   0x80000000
    Number of Native Events:      1
    Short Description:           |L1D cache misses|
    Long Description:            |Level 1 data cache misses|
    Developer's Notes:           ||
    Derived Type:                |NOT_DERIVED|
    Postfix Processing String:   ||
     Native Code[0]: 0x40000006 |L1D:REPLACEMENT|
     Number of Register Values: 0
     Native Event Description: |L1D cache, masks:L1D Data line replacements|
    
    --------------------------------------------------------------------------------
    avail.c                                     PASSED
    

    I can't explain the second question (PAPI_L2_TCA < PAPI_L2_TCM). It could have to do with the speculation fetches for the L2 cache. The native events are L2_RQSTS:ALL_DEMAND_REFERENCES (PAPI_L2_TCA) and LLC_REFERENCES (PAPI_L2_TCM).