For my application (SpMV) I have more data cache misses (PAPI_L1_DCM) than total cache misses (PAPI_L1_TCM) in level 1 cache. How can that be? For Level 2 the values are ok. That is, what the PAPI counters offer:
[PAPI_L1_ICM ][PAPI_L1_DCM ][PAPI_L1_TCM ][PAPI_L2_ICM ][PAPI_L2_DCM ][PAPI_L2_TCM ]
1256 3388225 1442386 1007 2389903 2390908
Furthermore, I have the case that my cache accesses are below the cache misses of a level. I can't explain it by myself.
[PAPI_L2_TCA ][PAPI_L2_TCM ][PAPI_L2_DCA ][PAPI_L2_DCM ]
1427361 2367210 1456111 2326503
Maybe the papi_avail output can explain. It would be also good to know the exact explanation of the addressed PAPI counters by Intel, but I didn't found it in the Manual: http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html
$ papi_avail
Available PAPI preset and user defined events plus hardware information.
--------------------------------------------------------------------------------
PAPI Version : 5.4.1.0
Vendor string and code : GenuineIntel (1)
Model string and code : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (63)
CPU Revision : 2.000000
CPUID Info : Family: 6 Model: 63 Stepping: 2
CPU Max Megahertz : 2501
CPU Min Megahertz : 1200
Hdw Threads per core : 2
Cores per Socket : 12
Sockets : 2
NUMA Nodes : 2
CPUs per Node : 24
Total CPUs : 48
Running in a VM : no
Number Hardware Counters : 11
Max Multiplex Counters : 32
--------------------------------------------------------------------------------
================================================================================
PAPI Preset Events
================================================================================
Name Code Avail Deriv Description (Note)
PAPI_L1_DCM 0x80000000 Yes No Level 1 data cache misses
PAPI_L1_ICM 0x80000001 Yes No Level 1 instruction cache misses
PAPI_L2_DCM 0x80000002 Yes Yes Level 2 data cache misses
PAPI_L2_ICM 0x80000003 Yes No Level 2 instruction cache misses
PAPI_L3_DCM 0x80000004 No No Level 3 data cache misses
PAPI_L3_ICM 0x80000005 No No Level 3 instruction cache misses
PAPI_L1_TCM 0x80000006 Yes Yes Level 1 cache misses
PAPI_L2_TCM 0x80000007 Yes No Level 2 cache misses
PAPI_L3_TCM 0x80000008 Yes No Level 3 cache misses
PAPI_CA_SNP 0x80000009 Yes No Requests for a snoop
PAPI_CA_SHR 0x8000000a Yes No Requests for exclusive access to shared cache line
PAPI_CA_CLN 0x8000000b Yes No Requests for exclusive access to clean cache line
PAPI_CA_INV 0x8000000c Yes No Requests for cache line invalidation
PAPI_CA_ITV 0x8000000d Yes No Requests for cache line intervention
PAPI_L3_LDM 0x8000000e Yes No Level 3 load misses
PAPI_L3_STM 0x8000000f No No Level 3 store misses
PAPI_BRU_IDL 0x80000010 No No Cycles branch units are idle
PAPI_FXU_IDL 0x80000011 No No Cycles integer units are idle
PAPI_FPU_IDL 0x80000012 No No Cycles floating point units are idle
PAPI_LSU_IDL 0x80000013 No No Cycles load/store units are idle
PAPI_TLB_DM 0x80000014 Yes Yes Data translation lookaside buffer misses
PAPI_TLB_IM 0x80000015 Yes No Instruction translation lookaside buffer misses
PAPI_TLB_TL 0x80000016 No No Total translation lookaside buffer misses
PAPI_L1_LDM 0x80000017 Yes No Level 1 load misses
PAPI_L1_STM 0x80000018 Yes No Level 1 store misses
PAPI_L2_LDM 0x80000019 Yes No Level 2 load misses
PAPI_L2_STM 0x8000001a Yes No Level 2 store misses
PAPI_BTAC_M 0x8000001b No No Branch target address cache misses
PAPI_PRF_DM 0x8000001c Yes No Data prefetch cache misses
PAPI_L3_DCH 0x8000001d No No Level 3 data cache hits
PAPI_TLB_SD 0x8000001e No No Translation lookaside buffer shootdowns
PAPI_CSR_FAL 0x8000001f No No Failed store conditional instructions
PAPI_CSR_SUC 0x80000020 No No Successful store conditional instructions
PAPI_CSR_TOT 0x80000021 No No Total store conditional instructions
PAPI_MEM_SCY 0x80000022 No No Cycles Stalled Waiting for memory accesses
PAPI_MEM_RCY 0x80000023 No No Cycles Stalled Waiting for memory Reads
PAPI_MEM_WCY 0x80000024 Yes No Cycles Stalled Waiting for memory writes
PAPI_STL_ICY 0x80000025 Yes No Cycles with no instruction issue
PAPI_FUL_ICY 0x80000026 Yes Yes Cycles with maximum instruction issue
PAPI_STL_CCY 0x80000027 Yes No Cycles with no instructions completed
PAPI_FUL_CCY 0x80000028 Yes No Cycles with maximum instructions completed
PAPI_HW_INT 0x80000029 No No Hardware interrupts
PAPI_BR_UCN 0x8000002a Yes Yes Unconditional branch instructions
PAPI_BR_CN 0x8000002b Yes No Conditional branch instructions
PAPI_BR_TKN 0x8000002c Yes Yes Conditional branch instructions taken
PAPI_BR_NTK 0x8000002d Yes No Conditional branch instructions not taken
PAPI_BR_MSP 0x8000002e Yes No Conditional branch instructions mispredicted
PAPI_BR_PRC 0x8000002f Yes Yes Conditional branch instructions correctly predicted
PAPI_FMA_INS 0x80000030 No No FMA instructions completed
PAPI_TOT_IIS 0x80000031 No No Instructions issued
PAPI_TOT_INS 0x80000032 Yes No Instructions completed
PAPI_INT_INS 0x80000033 No No Integer instructions
PAPI_FP_INS 0x80000034 No No Floating point instructions
PAPI_LD_INS 0x80000035 Yes No Load instructions
PAPI_SR_INS 0x80000036 Yes No Store instructions
PAPI_BR_INS 0x80000037 Yes No Branch instructions
PAPI_VEC_INS 0x80000038 No No Vector/SIMD instructions (could include integer)
PAPI_RES_STL 0x80000039 Yes No Cycles stalled on any resource
PAPI_FP_STAL 0x8000003a No No Cycles the FP unit(s) are stalled
PAPI_TOT_CYC 0x8000003b Yes No Total cycles
PAPI_LST_INS 0x8000003c Yes Yes Load/store instructions completed
PAPI_SYC_INS 0x8000003d No No Synchronization instructions completed
PAPI_L1_DCH 0x8000003e No No Level 1 data cache hits
PAPI_L2_DCH 0x8000003f No No Level 2 data cache hits
PAPI_L1_DCA 0x80000040 No No Level 1 data cache accesses
PAPI_L2_DCA 0x80000041 Yes No Level 2 data cache accesses
PAPI_L3_DCA 0x80000042 Yes Yes Level 3 data cache accesses
PAPI_L1_DCR 0x80000043 No No Level 1 data cache reads
PAPI_L2_DCR 0x80000044 Yes No Level 2 data cache reads
PAPI_L3_DCR 0x80000045 Yes No Level 3 data cache reads
PAPI_L1_DCW 0x80000046 No No Level 1 data cache writes
PAPI_L2_DCW 0x80000047 Yes No Level 2 data cache writes
PAPI_L3_DCW 0x80000048 Yes No Level 3 data cache writes
PAPI_L1_ICH 0x80000049 No No Level 1 instruction cache hits
PAPI_L2_ICH 0x8000004a Yes No Level 2 instruction cache hits
PAPI_L3_ICH 0x8000004b No No Level 3 instruction cache hits
PAPI_L1_ICA 0x8000004c No No Level 1 instruction cache accesses
PAPI_L2_ICA 0x8000004d Yes No Level 2 instruction cache accesses
PAPI_L3_ICA 0x8000004e Yes No Level 3 instruction cache accesses
PAPI_L1_ICR 0x8000004f No No Level 1 instruction cache reads
PAPI_L2_ICR 0x80000050 Yes No Level 2 instruction cache reads
PAPI_L3_ICR 0x80000051 Yes No Level 3 instruction cache reads
PAPI_L1_ICW 0x80000052 No No Level 1 instruction cache writes
PAPI_L2_ICW 0x80000053 No No Level 2 instruction cache writes
PAPI_L3_ICW 0x80000054 No No Level 3 instruction cache writes
PAPI_L1_TCH 0x80000055 No No Level 1 total cache hits
PAPI_L2_TCH 0x80000056 No No Level 2 total cache hits
PAPI_L3_TCH 0x80000057 No No Level 3 total cache hits
PAPI_L1_TCA 0x80000058 No No Level 1 total cache accesses
PAPI_L2_TCA 0x80000059 Yes Yes Level 2 total cache accesses
PAPI_L3_TCA 0x8000005a Yes No Level 3 total cache accesses
PAPI_L1_TCR 0x8000005b No No Level 1 total cache reads
PAPI_L2_TCR 0x8000005c Yes Yes Level 2 total cache reads
PAPI_L3_TCR 0x8000005d Yes Yes Level 3 total cache reads
PAPI_L1_TCW 0x8000005e No No Level 1 total cache writes
PAPI_L2_TCW 0x8000005f Yes No Level 2 total cache writes
PAPI_L3_TCW 0x80000060 Yes No Level 3 total cache writes
PAPI_FML_INS 0x80000061 No No Floating point multiply instructions
PAPI_FAD_INS 0x80000062 No No Floating point add instructions
PAPI_FDV_INS 0x80000063 No No Floating point divide instructions
PAPI_FSQ_INS 0x80000064 No No Floating point square root instructions
PAPI_FNV_INS 0x80000065 No No Floating point inverse instructions
PAPI_FP_OPS 0x80000066 No No Floating point operations
PAPI_SP_OPS 0x80000067 No No Floating point operations; optimized to count scaled single precision vector operations
PAPI_DP_OPS 0x80000068 No No Floating point operations; optimized to count scaled double precision vector operations
PAPI_VEC_SP 0x80000069 No No Single precision vector/SIMD instructions
PAPI_VEC_DP 0x8000006a No No Double precision vector/SIMD instructions
PAPI_REF_CYC 0x8000006b Yes No Reference clock cycles
================================================================================
User Defined Events
================================================================================
Name Code Avail Deriv Description (Note)
--------------------------------------------------------------------------------
Of 108 possible events, 56 are available, of which 12 are derived.
avail.c PASSED
I found an explanation of the first problem. (PAPI_L1_DCM > PAPI_L1_TCM):
The referenced native event is the L1D:REPLACEMENT which "Counts the number of lines brought into the L1 data cache." source
$ papi_avail -e PAPI_L1_DCM
Available PAPI preset and user defined events plus hardware information.
--------------------------------------------------------------------------------
PAPI Version : 5.4.1.0
Vendor string and code : GenuineIntel (1)
Model string and code : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (63)
CPU Revision : 2.000000
CPUID Info : Family: 6 Model: 63 Stepping: 2
CPU Max Megahertz : 2501
CPU Min Megahertz : 1200
Hdw Threads per core : 2
Cores per Socket : 12
Sockets : 2
NUMA Nodes : 2
CPUs per Node : 24
Total CPUs : 48
Running in a VM : no
Number Hardware Counters : 11
Max Multiplex Counters : 32
--------------------------------------------------------------------------------
Event name: PAPI_L1_DCM
Event Code: 0x80000000
Number of Native Events: 1
Short Description: |L1D cache misses|
Long Description: |Level 1 data cache misses|
Developer's Notes: ||
Derived Type: |NOT_DERIVED|
Postfix Processing String: ||
Native Code[0]: 0x40000006 |L1D:REPLACEMENT|
Number of Register Values: 0
Native Event Description: |L1D cache, masks:L1D Data line replacements|
--------------------------------------------------------------------------------
avail.c PASSED
I can't explain the second question (PAPI_L2_TCA < PAPI_L2_TCM). It could have to do with the speculation fetches for the L2 cache. The native events are L2_RQSTS:ALL_DEMAND_REFERENCES (PAPI_L2_TCA) and LLC_REFERENCES (PAPI_L2_TCM).