PAPI

PAPI (hardware counters)

PAPI can be used to understand how well an application is using the CPU and associated memory caches. It is used by various profiling tools installed on the system, but if necessary can also be called directory from applications.

The main website is http://icl.cs.utk.edu/papi/

Using with profilers

Please refer to the documentation for your favorite profiler on how to ask it to collect particular counters and how to then use that data to derive other interesting metrics – such as a flops rating ( PAPI_FP_OPS / time), or % of L2 cache misses ( PAPI_L2_ICM/PAPI_L2_TCA ).

What counters are available?

This is dependent on the CPU, so varies between our services. Please load the papi module and then execute the papi_avail command:

Note: a papi_avail command may be available before the module is loaded and fail with an error. This is normal – please load the papi module before executing the command.

Note: any counter with Yes in the Avail column is supported on the hardware; however, only a relatively small number of counters can be used at any one time.

Note: some counters, marked with a Yes in the Deriv column, are generated from several counters – so fewer derived than non-derived counters can be used simultaneously.

Note: no counters are available on ARC1. This is due to a lack of functionality provided by the underlying operating system. Counters are available on all our other services.

For example, on ARC2 the following counters are available:

Name Avail Deriv Description (Note)
PAPI_L1_DCM Yes No Level 1 data cache misses
PAPI_L1_ICM Yes No Level 1 instruction cache misses
PAPI_L2_DCM Yes Yes Level 2 data cache misses
PAPI_L2_ICM Yes No Level 2 instruction cache misses
PAPI_L1_TCM Yes Yes Level 1 cache misses
PAPI_L2_TCM Yes No Level 2 cache misses
PAPI_L3_TCM Yes No Level 3 cache misses
PAPI_TLB_DM Yes Yes Data translation lookaside buffer misses
PAPI_TLB_IM Yes No Instruction translation lookaside buffer misses
PAPI_L1_LDM Yes No Level 1 load misses
PAPI_L1_STM Yes No Level 1 store misses
PAPI_L2_STM Yes No Level 2 store misses
PAPI_STL_ICY Yes No Cycles with no instruction issue
PAPI_BR_UCN Yes Yes Unconditional branch instructions
PAPI_BR_CN Yes No Conditional branch instructions
PAPI_BR_TKN Yes Yes Conditional branch instructions taken
PAPI_BR_NTK Yes No Conditional branch instructions not taken
PAPI_BR_MSP Yes No Conditional branch instructions mispredicted
PAPI_BR_PRC Yes Yes Conditional branch instructions correctly predicted
PAPI_TOT_INS Yes No Instructions completed
PAPI_FP_INS Yes Yes Floating point instructions
PAPI_LD_INS Yes No Load instructions
PAPI_SR_INS Yes No Store instructions
PAPI_BR_INS Yes No Branch instructions
PAPI_TOT_CYC Yes No Total cycles
PAPI_L2_DCH Yes Yes Level 2 data cache hits
PAPI_L2_DCA Yes No Level 2 data cache accesses
PAPI_L3_DCA Yes Yes Level 3 data cache accesses
PAPI_L2_DCR Yes No Level 2 data cache reads
PAPI_L3_DCR Yes No Level 3 data cache reads
PAPI_L2_DCW Yes No Level 2 data cache writes
PAPI_L3_DCW Yes No Level 3 data cache writes
PAPI_L2_ICH Yes No Level 2 instruction cache hits
PAPI_L2_ICA Yes No Level 2 instruction cache accesses
PAPI_L3_ICA Yes No Level 3 instruction cache accesses
PAPI_L2_ICR Yes No Level 2 instruction cache reads
PAPI_L3_ICR Yes No Level 3 instruction cache reads
PAPI_L2_TCA Yes Yes Level 2 total cache accesses
PAPI_L3_TCA Yes No Level 3 total cache accesses
PAPI_L2_TCR Yes Yes Level 2 total cache reads
PAPI_L3_TCR Yes Yes Level 3 total cache reads
PAPI_L2_TCW Yes No Level 2 total cache writes
PAPI_L3_TCW Yes No Level 3 total cache writes
PAPI_FDV_INS Yes No Floating point divide instructions
PAPI_FP_OPS Yes Yes Floating point operations
PAPI_SP_OPS Yes Yes Floating point operations; optimized to count scaled single precision vector operations
PAPI_DP_OPS Yes Yes Floating point operations; optimized to count scaled double precision vector operations
PAPI_VEC_SP Yes Yes Single precision vector/SIMD instructions
PAPI_VEC_DP Yes Yes Double precision vector/SIMD instructions
PAPI_REF_CYC Yes No Reference clock cycles