ompP

ompP is a profiling library for OpenMP applications. It offers detailed information on how each parallel region performs together with measurements of various overheads (such as load imbalances, creating the region, etc.).

This page demonstrates usage on our systems and offers a short tour of some of the functionality – please see the website for more details on the project webpage:

http://www.ompp-tool.com/

or documentation installed with the software (see $OMPP_HOME/doc/ompp-usage.txt ).

Basic usage

Before using, please execute the following to make the software available:


module add ompP

Before profiling an application, it needs to be rebuilt by preprending kinst-ompp-papi to the normal compilation line e.g.


kinst-ompp-papi ifort -qopenmp -o hello_world hello_world.f90

When the application is run, a text file containing the profile will be generated. The profile will offer information on each parallel region (including file/line numbers), along with estimates for various overheads incurred by OpenMP.

Configuration

The behaviour can be influenced by setting various the OMPP_* environment variables, enabling it to print more or less detail, or output in a machine readable format. This may include PAPI hardware performance counters, e.g.


export OMPP_CTR1=PAPI_L2_DCM

Please see the software’s documentation for more information.

Example output


$ cat unknown.16-0.ompp.txt 
----------------------------------------------------------------------
----     ompP General Information     --------------------------------
----------------------------------------------------------------------
Start Date      : Mon Feb 09 10:59:02 2015
End Date        : Mon Feb 09 10:59:02 2015
Duration        : 0.11 sec
Application Name: unknown
Type of Report  : final
User Time       : 0.64 sec
System Time     : 0.01 sec
Max Threads     : 16
ompP Version    : 0.8.99
ompP Build Date : Jan 26 2015 13:08:12
PAPI Support    : available
Max Counters    : 4
PAPI Active     : yes
Used Counters   : 1
OMPP_CTR1       : PAPI_L2_DCM
OMPP_CTR2       : not set
OMPP_CTR3       : not set
OMPP_CTR4       : not set
Max Evaluators  : 4
Used Evaluators : 0
OMPP_EVAL1      : not set
OMPP_EVAL2      : not set
OMPP_EVAL3      : not set
OMPP_EVAL4      : not set

----------------------------------------------------------------------
----     ompP Region Overview     ------------------------------------
----------------------------------------------------------------------
PARALLEL: 1 region:
 * R00001 dot_product.f90 (16-39)

----------------------------------------------------------------------
----     ompP Callgraph     ------------------------------------------
----------------------------------------------------------------------

  Inclusive  (%)   Exclusive  (%)
   0.11 (100.0%)    0.04 (37.38%)           [unknown: 16 threads]
   0.07 (62.62%)    0.07 (62.62%) PARALLEL  +-R00001 dot_product.f90 (16-39)

----------------------------------------------------------------------
----     ompP Flat Region Profile (inclusive data)     ---------------
----------------------------------------------------------------------
R00001 dot_product.f90 (16-39) PARALLEL
 TID      execT      execC      bodyT   exitBarT   startupT   shutdwnT      taskT        PAPI_L2_DCM
   0       0.07          1       0.00       0.03       0.00       0.04       0.00                 71
   1       0.07          1       0.00       0.04       0.00       0.03       0.00                 39
   2       0.07          1       0.00       0.03       0.00       0.04       0.00                 35
   3       0.07          1       0.00       0.03       0.00       0.04       0.00                 33
   4       0.07          1       0.00       0.04       0.00       0.03       0.00                 36
   5       0.07          1       0.00       0.00       0.02       0.04       0.00                 33
   6       0.07          1       0.00       0.02       0.02       0.03       0.00                 33
   7       0.07          1       0.00       0.03       0.00       0.04       0.00                 44
   8       0.07          1       0.00       0.02       0.02       0.03       0.00                 45
   9       0.07          1       0.00       0.01       0.02       0.04       0.00                 39
  10       0.07          1       0.00       0.01       0.02       0.04       0.00                 29
  11       0.07          1       0.00       0.00       0.02       0.04       0.00                 40
  12       0.07          1       0.00       0.00       0.02       0.04       0.00                 40
  13       0.07          1       0.00       0.01       0.02       0.03       0.00                 30
  14       0.07          1       0.00       0.00       0.02       0.04       0.00                 33
  15       0.07          1       0.00       0.01       0.02       0.04       0.00                 37
 SUM       1.07         16       0.00       0.27       0.21       0.59       0.00                617

----------------------------------------------------------------------
----     ompP Callgraph Region Profiles (incl./excl. data)     -------
----------------------------------------------------------------------

[*00] unknown
[=01] R00001 dot_product.f90 (16-39) PARALLEL
 TID      execT      execC    bodyT/I    bodyT/E   exitBarT   startupT   shutdwnT      taskT        PAPI_L2_DCM/I        PAPI_L2_DCM/E
   0       0.07          1       0.00       0.00       0.03       0.00       0.04       0.00                   90                   90
   1       0.07          1       0.00       0.00       0.04       0.00       0.03       0.00                   51                   51
   2       0.07          1       0.00       0.00       0.03       0.00       0.04       0.00                   54                   54
   3       0.07          1       0.00       0.00       0.03       0.00       0.04       0.00                   57                   57
   4       0.07          1       0.00       0.00       0.04       0.00       0.03       0.00                   55                   55
   5       0.07          1       0.00       0.00       0.00       0.02       0.04       0.00                   53                   53
   6       0.07          1       0.00       0.00       0.02       0.02       0.03       0.00                   48                   48
   7       0.07          1       0.00       0.00       0.03       0.00       0.04       0.00                   62                   62
   8       0.07          1       0.00       0.00       0.02       0.02       0.03       0.00                   63                   63
   9       0.07          1       0.00       0.00       0.01       0.02       0.04       0.00                   52                   52
  10       0.07          1       0.00       0.00       0.01       0.02       0.04       0.00                   46                   46
  11       0.07          1       0.00       0.00       0.00       0.02       0.04       0.00                   56                   56
  12       0.07          1       0.00       0.00       0.00       0.02       0.04       0.00                   60                   60
  13       0.07          1       0.00       0.00       0.01       0.02       0.03       0.00                   50                   50
  14       0.07          1       0.00       0.00       0.00       0.02       0.04       0.00                   48                   48
  15       0.07          1       0.00       0.00       0.01       0.02       0.04       0.00                   54                   54
 SUM       1.07         16       0.00       0.00       0.27       0.21       0.59       0.00                  899                  899


----------------------------------------------------------------------
----     ompP Overhead Analysis Report     ---------------------------
----------------------------------------------------------------------
Total runtime (wallclock)   : 0.11 sec [16 threads]
Number of parallel regions  : 1
Parallel coverage           : 0.07 sec (62.62%)

Parallel regions sorted by wallclock time:
            Type                            Location      Wallclock (%)         PAPI_L2_DCM
R00001  PARALLEL             dot_product.f90 (16-39)       0.07 (62.62)                   0
                                                 SUM       0.07 (62.62)                   0

Overheads wrt. each individual parallel region:
          Total        Ovhds (%)  =   Synch  (%)  +  Imbal   (%)  +   Limpar (%)   +    Mgmt (%)
R00001     1.07     1.07 (99.99)    0.00 ( 0.00)    0.27 (25.43)    0.00 ( 0.00)    0.80 (74.56)

Overheads wrt. whole program:
          Total        Ovhds (%)  =   Synch  (%)  +  Imbal   (%)  +   Limpar (%)   +    Mgmt (%)
R00001     1.07     1.07 (62.61)    0.00 ( 0.00)    0.27 (15.92)    0.00 ( 0.00)    0.80 (46.69)
   SUM     1.07     1.07 (62.61)    0.00 ( 0.00)    0.27 (15.92)    0.00 ( 0.00)    0.80 (46.69)