HPCToolkit is an integrated suite of tools for measurement and analysis of program performance, without any special compilation of the program. It is able to instrument codes using using a variety of parallel programming models (e.g. serial/no parallelism, MPI, OpenMP, pthreads and hybrid models).

This page is a tour of some of the functionality – please see the website for more details:


Basic usage

Before using, please execute the following to make the software available:

Ordinary, fully optimised applications are now able to be measured and analysed; however, it is beneficial if the application has been built with the -g compiler flag – allows reporting of line numbers, etc. Launch a program with:

Type of program Job Script Command
Non-MPI hpcrun
MPI based mpirun hpcrun

This will generate a directory called hpctoolkit--measurements (or similar), containing a profile of the application run.

Examine the profile by executing:

  • hpcstruct to generate a .hpcstruct file containing an analysis of the program binary
  • hpcprof to generate a hpctoolkit--database directory containing a performance database, either:
    • hpcprof -S .hpcstruct hpctoolkit--measurements
    • mpirun hpcprof-mpi -S .hpcstruct hpctoolkit-measurements (from within a batch job)
  • hpcviewer hpctoolkit--database to view the performance database

The default profile contains information on what routines are occupying the most (wallclock) time spent by the application.

Note: the information presented by hpcviewer is different based on whether hpcprof or hpcprof-mpi is used. For example, a database created by hpcprof-mpi allows plots of how a metric varies over the different ranks.

Adding other measurements to profile (via options to hpcrun)

Hardware counters (PAPI)

HPCToolkit is able to sample PAPI counters keeping track of how the underlying hardware is performing. Do do this, add an option to the hpcrun command line, of the form -e event@period (event = PAPI counter, period = number of cycles between measurements).

For example, to collect data the number of cpu cycles and floating point instructions per routine, use something similar to:

Note that the smaller the sampling period, the more the application will slow down, but will obtain a greater resolution of what routines are doing.

Tracing what the application does over time

Using hpcrun -t instead of hpcrun will cause hpctoolkit to include a trace of what the application is doing over time to the other information collected in the measurements directory. To view this, use the same steps as above, but execute hpctraceviewer instead of hpcviewer.


Using hpcrun -e IO will add file read and write information to the profile.

Install local copy of user interface

It may be easier to transfer a performance database to your local desktop before examining. Before doing this, download the hpcviewer and hpctraceviewer applications from http://hpctoolkit.org/download/hpcviewer/