HPCToolkit is an integrated suite of tools for measurement and analysis of program performance, without any special compilation of the program. It is able to instrument codes using using a variety of parallel programming models (e.g. serial/no parallelism, MPI, OpenMP, pthreads and hybrid models).
This page is a tour of some of the functionality – please see the website for more details:
Before using, please execute the following to make the software available:
module add hpctoolkit
Ordinary, fully optimised applications are now able to be measured and analysed; however, it is beneficial if the application has been built with the
-g compiler flag – allows reporting of line numbers, etc. Launch a program with:
|Type of program||Job Script Command|
This will generate a directory called
hpctoolkit- (or similar), containing a profile of the application run.
Examine the profile by executing:
hpcstructto generate a
file containing an analysis of the program binary
hpcprofto generate a
hpctoolkit-directory containing a performance database, either:
.hpcstruct hpctoolkit- -measurements
mpirun hpcprof-mpi -S(from within a batch job)
.hpcstruct hpctoolkit -measurements
hpcviewer hpctoolkit-to view the performance database
The default profile contains information on what routines are occupying the most (wallclock) time spent by the application.
Note: the information presented by
hpcviewer is different based on whether
hpcprof-mpi is used. For example, a database created by
hpcprof-mpi allows plots of how a metric varies over the different ranks.
Adding other measurements to profile (via options to hpcrun)
Hardware counters (PAPI)
HPCToolkit is able to sample PAPI counters keeping track of how the underlying hardware is performing. Do do this, add an option to the
hpcrun command line, of the form
-e event@period (
event = PAPI counter,
period = number of cycles between measurements).
For example, to collect data the number of cpu cycles and floating point instructions per routine, use something similar to:
hpcrun -e PAPI_TOT_CYC@4000000 -e PAPI_FP_INS@4000000 <program>
Note that the smaller the sampling period, the more the application will slow down, but will obtain a greater resolution of what routines are doing.
Tracing what the application does over time
hpcrun -t instead of
hpcrun will cause hpctoolkit to include a trace of what the application is doing over time to the other information collected in the measurements directory. To view this, use the same steps as above, but execute
hpctraceviewer instead of
hpcrun -e IO will add file read and write information to the profile.
Install local copy of user interface
It may be easier to transfer a performance database to your local desktop before examining. Before doing this, download the
hpctraceviewer applications from http://hpctoolkit.org/download/hpcviewer/