Scalasca is a package allowing the analysis of profile/tracing of parallel programs, providing insights into load imbalances, etc.

This page is a tour of some of the functionality – please see the website for more details:

Setting the module environment

When you log in, you should ensure that your module list matches the one used to compile the program you wish to investigate. At minimum, this should include a compiler module (a set of Intel compilers are loaded by default); if the program uses MPI, it should also include an MPI module (a version of OpenMPI is loaded by default).

Once done, loading the scalasca module will make available a version appropriate to your selection. Scalasca also depends on a tool called ScoreP to modify program so that profile/tracing data can be collected. The scalasca module provides an environment variable SCALASCA_MODULE_SCOREP containing the name of the version of ScoreP it has been built against. This needs to be loaded when preparing the executable.

For example, if you would like to use the GNU GCC compilers and Intel’s MPI implementation, use:

If you require access to the cube* commands when analysing the data, please load the cube module used to build Scalasca, via the module add $SCALASCA_MODULE_CUBE command.

Example usage

Prepare the executable

Use the normal commands to build the program, prefixing any call to mpif90, mpicc, mpicxx, ifort, gcc etc. with scorep-. e.g.

This ensures that the executable is linked against the appropriate libraries to intercept MPI calls, etc.

Collect profile/tracing data

Use a command of the following format from within a batch job:

This will create a directory with a name matching the format scorep__O_sum containing the collected data.

Analyse profile/tracing data

Use a command of the following format on a login node to analyse the data in a directory containing program/tracing data and launch the GUI:

Files containing a text-based version of the output can be obtained via commands like scalasca -examine -s

Example GUI output:

Scalasca User Interface showing a computational inbalance
Scalasca User Interface showing a computational inbalance

Collecting additional metrics

By default, only a limited number of metrics are collected about a program: time, memory allocation, and MPI.

Additional information can optionally be selected to be included.

Please see the output of scorep-info config-vars for general details.

PAPI (hardware performance counters)

PAPI is a library able to collect low level hardware information (for example, use of the various caches inside a CPU) and scorep/scalasca can be configured to use it.

For example, export the following environment variable before the ‘analyse’ step above, to collect the number of L1 CPU data cache misses:

MPI tracing

More detail about MPI operations can be collected by executing the following command ahead of the ‘analyse’ step: