Valgrind provides a collection of tools to analyse an application, looking for memory errors, call graphs, heap profiling, branch prediction profiler and cache usage profiler.
This page describes very simple usage – please see the website for more details:
Before using, please load the modules for your preferred development environment and then execute the following to make the software available:
module add valgrind
(a valgrind executable may be available before this command is executed, but it is an older version supplied by the operating system).
Running Valgrind with codes:
Running Valgrind with MPI codes from within a batch job:
/bin/env LD_PRELOAD=$VALGRIND_HOME/lib/valgrind/libmpiwrap-amd64-linux.so mpirun valgrind
By default, valgrind applies the memcheck tool , which looks for common C/C++ memory errors like accessing unallocated/undefined memory, leaks, etc.
Select an alternative tool by using the –tool argument to valgrind , e.g.
Callgrind / kcachegrind
The callgrind tool can be used in combination with the kcachegrind graphical interface to profile and visualise what routines occupy the most time. Interestingly, it can also show you what cpu instructions the compiler generated for those routines.
e.g. have a program checking to see what numbers in a range are prime. Can run it under callgrind:
valgrind --tool=callgrind -v --dump-every-bb=100000 ./prime
The above command will periodically sample the program, the –dump-every-bb option controls how frequently.
- To relate functions to source code, ensure binary has been built with the -g compiler flag
- To relate functions to the underlying assembly language, execute valgrind with the –dump-instr=yes flag
- callgrind output files will be written to the current diectory
- The kcachegrind will allow you to visualise them
Example screen shots:
Showing the source code and assembly instructions windows:
Showing the call graph and a pictoral representation of wallclock time as area (top right), clearly showing that – for this short run of a toy program – the overheads of OpenMP parallelism simply isn’t worth it (routines __kmp* and for__acquire_semaphore_threaded ):