Debugging with the GNU Debugger (GDB)

The GNU Debugger GDB is a well-known and popular tool to understand what a program does during execution. There are numerous resources available to describe its use. It’s main website is:

http://www.gnu.org/software/gdb/

Debugging MPI programs

GDB is mainly aimed at serial, or multi-threaded programs. Using GDB for MPI programs can be a little fiddly but, as good MPI-aware alternatives tend to be commercial software with a limit on the number of ranks that can be debugged at any one time (depending on the license bought), it is still useful when debugging at scale.

Typically, the technique is to include an infinite loop in your code where you want it to stop. Then, use GDB to attach to the rank and modify the loop to no longer be infinite. You can than step through the code for that rank.

If there is demand, we will provide a simple library to help with this. For now, here is an example in C, which is also callable from Fortran:-

  1. Add this prototype to your source code:
  2. Add this function to your source code:

Add a line like arc_debug_attach("waiting here"); (C) or CALL ARC_DEBUG_ATTACH("waiting here") (Fortran) to be executed by the ranks you are interested in. When the program reaches the line, some text indicating this will be printed to standard output (normally ending up in the job output file):

To debug, login to the compute node with your chosen MPI rank (in this case h7s3b14.arc2.leeds.ac.uk) from the login node, attach gdb to the process and cause the infinite loop to exit.

Here is an example session, doing exactly that:

Note:

  • If the section of code you are debugging involves communication, it is probably easiest to only execute this by a single rank – or as many copies of gdb you feel you can handle in different windows at the same time.
  • If it does not involved communication, executing it by all ranks can be useful: this way if you step too far in one gdb session, other ranks are available to start a new session on, so you can try again without restarting the program.