Debugging with the GNU Debugger (GDB)

The GNU Debugger GDB is a well-known and popular tool to understand what a program does during execution. There are numerous resources available to describe its use. It’s main website is:

Debugging MPI programs

GDB is mainly aimed at serial, or multi-threaded programs. Using GDB for MPI programs can be a little fiddly but, as good MPI-aware alternatives tend to be commercial software with a limit on the number of ranks that can be debugged at any one time (depending on the license bought), it is still useful when debugging at scale.

Typically, the technique is to include an infinite loop in your code where you want it to stop. Then, use GDB to attach to the rank and modify the loop to no longer be infinite. You can than step through the code for that rank.

If there is demand, we will provide a simple library to help with this. For now, here is an example in C, which is also callable from Fortran:-

  1. Add this prototype to your source code:
    void arc_debug_attach(char *string);
  3. Add this function to your source code:
    /* Fortran name mangling */
    #define __arc_debug_attach arc_debug_attach_
    #define HOST_LEN 256
    static int loop   = 1;
    void arc_debug_attach(char *string) {
        char hostname[HOST_LEN];
        /* Obtain hostname */
        gethostname(hostname, sizeof(hostname));
        hostname[HOST_LEN -1] = 0;
        /* Signal to user that this process is ready */
        printf("PID %d on %s ready for debugger (%s)\n", getpid(), hostname, string);
        /* Loop until user sets variable "loop" to 0 with debugger */
        while (loop) {
    /* Fortran-callable stub */
    void __arc_debug_attach(char *string, int len) {
        /* Convert Fortran-style pointer/length string */
        /* to C-style null terminated string */
        char *new = malloc((len +1)*sizeof(char));
        strncpy(new, string, len);
        new[len] = 0;

Add a line like arc_debug_attach(“waiting here”); (C) or CALL ARC_DEBUG_ATTACH(“waiting here”) (Fortran) to be executed by the ranks you are interested in. When the program reaches the line, some text indicating this will be printed to standard output (normally ending up in the job output file):

PID 63622 on ready for debugger (waiting here)

To debug, login to the compute node with your chosen MPI rank (in this case ) from the login node, attach gdb to the process and cause the infinite loop to exit.

Here is an example session, doing exactly that:

[issmcd@login1.arc2 ~]$ grep 'ready for debugger' job_output.o9999
PID 63622 on ready for debugger (waiting here)
PID 64633 on ready for debugger (waiting here)
[issmcd@login1.arc2 ~]$ ssh h7s3b14
Last login: Fri Jul 11 12:16:10 2014 from

              Advanced Research Computing Node 2 (arc2)

[issmcd@h7s3b14.arc2 ~]$ gdb  64633
(gdb) bt
#0  0x00000030a50aca3d in nanosleep () from /lib64/
#1  0x00000030a50ac8b0 in sleep () from /lib64/
#2  0x00000000004018bd in arc_debug_attach (string=0x40c41c "waiting here")
    at arc_debug_c.c:56
#3  0x00000000004015d9 in main (argc=1, argv=0x7fffe89a58a8) at blabla.c:15
(gdb) frame 2
#2  0x00000000004018bd in arc_debug_attach (string=0x40c41c "waiting here")
    at arc_debug_c.c:56
56	        sleep(10);
(gdb) set loop=0
(gdb) break blabla.c:16
(gdb) cont

Breakpoint 1, main (argc=1, argv=0x7fffda0a3d98) at blabla.c:17
17	    MPI_Finalize();

< Note:

  • If the section of code you are debugging involves communication, it is probably easiest to only execute this by a single rank – or as many copies of gdb you feel you can handle in different windows at the same time.
  • If it does not involved communication, executing it by all ranks can be useful: this way if you step too far in one gdb session, other ranks are available to start a new session on, so you can try again without restarting the program.

Debugging a core file

If your program is failing and producing a core file you can use gdb to trace what has caused the core dump. Firstly you must compile the source code with the gnu compiler using the –ggdb flag. Then you can start up gdb with the command:


This runs in the command line and you will notice that your prompt is now (gdb) . Type in bt which stands for back trace to find the cause of the problem.