Tutorial of Parallel Computing

OpenMP

Fortran and C OpenMP example codes can be found in ~/GS/*/OpenMP , where * represents the compiler of your choice (either Fortran or C). The program takes an image that has undergone edge detection and reconstructs the original image iteratively. The program is initially set to perform 1000 iterations. Compile and run the code using full optimisation:

For the Intel compiler, for Fortran use:

or for C use:

Time the execution of this code on a single processor, launching under time. A measure of convergence is given as the final residue figure approaches zero. The reconstructed image is placed into the file finalimage.pgm which can be viewed with the graphics program display .

The number of parallel threads is set through the $OMP_NUM_THREADS variable. Increase the number of parallel threads to 2 and rerun the code. To set the number of threads to 2 via the environmental variable use:

and to unset/delete the variable use:

NOTE: When running the Intel compiled Fortran code on the login node, you may get a Segmentation fault. This is due to the way that the compiler handles the stack when using OpenMP. You can resolve this issue by setting the stack to an unlimited value, using ulimit -s unlimited on the command line. This does not happen when submitting via the queues as unlimited is the default value of the stack on the computational nodes.

To compile the above code with the Portland Group Fortran compiler:

or C:

OpenMP job submission

Below is a script that will launch an OpenMP parallel job:

Use this script to obtain accurate timing information for running this code on 1, 2, 4, 6 and 8 CPU cores for both the Intel and Portland group compilers.

MPI

MPI is available via wrapper scripts which call the relevant compiler, together with necessary include files and library calls. There are different wrapper scripts available depending upon the choice of compiler and MPI library.

The MPI wrapper scripts, take the form mpif77 (Fortran 77), mpif90 (Fortran 90), mpicc (C) and mpiCC (C++).

Compiling MPI

All compiler options applicable to the compiler being invoked are available to the wrapper scripts. Use the MPI wrappers to compile the code and link with the standard MPI library; for Intel Fortran:

or for C:

To launch the code, use the mpirun launcher. This takes an option -np <n> , where n is the number of processes to be launched. Execute and time the code for 1,2,4 processes; e.g. for 2 processes use:

The output from this program will be placed into the file output.pgm which can be viewed with the display command. E.g.:

To use the PGI compilers, first switch the PGI module module switch intel pgi . Then for PGI Fortran use:

or for C:

You can then run the program in the same way as described above

MPI Job Submission

The system is set up, such that the job will placed in an optimal fashion by default. This means that the cores will be selected from the available resources in such a way to minimise the communication hops between the different processes. This will have the effect of reducing latency and should improve program performance.

Below is a script that will launch a MPI parallel job:

Use this script to obtain accurate timings for 1, 2, 4, and 8 processes. If you have time, repeat the exercise with the PGI compiler as well as the Intel compiler.