Compiling and Running Codes

The aim of this practical tutorial is to ensure that users can compile and run different types of programs on ARC2.

The first part of the tutorial involves compiling and running a set of simple hello world type programs.

The second set of exercises involves compilation and execution of simple matrix vector multiplication code using different compiler options, introducing different optimisation levels and the -fast macro. This code is also linked to the Basic Linear Algebra Subroutine (BLAS) library.

You can download the exercises as a zip file, practicals.zip or as a tarred and gzipped file practicals.tar.gz. (right click on the link and save the file, then open the folder that the file is in). These files were also used in the web page Using Linux the Basics.

The example code is provided in both C and Fortran, please the choose the language you are more comfortable with.

Note about modules

The modules command is installed on the system in order that several compilers and their corresponding libraries can co-exist on the system. In addition there are numerous software applications (see this list of applications) available to load via the module command. When a module is loaded/unloaded the user environment is altered in order that the desired software can be used. To check what current modules are loaded, the command:


$ module list 

can be issued at any time. To see the complete list of modules, together with a brief description use:


$ module whatis 

to view a list of all available modules, with no descriptions use:


$ module avail 

Modules can be loaded with:


$ module load  

unloaded with:


$ module unload   

To switch between the currently loaded collection of modules and an alternative set of compiler-specific modules,
use:


$ module switch   

e.g.


$ module switch intel pgi 

will switch to using the PGI compiler from the currently loaded environment which uses the Intel compiler.

Basic compilation using the Intel compiler

This is a simple exercise to introduce you to the compilation of C/C++ and Fortran 77/90 programs. The example code can be found in ~/GS/compiling . The Intel compiler is the loaded by default when you login to ARC2. If you have switched the to the PGI compiler in that last section, you can switch back to the Intel compiler through the module switch command:


$ module switch pgi intel

C

The C code can be found in the file hello.c . To compile it using the Intel compiler issue the command:


$ icc -o hello hello.c

This will produce an output file called hello . To run it, issue the command:


./hello

It should produce the following output:


You have successfully compiled and run a C program!

C++

Example C++ code can be found in the file greetings.cc . To compile it, issue the command:


$ icpc -o greetings greetings.cc

This will produce an output file called greetings. To run it, issue the command:


$ ./greetings

Fortran 90/77:

A Fortran 90 example is included in the file easy.f90. Compile and run this by issuing the commands:

 
$ ifort -o easy easy.f90
$ ./easy

A Fortran 77 version can be found in the file simple.f. Invoke the Fortran 77 compiler using:

 
$ ifort -o simple simple.f 

Basic compilation using the Portland group compilers

To set the PGI compiler up you must first switch your environment using the module command:


$ module switch intel pgi

The PGI compiler can now be used to compile the above programs using: pgcc for C, pgCC for C++, pgf90 for Fortran 90 and pgf77 for Fortran 77. Repeat the exercise above with the PGI compiler:

i.e. to compile the C code:

 
$ pgcc -o hello hello.c 

to compile the C++ code:


$ pgCC -o greetings greetings.cc 

to compile the Fortran 90 code:


$ pgf90 -o easy easy.f90 

and for Fortran 77:


$ pgf77 -o simple simple.f 

To switch your environment back to Intel, issue the command:


$ module switch pgi intel

Basic compilation using the GNU compilers

The GNU compilers ( gcc (C), g++ (C++) , g77 (Fortran 77) and gfortran (Fortran 90) are also available on the system. Although the operating system native version of the GNU compilers are normally included in your PATH , it is best to load the module with the latest version of this compiler so that any corresponding libraries are available in your environment:


$ module switch intel gnu/4.8.1 

Then, to compile the C code:


$ gcc -o hello hello.c 

to compile the C++ code:


$ g++ -o greetings greetings.cc 

to compile the Fortran 77 code:


$ g77 -o simple simple.f 

and for Fortran 90:


$ gfortran -o easy easy.f90

Compiler Flags

In this exercise, different compiler flags are introduced and the performance of a simple (matrix * vector) code is analysed. The code can be found in ~/GS/*/flags where * represents the language of your choice, i.e. C or Fortran.

The code performs a (matrix * vector) operation in three different ways: Looping over columns in the inner most loop, looping over matrix rows in the inner most loop or using the library routine DGEMV.

Please work with either the C or Fortran codes depending upon which compiler you will use the most.

Note about numerical libraries

There are several versions of numerical libraries installed on the system. Currently, there are four versions, Intel’s Maths Kernel Library (MKL) library, AMD’s Core Maths Library (ACML), Automatically Tuned Linear Algebra Software (ATLAS) and the original Netlib. All these libraries, aside from Netlib, are optimised to run on the available hardware. To load a specific library, e.g. MKL:


$ module load mkl

to switch to another version of the libraries, e.g. ACML:


$ module switch mkl acml

Initial Fortran compilation

Intel compiler

As a first step, simply compile the code as in the first exercise, i.e.:


$ ifort -o matmul matmul.f90 

This will then give the error that the BLAS DGEMV routine cannot be found. Several versions of this library are installed on the system. After loading one of these libraries, see section 2.1 above, you can correctly link to it using the customised environmental variable $ARC_LINALG_FFLAGS .

For example to load the MKL library use:


$ module load mkl

and to compile the code use:


$ ifort -o matmul matmul.f90 $ARC_LINALG_FFLAGS

You can switch to another version of the numerical libraries and use the same compile line.

In each case once the program has successfully compiled, run the executable by typing its name on the command line, comparing the execution times:


$ ./matmul 

The program will print out the size of the problem, memory size used and timing (in seconds) of each of the
sections. A calculation of performance, expressed in megaflops is also printed.

To obtain global timing information, the time command can be used when running the code:


$ time ./matmul 

You can switch to another version of the numerical libraries and use the same compile line and execute the result in the same way.

At this stage, how does the performance of the different ways the calculation is performed compare? How does
this change with problem size?

Portland group compiler

This exercise can be repeated using the PGI compiler. First change the modules loaded to switch to using this
compiler:


$ module switch intel pgi 

You can then compile the code, while linking to the loaded numerical library of your choice, using:


pgf90 -o matmul matmul.f90 $ARC_LINALG_FFLAGS

Run the executable by typing the file name on the command line:


$ ./matmul

Initial C compilation

Intel compiler

As an initial step, simply compile the code as in the first exercise, i.e:


$ icc -o matmul matmul.c 

This will then give the error that the BLAS DGEMV routine cannot be found. Several versions of the BLAS library are installed on the system. After loading one of these libraries, see section 2.1 above, you can correctly link to it using the customised environmental variable $ARC_LINALG_CFLAGS .

For example to load the MKL library use:


$ module load mkl

and to compile the code use:


$ icc -o matmul matmul.c $ARC_LINALG_CFLAGS

Once the program has successfully compiled, run the executable by typing its name on the command line, comparing the execution time:


$ ./matmul 

The program will print out the size of the problem, memory size used and timing (in seconds) of each of the sections. A calculation of performance, expressed in megaflops is also printed.

To obtain global timing information, the time command can be used when running the code:


$ time ./matmul 

At this stage, how does the performance of the different ways the calculation is performed compare? How does this change with problem size?

PGI compiler

This exercise can be repeated using the PGI compiler. First change the modules loaded to switch to using this
compiler:


$ module switch intel pgi

As before use one of the available libraries, for e.g. to use MKL, if you have not already loaded it, issue the command:


$ module load mkl

and to compile the code use:


$ pgcc -o matmul matmul.c $ARC_LINALG_CFLAGS

Run the executable by typing the file name on the command line:


$ ./matmul

Optimisation

Until now, we have not allowed the compiler to optimise the code at all. There are many options which can be experimented with in order to get the most out of your code. A small subset of these are introduced here for the Intel and PGI compilers:

Intel compilers

By default the Intel compiler uses the optimisation flag -O2 flag . This can be turned off by using -O0 flag, for no optimisations. e.g. for Fortran:


ifort  -O0 -o matmul matmul.f90 $ARC_LINALG_FFLAGS

or for C:


$ icc -O0 -o matmul matmul.c $ARC_LINALG_CFLAGS

Now substitute -O for more aggressive optimisations with -O3 . How does the runtime of the code differ with these options?

The -fast flag includes a combination of optimisation options which in general improve the runtime of code. There are two architecture specific optimisation options available -xSSE4.2 , which produces code specifically optimised for the current architecture and -axSSE4.2 , which produces the specialised code and generic code to run on other processors. Experiment with these flags to see how performance of the code is alters.

Portland Group compilers

For the PGI compiler add the -O flag for default optimisations, e.g. for Fortran:


$ pgf90 -O -o matmul matmul.f90 $ARC_LINALG_FFLAGS 

or for C:


$ pgcc -O -o matmul matmul.c  $ARC_LINALG_CFLAGS

Now increase the optimisation level to 3 -O3 and observe how this affects the performance. There is also a -fastsse flag which allows specific instructions for the current architecture to be included. Experiment with these options to see how
performance is affected.