Compiling and Running Codes

The aim of this practical tutorial is to ensure that users can compile and run different types of programs on ARC2.

The first part of the tutorial involves compiling and running a set of simple hello world type programs.

The second set of exercises involves compilation and execution of simple matrix vector multiplication code using different compiler options, introducing different optimisation levels and the -fast macro. This code is also linked to the Basic Linear Algebra Subroutine (BLAS) library.

You can download the exercises as a zip file, practicals.zip or as a tarred and gzipped file practicals.tar.gz. (right click on the link and save the file, then open the folder that the file is in). These files were also used in the web page Using Linux the Basics.

The example code is provided in both C and Fortran, please the choose the language you are more comfortable with.

Note about modules

The modules command is installed on the system in order that several compilers and their corresponding libraries can co-exist on the system. In addition there are numerous software applications (see this list of applications) available to load via the module command. When a module is loaded/unloaded the user environment is altered in order that the desired software can be used. To check what current modules are loaded, the command:

can be issued at any time. To see the complete list of modules, together with a brief description use:

to view a list of all available modules, with no descriptions use:

Modules can be loaded with:

unloaded with:

To switch between the currently loaded collection of modules and an alternative set of compiler-specific modules,
use:

e.g.

will switch to using the PGI compiler from the currently loaded environment which uses the Intel compiler.

Basic compilation using the Intel compiler

This is a simple exercise to introduce you to the compilation of C/C++ and Fortran 77/90 programs. The example code can be found in ~/GS/compiling . The Intel compiler is the loaded by default when you login to ARC2. If you have switched the to the PGI compiler in that last section, you can switch back to the Intel compiler through the module switch command:

C

The C code can be found in the file hello.c . To compile it using the Intel compiler issue the command:

This will produce an output file called hello . To run it, issue the command:

It should produce the following output:

C++

Example C++ code can be found in the file greetings.cc . To compile it, issue the command:

This will produce an output file called greetings. To run it, issue the command:

Fortran 90/77:

A Fortran 90 example is included in the file easy.f90. Compile and run this by issuing the commands:

A Fortran 77 version can be found in the file simple.f. Invoke the Fortran 77 compiler using:

Basic compilation using the Portland group compilers

To set the PGI compiler up you must first switch your environment using the module command:

The PGI compiler can now be used to compile the above programs using: pgcc for C, pgCC for C++, pgf90 for Fortran 90 and pgf77 for Fortran 77. Repeat the exercise above with the PGI compiler:

i.e. to compile the C code:

to compile the C++ code:

to compile the Fortran 90 code:

and for Fortran 77:

To switch your environment back to Intel, issue the command:

Basic compilation using the GNU compilers

The GNU compilers ( gcc (C), g++ (C++) , g77 (Fortran 77) and gfortran (Fortran 90) are also available on the system. Although the operating system native version of the GNU compilers are normally included in your PATH , it is best to load the module with the latest version of this compiler so that any corresponding libraries are available in your environment:

Then, to compile the C code:

to compile the C++ code:

to compile the Fortran 77 code:

and for Fortran 90:

Compiler Flags

In this exercise, different compiler flags are introduced and the performance of a simple (matrix * vector) code is analysed. The code can be found in ~/GS/*/flags where * represents the language of your choice, i.e. C or Fortran.

The code performs a (matrix * vector) operation in three different ways: Looping over columns in the inner most loop, looping over matrix rows in the inner most loop or using the library routine DGEMV.

Please work with either the C or Fortran codes depending upon which compiler you will use the most.

Note about numerical libraries

There are several versions of numerical libraries installed on the system. Currently, there are four versions, Intel’s Maths Kernel Library (MKL) library, AMD’s Core Maths Library (ACML), Automatically Tuned Linear Algebra Software (ATLAS) and the original Netlib. All these libraries, aside from Netlib, are optimised to run on the available hardware. To load a specific library, e.g. MKL:

to switch to another version of the libraries, e.g. ACML:

Initial Fortran compilation

Intel compiler

As a first step, simply compile the code as in the first exercise, i.e.:

This will then give the error that the BLAS DGEMV routine cannot be found. Several versions of this library are installed on the system. After loading one of these libraries, see section 2.1 above, you can correctly link to it using the customised environmental variable $ARC_LINALG_FFLAGS .

For example to load the MKL library use:

and to compile the code use:

You can switch to another version of the numerical libraries and use the same compile line.

In each case once the program has successfully compiled, run the executable by typing its name on the command line, comparing the execution times:

The program will print out the size of the problem, memory size used and timing (in seconds) of each of the
sections. A calculation of performance, expressed in megaflops is also printed.

To obtain global timing information, the time command can be used when running the code:

You can switch to another version of the numerical libraries and use the same compile line and execute the result in the same way.

At this stage, how does the performance of the different ways the calculation is performed compare? How does
this change with problem size?

Portland group compiler

This exercise can be repeated using the PGI compiler. First change the modules loaded to switch to using this
compiler:

You can then compile the code, while linking to the loaded numerical library of your choice, using:

Run the executable by typing the file name on the command line:

Initial C compilation

Intel compiler

As an initial step, simply compile the code as in the first exercise, i.e:

This will then give the error that the BLAS DGEMV routine cannot be found. Several versions of the BLAS library are installed on the system. After loading one of these libraries, see section 2.1 above, you can correctly link to it using the customised environmental variable $ARC_LINALG_CFLAGS .

For example to load the MKL library use:

and to compile the code use:

Once the program has successfully compiled, run the executable by typing its name on the command line, comparing the execution time:

The program will print out the size of the problem, memory size used and timing (in seconds) of each of the sections. A calculation of performance, expressed in megaflops is also printed.

To obtain global timing information, the time command can be used when running the code:

At this stage, how does the performance of the different ways the calculation is performed compare? How does this change with problem size?

PGI compiler

This exercise can be repeated using the PGI compiler. First change the modules loaded to switch to using this
compiler:

As before use one of the available libraries, for e.g. to use MKL, if you have not already loaded it, issue the command:

and to compile the code use:

Run the executable by typing the file name on the command line:

Optimisation

Until now, we have not allowed the compiler to optimise the code at all. There are many options which can be experimented with in order to get the most out of your code. A small subset of these are introduced here for the Intel and PGI compilers:

Intel compilers

By default the Intel compiler uses the optimisation flag -O2 flag . This can be turned off by using -O0 flag, for no optimisations. e.g. for Fortran:

or for C:

Now substitute -O for more aggressive optimisations with -O3 . How does the runtime of the code differ with these options?

The -fast flag includes a combination of optimisation options which in general improve the runtime of code. There are two architecture specific optimisation options available -xSSE4.2 , which produces code specifically optimised for the current architecture and -axSSE4.2 , which produces the specialised code and generic code to run on other processors. Experiment with these flags to see how performance of the code is alters.

Portland Group compilers

For the PGI compiler add the -O flag for default optimisations, e.g. for Fortran:

or for C:

Now increase the optimisation level to 3 -O3 and observe how this affects the performance. There is also a -fastsse flag which allows specific instructions for the current architecture to be included. Experiment with these options to see how
performance is affected.