Using Linux: The Basics

Introduction

This page aims to give a tutorial on the basics of linux that will help new users to linux get the most out of the ARC systems. Because of this the tutorial assumes readers have very little knowledge and so starts at a very basic level and builds up. This is designed both to help those who are completely new to linux and to fill gaps in knowledge for those who have a small amount of knowledge. Feel free to skip sections of the tutorial that do not seem useful.

The materials here are tailored for the ARC systems and have taken some ideas from the software carpentry course called The Unix Shell. If you wish to go into the topics introduced on this page in more detail please book onto one of our courses or work through the software carpentry courses that are available online.

What is a Computer?

A computer has 4 basic functions:

  • It runs programs
  • Stores data
  • Communicates with other computers
  • Communicates with people

Operating Systems

If you are not aware of what an operating system is you probably use a Windows pc and this is first time you have used a supercomputer.

The operating system is the most important program that runs on a computer. It controls and co-ordinates the computer’s activities – taking inputs from the keyboard, mouse and network and passing output data to the screen and network while also working with and sharing memory, disc and other resources between applications and users of the system.

The operating system sits between the hardware (the parts of the computer that are physical entities) and user along with all the applications (software) that the user requires. Many diagrammatic representations look like an onion.

Diagram of the Operating System
Diagram to show the relationship between the hardware, the operating system, shells, applications and users.

The Command Line

There are 2 types of operating system. The type that most people are familiar with aim to be user friendly and so has a sophisticated graphical user interface (GUI). Windows is this type of operating system.

The other type are designed for security and efficiency and requires people to interact with them by typing at the keyboard. The text that is typed into a computer has a very specific syntax and orders the computer to do very specific tasks – these are called commands. Commands are typically only one line long and so this way of communicating with a computer is termed “through the command line. Linux is this type of operating system.

Just about all supercomputers use linux as their operating system and the ARC systems are no exception.

Some Basic Terminology

The place where the commands are typed is called the shell, terminal or window. The shell is an area where you can interactively type text into the computer. Typically the shell has a black background and green or white text. The colour scheme can be changed and I tend to do this.

A linux shell also known as a terminal or window.
The linux shell is also known as a terminal or window.

The text in the shell reads from left to right. On the far left is the prompt which is provided by the computer and it can consist of just one character e.g., $ but often it has other information. In the image of the shell above the prompt includes my user name so I have blurred that for this tutorial.

To the right of the prompt you, the user, type in your commands. A command consists of a short word such as “ls”. The words are short to reduce the number of letter that need to be typed. The “ls” command lists the files in a directory/folder. This command can be used just as “ls” in its bare or vanilla form. Extra functionality can be added by adding flags and arguments after the command. The command is only executed once the user ends the line by typing the RETURN (or end of line) character.

Linux 'ls' command
The linux ls command: 1, is the prompt, 2 is the command, 3 is the flag 4, is the argument.

Do not worry about the purpose of the “ls” command as this is covered later in this tutorial.

Conventions for the Documentation of Commands and Code

There are conventions used when documenting the use of linux commands. I will describe these here so that I can use them on the rest of this page (these conventions are used throughout this web site).

The commands and code are given a special font. I used that font in the last sentence on the words “commands and code”. Blocks of code are presented as a separate block of text that uses the special font for code and has a coloured background when viewed in the browser (although not always when printed). An example is given below.

If the documentation is presenting commands that are typed at the command line then the start of each line is given a standard prompt which is the ” $ ” character. The command line example given above is a screen grab of a shell however this is repeated below in standard documentation style.

The General Format of Linux Commands

Linux commands have the general format:

N.B. The options can also be called the flag while the item can also be called the argument.

Some general information on commands:

  • commands are case sensitive, so for instance ls is a valid command which list the contents of a directory, while LS is not a recognised command.
  • Command options usually have a long and short form, for example the commands:
    • ls -a and ls --all will both list all files in a directory.
  • Command options can be combined or listed separately. For instance:
    • ls -al is equivalent to ls -a -l . The -l option signifies long listing.
  • Usually the command item is given last. Often this is a file or directory name, for example:
    • ls -l GS and not ls GS -l
  • To find out the details of specific command you can use the manual pages via the command man [command] . (If a man page is not available try the help page, see the next bullet point.) For instance:
    • man man will display basic information about the online manual pages. Click on q on the keyboard to quit.
    • man ls will display the manual page for command to list directory contents. Click on q on the keyboard to quit.
  • Further information about a specific command can be accessed through the help option [command] --h . For instance:
    • ls --help will display the help information on the ls command.
    • man ls will display the manual page for command to list directory contents. Click on q on the keyboard to quit.

Organizing Your Data – Directory/Folder Structures

A number of conventions exist to help people organise data on computers. All data is stored in files and these are grouped into directories (or folders) – this terminology uses visual metaphors for how data was stored on paper in the 1970s. Many of these visual metaphors are used in computing even though many of the objects referenced in the metaphor no longer exist. If you are interested you can read more on visual metaphors here.

Linux tools to organise data use terms similar to stationary
Tools to organise data on Linux systems rely on old fashioned visual metaphors to do with stationary.

It is easier to find data if it is stored in a systematic way that uses directories to group similar files together.

Conventions of the Linux File System

To make it easy to use and manage computers that use the Linux operating system there are set conventions about where different types of data and software are stored as well as conventions about where a user is directed to when they login.

The Linux Filesystem Simplified
The Linux Filesystem Simplified

The top directory on any computer is termed the root directory and is the one that all other directories are in. The top or root directory is provided by the operating system and is sometimes simply called the root. The directories that come directly off the root are well known and used by the system administrator. As a normal user on a large system such as the ARC systems you will not be able to write to these directories.

Your Home Area and Your /nobackup Area

When you as a user of a computer login you will not be directed to root but to a directory that is just for you. This is called your user area or your $HOME directory. On the ARC systems the directory that you see when you login is given the name that is the same as your University wide userid. The directory that your $HOME directory sits in is normally called Users . This is not the case for the ARC systems this is because our systems have so many users that we have more than one directory to hold all the users account and these map to 2 different servers that handle logins. You will notice this later in the tutorial when you login.

User Areas In the Linux Filesystem are grouped in one directory
User Areas In the Linux Filesystem.

When you are using a linux system it is important to know where you are. This means you need to know the path. The the absolute path is the list of directories that you are in from the root. Any data or software on the system also has a path.

Diagram showing that the Directory Tree is Upside Down.
The Directory Tree is Upside Down.

Sometimes the absolute path may be very long and so there are also relative paths. The relative path is the list of directories between two directories.

Diagram to show the Relative Path.
The Relative path is the path between files or directories.

Relative paths are useful because they can be much shorter than an absolute path making them easier for a human to read and they work on collections of files that are in the same file structure but on different machines so are useful for particular software development and maintenance tasks. Relative paths tend to start with ./ which means your current directory or with ../ which means the directory up form your current directory.

The relative path in the diagram can be written as:

while the absolute path would be:

/nobackup

The ARC systems have a directory called /nobackup. This directory has a specially fast connection to the compute nodes and if your program reads data from and writes data to this directory then your jobs will run much faster. We recommend that you create a directory in the /nobackup directory and set your program to read and write to that. Most people create a directory that has the name of their userid. When you create a directory in /nobackup others cannot read your data unless you set their permissions so that they can. There is more about permissions later in this tutorial. Please see our advice on /nobackup.

Login

We are now ready to login so that we can look at the file systems, create directories and do some other simple exercises. Rather than repeat information that is elsewhere on the web site it is best if you follow the instructions for the logon from your system. Login with X-Windowing as we will need that in the exercises that are coming up. You do not need to know the details of X-windowing but you need to know of its existence as it enables important display functionality when you remotely login to any supercomputer – X-windowing must be enabled whenever you wish to use a graphical user interface. Also instructions on how to transfer files from each type of local computer are also included in these web pages. You will also need to follow those instructions soon.

Moving Around the Linux File System

Each user has a home directory and the location of this directory is its path. Type the command:

and the system will type the path of your home directory. Don’t forget that the $ character acts as the prompt because you are at the command line. The letters pwd stand for print working directory.

Type the command:

and you have just created a directory called hello in your home directory. You can see this directory by typing the list command ls:

The list of files in your home directory should include the directory that you just created called “hello”. Your home directory is also known as $HOME or ./~ . Type the ls command with both of these names for your home directory as inputs to the command. That means you type:

and

Confirm to yourself that you see the same list of files with each of these commands.

All files and directories on a computer have a path. The location of a path from the root is called its absolute path while the location between 2 files is its relative path. If you type:

The path has now changed because you are in a different directory.

We now wish to remove the hello directory so we can continue with the tutorial. We cannot delete a directory we are in so type:

N.B. The letters ../ mean go down a directory while the letters ./ means in the current directory.

The location of all files and directories on a computer can be drawn as a tree. The convention is to draw the tree upside down so that its root (the main trunk of the entire file system) is at the top of the diagram. This means you move “down” the tree towards the leaves and “up” the tree towards the root which is the opposite direction to a real tree.

File Transfer

Now you are on the system it is useful to transfer data on and off it. There are some data files that you can download here so that you can do some very simple exercises.

Compressed Files and File Archives (.zip and .tar.gz)

Transferring data onto and off a supercomputer can be slow. One of the best ways to reduce the time it takes is to reduce the size of the file/files. Compressing the files is a good way to do that. The software that compresses files can often also group files into one collection which is called an archive.

The standard compression and archive file for Windows is known as a zip file and has the .zip file extension. If you use a Windows machine and wish to work through the examples on this page download the practicals.zip file. (right click on the link and save the file, then open the folder that the file is in)

The standard archive file format for linux is known as a tar file and has the .tar file extension. Tar files do not have to include compression and the user can choose the type of compression so that it provides optimal compression on their data. One of the most commonly used compression formats is the gz compression format and its file extension is .gz . If you use a linux machine and wish to work through the examples on this page download the practicals.tar.gz file. (right click on the link and save the file, then open the folder that the file is in)

Unpacking the Practical Exercises – Windows Format (.zip)

To unpack the practical exercises file enter the following into the command line:

The practical exercises should be unpacked into the directory $HOME/GS , where $HOME is your home directory on ARC2.

Unpacking the Practical Exercises – linux Format (.tar.gz)

You will notice that the practical exercises file you have transferred to ARC2 has two file extensions: tar and gz. This means that it is a compressed (the gz bit) archive (the tar bit- tar is short for ‘tape archiving’ as all the files have been combined into a single archive).
To unpack the practical files enter the following into the command line:

You will note that the tar instruction has been passed four parameters: zxvf

  • z= unzip
  • x= extract archive
  • v= verbose output (so we know what is happening)
  • f= unpack the filename that follows

The practical exercises should be unpacked into the directory $HOME/GS , where $HOME is your home directory on ARC2.

Some Simple File Handling Exercises

To do these exercises you will need to use some more linux commands and a text editor. Before we start the exercises I have given some information on some of the common file handling command along with information on text editors.

Some Useful Linux Commands

Command Description
ls List the contents of a directory.
more Print to screen the content of a file.
head Print to screen the top section of a file.
tail Print to screen the bottom section of a file.
mkdir Create a directory.
rm Delete a file or directory.
cp Copy a file or directory.
mv Move a file or directory.
zip Create a zip file; an archive file that is compressed.
unzip Unpack a zip file; an archive file that is compressed.
tar Create or unpack a tar file; an archive file that is not compressed.
gz Compress or uncompress a file.
grep Search for a text string in a file.
find Search for a file with a particular or property such as date it was created.

There is a great online resource that dissects any shell command you type in and displays help text for each piece of the command:

http://explainshell.com/

Text Editing

As you can see text is the fundamental interface that we have with a linux computer. It is important to be able to write text and save it in files. This is done with text editors. There are several types of text editor on the ARC systems. There are 2 main types of text editor. The first I will describe are in-shell editors and these include vi and nano . The second type are windowed text editors and these include gedit and emacs.

All text editors are launched by typing the name of the editor. If you wish to edit a particular file then you can open it once you have launched the editor or you can type the name of the file after the name of the editor.

In-Shell Text Editors – X-windows are not required

This type of text editor appear in the shell and stop you from using the shell for other purposes while you edit your files. Interaction with the editor is via quick-key commands. They are very simple and only require a very basic login that does not handle graphical user interfaces. System administrators like these because they work no matter what state the system is in.

vi quick-key commands are available here:

http://www.shortcutworld.com/en/linux/vi.html

http://www.nano-editor.org/dist/v2.5/nano.html

http://vim.sourceforge.net/docs.php

Windowed Text Editors – X-windows are required

This type of text editor pop-up in another window and have a graphical user interface. It is sensible to type an ” & ” character at the end of a command that launches this type of editor. The ” & ” character makes a command run in the background and this means that you can carry on using your shell while you have the text editor open.

http://www.shortcutworld.com/en/linux/Emacs_23.2.1.html

http://www.shortcutworld.com/en/linux/gedit_2.3.html

https://www.gnu.org/software/emacs/

File Handling Exercises

Now that you have transferred and unpacked the practical exercises we are ready to take a look at the files that we have on our system. If you type:

You will see that there is now a directory called GS. At this time we do not know what is in this directory. The directory contains FORTRAN, C and C++ source code. If you use installed software you probably do not need to use source code and if this is the case do not worry these are just files that we can use for the exercises.

I like writing code in C so I am interested in finding if there are any files in the GS directory that have the .c file extension. I can do that with the linux find command.

The * is called the wild card and it can be used to stand in for any other text or number sequence that may be part of a file or directory name. We now have a list of all the files that have a .c file extension which means we know the names and paths of all the C source code. Source code is text so we can look into the file.

This is not a long file so we can see all the file printed out into the shell. Lets have a closer look at this file.

We have moved into the directory that holds the hello.c file. The ls -la command lists all the files in the directory along with some other information. In the column to the far left of this list is what we call the permissions. This is a series or 10 characters. The first character of this series is “d” if this is a directory and a “-” if it is a file. The other characters tells us who can read, write and execute each file or directory (you must have execute permissions on a directory to be able to get any information about its contents). There are 3 types of user who may be able to read, write or execute each file or directory these are the user (the owner or creator of the file), the group (a collection of users defined by systems admin) and all (which is everyone on the system. The right most column of the list is the file name and next to that is the date that the file was created.

You should see that all the files in this directory you have read and write permissions for (the “rw” characters) while the directories you can also execute (the “x” character).

Now we are in the same directory as this C program we will compile it. This means we will turn it from a text format which is human readable into a machine readable format that can be executed or run.

Now when we list the files in the directory we can see a new one. It is called hello and it has a different permissions to the rest of the files and its date of creation is the time you ran the first command (the one that started with icc ). In this files permissions there is an “x” which means we can now execute this program.

Text is printed to the screen and the text looks very similar to the text we saw when we typed the more command. We can alter this text so it says something more meaningful. To do this we need to open the file using a text editor.

I am going to edit the text so that it says hello to me on the line before it prints the rest of the message. This will change the line that reads as:

to:

The gedit text editor has the hello.c file open.
The gedit text editor has the hello.c file open and it is edited.

I am called Jo so it says “Hello Jo” but feel free to use your own name. We need to compile this again. We could use the same command that we did before but that would write over the file we already have. Instead we will use the command:

If we went away for a cup of tea we may forget when we returned which file we had edited. We could use the ls command and look at the time the file was created or we could use the grep command and look for a key word that we had edited into the file.

The -i flag does a search that is case-insensitive (a search that cannot tell uppercase from lowercase). This search gives a number of files in this directory. If we get rid of the -i flag and type the word “Hello” rather than “hello” we will get a more exact match.

This is the end of these exercises so now we can go back to our $HOME directory, delete the tutorial files and exit the system.

This time we have used the rm -rf command rather than the rmdir command this is because with the rmdir command we would need to delete the content of each directory with the rm command before we could use the rmdir command and this would be time consuming. Many people do not like the rm -rf command and try to use it as little as possible this is because in linux there is no trash where files can be found if wrongly delete. Chose your commands with care. More information on commands can be found on the web. The Stack Overflow web site is popular but a simple web search is often helpful.

Time Saving Tips

This section provides information on some techniques that can be used while typing at a linux shell to save time.

Job Control Codes – How to Stop a command

This is not a full list of control codes but gives some of the most useful to a learner. If you wish to know more you can google control codes. There are a small number of special commands that allow you to interact with a running command. As a learner these are most commonly used to stop a command that you have not fully understood and is running to long.

Control Code Description
Ctrl-C Interrupts and cancels a running command.
Ctrl-Z Suspends a command but does not cancel it – it runs in the background. It can be re-started by typing fg . Type the jobs command to see what jobs you have running in the foreground and background.
exit Logout – closes the session

Tab for Auto-completion

Auto-completion, or word completion, is when the shell predicts the rest of a word that a user is typing. Once the users has started to type the name of a file or directory at the shell prompt they can press the tab key to get the system to complete the name of the file. If there are multiple possibilities then the system will show each option each time the tab key is pressed. This can be useful when typing in long file name or identifying a long path to a file.

Combining Commands with Pipes – with a Search Example

The pipe command, | is used to combine a number of lunix commands into one.

I often use a pipe when I wish to find particular files. The grep command search a file for a particular string (a string is a quoted piece of text)

If I want to find all the lines that do not have that string I would type:

I can combine them with pipes. The resulting script reads from left to right, the output of each section being passed to the next.

This is good but I have to know which file to look in. The find command is good for finding files with particular strings in its name or matching other properties such as its permissions.

The grep command can also be combined with the find command which gives some powerful options for finding files and data. A pipe can be used to do this:

Regular Expressions

A Regular Expression is the sequence of characters that can be used for a search pattern. In this tutorial we only use the * which is a wild card and when used in a command it stands for zero or more characters of any type i.e., all letters of the alphabet (upper and lower case) as well as all numeric characters and punctuation marks. There is more than the wild card that can be used to identify a search pattern for example the ? stands for one character of any type. Expressions such as [AB] mean characters A and B while [1-9] means numbers 1 to 9. If you have to search text then you should look into these expressions – the term that covers them is Regular Expressions and many resources can be found on line.

Back Arrow for History

Each time a command is type into a shell it is stored in a log file, typically the log file stores the last 50 operations. The forward and back arrow keys allow you to access the previous commands while the left and right arrow keys allow you to move back into the text and so edit it.

Redirection to Read and Write to and from Files

The >> command redirects the output of a command into a file while the << command reads data in from a file. You can can use the redirect command to write your history into a file so that you do not loose the order of the commands you have run as you continue to create new entries into your history log.

You can open the file myhistory in a text editor and delete the unwanted commands until you have a list of the commands that you need to do a particular task in linux.

Simple Scripts

Scripting in linux is a very large topic and you can write scripts in a variety of languages which includes python, perl and for your shell i.e., bash. This section provides some very simple information about linux scripts. There are many online resources that you can use if you wish to create your own scripts.

Linux scripts allow you to put a series of commands into one text document that can be run by just typing the name of that script. People often start writing scripts for complex functions that they are regularly required to do. These commands can easily be identified from your history log or from a series of commands that you have pipped together.

Here is an example if a very simple script:

If you write this script on a linux machine you will have to change the permissions on the file so that it will execute before you can run it.

The command below will run your script if you are in the same directory as the script.

You work on the ARC systems will be run on the compute nodes. The compute nodes are very busy because many people are working on the system this means that your job will have to queue while it waits for a suitable space on the system. The queue is handled by the schedular and requires you to write a special script called a job submission script. Your first job submission script is unlikely to accurately predict the computational resources your job will need. It is best to request to many resources initially as otherwise your job may fail and gradually refine the resources you request so that you priority in the queue is as high as possible. We have a number of web pages that guide you through writing a job submission script. It is best to start with the top level one called The Job Submission System: SGE.