Advanced SGE: qsub & qrsh Usage


There are 2 commands that are used to run a job on the ARC systems.

The first uses the qsub command and submits a job onto the queue. The jobs waits in a queue that is managed by the scheduler until there are the right resources available for the job to run and jobs are given a priority that relates to your previous use of the system – the more you and your faculty use the system the longer the wait. This is also called batch mode.

The second uses the qrsh command and requests resources on the compute node for immediate interactive use. The scheduler also manages these requests so it is more likely that you will get the resources if the request is small, if the request cannot be met in a few minutes the request is refused. This type of job is useful if you need to work interactively with a GUI or if you want to debug your codes.

Both of these commands have very similar options for usage.

qsub – Running a Batch Queue Job on Compute Nodes

The most straightforward way to set queue submission options is to add them to a batch submission script.
To specify options that are interpreted by the queuing system, the relevant lines must begin with the sequence #$ .

As an example, to run an 8 hour job, across 64 Infiniband cores, exporting all variables and using the current working directory, a suitable script would be:

#$ -l h_rt=8:00:00
#$ -pe ib 64 
#$ -cwd -V 
mpirun myprogram

And the job submitted with:

$ qsub

These options can also be set as parameters to the qsub command.

The general format for qsub is:

$ qsub [options] script_file_name [--script-args] 

where script_file_name is a file containing commands to executed by the batch request.

So for this example:

$ qsub -cwd -V -pe ib 64 -l h_rt=8:00:00

qrsh – Running an Interactive Job on Compute Nodes

The general command to run an interactive session through the batch queues with the qrsh command is as follows:

$ qrsh [options] <application/programme_name> [--script-args] 

where application/programme_name is the name of the application executable or programme executable to be run.

Alternatively, using the qrsh command with no application defined will start a login session on the compute nodes. Please note that this method should only be used for serial jobs. So using:

$ qrsh [options]

will start a remote login session. Please note that using qrsh in this way will not export your current environmental variables and so you will need to load any modules you need to use.

Please note that the -cwd and -V options are not valid if you are not specifying an application to run in the qrsh session.

Running Interactive MPI jobs

For development and testing purposes, it might be convenient to run MPI jobs in interactive mode and have them launch fairly quickly. In this case it would be advisable to use the -l placement=scatter option. This will most likely get the job to launch fairly quickly at the expense of performance. For instance to run mympiprogram for 30 minutes on 4 cores from the currrent directory and using the current environment, use:

$ qrsh -V -cwd -l h_rt=0:30:0,h_vmem=1G,placement=scatter -pe ib 4 mpirun ./mympiprogram

Running interactive GPU sessions

If you want to use a GPU interactively instead of submitting a job to the GPU nodes you need to request a coprocessor and specify your shell as part of your qrsh, ie:

$ qrsh -l h_rt=2:0:0,coproc_k80=1 -pty y bash

This will give you a bash session on a k80 GPU node for two hours, using one half of the resources on that GPU node.


Commonly used options are given in the table below which is divided into sections for similar functionality:


Option ARC3 ARC2 MARC1 Description Default
Duration in Time
-l h_rt=hh:mm:ss Y Y Y The wall clock time (amount of real time needed by the job). This parameter must be specified, failure to include this parameter will result in an error message Required
OpenMP or Serial Large Memory
-l h_vmem=memory Y Y Y Sets the limit of virtual memory required (for parallel jobs per process). If this is not given it is assumed to be 1GB/process. If you require more memory than 1GB/process you must specify this flag. e.g. -l h_vmem=12G will request 12GB memory. The maxium memory that can be requested for a shared memory job is the sum of the processes requested and the virtual memory requested that must be equal to or less than the amount available in a single node. 1G
-pe smp np Y Y Y Specifies the shared memory parallel environment for parallel programs using OpenMP. np is the number of cores to be used by the parallel job. The number maximum number of cores that can be requested for shared memory jobs is limited by the number of cores available in a single node.
-pe ib np Y Y Y Specifies the parallel environment for parallel programs using MPI, np is the number of cores to be used by the parallel job.
-l nodes=x[,ppn=y][,tpp=z] Y Y Y Specifies a job for parallel programs using MPI. Assigns whole compute nodes. x is the number of nodes, y is the number of processes per node, z is the number of threads per process.
-l np=x[,ppn=y][,tpp=z] Y Y Y Specifies a job for parallel programs using MPI. Assigns whole compute nodes. x is the number of processes, y is the number of processes per node, z is the number of threads per process.
Node and Co-processor (GPU)
-l node_type=24core-128G Y Specifies the type of node to be used. There are 3 type of nodes: a standard node with the flag 24core-128G ; the high memory node with the flag 24core-768G ; and the GPGPU with the flag 128core-128G-2K80 . 24core-128G
-l coproc= Y Sets the CPU, memory and node_type. 128core-128G-2K80
nvidia-smi -L Y Confirms what cards you have been allocated.
Task Array Functionality
-hold_jid prevjob Y Y Y Hold the job until the previous job (prevjob) has completed – useful for chaining runs together, resuming runs from a restart file
-l placement=type Y Y Y Choose optimal for launching a process topology which minimises the number of infiniband switch hops used in the calculation, minimising latency. Choose scatter for running processes anywhere on the system without topology considerations. good
-l start-stop Y Y Y Produce an array of sub-tasks (loop) from start to stop, giving $SGE_TASK_ID variable to identify the individual sub-tasks.
Utility Functionality
-help Y Y Y Prints a list of options
-cwd Y Y Y Execute the job from the current working directory; output files are sent to the directory form which the job was submitted, not to the user’s home directory. Recommended
-V Y Y Y Export all current environment variables to all spawned processes. Necessary for current module environment to be transferred to SGE shell. Recommended
-m be Y Y Y Send mail at the beginning and at the end of the job to the owner
-M Y Y Y Specify mail address for -m option