Advanced SGE: Local Storage on Compute Nodes

Each ARC3 compute node contains a limited amount of dedicated storage and its capacity and performance is different for each type of node. As this storage is local to the compute node, it can offer more predictable performance than writing to the high performance filesystem, /nobackup , which is shared between all compute nodes

On our previous clusters, such storage is available for use by jobs – mostly under /scratch and via the directory name contained in the $TMPDIR environment variable, but also a small amount under /tmp . However, jobs could not reserve the local space they need and so could fail unexpectedly if another job running on the same node was also writing to local disk.

On ARC3, if a job wishes to use the compute node disk, this should be explicitly requested by the job. It is likely only to be useful if you currently use /scratch on our other clusters (if that is the case, please try to use it and report back your thoughts).

ARC2 and MARC1

Summary:

  • Use of /tmp should be avoided, as it is relatively small and space usage is uncontrolled.
  • /scratch is big, but space usage is uncontrolled. Files inside it are deleted 96 hours after last use, so that you have time to move the contents to somewhere else if you need to.
  • $TMPDIR within a job refers to a per-job directory within the big /scratch directory, which is deleted at the end of the job.

ARC3

Summary:

  • Use of /tmp should be avoided, as it is relatively small and space usage is uncontrolled.
  • Use of /scratch should be avoided, as this is now the same storage as /tmp
  • $TMPDIR within a job refers to a per-job directory which has been assigned dedicated storage for the job.

The specification of the compute nodes on ARC3 and their local disk characteristics:

Guide to the Nodes on ARC3
Node Type Number of nodes Memory Local Disk Capacity Local Disk Type
Standard ( -l 24core-128G ) 165 128GB 100GB ssd
High Memory ( -l 24core-768G ) 2 768GB 800GB hdd
GPGPU 2 – each with 2 NVIDIA K80 GPUs 128GB 800GB hdd

Notes:

  • ssd refers to a solid state disk, which has excellent performance characteristics for reading and writing
  • hdd refers to a mechanical hard disk with a spinning platter, which has good performance characteristics for large sequential reads or writes only.

Usage

Usage for qsub Local Disk Requests on ARC3
Option Description Default
-l disk=<bytes> Sets the limit of the local disk available under $TMPDIR (per slot) 1G per slot
-l disk_type=*|ssd|hdd Specifies the type of local disk that files under $TMPDIR will be written to. There are 3 options
  1. * – which is the default and specifies any type of disk
  2. ssd – which explicitly specifies solid state disk
  3. hdd – which explicitly specifies hard disk drive.

Reading a writing will be much quicker to a solid state drive and this option should be used if you have lots of read and write operations.

*
-l disk_out=<directory> Specifies the directory to which the contents of $TMPDIR will be copied to at the end of a job. As $TMPDIR will be deleted at the end of the job, this could be useful if you need to keep its contents, for example if you are check pointing.

The files will be copied to directory <directory>/<job_id>.<task_id>
(where task_id = 1 if the job is not part of a task array.)

Please note that, for distributed parallel jobs, only the $TMPDIR on the compute node where the job script runs is saved.

No copy by default.
-l disk_usejobname When used in conjunction with the -l disk_out=<directory> option, the directory name that $TMPDIR is copied to at the end of the job is changed to <directory>/<job_name> , where <job_name> can be specified with the -N <job_name> option. false