ARC1

The ARC1 service closes on May 31st 2017.

Operating system

ARC1 is a Linux-based HPC service, using the CentOS 5 operating system.

Hardware

ARC1 consists Sun Microsystems x84-64 based servers and storage, with the addition of a later phase of HP hardware. A schematic of the rack layout is below, which is separated into a high density component geared towards computation and a low-density portion providing mostly infrastructure:

Components:
Purpose Item Description Quantity
Compute Sun X6725 blade Each blade houses two Nehalem servers (nodes). Each server being a dual socket with quad-core Intel X5560 (2.8GHz) processors; 12GB of DDR3 1333MHz memory per server plus 24GB flash disk and QDR Connect-X infiniband 118 (blades); 236 (servers);1888 (cores)
x4440 SMP's Quad-socket quad-core AMD 8384 (2.7Ghz) processors; 128GB of DDR2 666Mhz memory per server; 780GB /scratch Hard disk (6x146GB striped) QDR infiniband 4 (servers) 64 (cores)
HP blade Each blade houses two Westmere servers (nodes). Each server being a dual socket with hex-core Intel X5650 (2.66GHz) processors; 24GB of DDR3 1066MHz memory per server; 120GB hard disk and QDR Connect-X infiniband 8 (blades) 16 (servers) 192 (cores)
Storage Lustre Two pairs of 'snowbird' fail-over pairs delivering 3.2GB/s via the infiniband network to ~100TB usable storage on /nobackup 3.2GB/s 100TB
X4540 48TB (raw) storage configured with ZFS for snapshots and mirroring of data across the two servers 48TB
X4170 Data movers mount Lustre filesystem and re-export to campus network 2
Network M2-72 IB Provide a Full-Clos and 2:1 Blocking core network to compute blades and access to infrastructure (e.g. Lustre storage) on the edge 6
Gigabit Management and general networks facilitating system boot. All user traffic carried by the infiniband network 13
Login X4170 Dual socket quad-core Intel E5550 (2.53Ghz) processors; 24GB of DDR3 1066MHz memory per server 4
Linux Management X4150 Failover pair of dual-socket quad-core Intel L5420 (2.5GHz) processor servers with 16GB memory. Attaches to shared 3.6TB disk array 2
Linux Scheduler X4150 Failover pair of dual-socket quad-core Intel L5420 (2.5GHz) processor servers with 16GB memory. Attaches to shared 3.6TB disk array 2

The system also includes additional compute nodes purchased by individual groups and dedicated to their use.

A schematic of the rack layout is shown below (click for larger version):

ARC1Rack Layout
ARC1Rack Layout

Network topology

All user-facing systems (login, compute, datamovers) are connected to the infiniband network and use it to transfer all user-data. This is a layered network, with the latency of communication dependent upon the number of (36-port) switch hops required to route between the source and destination devices. The diagram below shows a full Clos network capable of supporting 1536 CPU cores. This is the network topology in place for the two fully populated high-density racks.

A schematic of the ARC1 Full Clos network is shown below (click for larger version):

ARC1 Full Clos network
ARC1 Full Clos network

Each server has a quad-data-rate (QDR) 4X connection which can send and receive data at 2.6GB/s. Each 12X (three 4X) uplink to the core can therefore transfer data at 8GB/s.

The latency between servers, including a single switch hop is around 1.8 microseconds. Each switch hop introduces a 60ns latency. As the 72-port switches are a tiered set of 6 36-port switches, they introduce 300ns (5-hops) to the latency.

By default, jobs will be dispatched to span a minimum number of switch hops for the particular size of job. This is controlled through the parameter:

attribute comments
-l placement=optimal Minimises number of switch hops (default)
-l placement=scatter Ignore topology concerns and run anywhere potentially introducing more latency than necessary to all communications

The partially filled rack has a 2:1 blocking factor. This is similar to the full-clos topology, above, however only one uplink (rather than 2) connects from each shelf to the core M2-72 switches.

Jobs which will span more than one shelf will be placed preferentially on the non-blocking IB island and smaller jobs directed towards the blocking island.

Lustre file system

A large amount of infrastructure is dedicated to the Lustre parallel filesystem, which is mounted on /nobackup. This is accessed over infiniband, and is configured to deliver ~3.2GB/s from a 100TB filesystem. It is possible to tune the filesystem in a more-extreme (or conservative) manner, however this configuration achieves a sensible compromise between data integrity and performance.