The ARC1 service closes on May 31st 2017.
ARC1 consists Sun Microsystems x84-64 based servers and storage, with the addition of a later phase of HP hardware. A schematic of the rack layout is below, which is separated into a high density component geared towards computation and a low-density portion providing mostly infrastructure:
|Compute||Sun X6725 blade||Each blade houses two Nehalem servers (nodes). Each server being a dual socket with quad-core Intel X5560 (2.8GHz) processors; 12GB of DDR3 1333MHz memory per server plus 24GB flash disk and QDR Connect-X infiniband||118 (blades); 236 (servers);1888 (cores)|
|x4440 SMP's||Quad-socket quad-core AMD 8384 (2.7Ghz) processors; 128GB of DDR2 666Mhz memory per server; 780GB /scratch Hard disk (6x146GB striped) QDR infiniband||4 (servers) 64 (cores)|
|HP blade||Each blade houses two Westmere servers (nodes). Each server being a dual socket with hex-core Intel X5650 (2.66GHz) processors; 24GB of DDR3 1066MHz memory per server; 120GB hard disk and QDR Connect-X infiniband||8 (blades) 16 (servers) 192 (cores)|
|Storage||Lustre||Two pairs of 'snowbird' fail-over pairs delivering 3.2GB/s via the infiniband network to ~100TB usable storage on /nobackup||3.2GB/s 100TB|
|X4540||48TB (raw) storage configured with ZFS for snapshots and mirroring of data across the two servers||48TB|
|X4170||Data movers mount Lustre filesystem and re-export to campus network||2|
|Network||M2-72 IB||Provide a Full-Clos and 2:1 Blocking core network to compute blades and access to infrastructure (e.g. Lustre storage) on the edge||6|
|Gigabit||Management and general networks facilitating system boot. All user traffic carried by the infiniband network||13|
|Login||X4170||Dual socket quad-core Intel E5550 (2.53Ghz) processors; 24GB of DDR3 1066MHz memory per server||4|
|Linux Management||X4150||Failover pair of dual-socket quad-core Intel L5420 (2.5GHz) processor servers with 16GB memory. Attaches to shared 3.6TB disk array||2|
|Linux Scheduler||X4150||Failover pair of dual-socket quad-core Intel L5420 (2.5GHz) processor servers with 16GB memory. Attaches to shared 3.6TB disk array||2|
The system also includes additional compute nodes purchased by individual groups and dedicated to their use.
A schematic of the rack layout is shown below (click for larger version):
All user-facing systems (login, compute, datamovers) are connected to the infiniband network and use it to transfer all user-data. This is a layered network, with the latency of communication dependent upon the number of (36-port) switch hops required to route between the source and destination devices. The diagram below shows a full Clos network capable of supporting 1536 CPU cores. This is the network topology in place for the two fully populated high-density racks.
A schematic of the ARC1 Full Clos network is shown below (click for larger version):
Each server has a quad-data-rate (QDR) 4X connection which can send and receive data at 2.6GB/s. Each 12X (three 4X) uplink to the core can therefore transfer data at 8GB/s.
The latency between servers, including a single switch hop is around 1.8 microseconds. Each switch hop introduces a 60ns latency. As the 72-port switches are a tiered set of 6 36-port switches, they introduce 300ns (5-hops) to the latency.
By default, jobs will be dispatched to span a minimum number of switch hops for the particular size of job. This is controlled through the parameter:
|-l placement=optimal||Minimises number of switch hops (default)|
|-l placement=scatter||Ignore topology concerns and run anywhere potentially introducing more latency than necessary to all communications|
The partially filled rack has a 2:1 blocking factor. This is similar to the full-clos topology, above, however only one uplink (rather than 2) connects from each shelf to the core M2-72 switches.
Jobs which will span more than one shelf will be placed preferentially on the non-blocking IB island and smaller jobs directed towards the blocking island.
Lustre file system
A large amount of infrastructure is dedicated to the Lustre parallel filesystem, which is mounted on /nobackup. This is accessed over infiniband, and is configured to deliver ~3.2GB/s from a 100TB filesystem. It is possible to tune the filesystem in a more-extreme (or conservative) manner, however this configuration achieves a sensible compromise between data integrity and performance.