Contents
The ARC1 service closes on May 31st 2017.
Operating system
ARC1 is a Linux-based HPC service, using the CentOS 5 operating system.
Hardware
ARC1 consists Sun Microsystems x84-64 based servers and storage, with the addition of a later phase of HP hardware. A schematic of the rack layout is below, which is separated into a high density component geared towards computation and a low-density portion providing mostly infrastructure:
Purpose | Item | Description | Quantity |
---|---|---|---|
Compute | Sun X6725 blade | Each blade houses two Nehalem servers (nodes). Each server being a dual socket with quad-core Intel X5560 (2.8GHz) processors; 12GB of DDR3 1333MHz memory per server plus 24GB flash disk and QDR Connect-X infiniband | 118 (blades); 236 (servers);1888 (cores) |
x4440 SMP's | Quad-socket quad-core AMD 8384 (2.7Ghz) processors; 128GB of DDR2 666Mhz memory per server; 780GB /scratch Hard disk (6x146GB striped) QDR infiniband | 4 (servers) 64 (cores) | |
HP blade | Each blade houses two Westmere servers (nodes). Each server being a dual socket with hex-core Intel X5650 (2.66GHz) processors; 24GB of DDR3 1066MHz memory per server; 120GB hard disk and QDR Connect-X infiniband | 8 (blades) 16 (servers) 192 (cores) | |
Storage | Lustre | Two pairs of 'snowbird' fail-over pairs delivering 3.2GB/s via the infiniband network to ~100TB usable storage on /nobackup | 3.2GB/s 100TB |
X4540 | 48TB (raw) storage configured with ZFS for snapshots and mirroring of data across the two servers | 48TB | |
X4170 | Data movers mount Lustre filesystem and re-export to campus network | 2 | |
Network | M2-72 IB | Provide a Full-Clos and 2:1 Blocking core network to compute blades and access to infrastructure (e.g. Lustre storage) on the edge | 6 |
Gigabit | Management and general networks facilitating system boot. All user traffic carried by the infiniband network | 13 | |
Login | X4170 | Dual socket quad-core Intel E5550 (2.53Ghz) processors; 24GB of DDR3 1066MHz memory per server | 4 |
Linux Management | X4150 | Failover pair of dual-socket quad-core Intel L5420 (2.5GHz) processor servers with 16GB memory. Attaches to shared 3.6TB disk array | 2 |
Linux Scheduler | X4150 | Failover pair of dual-socket quad-core Intel L5420 (2.5GHz) processor servers with 16GB memory. Attaches to shared 3.6TB disk array | 2 |
The system also includes additional compute nodes purchased by individual groups and dedicated to their use.
A schematic of the rack layout is shown below (click for larger version):

Network topology
All user-facing systems (login, compute, datamovers) are connected to the infiniband network and use it to transfer all user-data. This is a layered network, with the latency of communication dependent upon the number of (36-port) switch hops required to route between the source and destination devices. The diagram below shows a full Clos network capable of supporting 1536 CPU cores. This is the network topology in place for the two fully populated high-density racks.
A schematic of the ARC1 Full Clos network is shown below (click for larger version):

Each server has a quad-data-rate (QDR) 4X connection which can send and receive data at 2.6GB/s. Each 12X (three 4X) uplink to the core can therefore transfer data at 8GB/s.
The latency between servers, including a single switch hop is around 1.8 microseconds. Each switch hop introduces a 60ns latency. As the 72-port switches are a tiered set of 6 36-port switches, they introduce 300ns (5-hops) to the latency.
By default, jobs will be dispatched to span a minimum number of switch hops for the particular size of job. This is controlled through the parameter:
attribute | comments |
---|---|
-l placement=optimal | Minimises number of switch hops (default) |
-l placement=scatter | Ignore topology concerns and run anywhere potentially introducing more latency than necessary to all communications |
The partially filled rack has a 2:1 blocking factor. This is similar to the full-clos topology, above, however only one uplink (rather than 2) connects from each shelf to the core M2-72 switches.
Jobs which will span more than one shelf will be placed preferentially on the non-blocking IB island and smaller jobs directed towards the blocking island.
Lustre file system
A large amount of infrastructure is dedicated to the Lustre parallel filesystem, which is mounted on /nobackup. This is accessed over infiniband, and is configured to deliver ~3.2GB/s from a 100TB filesystem. It is possible to tune the filesystem in a more-extreme (or conservative) manner, however this configuration achieves a sensible compromise between data integrity and performance.