Skip to main content

Aire Progress Update - February

Patricia Ternes

Category
Aire
HPC
News
Date

Since our last Aire Progress Update in January, we have made significant strides in developing and stabilising the system. A new group of testers joined the project, providing valuable feedback that helped refine configurations and improve overall usability. Additionally, the cooling issues that previously affected hardware availability have been resolved, and all hardware is now fully operational.

Aire Launch Event

On 6th February 2025, we officially celebrated the installation of Aire with a launch event attended by Prof. Yu (DVC & Provost). This marked a significant milestone in our HPC journey, bringing together researchers and stakeholders to recognise the importance of this new infrastructure. You can read about the day here: An Aire of change in our research infrastructure

Since the launch, the Aire community has been steadily growing, along with an increasing number of configuration requests as more users begin working with the system.

 

Key Updates

Slurm Configuration

We have made adjustments to the Slurm scheduler to better balance resource utilisation. Initially, we set baseline configurations expected to be used most frequently. However, as the system is still new, we observed that while some users experience queue restrictions, many nodes remain idle. This highlighted the need for a more dynamic approach to job scheduling.

To address this, we are increasing the maximum number of GPUs per job from 6 to 24 for the time being. We will continue to monitor the system, and once the queue for this partition begins to grow, we will rebalance the allocation accordingly.

 

Multifactor Authentication (MFA)

As part of our ongoing efforts to improve system security, we are introducing multifactor authentication (MFA). The solution is currently in an experimental phase and is available on login4. Users can test it by SSHing directly to this node. Over the coming weeks, we will roll out MFA across all login nodes.

 

Software Modules

Software availability is one of the most crucial aspects of an HPC system. Given Aire’s multi-purpose nature, our team has been working diligently to provide a broad range of applications via modules. Over the past week, we have installed:

  • java/jdk-21.0.6
  • spack/0.23
  • julia/1.11.3
  • pixi/0.41.4
  • ansys/2024R2
  • LS-DYNA/R14.1.1
  • crest/3.0.2
  • rclone/1.69.1
  • stata/18
  • xtb/6.7.1

We are currently working on adding:

  • A new R installation to support user-installed libraries
  • Intel OneAPI suite
  • CASTEP
  • MATLAB
  • ORCA
  • NetCDF-Fortran

We encourage all users to test these installations and provide feedback. While we rigorously test all software before deployment, HPC environments allow for multiple usage routes, and we may not always be aware of specific dependencies or edge cases. If you need software that is not yet available, please contact us, and we will do our best to provide it as soon as possible.

 

New Storage Quota Approach

The Lustre scratch storage on Aire is now subject to a default 1TB quota. This limit is in place to prevent accidental misuse and ensure fair access to resources. However, if you require additional quota, you can submit a ticket explaining your requirements and the duration for which you need extra space. The more specific your request, the easier it will be for us to accommodate your needs while balancing system-wide storage allocation.

For example, requesting 10TB for 3 months is easier to accommodate than requesting 2TB for 5 years. Once the need for additional quota has passed, the standard 1TB limit will be reinstated. Users can request quota adjustments as often as necessary. This model ensures that everyone has the opportunity to run large-scale simulations while maintaining fair resource distribution.

 

ARC4 Status

If you are an ARC4 user, you may have noticed system issues this week. The primary concern has been the critical state of ARC4’s Lustre storage (/nobackup), which has been failing intermittently. If you rely on ARC4 for your work, we strongly encourage you to migrate to Aire.

Aire provides not only a more reliable workflow but also significantly faster processing speeds. Additionally, queue times on Aire are currently much shorter, meaning that while there may be an initial time investment in migrating, you will quickly see the benefits in performance and efficiency.

See more about Aire in our Documentation, including information about how to request an account.

 

We appreciate everyone’s continued support in testing and refining Aire. If you have any questions, require additional resources, or would like to provide feedback, please do not hesitate to get in touch with the team. Please add any comments or questions to the Teams Channel and we will continue to keep you updated.

 

If you are interested in more information about Aire, please read our previous blog posts:

Author

Patricia Ternes

Research Software Engineer Manager