Skip to main content

First look at PyTorch on Aire

John Hodrien, Patricia Ternes

Category
Aire
HPC
News
Date

How does Aire’s cutting-edge hardware stack up? Let me give you a sneak peek: three NVIDIA L40S GPUs on Aire outperformed four V100 GPUs on ARC4 by 29% in sequences per second while using less power! That’s a significant jump, especially given this was just a quick test to get a feel for Aire’s capabilities.

This experiment wasn’t about pushing the limits or showcasing optimal configurations. Instead, it was a simple “kick of the tyres” to check that Aire’s GPUs were functional and that multi-GPU workloads were running smoothly. I picked a straightforward PyTorch benchmark, ran it on both systems and let the results speak for themselves.

Here’s what I found:

  • Aire’s three L40S GPUs delivered 300 sequences per second, compared to ARC4’s four V100 GPUs achieving 232 sequences per second.
  • Even under load, Aire’s GPU node stayed cooler and consumed less power, demonstrating impressive efficiency.
  • The job setup and execution were seamless, a testament to Aire’s well-integrated GPU nodes and updated system design.

If that’s got you excited, let’s dive into the details of how the test was set up and what we learned from this first glance at Aire’s GPU capabilities!

 

PyTorch Benchmark

For the experiment, I selected a straightforward, easy-to-run multi-GPU benchmark from the PyTorch Benchmarks repository, which could run on both Aire and ARC4.

 

Aire Experiment

Aire Setting it up was straightforward.

module add miniforge
conda create -n pytorchbenchmark pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
conda activate pytorchbenchmark
git clone https://github.com/aime-team/pytorch-benchmarks
cd pytorch-benchmarks
pip3 install -r requirements.txt

After that, I experimented with various benchmark arguments, focusing on configurations that filled the GPU memory to give the hardware a bit of a workout.
Below is the job submission script I used to run a benchmark on Aire:
#!/bin/bash

# Ignore the shell's environment
#SBATCH --export=NONE

# Run on a single node, with all 24 cores, and three GPUs, for up to an hour
#SBATCH -N 1 -c 24 -t 1:0:0 -p gpu --gres gpu:3

module add miniforge
conda activate pytorchbenchmark
python3 main.py --num_gpus 3 --model bert-large-uncased --data_name squad --global_batch_size 180 -amp --compile
With that, I submitted the job (sbatch submit.sh) and see what the results look like (taking a mid-run output snippet):
Epoch [1 / 10], Step [490 / 493], Loss: 4.9542, Sequences per second: 300.8
GPU-ID: 0, Temperature: 59 °C, Fan speed: 0%, GPU usage: 100%, Memory used: [44.8/ 45.0] GB
GPU-ID: 1, Temperature: 60 °C, Fan speed: 0%, GPU usage: 100%, Memory used: [44.8/ 45.0] GB
GPU-ID: 2, Temperature: 61 °C, Fan speed: 0%, GPU usage: 100%, Memory used: [44.8/ 45.0] GB
The GPUs maintained a stable temperature and fully utilised their memory and compute capacity, delivering just over 300 sequences per second.

 

Comparing Aire with ARC4

To understand how Aire’s GPUs compare with ARC4, I ran a similar test on ARC4 using its older V100 GPUs, adjusting the batch size (from 180 to 150) to fit their smaller memory:

python3 main.py --num_gpus 4 --model bert-large-uncased --data_name squad --global_batch_size 150 -amp --compile
Output snippet from ARC4:
Epoch [1 / 10], Step [590 / 599], Loss: 4.9348, Sequences per second: 232.2
GPU-ID: 0, Temperature: 72 °C, Fan speed: 0%, GPU usage: 99%, Memory used: [30.3/ 32.0] GB
GPU-ID: 1, Temperature: 61 °C, Fan speed: 0%, GPU usage: 99%, Memory used: [30.3/ 32.0] GB
GPU-ID: 2, Temperature: 60 °C, Fan speed: 0%, GPU usage: 99%, Memory used: [30.3/ 32.0] GB
GPU-ID: 3, Temperature: 69 °C, Fan speed: 0%, GPU usage: 98%, Memory used: [30.3/ 32.0] GB
The ARC4 GPUs ran slightly warmer on two of the cards but remained within acceptable limits. With a slightly smaller batch size, ARC4 achieved 232 sequences per second.

 

Power Consumption

For both Aire and ARC4, I used nvidia-smi to monitor power consumption during the runs. This tool provides real-time information on GPU usage, including power draw, temperatures, and memory usage. Aire’s L40S GPUs demonstrated a headline peak power draw of 1050W compared to the 1200W peak for ARC4’s V100 GPUs.

 

Highlights

  • Aire’s three NVIDIA L40S GPUs outperformed ARC4’s four V100 GPUs, delivering a 29% improvement in sequences per second while consuming less power. The L40S GPUs have a peak power draw of 1050W compared to the V100’s 1200W.
  • The EPYC CPUs in Aire’s GPU nodes also consume less power than ARC4’s, further improving performance per watt. Future tests may explore additional power-saving strategies to enhance efficiency further.
  • The L40S GPU node demonstrated excellent stability and temperature management under load, with a design that minimises power use while maintaining performance.

 

Conclusion: Aire’s Performance, Efficiency, and Sustainability

These results showcase Aire’s exciting potential to revolutionise GPU-accelerated computing for research in Leeds. In this simple test, Aire’s NVIDIA L40S GPU node delivered a 29% performance improvement over ARC4’s V100 GPU node, even while using fewer GPUs. It achieved this while consuming less power and generating less heat, making Aire not just faster but also more efficient.

This efficiency is a step towards greater sustainability in research computing. By delivering more performance per watt and reducing cooling demands, Aire helps minimise the environmental footprint of computational research. As workloads grow and energy consumption becomes an increasingly pressing concern, systems like Aire demonstrate how advanced technology can balance performance and sustainability.

Aire’s GPU nodes are already proving to be a powerful and efficient platform for research. We’re excited to continue unlocking its full potential and exploring how it can drive innovation while supporting a more sustainable future in high-performance computing. Stay tuned!

Authors

John Hodrien

Research Software Engineer

Patricia Ternes

Research Software Engineer Manager