In this workshop, we will give an overview of a number of data science tools and techniques and explore how High Performance Computing can provide the computational power to rapidly and efficiently analyse large volumes of data.
Although HPC clusters have traditionally been used for modelling and simulation in Science and Engineering, they are equally suited to those research problems that focus on the analysis of data.
Although this workshop is open to researchers from all disciplines, examples will be drawn from the humanities and social sciences.
In this hands-on workshop we will cover:
- What are data analytics, big data and data science?
- Using R for High Performance Data Analytics
- Analysing streaming data from IoT devices and social media
- Databases: Using SQL, NoSQL and graph databases on HPC
- Parallelising data analysis with MapReduce and Apache Spark
- Moving between desktop, HPC and the Cloud
This workshop is aimed at people who already have some programming experience, ideally in R or Python. Attendees are not expected to have any prior experience of High Performance Computing or the Linux command line.
This is NOT an introductory programming course. If you need to learn how to program, please attend course SWD 1a: Introduction to Python programming or SWD 1b: Introduction to R programming.
This workshop usually runs once each academic year.
All research staff and students
Booking for this course is through the IT Training Unit.