Booking for this course is through the IT Training Unit. Click here to book
In this workshop, we will give an overview of a number of data science tools and techniques and explore how HPC can provide the computational power to rapidly and efficiently analyse large volumes of data. Although HPC clusters have traditionally been used for modelling and simulation in Science and Engineering, they are equally suited to those research problems that focus on the analysis of data. Although this workshop is open to researchers from all disciplines, examples will be drawn from the humanities and social sciences.
At the end of the workshop, attendees will be able to:
What are data analytics, big data and data science?
Using R for High Performance Data Analytics
Analysing streaming data from IoT devices and social media
Databases: Using SQL, NoSQL and graph databases on HPC
Parallelising data analysis with MapReduce and Apache Spark
Moving between desktop, HPC and the Cloud
This workshop is aimed at people who already have some programming experience, ideally in R or Python. Attendees are not expected to have any prior experience of High Performance Computing or the Linux command line. This is not an introductory programming course. If you need to learn how to program, please attend either SWD1a: Introduction to Python programming, SWD1b: Introduction to R programming.
This workshop usually runs once each academic year.
If you would like a bespoke version of this course run in your department then please contact us.
Research postgraduate students and above; teaching lecturing and research staff.