🎄12 Days of HPC 2021

Understanding little angle warriors and demons in our body using the power of NGS

Blog post number 19 in our 12 days of HPC series from Faculty of Medicine and Health!

During the month of December we’re featuring blog posts from researchers from across the University of Leeds showcasing the fantastic work they do using our High Performance Computing system. Follow us @RC_at_Leeds to keep up to date with our 12 days of HPC blog series.

What’s your name?

Dr Suparna Mitra

What department do you work in?

Faculty of Medicine and Health

What research question are you trying to answer?

Metagenomics is a vastly advancing field within microbial sciences, providing unique insight into the diversity of microbial communities from a plethora of sources including air, soil, and marine sites to those found in animals and humans. The study of bacterial communities, using metatranscriptomics and metagenomics therefore enables the progressive understanding of microbial interactions implicated within medicine, ecology, agriculture, biotechnology, and others. The fast moving field of personalised medicine and therapeutics is already showing positive impact on clinical care for some diseases but there is still much development needed to be in a position to benefit early from these advances. I have been working on different medical projects, e.g. in a gut-microbiome study I employed different next generation sequencing technologies (NGS) and led the data analyses to study microbial communities in intestinal microbiota at high resolution during weight loss therapy for surgery and diet patients’ samples. In another project I investigated atherosclerotic plaque samples from patients to investigate the microbial community diversity between atherosclerosis patients and control group. The aim of my research is to develop better analyses of the meta-genomics/transcriptomics samples using bioinformatics and statistical tools. Recently I am working on multiple projects where I am exploring different study design, data analyses and case-control comparison, treatment conditions and effects using ‘in vitro human gut model’. My work mainly focuses in gut microbiome but not limited to as I have collaborations in environmental projects too.

What tools or technologies do you use in your research? (Programming languages, packages, APIs)

My research involves database searches for similarity match. I have multiple databases like NCBI, RDP, SILVA for bacteria and UNITE database for rDNA ITS based identification of Eukaryotes. I often used multiple tools like traditional BLAST, Rapsearch, more recent DIAMOND etc for metagenomic sequence mappint to the databses. For 16S annotation I use QIIME and for metagenomic shotgun sequences I use MEGAN. For multiple analyses and plots I use R.

How does HPC help your research?

Metagenomic samples can be quite big, often it can be 30-50GB for one sample and the databases are also quite big. For example NCBI nr is the comprehensive database of non-identical protein sequences compiled by the National Center for Biotechnology Information. The January 2021 version contains 338 million protein sequences and the FASTA file size was 158GB – and the database has grown since then. Now in metagenomics we compare each short sequence reads or assembled reads to all NCBI-nr data.. the complexity can be easily understandable. Analysing and comparing such samples using a local computer would take several months which can be done quickly and efficiently using HPC.

What is the potential impact of your research?

My research focuses on development and utilisation of new methods for analysing and correlating samples, using appropriate ecological and statistical models, for better understanding of specific diseases and individual cases. Since my early research career I have been working with methods development to address this need. My statistical and bioinformatics background helps in better understanding of the bacterial involvement in disease pathways and target therapies.

In your personal opinion what’s the coolest thing about your research?

A microbiome is an aggregate of microbes that live in and on a particular niche or environment, usually consisting of bacteria, but also archaea, fungi, viruses, and small protists. There are trillions and trillions of microbes living all over our body but the biggest populations reside in the gut. In a healthy person they live together in the right balance for optimal gut performance. Dysbiosis can happen with a reduction in microbial diversity and a combination of the loss of beneficial bacteria. Coming from mathematical background (statistics and bioinformatics) and working several years in medical science gave me the unique skill set and experience to explore these little angle warriors and demons in our body using the power of NGS. Understanding them can help us to fight complex diseases and save lives… this is very exciting to me.

In your opinion, what is the ultimate Christmas song?

Jingle Bells Jingle Bells


I will be open and excited for collaborations.

A network analysis constructed using Goodall’s index demonstrated clustering of the anti-CCP positive population and the healthy comparators. Each terminal node, represented by a different shape, indicates an individual’s gut microbiome. Rheumatology (Oxford), Volume 60, Issue 7, July 2021, Pages 3380–3387, https://doi.org/10.1093/rheumatology/keaa792