Microbiome 2.0 - or what to do when you have no hits

The cow rumen is an under-characterised environment, with few rumen microbial genomes publicly available. What is the solution? Microbiome 2.0 will be characterised by building your own reference database, assembled from your own data!

Go to the profile of Mick Watson
Feb 28, 2018
Upvote 9 Comment

The paper in Nature Communications is here:

It's not every day that one of your post-docs changes the entire direction of a research project, but this happened to me last year during one of our lab meetings!  However, we'll get to that bit in a wee while, let's start with a bit of context....

My group has an excellent and ongoing research collaboration with Scotland's Rural College (SRUC) - we study ruminants, and specifically rumen microbiomes.  Why should you be interested in them?  Well, the food they produce has influenced the evolution of humans and our society for thousands of years, so you probably should pay attention!

Cattle and other milk-producing ruminants such as sheep, goats and buffalo have provided us with essential nutrition for over 10,000 years and today are responsible for feeding billions of people worldwide.  In addition to milk, cattle also provide us with beef, the third most consumed meat on the planet today.  Some may not agree with the farming of animals or the consumption of animal products; however, there can be no denying their impact either today or throughout history.

These fascinating animals eat plants, digest the complex mixture of carbohydrates, proteins, oils and fats within and produce a lot of muscle protein and milk.  This is an incredibly complex process and ruminants are very, very good at it.  In actual fact, it is the rumen microbiome that is responsible for much of it.  The rumen is essentially a large anaerobic fermentation chamber, and estimates of the number of microbial species therein range from 1000s to many tens of thousands.  

We've been studying the rumen microbiome for some time - we have used rumen metagenomics data to look for novel enzymes of interest to biotechnology companies; we have  linked metagenomic enzyme abundance with methane emissions; established robust biomarkers for methane emissions; studied the role of genetics in rumen microbial gene abundance, leading to our winning the PLOS Genetics Research Prize in 2017; and we have even linked diets to the abundance of antimicrobial resistance genes.  

However, what none of those studies mentioned was they they used only between 2-10% of the available data.  Why?  The rest simply didn't hit anything in the public databases.

In 2013, I attended a seminar by a colleague of mine, Chris Quince.  He was presenting early results from CONCOCT, a metagenomic assembler that used binning techniques.  The results presented were from a mock community, and I thought - "that would never work on ruminants, the data are simply too complex".  Fast forward to 2017 and, funded by BBSRC, I asked Rob Stewart (first author of our recent paper and post-doctoral researcher in my lab) to see if he could make metagenomic binning techniques work on our data (we chose an alternative, MetaBAT).  Bear in mind, this wasn't the main focus of our research - this was a side project, a look-see, a mild interest to see whether it would work or not.  

From that moment on, everything changed.

At first, Rob presented 220 high quality microbial genomes, assembled from just 42 samples.  This may not sound like a lot but so few rumen microbes have been sequenced, it had the potential to be truly transformational to the field.  Rob and I set about working full time on these data.  Over time, and in response to reviewers' comments, the 220 grew to 850, and we additionally brought in some genomes assembled using Phase Genomics' Hi-C technology.  The final number was 913 genomes.  913 genomes!  

By this point all of our activity was focused on figuring out what on earth these things were.  This wasn't easy, and we have released MAGpy, which is our pipeline for metagenome-assembled-genome (MAG) annotation.  Still, hundred of these genomes are only taxonomically characterised at the level of family.  Percentage identity at the protein level can be as low as 35% when compared to public databases, even when including cultured rumen microbes.  This is why we say the rumen is an uncharacterised environment - so much novelty!

The amazing result, the good news, is that now, in our own samples, we can classify between 30-80% of our data (a huge improvement from 2-10%).  

This is only the beginning.  The impact of this dataset on the rumen microbiome field, we hope, will be huge.  But it's also an example of how we can re-launch microbiome research.  Can we be part of a movement to relaunch microbiome research?

Microbiome 2.0, I think, will be characterised by the following paradigms: 

  • We will track genomes, not names - it doesn't matter what a microbe is called, it matters what it does (i.e. which genes and pathways are encoded in its genome)
  • We will characterise the function of environments, not the taxonomic structure.  It is far more interesting to look at the metabolic pathways encoded in a microbiome, rather than simply list the species.
  • We will build your own reference databases!  This can be done for 16S as well as metagenomics.  No longer do we have to rely on incomplete reference databases!
  • We will follow best practice every step of the way 

So, that's the story behind the paper; from a simple "I wonder..." to a change in direction for our project, and the "launch" of microbiome 2.0.  Don't you just love science?

Go to the profile of Mick Watson

Mick Watson

Chair in bioinformatics and computational biology, The Roslin Institute

I am interested in what large datasets tell us about biological function, and how we can correlate patterns in big data with phenotypes of interest in farm animal health, disease and productivity. - We have been studying ruminant microbiomes using whole-genome shotgun sequencing data since 2011 and are currently running possibly one of the biggest cattle gut microbiome projects ever carried out - whole-genome-shotgun metagenomics of over 300 cattle, richly phenotyped for feed-conversion-ratio and methane emissions (segregated by diet and breed). This is in collaboration with SRUC and Aberdeen - Funded by BBSRC and TSB, we are mining rumen gut metagenome data for novel enzymes in relation to biocatalysis - We are also interested in the chicken gut microbiome and are associating changes in the gut flora with diet and response to vaccines - Genomics and the consequences of genome variation are also of interest. We work with animal breeders to help them understand the possible consequences of variation in their herds, and what that variation can tell us about gene function. We have worked on improving the pig genome and assembled Sscrofa 11.1, a PacBio+nanopore+Illumina pig genome assembly to be released in 2017 - We have an active interest in sequencing technologies, and have released poRe, one of the first tools to work with data from Oxford Nanopore's MinION - We also have an interest in bioinformatics methods and pipelines, and have produced improved pipelines for both metagenomic analysis and RNA-Seq analysis. In particular, we focus on precision in RNA-Seq, and attempts to identify uncertainty in terms of genes that we cannot measure accurately. - I am the programme leader of programme 5 of the centre for tropical livestock genetics and health (CTLGH:, a Gates-funded centre focused on genomic-improvement of African livestock - We work with Prof Venu Nair at The Pirbright Institute on avian oncogenic viruese, using mRNA-Seq, microRNA-Seq and ChIP-Seq to understand how these viruses subvert the host cell machinery - We work with Alain Kohl, Esther Schnettler and Isabelle Dietrich on analysis of viRNA and piRNA responses in insect vectors of viral diseases, using our software viRome (10.1093/bioinformatics/btt297) - We work with David Griffiths on ovine pulmonary adenocarcinoma, again using next-generation-sequencing to understand the virus and how it interacts with the host


Go to the profile of Andrew Jermy
Andrew Jermy 18 days ago

Great post Mick (and great paper), thanks for sharing here!

Go to the profile of Pandeng Wang
Pandeng Wang 16 days ago


Thanks for sharing your experience!

Go to the profile of Mick Watson
Mick Watson 16 days ago

I'd just like to thank super-admin Ross Houston for spell checking this post!

Go to the profile of Rahul Bodkhe
Rahul Bodkhe 12 days ago

Great work! I'll ve following your group.