Microbiome 2.0 - or what to do when you have no hits
The cow rumen is an under-characterised environment, with few rumen microbial genomes publicly available. What is the solution? Microbiome 2.0 will be characterised by building your own reference database, assembled from your own data!
The paper in Nature Communications is here: http://go.nature.com/2FBRMIH
It's not every day that one of your post-docs changes the entire direction of a research project, but this happened to me last year during one of our lab meetings! However, we'll get to that bit in a wee while, let's start with a bit of context....
My group has an excellent and ongoing research collaboration with Scotland's Rural College (SRUC) - we study ruminants, and specifically rumen microbiomes. Why should you be interested in them? Well, the food they produce has influenced the evolution of humans and our society for thousands of years, so you probably should pay attention!
Cattle and other milk-producing ruminants such as sheep, goats and buffalo have provided us with essential nutrition for over 10,000 years and today are responsible for feeding billions of people worldwide. In addition to milk, cattle also provide us with beef, the third most consumed meat on the planet today. Some may not agree with the farming of animals or the consumption of animal products; however, there can be no denying their impact either today or throughout history.
These fascinating animals eat plants, digest the complex mixture of carbohydrates, proteins, oils and fats within and produce a lot of muscle protein and milk. This is an incredibly complex process and ruminants are very, very good at it. In actual fact, it is the rumen microbiome that is responsible for much of it. The rumen is essentially a large anaerobic fermentation chamber, and estimates of the number of microbial species therein range from 1000s to many tens of thousands.
We've been studying the rumen microbiome for some time - we have used rumen metagenomics data to look for novel enzymes of interest to biotechnology companies; we have linked metagenomic enzyme abundance with methane emissions; established robust biomarkers for methane emissions; studied the role of genetics in rumen microbial gene abundance, leading to our winning the PLOS Genetics Research Prize in 2017; and we have even linked diets to the abundance of antimicrobial resistance genes.
However, what none of those studies mentioned was they they used only between 2-10% of the available data. Why? The rest simply didn't hit anything in the public databases.
In 2013, I attended a seminar by a colleague of mine, Chris Quince. He was presenting early results from CONCOCT, a metagenomic assembler that used binning techniques. The results presented were from a mock community, and I thought - "that would never work on ruminants, the data are simply too complex". Fast forward to 2017 and, funded by BBSRC, I asked Rob Stewart (first author of our recent paper and post-doctoral researcher in my lab) to see if he could make metagenomic binning techniques work on our data (we chose an alternative, MetaBAT). Bear in mind, this wasn't the main focus of our research - this was a side project, a look-see, a mild interest to see whether it would work or not.
From that moment on, everything changed.
At first, Rob presented 220 high quality microbial genomes, assembled from just 42 samples. This may not sound like a lot but so few rumen microbes have been sequenced, it had the potential to be truly transformational to the field. Rob and I set about working full time on these data. Over time, and in response to reviewers' comments, the 220 grew to 850, and we additionally brought in some genomes assembled using Phase Genomics' Hi-C technology. The final number was 913 genomes. 913 genomes!
By this point all of our activity was focused on figuring out what on earth these things were. This wasn't easy, and we have released MAGpy, which is our pipeline for metagenome-assembled-genome (MAG) annotation. Still, hundred of these genomes are only taxonomically characterised at the level of family. Percentage identity at the protein level can be as low as 35% when compared to public databases, even when including cultured rumen microbes. This is why we say the rumen is an uncharacterised environment - so much novelty!
The amazing result, the good news, is that now, in our own samples, we can classify between 30-80% of our data (a huge improvement from 2-10%).
This is only the beginning. The impact of this dataset on the rumen microbiome field, we hope, will be huge. But it's also an example of how we can re-launch microbiome research. Can we be part of a movement to relaunch microbiome research?
Microbiome 2.0, I think, will be characterised by the following paradigms:
- We will track genomes, not names - it doesn't matter what a microbe is called, it matters what it does (i.e. which genes and pathways are encoded in its genome)
- We will characterise the function of environments, not the taxonomic structure. It is far more interesting to look at the metabolic pathways encoded in a microbiome, rather than simply list the species.
- We will build your own reference databases! This can be done for 16S as well as metagenomics. No longer do we have to rely on incomplete reference databases!
- We will follow best practice every step of the way
So, that's the story behind the paper; from a simple "I wonder..." to a change in direction for our project, and the "launch" of microbiome 2.0. Don't you just love science?