The number of studies investigating the microbiome and its role in human health and disease has exploded in recent years. However, one question has remained despite these efforts: how much of the human gut microbiota diversity do we know?
When I started my first postdoctoral position in Rob Finn’s Computational Metagenomics group at the EMBL-EBI, I had the opportunity to collaborate with Trevor Lawley’s team at the Sanger Institute. For the past years, the Lawley team has been focusing on the isolation and culturing of bacteria found in our intestinal microbiota, specifically those previously deemed “unculturable”. The two teams were collaborating to validate the newest culture collection of gut bugs to determine how comprehensive this reference was compared to existing public datasets, such as the Human Microbiome Project (HMP). To determine this, we reconstructed bacterial genomes from ~12,000 publicly available human gut metagenomic datasets and compared the resulting genomes to the Lawley collection and the HMP. We were immediately intrigued by the results: while most of the reconstructed genomes matched a cultured organism, over 30% of the data did not match any genome previously found. This sparked the beginning of the present study, which aimed to address two main questions: i) how many gut bacterial species are still unexplored and ii) why haven’t we been able to find them?
Metagenomics is a constantly evolving and challenging field: novel analysis methods and approaches are developing at an incredible pace and it can be a daunting task choosing the right tool for the job. In this work, we used assembly and binning methods to generate what are known as metagenome-assembled genomes (MAGs) — genomes reconstructed entirely from metagenomic data. When using MAGs there are many challenges to overcome: genomes can be very fragmented, or sequences from related, yet distinct, species can be mistakenly put together in the same MAG. For this reason, we had to use strict thresholds for contamination estimates (based on single-copy marker genes) and test how reproducible the genomes were when using different assembly/binning methods. MAGs will never be a substitute for pure isolate genomes, but they provide the next best thing.
By generating >90,000 MAGs from publicly available datasets, we found that almost 2,000 candidate species did not match any cultured reference genome. What defines a bacterial species is still an ongoing area of investigation (and we might never have a clear answer), but we used recent, independent studies to define genome-based species boundaries. Given that the most recent genome culture collections have 500-600 gut bacterial species, we essentially found over three times the number of species that have been cultured and sequenced to date.
But what makes our candidate species different? There were two main results that gave us some clues. On the one hand, we found that within each bacterial phylum, the uncultured species had a lower abundance of genes involved in antioxidant activity. We think that this is an indication that these unknown species may be particularly sensitive to ambient oxygen, so any slight exposure (which could happen, for instance, during fecal sampling) could limit their viability for culturing. Another indication was the fact that they appear to be less prevalent/abundant in most well-studied populations (i.e. Europe and North America) — this made sense, as species that are more prevalent and abundant are more easily discovered and cultured in the laboratory. However, one important finding from this analysis was that specific populations from Africa and South America (representing individuals with non-Western lifestyles) contained a higher abundance of some of these species. Given that these populations are extremely undersampled compared to the European and North American counterparts, it really makes us wonder: how much is still out there to be discovered?
This was a very large undertaking, making use of thousands of gut samples across 75 different studies. Therefore, it is important to acknowledge and give special thanks to all of the original authors of those studies who took the time and effort to generate all these data and, most importantly, make them publicly available.
We are witnessing an important turning point for human microbiome research — the past month has seen three additional landmark studies for the field. Our dataset and results provide a great complement to the work of Pasolli and colleagues, published in Cell, on a massive scale analysis of thousands of human metagenomes; while another study from Liang Xiao’s team accompanied Trevor Lawley’s new culture collection in a back-to-back publication in Nature Biotechnology. These are potentially transformative datasets that will allow us to move beyond simply observing which microbes are present, to developing new hypotheses for further functional validation and exploring novel therapeutic targets for modulating the gut microbiota for health.