The use of metagenomic methods for finding and recovering microbial genomes (termed metagenome-assembled genomes, MAGs) has exponentially increased in recent years. Last year, 2019, three large-scale studies unveiled thousands of new microbial species from the human gut microbiome through the reconstruction of tens of thousands of MAGs (Pasolli et al. 2019, Almeida et al. 2019 and Nayfach et al. 2019). Our Microbiome Informatics team at the European Bioinformatics Institute (EMBL-EBI) led one of these studies, and we were honoured to have our work recognized alongside the other two papers as one of Nature’s Milestones in Human Microbiota Research. Coincidentally, 2019 also saw the publication of three new bacteria culture collections from the human gut microbiome (Forster et al. 2019, Zou et al. 2019 and Poyet et al. 2019), which provided thousands of high-quality reference genomes of newly cultured bacteria.
The release of all these datasets represented an important landmark for the human microbiome field. However, given these studies were published at around the same time further efforts were needed to compare and contrast their individual results. Specifically in the case of the MAG studies, we wondered: how much do these datasets overlap and how reproducible are the resulting MAGs?
We therefore used this opportunity to compile all human gut microbiome genomes (MAGs and isolates) available as of March 2019 into the Unified Human Gastrointestinal Genome (UHGG) and Protein (UHGP) catalogs. These comprise more than 280,000 genomes and 625 million proteins from the human gut microbiome, providing the most complete view of the global diversity of the gut microbiome thus far. Clustering and analysis of these datasets revealed that there are 4,644 gut microbiome species currently known and sequenced. Importantly, despite recent culturing efforts >70% of these species still lack any cultured representatives. This means that for thousands of human gut microbiome species, we have yet to grow them under laboratory conditions to experimentally test their biological role. In the interim, MAGs represent an alternative to obtain useful functional and taxonomic information from these uncharacterized species.
Nevertheless, there is an ongoing debate about the quality and reliability of MAGs. MAGs often contain contaminant, incorrectly binned sequences from other related organisms and may represent highly incomplete/fragmented genomes. However, in this work we found that in a common set of samples, >90% of the species identified in each of the three original MAG studies were detected in at least one of the other MAG datasets. These were very encouraging results, as it meant that even by using different strategies for assembling, binning and MAG refinement, different teams independently obtained a consistent set of microbiome species. Depending on the diversity of the samples they were derived from, MAGs may represent population genomes rather than individual strains, so these results suggest that the different analysis approaches captured very similar species populations. This does not mean MAGs are perfect, nor that they should be considered a substitute for pure isolate genomes. But, future improvements in sequencing and analysis methods will facilitate the detection and removal of contaminant sequences, as well as mitigate some of general issues related with MAGs. For instance, the use of long-read sequencing technologies has been recently shown to allow recovering single-contig bacterial chromosomes directly from metagenomic datasets. Therefore, in the coming years we expect a combination of improved experimental techniques and metagenome assembly methods will further enhance our ability to uncover the hidden diversity of many microbial environments.
For the human microbiome field, this resource offers an unprecedented genome map of the currently known species belonging to the human gut microbiome. Alongside the genome and protein catalogs, we also generated individual pan-genomes for each of these species, providing new insights into the strain diversity and population dynamics of the human gut microbiome. We have made this collection available via MGnify, which can be accessed either interactively through our website or programmatically using our FTP and API. Having this comprehensive and freely available resource will open new and exciting opportunities to improve our understanding of the role of the microbiome in health and disease.