Seing metagenomic data in 3D
Identifying antibiotic resistance genes in metagenomic data is quite challenging because they may significantly differ from known genes in databases. Here, we propose a method based on the structure of proteins.
From the French countryside to the intestinal rainforest
In late 2013, I started my post-doctoral fellowship at the MetaGenoPolis unit located in the INRA campus at Jouy-en-Josas, a small, countryside village located 15 km south west of Paris. I was funded under the EvoTAR (“evolution and transfer of antibiotic resistance”) FP7 program of the European Union. Among others, one of my objectives was to identify antibiotic resistance genes (ARGs) in a 3.9 million gene catalogue built from the metagenomic sequencing of 396 European individuals1. I had been working for some years about antibiotic resistance in intestinal pathogens such as Enterobacteriaceae, and that is little to say that I was quite excited about what I could find. Indeed, some ARGs acquired by bacterial pathogens had been quite successful during the last two decades, but their original host was unknown for most. Thus, I arrived at MetaGenoPolis like a butterfly hunter to whom the Amazonian forest would be open for the first time.
Where are those ARGs???
After a short bioinformatic initiation and some preliminary tests, I was desperate by the results: this intestinal microbiota would only contain a few dozens of known ARGs? And clearly not the trendiest. Furthermore, three papers looking at ARGs in similar catalogues were published at this time2–4. A careful reading would yield the same conclusion: this intestinal microbiota only includes a subset of known ARGs. Yet from a couple of functional metagenomic studies5,6, we knew that other ARGs were there, and that they were quite different from the one we knew in the databases. I was facing a two-option dilemma: I could lower the sensitivity parameters of my searching tools to recover distantly-related ARGs, but this would certainly come with a lot of false positives. Conversely, using conservative parameters would yield as few ARGs as in other studies.
Meeting the right fellows helps
I had the chance to meet Amine Ghozlane, a bioinformatician, and Julien Tap, a biologist well experienced in microbiota analysis, both working at MetaGenoPolis and being quite talented. During a previous post-doctoral fellowship, Amine had been working on homology modelling, that is getting a three-dimensional structure of a protein from an amino acid sequence and a reference structure previously obtained from crystallography). At some point during our discussions, we wondered whether we could use 3D instead of methods based on flat nucleic or amino acid sequences (such as BLAST or hidden Markov model – based methods). Indeed, we found that two proteins from the same family (e.g. two beta-lactamases) could have very distinct amino acid sequences while sharing superimposable structures. Hence, this 3D path was worth trying.
Applying high throughput homology modelling to metagenomic data
It took quite some time to build up a method aiming at identifying proteins sharing similar structure with a decent throughput, but it eventually worked out and we termed it PCM for pairwise comparative modelling. Applying PCM on the 3.9 million protein catalogue (no more nucleic acid sequences) required to solve a lot of issues, one being to find the required computational resources that were not available locally. The prediction of the position of every atom that compose each protein is not simple computational task, but with the gracious help from several bioinformatics institutes spread in south and west of France, we could eventually predict the presence of 6,095 ARGs in the human gut, with each of us carrying an average of 1,300 ARGs. The rest of the story is to be read from the paper. Besides, we need to stress that PCM can not only be used for seeking antibiotic resistance determinants: it is a generic method aiming at predicting proteins that may be distantly related to known proteins. All the code and the workflow have been made available, we now hope that it will find its way.
1. Nielsen, H. B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014).
2. Forslund, K. et al. Country-specific antibiotic use practices impact the human gut resistome. Genome Res. 23, 1163–1169 (2013).
3. Hu, Y. et al. Metagenome-wide analysis of antibiotic resistance genes in a large cohort of human gut microbiota. Nat. Commun. 4, 2151 (2013).
4. Ghosh, T. S., Gupta, S. S., Nair, G. B. & Mande, S. S. In silico analysis of antibiotic resistance genes in the gut microflora of individuals from diverse geographies and age-groups. PloS One 8, e83823 (2013).
5. Sommer, M. O. A., Dantas, G. & Church, G. M. Functional characterization of the antibiotic resistance reservoir in the human microflora. Science 325, 1128–1131 (2009).
6. Moore, A. M. et al. Pediatric fecal microbiota harbor diverse and novel antibiotic resistance genes. PloS One 8, e78822 (2013).