An advantage of working so long in one place is that is that one can observe the comings and goings of the lab through several turnovers of graduate students, postdocs, and the accompanying projects. As long as I can remember there seemed to be one or more glycopeptide antibiotic (GPA) projects going on, usually having to do with enzymatic modification to make novel GPAs, characterization of a GPA biosynthetic gene clusters (BGCs), or development of strategies for finding novel antibiotics using GPAs as a model. For other projects, we were slowly sequencing strains of interest in our soil isolate collection, while the number of genome sequences available in public databases kept growing.
Some of our soil strains were even isolated using selection with the GPA vancomycin, and we had previously showed that selecting for GPA resistant isolates enriches for strains capable of producing GPAs, reasoning that self-resistance is a necessary requirement for production of an antibiotic. That same project also made use of GPA fingerprint primers that could be used to amplify genes characteristically present in GPA precursor biosynthesis and tailoring (HpgT - HPG and DPG biosynthesis, P450 monooxygenases - oxidative crosslinking, and HalI – halogenation). We performed a phylogenetic analysis by concatenating these aligned sequences to smooth over the noisy signal from each gene and generate a tree that was in a sense representative of the GPA BGCs sharing these sequences – a description of the evolution of the family of BGCs. This guided us to which BGCs were most unlike the sequences from known GPA producers and led us to discover pekiskomycin1. The longest branch in that phylogeny did not belong to the two pekiskomycin strains, however.
This strain, Streptomyces sp. WAC 01438, in fact was missing one of the two fingerprint P450 sequences, while the other we amplified from this strain was very dissimilar to the other GPA BGC P450s, contributing to its long branch in the BGC tree. I thought about how if we wanted to recognize antibiotics different from what we already knew we would want to include more strains that didn’t perfectly fit the mold of a GPA and I thought about the worst case of including sequences totally dissimilar to GPA antibiotics how this approach could fail or lead us astray.
The known GPA BGCs are varied in their gene content – this variation is reflected ultimately in the diversity of the final molecules produced by these BGCs. I used the fingerprint sequences to identify a total of 71 putative GPA BGCs. I decided that if we wanted to understand the evolution of these molecules using a phylogenetic approach that it would be impossible to concatenate conserved fingerprint sequences because 1) the diversity of BGC biosynthesis is not contained in every BGC 2) some BGCs contained multiple copies of these sequences. It is not clear which copies ought to be aligned with each other. I reasoned that since each family of aligned sequences had its own evolutionary story that the history of GPA BGC evolution was contained the entire set of all these trees.
This scenario of BGC genes evolving within a set of species reminded me of parasites evolving within a host species. The technique of phylogenetic reconciliation was conceived to model this, and applied more recently to study how the dynamics of domain composition accounting for duplications, transfers and losses contribute to eukaryotic protein evolution. This takes a reference phylogeny, usually the species tree, and models how the parasite phylogenies evolve with respect to this reference. To help keep the model for transfers of genes or domains tractable, a constraint is used to enforce time-consistency for transfers. If a dated phylogeny is provided, then transfers are only allowed between contemporary branches, disallowing an exponential number of hypothetical transfer events between species branches of different ages.
I considered that by providing a fully dated bacterial species tree and looking for events that pinned down the nodes of each BGC gene/domain family tree to these dated nodes, I may be able to date specific events in GPA evolution. In this way the bacterial species tree represents both time (when, during bacterial evolution these events, labelled on the gene trees happen) and space (where, in which lineage, these events happen). Fortunately, the signal in the topologies of the gene trees are fairly consistent leading to a small number of well-supported species nodes for the origin of the GPA BGCs, specifically those than have the traditional mechanism of action of these antibiotics which we have called the true GPAs.
A more practical upshot of this work is that we are in a better position to define which sets of BGCs are distinct, being sister to the GPA BGCs. These BGCs, having distinct origins in terms of both time and space within this species phylogeny, nevertheless are composed of the same elements as the GPAs and are fertile ground for bioprospecting.
1. Thaker, M. N. et al. Identifying producers of antibacterial compounds by screening for antibiotic resistance. Nat Biotechnol 31, 922-927, doi:10.1038/nbt.2685 (2013).