Genomic analysis of bacterial communities can often be tricky, particularly in environments with many uncharacterized species. Let’s say you are interested in a specific species, because you repeatedly find its 16S rRNA sequence in your amplicon datasets and you want to learn more about its capabilities and its role in the environment. But this species does somehow not like the culturing medium you are using or the conditions you try to grow it in. So you decide to take a culture-independent approach and sequence a metagenome. You get some genome bins, but you are unsure, whether they represent the species you are interested in, because they lack the 16S rRNA gene sequence as the link to your amplicon data.
This was the situation we found ourselves in when we began our project. In our research group we are studying the succession of bacteria thriving during and after phytoplankton blooms in the North Sea. We extensively use fluorescence in situ hybridization (FISH) as it allows for counting the relative abundances of bacterial clades based on 16S rRNA sequences and using FISH, we recurrently detected a flavobacterial clade named “Vis6”, which resisted laboratory cultivation. Having the 16S rRNA based probe at hand and the flow cytometer with cell sorter down the corridor, we thought about combining these two methods for a targeted genomics approach. The pipeline we had in mind introduces a 16S rRNA based FISH signal into the cells of a species of interest within a bacterial community. These cells are then sorted based on that FISH signal by fluorescence activated cell sorting (FACS), resulting in an enrichment of very limited diversity (ideally on the species level) for subsequent genomic sequencing. Genome annotation then provides clues about the capabilities and functions of the targeted species in the given environment.
So far, so good, but we were facing two major challenges: 1.) A very bright FISH signal was needed for the detection and sorting of targeted cells using flow cytometry and 2.) Sufficient unimpaired DNA material was required for high quality genome sequencing. Our lab (Max Planck Institute for Marine Microbiology) had its expertise in FISH methods and we teamed up with DOE’s Joint Genome Institute (JGI) who brought along the sequencing expertise, in particular from little starting material. Our work consisted of a lot of troubleshooting using laboratory cultures, but eventually we developed a FISH&FACS protocol that provided high FISH signal intensities without impairing the sequencing quality by testing and validating the most suitable cell fixation approach and assessing the optimal number of sorted cells.
With this optimized pipeline at hand, we sampled seawater from the North Sea and hybridized it with our Vis6-specific FISH probe. The sorted cell enrichments were genome sequenced and the resulting mini-metagenomes of reduced diversity yielded good quality metagenome assembled genomes (MAGs). Due to the encoded 16S rRNA gene sequences and further supported by the FISH probes used, we were able to assign the taxonomic identity to the MAGs. Gene annotations of these MAGs revealed that Vis6 is a putative polysaccharide and protein degrader.
We see the main application of our pipeline in filling a gap that metagenomics and single cell genomics leave in the field of culture independent species descriptions and a valuable addition to the toolkit of cultivation-independent genomics. Metagenomes are challenged by highly complex samples and the bins are often lacking the 16S rRNA genes as a taxonomic marker. While single cell genomes most often have this taxonomic link, they are generally hampered by low genome completeness. Both methods retrieve highly abundant species more often than species of lower abundance. With a targeted genomic approach, such as the FISH&FACS pipeline we describe in our paper, specific groups or species can be genomically interrogated, opening a window into the rare biosphere, even of highly complex samples.