The most abundant human-associated virus no longer an orphan

Over the last few years, microbiology in general and virology in particular have entered a new era, the age of metagenomics.

Go to the profile of Eugene Koonin
Nov 13, 2017
Upvote 4 Comment

The paper in Nature Microbiology is here:    

It has been a very quiet revolution but a revolution nonetheless. Metagenomics foregoes the customary step of growing a microbe or a virus in a lab culture for direct sequencing of the totality of the DNA or RNA from a given environment, sometimes, after fractionation, e.g, into bacterial and viral fractions. This brute force approach immensely expands the scope of biological diversity that accessible to researchers because the genomes of organisms that refuse to grow in the lab are now within reach. The impact of metagenomics is not simply quantitative: over and again, metagenomics yields discoveries with major biological implications and, importantly, defines new experimental directions. Metagenomics has already become the primary route to new virus discovery, and in a striking acceptance of the sea change, the International Committee for Taxonomy of Viruses has recently adopted new rules that allow the formal recognition of a new virus species or a higher taxon from metagenomic sequence analysis alone. It looks like, this decision all but “officially” ushers the new era. 

The discovery of crAssphage (so named after cross-Assembly) is arguably one of the most striking feats of metagenomics to date. The nearly 100 kilobase genome of this phage was assembled from multiple human gut microbiomes (hence cross-assembly) in the laboratory of Rob Edwards at San Diego State University. The paper published in Nature Communications in 2014 has reported that the crAssphage was the most abundant virus associated with humans – represented in a substantial majority of the individual gut microbiomes, and astonishingly, accounting for up to 90% of the DNA reads in the virus enriched fraction in many of them. This publication was a true sensation and a shock at the same time. Indeed, it has shown that, before metagenomics has come of age, we have been completely blind to one of the key components of our microbiomes. In retrospect, the principal reason seems clear: the likely hosts of crAssphage are bacteria of the phylum Bacteroidetes, major players in the microbiome  that also defy attempts on culturing, but have been pinned down by metagenomic approaches a few years before the crAssphage. Thus, both some of the most common microbial denizens of our intestines and their equally ubiquitous viruses comprise dark matter on which at this time only metagenomics can shed light.

As if the discovery of the most abundant human-associated virus was not enough, the crAssphage brought about another shocking surprise. The genome of the crAssphage effectively “looked like nothing in the world”. For most of the genes, no homologs were detectable, and even when some have been identified, they provided few clues to the biology of the phage (one protein implicated in virus-host interaction and pointing to Bacteroidetes as the likely host was an exception). No links to other phages could be established, and even the structure structural proteins of the crAssphage particle remained elusive.

When a genome of a potentially interesting and important organism comes out uninformative, experimenters are justifiably exasperated but specialists in computational genomics see an opportunity. For a few days after the first crAssphage publication, Mart Krupovic who was visiting my lab at the time and I have been peering into crAssphage proteins using all analysis tools available to us at the time. The result of the effort, though, was sheer disappointment. We detected a few additional homologies but these shed little light on the phage evolutionary relationships or reproduction strategy. With so many other genomes to deal with, the crAssphage was set aside – for 3 years as it turned out.

The new chapter in the crAssphage saga started with Rob Edwards’ seminar at the NCBI in April, 2017, which he was invited to present by Anca Segall, at the time a sabbatical visitor in my lab. After listening to Rob’s exciting talk, the temptation to go back to the genome was irresistible, and this time, the stars aligned right! Within a day, Natalya Yutin has identified the capsid protein of the crAssphage, and after about 8 weeks of intense computational analysis, we had fairly complete genomic maps for an expansive group of crAssphage-related bacteriophages. The paper describing these findings,  “Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut” by Natalya Yutin, Kira S. Makarova, Ayal B. Gussow, Mart Krupovic, Anca Segall, Robert A. Edwards, and Eugene V. Koonin, has been submitted to Nature Microbiology without delay and is coming out today.


A few of the crAss-like phages have been studied by traditional methods of virology but, in a now familiar pattern, the great majority come from various metagenomes, both human-associated and environmental. For all these phages, we predicted, with good confidence, the main structural proteins as well as those involved in replication and expression. Examination of the proteins implicated in the tail formation indicates that crAssphage and its relatives should be included in the family Podovoridae, phages with short, stubby tails. However, an overhaul of the phage taxonomy is imminent, and we expect that the crAss-like phages become a new family because they are so different from all other known phages. We hope and believe that the information contained in this paper provides a roadmap for experimental study of these undoubtedly important viruses in many labs.

Apart from the immediate importance of crAss-like phages, there is a general lesson here. Along with other recent studies, this work shows that nowadays, thanks to the explosive growth of metagenomic databases, when a new virus is discovered, chances are that it becomes a prototype of a new family and that analysis of the gene sequences predicts interesting and tractable new biology. However, in order to take advantage of the metagenomic treasure trove, creative use of the most powerful available sequence analysis tools is essential. Simply put, you have a good chance to see wonders if you know where and how to look.


Go to the profile of Eugene Koonin

Eugene Koonin

Senior Investigator, NIH

No comments yet.