The viruses that were supposed to infect us but perhaps didn’t know it
Smacoviruses are found in animal faeces and are related to other minimalist eukaryotic viruses. But their hosts have never been found. However, in the related paper we report CRISPR spacers in a fecal archaeon that presumably come from human-associated smacoviruses. In consequence, smacoviruses could be the smallest known viruses to infect a prokaryotic cell.
Nowadays, it is hard to find a microbiologist that doesn't know what Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are. The spacers come from precursor sequences (protospacers) in invading genetic elements and are usually involved in the recognition stage of an immune response. When I started working with CRISPR, that name didn't even exists. For some, it was hard to believe that spacers came from invading genetic elements. And for most, it was unthinkable that, as we were saying, they were part of a prokaryotic immune system. Perhaps, I was too trustful and naïve. Despite independently coming up with the idea of CRISPR immunity, I ended up with a horrible experience that took most of the credit of my work and ideas.
After I finally read my Thesis, I decided to hone my code working on CRISPR. Two of the aspects I found interesting were predicting viral hosts and increasing search sensitivity to find unknown origins of spacers. Working in the group of Prof. Francisco Rodriguez Valera (Universidad Miguel Hernandez, Spain) I met Dr. Felipe Hernandes Coutinho who works with different in silico approaches to assign hosts to prokaryotic viruses including CRISPR spacers. I let him use my program to find CRISPR so that he could extract spacers. And he allowed me to access his collection of spacers and virus sequences, as well as the metadata that specified the taxonomy of viral hosts. When assessing the effects of different parameters to deduce the origin of spacers, I was surprised by the percentage of spacers that assigned a prokaryotic host to an eukaryotic virus. It is not that the proportion was excessively high, but it looked larger than it was to be expected by chance.
Then I took a closer look at those cases, looking for unambiguous associations that involved more than one spacer of the same organism matching sequences in the same virus. At first, there seemed to be many prokaryotic species with spacers matching eukaryotic viruses, mainly in Mycoplasma, which made a lot of sense. But it turned out that there were spacers with redundant sequences that produced many non-significant hits. Accordingly, I filtered low-complexity sequences. Most of the associations were lost, but not the relationship between smacoviruses and Candidatus Methanomassiliicoccus intestinalis.
The more I looked into the relationship, the more surprised I was. At first it was a single virus, but that virus had humans as assigned host. The virus had been sequenced from human faeces from children with unexplained gastroenteritis, which hinted at a possible role in pathogenesis. However, infection of humans hadn’t been observed. Initially I detected four matching spacers, but looking into related smacoviruses I was able to find more than 20. That number facilitated the recognition of a Protospacer Adjacent Motif (PAM), which supported the obtained matches.
I recognize that the term smacovirus was completely new to me. It also turned out that Smacoviridae is a family of uncultured viruses from a larger group known as CRESS-DNA (Circular Rep Encoding Single-Stranded DNA) viruses. CRESS-DNA viruses make up a significant part of single-stranded (ss) DNA viruses, and share a homologous replication circle protein. All known hosts of CRESS-DNA viruses are eukaryotes. CRESS-DNA include different families, such as Geminiviridae and Circoviridae, which are well known pathogens of plants and animals respectively. If you want to know more, there is a recent good review1 about “eukaryotic” CRESS DNA viruses including smacoviruses.
For a biological entity to be considered a virus it should be able at least to replicate and build a capsid to pack its DNA. And indeed, there are eukaryotic viruses that have a minimal structure as many CRESS-DNA viruses. Smacoviridae encode only the minimal components of a virus: the replication protein and a capsid protein.
Viruses that infect prokaryotes usually need specific machineries to penetrate and exit the host. And normally, prokaryotic viruses encode additional sets of genes with different purposes, including host recognition. That's why it is so surprising that a virus with only two genes could have internalized into an archaeon.
It would also be convincing if some sequence properties of the host, unrelated to CRISPR, could be found in the viruses. That could imply an adaptation of the smacoviruses to the predicted host. One peculiarity of Ca. M. intestinalis and other methanogens is the use of the amber stop codon (“UAG”) to code amino acid pyrrolisin. Therefore, its use for translation termination is diminished. So, I looked at the proportion of termination codons in a collection of smacoviral coding sequences. Could that property be found also in smacoviruses? Excactly, there is a similar under-representation of “UAG”.
It was already known that other related methanogen, Candidatus Methanomethylophilus alvus, also has an imperfect possible match against a smacovirus. When it was too late to include it in the parer, I also found another match to a smacovirus in an assembly contig of another genome from the same genus (Candidatus Methanomethylophilus sp. UBA78). Therefore, it is possible that the relationship between Smacoviridae and Ca. M. intestinalis extends to other species of the order Methanomassiliicoccales.
If you find this story interesting and want to know more details, I invite you to read the related paper.
1. Zhao, L., Rosario, K., Breitbart, M. & Duffy, S. Eukaryotic Circular Rep-Encoding Single-Stranded DNA (CRESS DNA) Viruses: Ubiquitous Viruses With Small Genomes and a Diverse Host Range. Adv. Virus Res. 103, 71–133 (2019).