The story started from my fascination with RNA, which not only serve as intermediates to convert the genetic code from DNA to protein, but also carry diverse cellular functions themselves. Knowledge about the sequence composition of an RNA transcript—from its 5' extremity to its 3' extremity—is a prerequisite for understanding its function. Next-generation RNA sequencing (RNA-seq) has revolutionized the way researchers conduct transcript profiling. However, although various flavors of RNA-seq have been developed to map RNA 5' ends or3' ends, methods capable of simultaneously determining transcription start sites (TSS) and termination sites (TTS) across a whole transcriptome are still a rarity. This deficiency is particularly notable for prokaryotic transcripts, which lack the poly(A) tails commonly exploited for analyzing eukaryotic RNA.
My laboratory at the Rockefeller University sought to fill this gap. Our strategy is simple: by converting linear RNA molecules into circles, we should be able to bring the two termini into the same sequencing read and therefore discern their nucleotide composition concurrently. It was easier said than done. After testing many reagents and experimental conditions, Xiangwu Ju, a postdoctoral fellow in the lab, finally found a protocol that faithfully recovers 5' and 3' ends of RNA of varying lengths. We named our method SEnd-seq—standing for Simultaneous 5' and 3' End Sequencing—and first applied it to the model bacterium Escherichia coli. Satisfyingly, our method not only reproduced previously annotated TSS and TSS, but also unveiled a large number of novel sites, many of which were subsequently validated by us. The updated E. colitranscriptome structure is surprisingly complex: many genes adopt different start sites depending on the bacterial growth condition; the choice of TSS often impacts where transcription is terminated—implying crosstalk between the two ends.
Speaking of transcription termination, there are two well-known mechanisms that inactivate the RNA-making machinery—a.k.a. RNA polymerase (RNAP)—in bacteria: one caused by secondary structures formed in the nascent RNA and the other mediated by an RNAP-dislodging motor protein Rho. SEnd-seq expands the catalogue of both types of terminators. Strikingly, our dataset also revealed another termination pattern that cannot be fully interpreted by either of the above two mechanisms: two highly expressed transcripts emanating from oppositely oriented start sites often converge in the middle at a well-defined site, producing an overlapping, bidirectional TTS. This phenomenon can be found between a pair of convergent genes, or between a protein-coding gene and a non-coding RNA.
After ruling out a few alternative explanations, we posited that such bidirectional termination might be caused by physical conflicts between the converging RNAPs, akin to two trains hit off the rail by head-on collisions. To gather evidence for this model, we performed in vitro transcription and in vivo genome editing experiments. When we eliminated transcription activity from one direction, RNAP from the other direction displayed a much higher tendency to run across the TTS, suggesting that convergent transcription is required for efficient bidirectional termination.
This kind of transcriptional interference has been reported before, but generally thought of as a nuisance, repressing the output of influenced genes. We show in the current work that such interference can be harnessed to prevent spurious RNAP readthrough and precisely shape transcript boundaries. Its impact is widespread in E. coli, affecting about one third of its protein-coding genes. In the future we plan to use SEnd-seq to study the transcriptome of other organisms to see if this termination mechanism is conserved. Next time when we investigate how a train is halted, we may need look no further than a crowed track.