Dude, where's my spacer?
Of accidental introductions and integrations: the story behind Neo-CRISPR Genesis
A link to the paper published in Nature Microbiology is here: http://go.nature.com/2DLWaDv
I joined George Church’s lab in late 2014 as a postdoc with a few different project ideas floating around in my head. Discussing them with George, most were met with a lukewarm reception, polite but not enthusiastic – not a good sign. I moped around for a bit trying to convince myself George must not know a good idea when he hears one (yeah right…). Not long after that, I came across a few papers about CRISPR adaptation buried in a stack of random junk other people in the lab had printed out and left behind on a communal desk. This made me quite excited. See, I had been fascinated with, and following closely since early in grad school, the outstanding questions surrounding the adaptation phase of CRISPR-Cas immunity. At that time, very little was known about the immunization process relative to how much was understood about the interference stage of CRISPR-Cas defense (eg. Cas9 etc). I quickly tracked down the fellow postdoc that had left the papers there (Seth Shipman). We chatted about some stuff he was working on and a few new ideas as well. So long story short, we teamed up and soon took the world by storm with the concept of ‘molecular recordings’ in CRISPR arrays, yada yada, and a spacer state space larger than the number of atoms in the observable universe and a few movie frames inside E.coli later and now we’re just waiting for the Oscars to call.
But now to this paper. One of the things that frustrated us from the very beginning of our experiments with Cas1-Cas2 in E. coli was the relatively low rates with which spacer integration occurred. As we were trying to build the system into a molecular recorder, we wanted the efficiency to be as high as possible to enable single-cell recordings. However, in a typical experiment, only ~10-20% of cells would undergo array expansion. We had lots of pet theories of why this might be the case, some of which we could test (and rule out), and some that we couldn’t. One theory I particularly liked was the idea that the Cas1-Cas2 proteins were putting the spacers into the genome with much higher frequencies than we were measuring, but that we just weren’t detecting them because they were going into random sites outside of the CRISPR array (ie. off-target). This concept is of course analogous to Cas9 off-target activity, which is a very well explored phenomenon. However, no one had yet elucidated the genome-wide specificity of Cas1-Cas2. Biology is never perfect, that’s fundamental to evolution, so I knew that Cas1-Cas2 had to be integrating spacers into the genome outside the CRISPR array at some frequency. The question, then, was how best to detect them?
We had recently come up with a method of coaxing E. coli to integrate synthetic spacers by electroporating the cells with short duplexed oligos. It occurred to me that if these oligos were getting integrated into the E. coli genome off-target, they would be relatively easy to find using deep whole genome sequencing. So that’s what I did. And, a bit to my surprise, we did actually find a few sequencing reads that we could clearly tell were off-target spacer integrations. But we wanted more! However, deep sequencing is kind of expensive, so we came up with a way of enriching for off-target integration events (Spacer-Seq), and ended up identifying nearly 700 unique off-target integration sites throughout the genome. This made us wonder whether there was any evidence for off-target integrations in any native genomic contexts. And if so, perhaps this was a mechanism of forming new CRISPR arrays. Low-and-behold, after combing CRISPR databases and some papers detailing CRISPR systems of diverse microbes, we found a couple instances of CRISPR arrays that may have actually as started random off-target integration events. We call the phenomenon “Neo-CRISPR Genesis."
So pretty cool story I think, but in the end, it still didn’t explain the low efficiency issue we were trying to address, as the on-target integration rate was still much higher than the off-target rates we found. I guess we’ll leave that to another experiment, another day. I need to go get fitted for a tux.