Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Nov 29.
Published in final edited form as: Structure. 2025 Oct 9;33(12):2058–2070.e6. doi: 10.1016/j.str.2025.09.007

Structures reveal how the Cas1–2/3 integrase captures, delivers, and integrates foreign DNA into CRISPR loci

William S Henriques 1,4, Jarrett Bowman 1, Laina N Hall 1,5, Colin C Gauvin 2, Hui Wei 3, Huihui Kuang 3, Christina M Zimanyi 3, Edward T Eng 3, Andrew Santiago-Frangos 1,6, Blake Wiedenheft 1,7,*
PMCID: PMC12661925  NIHMSID: NIHMS2112186  PMID: 41072406

Abstract

SUMMARY

Cas1 and Cas2 are the hallmark proteins of prokaryotic adaptive immunity. However, these two proteins are often fused to other proteins and the functional association of these fusions often remain poorly understood. Here we purify and determine structures of Cas1 and the Cas2/3 fusion proteins from Pseudomonas aeruginosa at distinct stages of CRISPR adaptation. Collectively, these structures reveal a prominent, positively charged channel on one face of the integration complex that captures short fragments of foreign DNA. Foreign DNA binding triggers conformational changes in Cas2/3 that expose new DNA binding surfaces necessary for homing the DNA-bound integrase to specific CRISPR loci. The length of the foreign DNA substrate determines if Cas1–2/3 docks completely onto the CRISPR repeat to successfully catalyze two sequential transesterification reactions required for integration. Together, these structures clarify how the Cas1–2/3 proteins orchestrate foreign DNA capture, site-specific delivery, and integration of new DNA into the bacterial genome.

In brief

Bacteria defend themselves against viruses by storing pieces of viral DNA in their genomes. Henriques et al. use cryogenic electron microscopy to determine how the Cas1–2/3 integrase captures, transports, and inserts foreign DNA precisely into CRISPR arrays, providing new insight into the fundamental process of adaptive immunity in microbes.

Graphical Abstract

graphic file with name nihms-2112186-f0006.jpg

INTRODUCTION

Adaptive immunity requires genomic recombination to generate molecular memories of pathogens. In both V(D)J recombination in eukaryotes and CRISPR-Cas adaptive immunity in prokaryotes, viral-derived transposases have been independently domesticated to fight foreign invaders.16 However, the mechanism of adaptive immunity differs significantly. In jawed vertebrates, the RAG recombinase splices together fragments (i.e., V, D, and J) of genomic DNA, but these adaptive rearrangements are not heritable.1,79 In contrast, prokaryotic Cas integrases capture and insert short fragments of foreign DNA (protospacers) into CRISPR loci, creating an updated and heritable record of circulating viral predators.1013

Cas1 is a striking example of a selfish transposase gene repurposed for host defense.3,14 CRISPR-Cas systems, nearly all of which contain Cas1, are found in ~90% of archaea and 40% of bacteria.3,1416 Cas1 no longer mobilizes a selfish genetic element but protects the cellular host by providing the CRISPR interference machinery with updated guide sequences derived from selfish genetic elements.1721 During the adaptation stage of CRISPR-Cas adaptive immunity, Cas1 and Cas2 assemble into a heterohexameric complex that captures and integrates fragments of foreign DNA into the CRISPR array.6,10,17,18,2126 These fragments often contain a 2–5 bp protospacer adjacent motif (PAM) that functions to distinguish self from non-self DNA and facilitates directional integration into the CRISPR array.27 In some CRISPR systems, dedicated Cas proteins recognize and trim the PAM prior to integration, while in other systems this function is performed by non-Cas host proteins such as DnaQ exonucleases.2833 However, the mechanism of PAM trimming remains unknown in many CRISPR systems that lack the canonical PAM-trimming protein Cas4.22,32,33

Cas3 is a nuclease/helicase found in all type I CRISPR systems.16,34 The nuclease/helicase generates foreign DNA fragments by degrading foreign DNA identified by the CRISPR RNA-guided surveillance complex (i.e., Cascade or Csy complex).23,3540 Cas3 functionally couples the interference and adaptation stages of CRISPR immunity by generating double-stranded DNA (dsDNA) fragments enriched in PAMs,22,34,41,42 which serve as substrates for integration.20,41,42 This link between generating (Cas3) and integrating (Cas1 and 2) fragments of foreign DNA is further highlighted in type I-F CRISPR-Cas systems, where Cas3 is genetically fused to the C-terminus of Cas2.16 Cas1 and Cas2/3 assemble into a 2-fold symmetric propeller-shaped heterohexamer where Cas1 represses the Cas2/3-nuclease activity through electrostatic interactions at a single interface.20,41 However, the Cas1–2/3 complex is anticipated to adopt distinct conformational states that support its diverse roles in target degradation, foreign DNA capture, site specific DNA delivery, and foreign DNA integration.20,41,43

Here we determine cryo-electron microscopy (cryo-EM) structures of the type I-F Cas1–2/3 integration complex from Pseudomonas aeruginosa in multiple conformational states. These structures explain how Cas1–2/3 recruits short fragments of foreign DNA to a positively charged channel on one face of the complex. Foreign DNA binding to this face functions as an allosteric switch that triggers conformational rearrangements, exposing DNA binding sites previously occluded by Cas1 and Cas3. Cas3 sterically regulates the integrase but does not trim the PAM. Nonetheless, PAM trimming is an essential prerequisite for positioning foreign DNA in the Cas1 active site and completion of the second transesterification reaction. Taken together, these cryo-EM structures illustrate how the Cas1–2/3 complex orchestrates foreign DNA capture, site-specific delivery, and integration of new DNA into the bacterial genome.

RESULTS

Cas1–2/3 forms a DNA capture complex

To determine how the Cas1–2/3 integrase captures fragments of foreign DNA, we set out to determine the first high-resolution structure of a Cas1–2/3 complex in the absence of nucleic acid. Cas1 and Cas2/3 proteins from the PA14 strain of Pseudomonas aeruginosa were co-expressed in Escherichia coli BL21 cells. An amino-terminal Strep tag on Cas1 pulls down untagged Cas2/3 in a stable complex that elutes from a size exclusion column as a single peak with an estimated molecular weight of ~380 kDa, corresponding to a heterohexameric assembly with four Cas1 subunits and two Cas2/3 subunits (Figures 1A and S1).20,41 Purified Cas1–2/3 was concentrated to 12 μM, spotted on cryo-EM grids, blotted, and plunge frozen in liquid ethane. Frozen grids were imaged using a 300 keV cryo-electron microscope equipped with an energy filter. Grid squares with thin ice and a dense particle distribution were used to collect 3,944 movies. After preliminary motion correction, CTF-correction, blob picking, and 2D classification, ab inito reconstruction and non-uniform refinement were used to reconstruct an initial volume that was used as a template for particle picking. Template-picked particles (303,250) were extracted from 3,293 cryo-EM movies to determine a 3.3 Å resolution structure of the complex (Figures 1B, 1C, and S1). To build an atomic model of the complex, an AlphaFold3-predicted structure of the complex was fit into the electron density map using ChimeraX, followed by iterative refinement in Isolde, Phenix, and Coot.4446 This approach resulted in an atomic model across 96% of Cas1–2/3 residues with 3,138 unambiguously positioned sidechains (Table 1).

Figure 1. One side of the Cas1–2/3 integrase has a positively charged foreign DNA-binding channel.

Figure 1.

(A) Schematic of the type I-F CRISPR locus (purple) and cas genes from Pseudomonas aeruginosa. The asterisks (*) indicates nuclease domains.

(B) A 3.3 Å resolution cryo-EM structure of the Cas1–2/3 heterohexamer reveals a propeller shaped complex. One complex face binds foreign DNA, the other binds the CRISPR repeat. The asterisks (*) indicates the Cas1 subunits where transesterification will occur. Density map displayed at 80% transparency (threshold = 0.109), with the atomic model docked inside and colored by subunit.

(C) A large patch of basic residues is present on one face of the complex. This patch forms a channel 24 Å wide and 75 Å long. Atomic model is colored by electrostatic potential (kcal/mol*e at 298 K).

Table 1.

Refinement statistics for all models and map

Figure Figure 1 Figure 3 Figure 4 Figure 4 Figures 4 + 5 Figures 4 + 5 Figures 4 + 5

PDB 9P11 9P1D NA NA NA NA NA
EMDB EMD-71091 EMD-71097 EMD-71112 EMD-71101 EMD-71114 EMD-71115 EMD-71116

Data collection and processing

Nominal magnification 81,000 96,000 36,000 36,000 36,000 36,000 36,000
Voltage (keV) 300 300 200 200 200 200 200
Electron exposure (e—/A^2 52.51 51.78 65 65 65 65 65
Defocus range (μm) 0.8–2.5 0.9–2.8 0.5–2.5 0.5–2.5 0.5–2.5 0.5–2.5 0.5–2.5
Pixel size (A) 1.0691 0.833 0.576 or 1.152 0.576 or 1.152 1.104 1.152 1.152
Number of frames 40 frames 40 frames 50 frames 50 frames 50 frames 50 frames 50 frames
Total/Accepted micrographs 3944/3293 5403/2332 10396/6389 10396/6389 7666/5462 7666/5462 7666/5462
Symmetry imposed C2 C1 C1 C1 C1 C1 C1
Initial particle images (no.) 1,716,883 1,083,100 1,495,601 1,495,601 719,626 719,626 719,626
Final particle images (no.) 303,250 28,665 116,378 155,981 121,485 131,056 93,957
Map resolution (A) 3.5 3.31 3.9 3.99 3.92 3.89 3.79
FSC threshold 0.143 0.143 0.143 0.143 0.143 0.143 0.143
Map resolution range (A) 2.8–5.1 1.77–41.78

Refinement

Initial model used (PDB code) AlphaFold 3 AlphaFold 3
Model resolution (A) 3.3 3.3
Fsc threshold 0.143 0.143
Map sharpening B factor (A2) NA NA

Model composition

Non-hydrogen atoms 24,446 (0) 34,725 (16,472 hydrogens)
Protein residues 3,340 2,322
Nucleotides 0 59

B factors (A^2) (min/max/mean)

Protein 11.63/121.16/51.46 4.68/236.79/92.51
Nucleotides NA 0/261.75/87.85

RMSDs

Bond lengths (# > 4σ) 0.003 (0) 0.005 (0)
Bond angles (# > 4σ) 0.585 (12) 0.561 (2)

Validation

MolProbity Score 1.56 1.61
Clashscore 5.78 5.88
Poor rotamers (%) 0.14 0.39

Ramachandran plot

Favored 96.31 95.88
Allowed 3.39 4.12
Disallowed 0.3 0

Cas1–2/3 is shaped like a four-bladed propeller. The Cas1 homodimers and Cas3 domains form the blades around the central Cas2 homodimer (Figure 1B). In type I-F systems, the small Cas2 subunits (~12 kDa) are uniquely fused to the large Cas3 protein (~109 kDa) by a linker sequence (residues 95–104), which is not resolved in the density due to its flexibility. To understand if the fusion of the large Cas3 effector to the C-terminus of Cas2 alters the core integrase architecture, we compared the Cas1–2/3 structure to previously determined structures of Cas1–2 alone (Figure S2).26,47 All Cas2 proteins described to date have the same N-terminal ferredoxin-like fold (βαββαβ topology) and a fifth anti-parallel beta strand that completes the homodimer interface.26,4749 This fifth beta strand is provided by the same molecule in type I-E systems and by the opposing Cas2 molecule in other CRISPR systems.47,50 Whether the fifth beta strand is provided in cis or trans corresponds to two points of contact with Cas1 in type I-E systems (cis) or one Cas1 interaction in type I-A systems (trans) (Figure S2; Table S1).51,52 Like the I-E system, the I-F Cas2 provides the fifth anti-parallel beta strand in cis and interacts with two Cas1 molecules to create a molecular ruler that measures DNA sequences of similar lengths in both type I-E and type I-F systems.19,26 The C-terminal fusion of Cas3 to Cas2 does not fundamentally alter the core Cas1-Cas2 integrase architecture (Figure S2).

Previous work established that four residues in each Cas2/3 molecule coordinate the phosphate backbone of double-stranded B-form foreign DNA during integration.43 These four Cas2 residues (K11, R18, T93, and C94) form the edges of a 24 Å -wide positively charged channel, which we designate the “foreign DNA binding face” (Figures 1B, 1C, and S3). Histidine residues in Cas1 (H25) are like wedges on either end of the channel (Figures 1C and S3). These histidines split dsDNA and direct 3′ ssDNA ends down the electrostatic funnel toward the Cas1 active site.43 The length (75 Å ) and the width (24 Å ) of the “foreign DNA binding channel” can accommodate 22 base-pairs of B-form DNA and is the only face of the Cas2 homodimer accessible for DNA binding (Figure 1). The opposing “CRISPR repeat binding face” of Cas2 is obscured by Cas1 homodimers (Figure 1B). Previous biochemical analysis reveals that a K11D charge-swap mutation in Cas2 prevents integration of foreign DNA into the CRISPR array.43 Thus, the cryo-EM structure of Cas1–2/3 in the absence of DNA reveals that the solvent accessible face of Cas2 forms a positively charged foreign DNA binding channel.

To understand if the residues involved in binding and guiding DNA to the Cas1 active sites are conserved, we calculated conservation scores for type I-F Cas1 and Cas2/3 proteins from the non-redundant database at NCBI (see STAR Methods). This analysis revealed a pattern of conservation across the foreign DNA binding face of Cas1–2/3 and low conservation on other solvent-exposed surfaces (Figure S3). Three of the four DNA binding residues on Cas2 (K11, R18, and T93) are conserved, as well as the positively charged residues in the electrostatic funnel that leads to the Cas1 active site (R72, R76, R293, and R259) (Figure S3). To understand if the propeller assembly itself is a conserved feature of the Cas1–2/3 complex, we used AlphaFold3 to predict the structure of four distinct Cas1–2/3 complexes. All predicted structures of the Cas1–2/3 complex preserved the four-bladed propeller architecture observed in the experimentally determine structure (Figure S3). These comparisons indicate that the foreign DNA binding channel and propeller assembly are conserved structural features of I-F CRISPR-Cas integrase.

Cas3 regulation by a conserved gate and latch

Cas3 is a histidine/aspartate (HD) nuclease and a superfamily 2 (SF2) helicase that degrades targeted foreign DNA during type I CRISPR interference.34,40 Previous studies show that Cas1 inhibits the nuclease activity of Cas3, while Cas1–2/3 recruitment to target-bound Cascade relieves Cas3 inhibition and leads to rapid DNA degradation.35,36,41,53 The cryo-EM structure of Cas1–2/3 reveals the RecA1 and RecA2 helicase domains stack to form a solvent-accessible cleft for ATP binding and hydrolysis, while the HD nuclease domain associates with the RecA1 domain (Figures 2A and 2B).3537,54 In the absence of Cascade, a loop from RecA1 (residues 483–511) covers the HD active site like a closed gate (Figure 2B). At the top of this loop, R148 and K127 of the HD domain form hydrogen bonds with G498 and E500 (Figure 2C). These hydrogen bonds hold the gate over weak unmodelled density, which we attribute to the heterogeneous occupation of two divalent metal cations in the HD active site that are known to be critical for HD nuclease function (Figure 2C).35,40,55 Examining the alignment of Cas2/3 fusion sequences (n = 685) revealed a well-conserved motif (GSES, residues 498–501) that appears to function as a latch (Figure 2D). Notably, when Cas2/3 associates with Cascade, the single-stranded R-loop displaces the latch and the gate flips open, creating a direct path to the HD active site where ssDNA cleavage occurs (Figure 2E).56 The RecA1 gate hinges at a conserved motif found across SF2 helicases (motif 1b), including Cas3s, suggesting that this hinge may be functionally conserved in SF2 helicases.34 Auto-inhibition of the HD nuclease of Cas3 by insertions in the RecA1 helicase has been described in the type I-A system of Pyrococcus furiosus,57 and a similar gate-like feature is displaced upon ssDNA binding in crystal structures of Cas3 from type I-E systems.35,58 While the gate-like insertion in RecA1 is structurally conserved, the amino acid sequence differs across systems. Collectively, this work and prior observations suggest that Cas3 RecA1 insertion sequences gate access to the HD nuclease, which may help prevent off-target DNA degradation by Cas3.

Figure 2. A conserved gate-like feature controls access to the Cas3 HD nuclease active site.

Figure 2.

(A) Domain architecture of the Cas2/3 fusion.

(B) A RecA1 insertion (residues 483–511, colored orange) protrudes into the HD-nuclease active site of Cas3. One Cas2/3 molecule colored by domain, while all other molecules are displayed at 90% transparency. The yellow ATP label indicates the ATP binding site between RecA1 and RecA2.

(C) Residues of the HD domain (K127, R148, and D315) interact with the gate residues (G498, E500, and S501) latching the gate over HD active site, where two divalent cations are positioned in the active site (gray circles, labeled 2+). Hydrogen bonds are shown in turquoise, with the distance between hydrogen-bonded atoms indicated.

(D) The RecA1 gate and latch contains two sequence motifs, motif 1b at the glycine hinge (G479) and the latch motif (GSES, residues 498–501), that are conserved in type I-F systems. WebLogo built from a sequence alignment of n = 685 Cas2/3 sequences.

(E) The R-loop formed by Cascade during RNA-guided strand invasion displaces the HD gate during interference (PDB: 8k24) reveals the HD gate is flipped out of the active site, rotating from the glycine hinge (motif 1b). Single-stranded DNA (ssDNA) travels along the path previously occluded by the gate to the HD active site, indicated with an asterisk (*).

Cas3 blocks leader binding sites of Cas2

Previous work has shown that Cas2 is essential for bending the CRISPR leader during foreign DNA integration in type I-F systems and that residues R55 and N56 are critical for this interaction.43 During integration, R55 intercalates between deoxyribose sugars and N56 stabilizes the phosphate backbone at the conserved type I-F inverted repeat motifs.19,43 Based on these interactions, we named the face of Cas2 containing these residues the “leader binding site.” In the absence of DNA, the leader binding site is masked by a network of hydrogen bonds and salt bridges between residues in Cas2 (R54, R55, K50, and E71), and the RecA1 domain of Cas3 (N373, D472, D473, T517, E519, D520, R525, and R529) (Figures 1, S3, and S4). Residues R55 and R54 of Cas2 form the most extensive network of hydrogen bonds with the RecA1 domain (Figure S4). Both the cryo-EM map and AlphaFold predictions reveal the position of Cas3 over the leader recognition sites is further stabilized through an elaborate network of hydrogen bonds with Cas1 dimers on opposite ends of the complex (Figure S4). Collectively, these interactions stabilize Cas3 over the leader binding sites on Cas2.

Foreign DNA capture exposes DNA binding sites

The propeller-shaped conformation of Cas1–2/3 alone is drastically different from the conformation of Cas1–2/3 bound to a synthetic integration intermediate (Figures 1 and 3).43 To determine how the Cas1–2/3 complex positions short DNA fragments for PAM trimming and integration, we incubated Cas1–2/3 with a 32 bp fragment of foreign DNA. This DNA fragment contained 22 bp of dsDNA and 5 nt of splayed single-stranded ends with two additional 3′ GG PAM nucleotides added to one strand (Figure S5). The DNA-bound Cas1–2/3 complex eluted from a size exclusion column with an estimated molecular weight of 415 kDa, which is consistent with a 1:1 stoichiometry of Cas1–2/3 and foreign DNA (Figure S5). A cryo-EM dataset of 5,552 movies was collected and processed to determine a 3.31 Å reconstruction of this DNA-bound Cas1–2/3 complex (Figures 3 and S5; Table 1). Conformational heterogeneity and asymmetry were notable features of this dataset compared to data collected on the Cas1–2/3 complex without DNA. However, during processing, no classes corresponding to the DNA-free complex were observed, indicating that foreign DNA stably and uniformly associates with Cas1–2/3. The primary source of heterogeneity was the position of Cas3 relative to Cas1–2 (Figure S5). This heterogeneity was not a result of particle picking bias, since blob picking followed by cryoSPARC template picking, crYOLO, Topaz, or cryoSPARC’s DeepPicker all yielded particle stacks of similar sizes and heterogeneity.5961 An AlphaFold3 prediction of the DNA-bound Cas1–2/3 complex positioned Cas3 in two different orientations relative to Cas1–2, providing further evidence that DNA binding to the foreign DNA binding face destabilizes Cas3 positioning (Figure S6).

Figure 3. Foreign DNA triggers a conformational rearrangement that exposes additional DNA binding sites.

Figure 3.

(A) Atomic model colored by subunit docked into ~3.3 Å density map from a non-uniform refinement shown at 80% transparency (threshold = 0.178). One Cas3 (shown as an outline) is conformationally flexible.

(B) The linker between C94 and D105 in Cas2/3 is flexible and unresolved in the absence of foreign DNA (shown as a dashed line in cyan).

(C) Foreign DNA binding extends and orders the 10 amino acid linker between Cas2 (first residue: C94) and Cas3 (last residue: D105), changing the position of Cas3 relative to Cas1–2.

(D) Foreign DNA binding exposes two leader-binding sites and a basic channel for the CRISPR repeat. The leader binding sites sit at the base of a 35 Å channel, measured from D388 on Cas3a (blue dot) to G236 on Cas1b* (brown dot). The length of the repeat channel was measured from Cas1a* H271 to Cas1b* H271 (~100 Å, positioned beneath the 3′ label). The width of the CRISPR repeat binding channel was measured from R169 Cas1a to R169 on Cas1b (beige dots). Atomic model is colored by electrostatic potential (kcal/mol*e at 298 K).

To build an atomic model of the complex, the AlphaFold3 predictions of the Cas3 domain and the Cas1–2 hexamer bound to dsDNA were independently rigid-body fit into the density in ChimeraX followed by jiggle fitting in Coot.45 This model was used as a starting point for iterative refinement in Isolde, Phenix, and Coot,4446 leading to a map with 66% of main chain residues unambiguously modeled (Table S1). This model captures both Cas1 homodimers, the Cas2 homodimer, and foreign DNA, but only one of the two Cas3 lobes (Figure 3A). Given that Cas2 and Cas3 are fused, the presence of Cas2 and absence of Cas3 in the cryo-EM density map indicates that Cas3 is averaged out during processing because it exists in multiple conformations. Thus, this structure reveals that DNA binding triggers release of Cas3 from the leader binding sites of Cas2. One Cas3 domain is stabilized in a new position against the Cas1 dimer while the opposing Cas3 rotates freely.

The DNA fragment preferentially binds to the foreign DNA binding face of Cas1–2/3 (Figure 3). DNA binding triggers conformational changes that expose three additional DNA binding sites involved in integration (Figure 3). The individual domain architectures of Cas2 and Cas3 remain largely unchanged after binding foreign DNA (root-mean-square deviation [RMSD] <1 Å across 1,066 unpruned Cα pairs), but the positioning between domains changes dramatically. Upon foreign DNA binding, the linker between the Cas2 and Cas3 domain (residues 95–104) rigidifies, extends, and rotates to reposition Cas3 (Figures 3B and 3C). This rearrangement breaks all the previous Cas3 interactions with both Cas1 homodimers and Cas2, and establishes new interactions between Cas3 and a single Cas1 homodimer (Figures 1, 3, and S4). The new Cas3 position creates a 35 Å wide positively charged channel between Cas3 and Cas1 that exposes the CRISPR leader binding sites on both sides of the Cas2 homodimer (Figure 3D, white circles). The Cas1 dimers each rotate out and down, which opens a 26 Å wide positively charged channel that we designate the “CRISPR repeat channel” based on previous work (Figure 3D, white diamonds).17,33,43 The Cas2 homodimer forms the floor of this channel. The transesterification sites of the active Cas1 subunits are 100 Å apart at either end of the CRISPR repeat channel, while the inactive Cas1 subunits form the walls of the channel (Figure 3). The domain arrangements of Cas1–2/3 bound to foreign DNA are nearly identical to the domain arrangements of the previously determined type I-F integration complex (PDB: 8FLJ),43 revealing that foreign DNA binding is sufficient to position the Cas1–2/3 complex for integration.

DNA positioned for PAM trimming and integration

Despite the presence of three additional exposed positively charged DNA binding sites, nucleic acid density is only observed on the foreign DNA binding face (Figures 3 and S5). The absence of nucleic acid density at the other DNA binding sites suggests that recruitment of nucleic acid to the leader binding sites and repeat channel requires sequence- or structure-specific interactions to coordinate assembly of the integration complex at the CRISPR locus (Figures 3 and S5). As in the previously determined structure of the integration complex,43 22 base pairs of the double-stranded DNA is positioned in the foreign DNA binding channel by residues K11 and R18 of Cas2 through hydrogen bonds with the phosphate backbone (Figure S6). The H25 wedge splits the double stranded nucleic acid and routes the 3′ strand down a positively charged channel leading to the Cas1 active site (Figure S6), where the 3′ ends of foreign DNA are positioned for transesterification. This positioning is further stabilized by hydrogen bonds between T93 and C94 of Cas2/3 and the foreign DNA (Figure S6). The residues involved in positioning the foreign DNA are conserved in type I-F systems, highlighting their functional importance (Figure S6).

The foreign DNA substrate contained an additional 3′ GG dinucleotide PAM on one end (Figure S5). The PAM is critical for directional insertion of foreign DNA into the CRISPR array, but the protospacer-adjacent motif must be removed prior to or during DNA integration.29,30,33,43 However, the mechanism of PAM trimming is unknown in type I-F CRISPR systems. We hypothesized that Cas3 might play a role in PAM trimming because it contains a nucleolytic HD domain and undergoes substantial rearrangement relative to Cas2 and Cas1 upon foreign DNA binding. However, after foreign DNA binding to the Cas1–2/3 complex, the Cas1 transesterification sites are positioned ~80 and 100 Å away from the HD domain (Figures 3 and S6). Though Cas3 is both an endonuclease capable of nicking ssDNA and a 3′ to 5′ exonuclease,40 it’s positioning in the Cas1–2/3 complex precludes a role for it in PAM trimming without additional conformational changes.

DNA-induced structural rearrangements expose the faces of Cas1 and Cas2 that have been shown to interact with PAM-trimming exonucleases in other CRISPR systems (Figure S6).31,33 To understand if Cas3 would sterically block access to the PAM, we docked a previously determined structure of Cas1-Cas2-Cas4 bound to foreign DNA onto the atomic model of the Cas1–2/3 complex bound to foreign DNA (Figure S6). While the I-F system does not contain a Cas4 protein, it’s role in PAM trimming serves as a structural proxy for the unidentified nuclease responsible for PAM trimming in type I-F systems. Docking reveals that the 35 Å channel between Cas1 and Cas3 accommodates Cas4 without steric clashes between Cas3 and Cas4 (Figure S6), demonstrating that foreign DNA-induced conformational changes may also facilitate the recruitment of an unidentified PAM trimming nuclease by exposing a conserved face of Cas1 that positions the 3′ PAM for trimming (Figure S6). Weak density in a global refinement of the foreign DNA-bound Cas1–2/3 complex refined to 3.3 Å resolution suggests that the PAM is positioned in the Cas3-stabilized side of the complex. To unambiguously assign PAM nucleotide positions, each Cas1 dimer was refined separately using local masks. Surprisingly, the two Cas1 dimer volumes did not differ significantly. This result suggests that the nucleic acid provided did not completely lock the PAM in a specific orientation at either Cas1 active site. However, the DNA-induced asymmetry suggests a role for the PAM in asymmetric stabilization of the complex.

The PAM alters integration states

The Cas1–2/3 integrase is a C2 symmetric heterohexamer (Figure 1).17,18,30,43 This symmetric integrase must home in on the CRISPR leader in an asymmetric orientation that ensures that the spacer is integrated into the CRISPR array in a direction that results in transcription of a CRISPR RNA complementary to an invading DNA sequence adjacent to the PAM.17,43 Integration host factor (IHF) is critical for introducing folds in the CRISPR leader recognized by Cas1–2/3. Previous in vitro biochemical experiments performed using purified components of the type I-F system revealed that the PAM stalls integration after the first transesterification reaction.17,43 To understand if the PAM functions to orient the integrase onto the CRISPR array, we determined the structures of Cas1–2/3 integrating foreign DNA with and without a PAM into the CRISPR array (Figures 4, S7, and S8). DNA substrates with or without a PAM were incubated with purified Cas1–2/3 and then mixed with purified IHF and 200 bp of the leader, along with the first repeat and spacer sequence.

Figure 4. Foreign DNA length affects complex stability and sequential transesterification.

Figure 4.

(A–C) Purified Cas1–2/3 protein was mixed with 32 bp (No PAM) fragments of foreign DNA or 34 bp PAM-containing DNA, and then mixed with the IHF-bent CRISPR leader. The integration complex was purified by size exclusion chromatography (SEC), and imaged by cryo-EM. The first and second transesterification sites are labeled with an asterisk (*). Protein subunits with variable density between samples or between integration stages are outlined in solid colors (orange = no PAM, red = PAM). Missing subunits due to conformational flexibility are displayed as gray shadows with dashed outlines. The CRISPR repeat face is displayed for all volumes. The foreign DNA and CRISPR leader are mostly obscured at this viewing angle but are present in all volumes (see Figures S7 and S8).

Density maps from non-uniform refinements of the integration complex at three stages of integration in both conditions are displayed (threshold = 0.103).

(A) The pre-integration genome association complex without a PAM (n = 121,485 particles, left) or with a PAM (n = 82,751 particles, right). The CRISPR leader is labeled, but largely obscured at this viewing angle.

(B) The integration complex after the first transesterification without a PAM (n = 97,437 particles, left) or with a PAM (n = 94,204 particles, right), with the leader-repeat junction positioned in the first transesterification site (CRISPR repeat labeled and colored purple).

(C) In the absence of a PAM, density corresponding to the CRISPR repeat is positioned in the active site of the Cas1 subunit responsible for the second transesterification (n = 93,957 particles, left), and the CRISPR repeat stretches across the channel. No complex corresponding to the second transesterification reaction is observed in the presence of a PAM (right).

This mixture was incubated at 35°C for 25 min then subjected to size exclusion chromatography (Figures S7 and S8). Samples with the PAM and without the PAM were used for cryo-EM and five distinct mid-resolution electron density maps of the integration complex were refined: two maps from the PAM-containing dataset (3.9 Å and 4 Å ), and three from the no PAM dataset (3.9, 3.9, and 3.8 Å ) (Figures 4, S7I, S7J, and S8IS8K; Table 1). In all volumes, Cas1–2/3 associates with the CRISPR leader, which forms a U-shaped bend over the foreign DNA bound to the foreign DNA face of the complex (Figures S7I, S7J, and S8IS8K). The straight pillars of the “U” bind to the Cas1–2/3 at the genome binding sites (Figures 3 and 4A). In all volumes, the shorter foreign DNA fragments binds to the foreign DNA binding face of Cas1–2/3 and there is no evidence for the longer CRISPR leader fragments bound at this location, providing structural confirmation that Cas1–2/3 has a length preference on the foreign DNA binding face.20 Cas1–2/3 induces a second fold in the CRISPR leader to position the leader-repeat junction in the first transesterification site (site 1) (Figures 4B, S7, and S8). As observed in the delivery complex, the integration complex is asymmetrically stabilized, with density for one Cas3 averaged out due to conformational flexibility in all volumes (Figures 3, 4, S7, and S8).

The presence of the PAM introduced two notable differences in the integration complex. First, the second transesterification reaction site (site 2) is resolved in the presence of a PAM due to stabilization of the corresponding Cas1 dimer (Figures 4A and 4B, red outline). Second, when a PAM is present, the repeat fails to extend all the way through the repeat channel (Figure 4C). In the absence of a PAM, the repeat stretches through the repeat channel to the second transesterification site (Figure 4C). When the repeat channel is fully occupied, the Cas1 dimer housing the second transesterification site appears stabilized, along with the previously unresolved Cas3b domain (Figure 4C, orange outline). On the opposite side of the complex, the previously stabilized Cas3 is averaged out, presumably due to conformational heterogeneity (Figures 4B and 4C). These structures are consistent with previous biochemical data showing that the presence of a PAM blocks the second transesterification reaction but not the first.43 The PAM does not alter initial integration complex assembly but blocks the formation of structural states required for the second transesterification reaction to occur.

Full integration distorts the repeat

In order for the second transesterification reaction to occur, the Cas1–2/3 complex must dock the first Cas1 transesterification site at the leader-repeat junction (site 1) and the second Cas1 transesterification site (site 2) at the spacer-repeat junction.19,43 To ensure complete duplication of the repeat on either side of the newly integrated foreign DNA, both the leader-repeat junction and the spacer-repeat junction must be correctly positioned in the Cas1 transesterification sites. To understand if the repeat sequence forms secondary structures that facilitate this positioning, AlphaFold3 was used to predict the structure of the repeat sequence alone. The 28 bp repeat sequence adopts linear B-form conformation that is ~7 Å too short to stretch from site 1 to site 2, suggesting that the repeat must be elongated during integration. (Figure 5A). To understand if or how the repeat is distorted, the linear B-form AlphaFold3 prediction was docked into the repeat density from the only class containing density along the full repeat channel (Figures 4C and 5A). Though the conformational heterogeneity limits resolution of the DNA and precludes construction of an atomic model of this state (Figure 5B) the shape highlights several important features of the CRISPR repeat. Leaving the first transesterification site, the repeat adopts a pseudo-B-form helical conformation for ~25 Å (~6–7 bp) (Figures 5B5D). A kink in the density marks the beginning of a distorted region in which the helical pitch of B-form DNA is bent into the narrow central region of the channel (Figures 5B5D). A second kink marks a transition back to pseudo-B-form DNA as the repeat approaches Site 2 (Figures 5C and 5D). These two kinks change the trajectory of the CRISPR repeat in the repeat channel, bending the DNA ~28° over the top of the Cas2 dimer (Figure 5B). The repeat distortion also introduces a ~3° shift toward the second transesterification site (Figure 5B). Notably, positively charged residues that line the CRISPR repeat channel are conserved in type I-F systems, highlighting the importance of residues in the CRISPR repeat channel positioned to distort the repeat (Figure S9).

Figure 5. The repeat channel distorts the CRISPR repeat.

Figure 5.

(A) An AlphaFold3 prediction of the 28 bp repeat (cyan) from the PA14 strain of P. aeruginosa was rigid body fit into the B-form section of the density map projecting from Site 1 (colored purple). The experimental density for the repeat is twisted 3.5° off-center and bent ~27.5° down into the second transesterification site (site 2) compared to the AlphaFold3 prediction. Cas1 active sites where transesterification occurs are indicated with an asterisk (*), and the density is displayed at threshold = 0.171. B-form regions of repeat density are colored purple, and distorted regions are colored pink.

(B) Local resolution estimation of the repeat-containing volume (n = 27,573 particles) displayed at threshold 0.171.

(C) The repeat is distorted in the middle of the repeat channel as it extends from the first transesterification site (site 1) to the second site (site 2). The distorted region is displayed in pink, while the B-form region of the repeat is displayed in purple. Cas1 active sites are indicated with an asterisk (*), and the density is displayed at threshold = 0.171 and colored by protein subunit.

(D) Side view of the distorted repeat (threshold = 0.171, protein subunits that obscure the repeat at this view are erased for clarity). Two kinks in the density flank the distorted region, and approximately align with conserved purine-pyrimidine steps (indicated by dashed arrows pointing to E). Cas1 active sites are indicated with an asterisk (*).

(E) A logo plot showing the conservation of the type I-F CRISPR repeat across 2,804 aligned type I-F repeat sequences (28–32 bp). Letter height corresponds to bit-score from 0 (not conserved) to 2 (conserved). Inverted repeats are indicated with gray boxes and black arrows indicate direction of inversion.

(F) There are five conserved di-nucleotide purine (Pu) to pyrimidine (Py) steps in the type I-F CRISPR repeat at positions 1–2, 5–6, 8–9, 17–18, and 20–21, indicated by black boxes.

CRISPR repeat sequences differ between CRISPR-Cas sub-types and efficient adaptation is dependent on the repeat sequence.16,62 Low resolution structural data suggest the type I-E repeat forms a single bend in the center of the CRISPR repeat binding channel.17,33,62 Swapping the type I-F CRISPR repeat for the I-E CRISPR repeat does not impact the first transesterification reaction, but reduces the efficiency of the second transesterification, despite similar sequence lengths.43 To understand if there were any conserved sequence features of the I-F repeat that might explain why the second transesterification reaction efficiency is reduced or outright abolished by swapping the repeat sequences, 2,804 type I-F repeats were aligned and plotted as a logo (Figure 5E). This analysis revealed that the 28 bp type I-F repeat is well conserved, including four purine-pyrimidine steps located in the inverted repeats (Figure 5F). The position of the DNA distortions observed in the repeat channel aligns well with these conserved purine-pyrimidine steps, which are particularly susceptible to DNA kinking (Figures 5C5F).63 Conserved purine-pyrimidine steps in the closely related type I-E system differ in both dinucleotide pairs and position within the repeat (Figure S9). Together with previous biochemical and mutagenesis studies,43 these findings reveal that the repeat sequence is subject to sub-type specific distortions that function to further enhance the specificity of the CRISPR integrase. These DNA distortions ensure the second transesterification reaction occurs only when the integrase docks onto the correct repeat.

DISCUSSION

Previous studies of CRISPR integrases have identified how genome folding is necessary for site-specific integration of foreign DNA.17,43 However, the mechanisms of integrase activation and regulation remain poorly understood. Here we determine seven structures of the type I-F CRISPR integrase that collectively explain how auto-inhibition and allosteric activation regulate the CRISPR integrase. Cas1–2/3 forms a heterohexameric complex with a positively charged face that binds short fragments of foreign DNA (Figure 1). In this DNA capture state, the HD domain of Cas3 is blocked by a loop from the RecA1 domain (Figure 2). Binding to short fragments of DNA triggers conformational rearrangements that expose additional DNA interaction sites (Figure 3). Full integration of foreign DNA via sequential Cas1-mediated transesterification reactions on either side of the complex only occurs in the absence of a PAM (Figure 4). During full integration, the CRISPR repeat channel distorts the B-form repeat to direct the repeat DNA into the second transesterification site (Figure 5). Thus, these seven volumes and two atomic models provide stepwise snapshots of CRISPR adaptation.

These snapshots identify a key structural checkpoint during CRISPR adaptation (Figures 1 and 3). Cas3 blocks Cas2 association with the folded CRISPR leader (Figure 1). After foreign DNA binding, Cas3 is displaced to expose two CRISPR leader binding sites on either side of Cas2 (Figure 3). DNA binding motifs upstream of CRISPR arrays19,64 correlate with the observed diversity of CRISPR adaptation strategies in bacteria and archaea.17,19,33,43,6466 This observation suggests that recognition of specific genome structures by the integrase is a conserved feature of CRISPR adaptation. Our study shows that foreign DNA allosterically regulates Cas1–2/3 integrase conformation and is sufficient to expose DNA binding sites necessary for homing the integrase to the CRISPR.

Previous work has shown that nuclease/helicase-competent Cas3 is involved in both naive and primed acquisition of new foreign DNA sequences into the CRISPR array.21,67,68 This study reveals a direct structural role for Cas3 in regulating new sequence acquisition in the type I-F system. Cas3 directly gates access to key DNA binding sites on the type I-F integrase (Figures 1 and 2). To our knowledge, interactions between Cas1 and Cas3 have only been tested in type I-F system. However, Cas1–2 complexes have been shown to attenuate Cas3 nuclease activity and facilitate long-range translocation, revealing that Cas1–2 integrases interact with Cas3 even in the absence of genetic fusion.38 To determine if Cas3 might regulate integrases beyond the I-F system, we used AlphaFold3 to predict structures of Cas1–2 heterohexamer from the type I-E system of E. coli in the presence of two Cas3 molecules, both with and without foreign DNA fragments (Figure S10). AlphaFold3 consistently predicted nearly identical Cas1-Cas3 interactions across five different predicted structures (Figure S10). In all cases, Cas3 obstructs the face of Cas1 that is required to interact with IHF and the CRISPR leader during integration in the type I-E system (Figure S10).17 These predictions suggest that Cas3 and Cas1 may function together to regulate adaptation in other type I CRISPR systems. Our study did not consider the effects of helicase or nuclease mutations on gating the DNA binding sites of the Cas1–2/3 integrase, nor did we attempt to biochemically characterize Cas1-Cas3 interactions outside of the type I-F system. However, together these observations guide future work aimed at understanding the functional and physical link between Cas3 and the integrase in type I CRISPR systems38

RESOURCE AVAILABILITY

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Blake Wiedenheft (bwiedenheft@gmail.com ).

Materials availability

This study did not generate new unique reagents.

Data and code availability

Electron density maps are publicly available as of the date of publication at EMDB: EMD-71091 (Cas1–2/3), EMDB: EMD-71097 (Cas1–2/3 and foreign DNA), EMDB: EMD-71112 (integration complex with PAM—CRISPR-leader associated), EMDB: EMD-71101 (integration complex with PAM—first transesterification), EMDB: EMD-71114 (integration complex without the—CRISPR leader associated), EMDB: EMD-71115 (integration complex without the PAM—first transesterification), and EMD-71116 (integration complex without the PAM— full repeat). Atomic models are publicly available as of the date of publication at the Protein DataBank under accession codes: PDB: 9P11 (Cas1–2/3) and PDB: 9P1D (Cas1–2/3 and foreign DNA). Raw movies from each data collection are available at EMPIAR: EMPIAR-12916 (Cas1–2/3), EMPIAR-12917 (Cas1–2/3 and foreign DNA), EMPIAR: EMPIAR-12928 (integration complex with PAM), and EMPIAR: EMPIAR-12922 (integration complex without the PAM). Alignments and Newick files used for conservation and phylogenetic analyses are available as supplementary data. This paper does not report original code. Any additional information required to re-process the data reported in this paper is available from the lead contact upon request.

STAR★METHODS

EXPERIMENTAL MODEL AND STUDY PARTICIPANT DETAILS

E. coli BL21(DE3) (New England Biolabs) strain was transformed with either Addgene plasmids #89240, #149384, or #149385 to express the proteins used in this study.

METHOD DETAILS

Cas1–2/3 and IHF purification

P. aeruginosa Cas1–2/3 heterohexamer complexes were purified as previously described.19 Briefly, StrepII-tagged Cas1 (Gen-Bank: WP_003139228.1) and untagged Cas2/3 (Genbank: WP_003139227.1) encoded on a single plasmid were overexpressed in E. coli BL21(DE3)(New England Biolabs) (Addgene #89240). Cell pellets were lysed via sonication in Cas1–2/3 lysis buffer (50 mM HEPES pH 7.5, 500 mM KCl, 10% Glycerol, 1 mM DTT), supplemented with 0.3x Halt Protease Inhibitor Cocktail (Thermo Fisher Scientific), at 4°C. Lysate was clarified by two rounds of centrifugation at 10,000 × g for 15 minutes, at 4°C. StrepII-tagged Cas1-Cas2/3 complexes were affinity purified on StrepTrap HP resin (GE Healthcare) and eluted with Cas1–2/3 lysis buffer containing 3 mM desthiobiotin (Sigma-Aldrich). Eluate was concentrated at 4°C (Corning Spin-X concentrators), before purification over a Superdex 200 HiLoad 16/600 size-exclusion column (Cytiva) equilibrated in 10 mM HEPES pH 7.5, 500 mM Potassium Glutamate, and 10% Glycerol.

P. aeruginosa IHF heterodimer was purified as previously described.19 Briefly, 6xHis-tagged IHFα and StrepII-tagged IHFβ were co-expressed in E. coli BL21(DE3) (Addgene #149384, #149385). Cell pellets were lysed by sonication in IHF lysis buffer (25 mM HEPES-NaOH pH 7.5, 500 mM NaCl, 10 mM Imidazole, 1 mM TCEP, 5% Glycerol), supplemented with 0.3x Halt Protease Inhibitor Cocktail (Thermo Fisher Scientific), at 4°C. Lysate was clarified by two rounds of centrifugation at 10,000 × g for 15 minutes, at 4°C. His-tagged IHF was captured on HisTrap HP resin (Cytiva), and eluted with 500 mM Imidazole. Affinity tags were cleaved using PreScission protease, and the PreScision protease and remaining 6x-His-IHFα were removed by affinity chromatography using HisTrap HP resin (Cytiva). Untagged IHF heterodimer was then further purified on Heparin Sepharose (Cytiva) and eluted with a linear gradient to a buffer containing 2 M NaCl. Fractions containing IHF heterodimer were concentrated and further purified by Size Exclusion Chromatography (SEC) on a Superdex 75 column (Cytiva) equilibrated in IHF buffer (25 mM HEPES-NaOH pH 7.5, 200 mM NaCl, 5 % Glycerol).

Cryo-EM of the Cas1–2/3 capture complex

Frozen aliquots of purified Cas1–2/3 complex (12.8 μM) were shipped on dry ice to the National Center for CryoEM Access and Training (NCCAT) and the Simons Electron Microscopy Center located at the New York Structural Biology Center (NYSBC). 3 μL of sample was spotted on glow discharged Quantifoil R0.6/1 Cu 300 mesh grids (Quantifoil Micro Tools GmbH) and blotted for 2 seconds at 10°C and 100% humidity using a Leica EM-GP2 (Leica Microsystems) prior to plunge freezing in liquid ethane. The dataset was collected on a Titan Krios (Thermo Fisher Scientific) at NYSBC operating at 300 keV with a Gatan K3 direct electron detector and a GIF-Quantum energy filter with a 20 eV slit width at a nominal magnification of 81,000x. 40 frame movies were collected at a 50 milliseconds frame rate and a dose rate of 25.26 e2/s for a total dose of 52.51 e2 with a pixel size of 1.0691 Å/px over a defocus range of −0.8 to −2.5 μm. Holes were identified and targeted and automated data collection was carried out using Leginon.70 Preliminary dataprocessing was done on the fly in cryoSPARC Live.61

Conservation analysis of Cas1 and Cas2/3

We used two complementary approaches to measure conservation of residues in Cas1 and Cas2/3 proteins. To map conserved residues in the structure, we use CasFinder to identify unique occurrences of type I-F CRISPR systems in NCBI’s database of complete bacterial and archaeal genomes.81 Complete genomes from NCBI were downloaded on July 11, 2023 and open-reading frames annotated using Prodigal.73 These open reading frames were then queried for cas genes using MacsyFinder v1.0.575 with the following parameters: “macsyfinder –sequence-db <open_reading_frames> –db-type gembase -d <CRISPR_subtype_definitions> -p <HMM_profiles> -w 50 -vv all”. HMM profiles and classification definitions (CRISPR_subtype_definitions) used in MacsyFinder are acquired from the edited version of CasFinder v3.1.081 to include definitions for the CAST systems (available at https://github.com/macsy-models/CasFinder ). Type I-F Cas2/3 sequences (any sequence annotated with “cas3f”, “cas3”, “cas3HD”, n=1922) and Cas1 sequences (n=1838) were extracted from the database of Prodigal-annotated open-reading frames and identical sequences removed with CD-hit (-c 1).73,76 The remaining 739 (Cas1) or 970 (Cas2/3) sequences were aligned in MAFFT77 (mafft –genafpair –maxiterate 1000), and poorly aligned sequences were removed using MaxAlign (flags: -v=1, -a).82 The resulting alignments of 679 (Cas1) and 745 (Cas2/3) sequences were submitted to Consurf83 with the Cas1–2/3 atomic model to calculate conservation scores. PA14 Cas1 and Cas2/3 sequences (from genome Genbank: NC_008463) were associated with the submitted atomic model. Conservation scores were mapped onto atomic models using ChimeraX (Figures S3, S6, and S9).71

To understand if the global architecture of the Cas1–2/3 complex was conserved, the Cas2/3 sequence from Pseudomonas aeruginosa strain UCBPP-PA14 (100% identical to multi-species accession Genbank: WP_003139227.1) was used to query the nr_70_Mar12 database using the ProtBLAST/PSI-BLAST tool on the MPI Bioinformatics Toolkit server (Scoring Matrix: BLOSUM62; E-value cutoff for reporting: 1e-3; E-Value cutoff for inclusion: 1e-6; Max target hits: 1000).84 The resulting alignment of 1000 sequences was downloaded and 170 sequences missing Cas2 or Cas3 domains were manually removed. MaxAlign was used to remove 145 additional poorly aligned sequences (flags: (-v=1, -a).82 Gappy columns were removed using trimAL (flags: -gt 0.7) to generate an alignment of Cas2/3 across 1071 positions.85 This alignment was used to build a phylogenetic tree using Fasttree (flags: -pseudo, -wag, -gamma), which was visualized in R using ggtree (Figure S3F).78,80

Purification of the delivery complex

Foreign DNA fragments (32–34 bp) were ordered from IDT as 100 nmole ssDNA oligonucleotides and resuspended to a working concentration of 100 μM in 1x TE buffer. To generate dsDNA foreign DNA fragments, 2 nmole of ssDNA was suspended in 0.5X hybridization buffer (20 mM Tris-HCl pH 7.5, 220 mM K-Glutamate, 5 mM EDTA, 1 mM TCEP, 2 % glycerol) and incubated at 95°C for 5 minutes, followed by 25°C for 10 minutes. Samples were then denatured at 100°C for 5 minutes, and slow cooled to 25°C at a rate of 6 degrees/5 minutes for an hour, to a final temperature of 25°C.

A 1:1 mixture of purified Cas1–2/3 and dsDNA was created by mixing 2 nmoles of purified Cas1–2/3 thawed on ice then warmed to 25°C with 2 nmoles of annealed dsDNA containing a single 3′ GG extension. This mixture was incubated at 25°C for 20 minutes, then centrifuged at 4°C, 22000 g for 20 minutes to pellet any aggregate. The supernatant was collected in a 1 mL syringe and injected onto a Superdex 200 10/300 column (Cytiva) equilibrated with 20 mM Tris-HCl pH 7.5, 200 mM monopotassium glutamate, 5 mM EDTA, 1 mM TCEP, 2 % Glycerol. Fractions (0.25 ml) were individually concentrated using 100 kDa molecular weight cuttoff concentrator at 4°C, 15000 g, to ~20 uL. All fractions were analyzed on SDS-PAGE gels (15 % resolving) to determine protein composition and 8 % Urea-PAGE gels to determine nucleic acid composition. The fifth SEC fraction contained all DNAs and proteins of interest and was further analyzed by cryo-EM.

Cryo-EM of the Cas1–2/3/foreign DNA complex

Frozen aliquots of purified Cas1–2/3 delivery complex bound to foreign DNA (45.4 μM) were shipped on dry ice to NCCAT at NYSBC. Sample was thawed and diluted 1:10 in SEC buffer to a concentration of 4.54 μM. 3 μL of diluted sample was spotted on glow discharged Quantifoil R1.2/1.3 Cu 300 mesh grids (Quantifoil Micro Tools GmbH), blotted for 4.5 seconds at force and 4°C, 100% humidity, and plunge frozen in liquid ethane using a Vitrobot (Mk. IV, Thermo Fisher Scientific). The dataset was collected on a Titan Krios (Thermo Fisher Scientific) at NYSBC operating at 300 keV with a Falcon 4EC direct electron detector at a nominal magnification of 96,000x. 40 frame movies were collected at a 150 ms frame rate and a dose rate of 8.63 e-/Å 2/s for a total dose of 51.78 e-/Å 2 with a pixel size of 0.833 Å /px over a defocus range of −0.9 to −2.8 μm. Holes were identified and targeted and automated data collection was carried out using Leginon.70 Preliminary dataprocessing was done on the fly in cryoSPARC Live.61

Purification of the integration complex

To assemble a 200 bp fragment of the CRISPR leader and first repeat and spacer, 4 nmol of 200 bp ssDNA fragments were ordered from IDT and suspended to 200 μMs in a modified hybridization buffer (20 mM HEPES pH 7.5, 200 mM K-glutamate, 2% glycerol, 1 mM TCEP). ssDNA fragments were combined in a 1:1 ratio to generate 100 μM dsDNA, heated to 100°C for 5 minutes, and then slow-cooled to 25°C over the course of an hour. Annealed oligos were stored at −80°C.

3.75 nmoles of purified IHF (at 1234 μM) was diluted to a final volume of 125 uL (30 μM) in SEC buffer (20 mM HEPS pH 7.5, 200 mM K-Glutamate, 5 mM MnCl2,7.5 mM spermidine, 2% glycerol, 1 mM TCEP) at 25°C to avoid precipitation of IHF. 1.5 nmoles of annealed CRISPR fragment was diluted to a final volume of 95 μL (15 μM) in the same SEC buffer. Both the pre-diluted IHF solution and pre-diluted CRISPR solution were briefly warmed to 25°C prior to mixing (220 μL final volume) and incubation for 25°C for 10 minutes.

1.5 nmoles of purified Cas1–2/3 was mixed with 1.875 nmoles of foreign DNA fragment and 1.4 μL of 1 M MnCl2 and 2.1 μL of 1 M spermidine in pH neutralized buffer and incubated for 10 minutes at 25°C.

The CRISPR-IHF reaction was mixed with the Cas1–2/3-foreign DNA reaction and incubated for 20 minutes at 35°C. Precipitates were removed by centrifugation at 4°C, 20,000xg, prior to size exclusion. A Superose 6 10/300 column (Cytiva) was equilibrated to SEC buffer (20 mM HEPS pH 7.5, 200 mM K-Glutamate, 5 mM MnCl2, 7.5 mM spermidine, 2% glycerol, 1 mM TCEP) at 4°C. 0.5 mL fractions were collected, analyzed on SDS-PAGE (15 %) and UREA-PAGE (8 %) gels and concentrated to 10 uL (~4 μM for PAM, ~3 μM for no PAM).

Cryo-EM of the Cas1–2/3 integration complex

Purified integration complex with either 34 bp of dsDNA that included a ‘GG’ PAM or 32 bp dsDNA fragment with no PAM was diluted to a concentration of ~1 μM in SEC buffer lacking glycerol (20 mM HEPES pH 7.5, 200 mM K-Glutamate, 5 mM MnCl2 7.5 mM spermidine, 2 % glycerol, 1 mM TCEP). 3 μl of diluted integration complex was spotted on Quantifoil Au 300 R2/1 + nm continuous C grids (Quantifoil Micro Tools GmbH) that were glow discharged for 45 seconds with a 10 second hold (easi-Glow, Pelco). The grids were then blotted for 5 seconds without incubation, with a blot force of “5”, at 100% humidity and 4°C followed by plunge freezing using a Vitrobot (Mk. IV, Thermo Fisher Scientific). The datasets for both the trimmed and PAM-containing integration complexes were collected on Montana State University’s Talos Arctica transmission electron microscope (Thermo Fisher Scientific), with a field emission gun operating at an accelerating voltage of 200 keV using parallel illumination conditions. Movies were acquired using a Gatan K3 direct electron detector, operated in electron counting mode targeting a total electron exposure of 65.67 e-/Å2 over 50 frames (5.992 second exposure, 0.12 second frame time). The SerialEM data collection software was used to collect micrographs at 36,000x nominal magnification (1.152 Å/pixel at the specimen level) over a defocus range of −0.5 μm to −2.0 μm.69 Stage movement was used to target the center of four 2.0 μm holes for focusing, and image shift was used to acquire high magnification images in the center of each of the holes. Preliminary data processing was done on the fly in cryo-SPARC Live.

Cryo-EM image processing

Movies were motion and CTF-corrected in cryoSPARC using patch motion and patch CTF correction.61 For details, see Figure S1 (capture complex), Figure S7 (delivery complex), Figure S10 (integration complex with PAM-containing foreign DNA), and Figure S11 (integration complex with foreign DNA, no PAM). In brief, initial rounds of blob picking (capture/delivery) or template picking (integration complexes) were used to generate 3D volumes from the dataset for a second round of template picking. Template-picked particles were subject to multiple rounds of multi-class ab initio reconstructions and heterogenous refinements to identify the major 3D classes of the complex. Particle stacks corresponding to individual complex were further refined through iterative multi-class ab initio reconstructions and heterogeneous refinements, local refinement, 3D classification, and 3D variability analysis in cryoSPARC.

Model building and validation

AlphaFold3 was used to generate a prelimnary model of the capture complex.72 However, AlphaFold3’s predicted positioning of Cas3 does not correlate well with cryo-EM density maps for the capture and integration complexes, so in these instances, individual protein molecules from AlphaFold were sequentially docked into the density. Protein and DNA segments were individually rigid-body fit into the EM density map using the “fit <model> inMap <map>” command in UCSF ChimeraX.71 Isolde was intitially used to morph the rigid body-fit initial model into the density map, followed by Real Space Refinement with morphing (no secondary structure restraints, ignore symmetry conflicts) in Phenix.44,46 The resulting map and model were re-opened in Isolde/ChimeraX and any gross conformational issues were corrected with the Isolde simulation running. After a second round of Real Space refinement (no morphing, no secondary structure restraints), problem areas were inspected in Coot and restrained to ideal geometry, using secondary structure and German McClure distance restraints generated in ProSMART.86 MolProbity and the PDB validation service server (https://validate-rcsb-1.wwpdb.org/ ) were used to identify problem regions subsequently corrected in Coot.87,88 Contacts and hydrogen bonds between residues were identified by ChimeraX v1.9 using the “contacts” and “hbonds” commands respectively, with default parameters.

Conservation analysis of CRISPR repeats

Complete bacterial (n=39,277) and archaeal (n=556) genomes and chromosomes were downloaded from the NCBI RefSeq Assembly database (accessed on July 11, 2023). CRISPR loci within 93,671 genomic and plasmid sequences were identified using the default parameters in CRISPRDetect v3,74 which resulted in 37,477 high-confidence CRISPR loci predictions (array quality score > 3). Type I-F and type I-E repeat sequences were extracted from the default output file of CRISPR repeat sequences and aligned in MAFFT.77 Logo plots were generated using Weblogo.79

Search for Cas4 homologues for PAM trimming

In an attempt to identify a remote Cas4 homologue responsible for PAM trimming, we used seven different Cas4 profile HMMs downloaded from CasFinder81 to query Prodigal73 translations of the PA14 genome (Genbank: GCF_045689255.1).

QUANTIFICATION AND STATISTICAL ANALYSIS

Cryo-EM data collection and refinement statistics are summarized in Table 1.

Supplementary Material

MMC1
MMC2

SUPPLEMENTAL INFORMATION

Supplemental information can be found online at https://doi.org/10.1016/j.str.2025.09.007.

KEY RESOURCES TABLE

REAGENT or RESOURCE SOURCE IDENTIFIER

Bacterial and virus strains

E. coli BL21 (DE3) New England Biolabs C2527H

Chemicals, peptides, and recombinant proteins

HEPES (Fine White Crystals/Molecular Biology) Fisher Scientific BP310–500
Potassium Chloride (99+%) Fisher Scientific AC424090010
Glycerol Fisher Scientific G33–500
DTT (dithiothreitol) ThermoFisher Scientific R0861
Halt Protease Inhibitor Cocktail (100x) ThermoFisher Scientific 78429
d-Desthiobiotin Sigma Aldrich D1411–500MG
RPI L-Glutamic Acid Postassium Salt Monohydrate Fisher Scientific 50–136-8278
Sodium Chloride (Crystalline/Certified ACS) Fisher Scientific S271–500
Imidazole, 99% Fisher Scientific AC122021000
Tris-(2-Carboxyethyl)phosphine, Hydrochloride (TCEP) ThermoFisher Scientific T2556
Ethylenediaminetetraacetic acid (EDTA) Sigma Aldrich E6758
Manganese (II) Chloride Sigma Aldrich 244589
Spermidine Sigma Aldrich 8558–1G
PreScision protease Cytiva 27084301

Deposited data

Cas1–2/3 complex This study EMPIAR: EMPIAR-12916
EMDB: EMD-71091
PDB: 9P11
Cas1–2/3 complex + foreign DNA This study EMPIAR: EMPIAR-12917
EMDB: EMD-71097
PDB: 9P1D
Cas1–2/3 complex + foreign DNA (PAM) + CRISPR leader/repeat/spacer This study EMPIAR: EMPIAR-12928
EMDB: EMD-71112
EMDB: EMD-7110
Cas1–2/3 complex + foreign DNA (No PAM) + CRISPR leader/repeat/spacer This study EMPIAR: EMPIAR-12922
EMDB: EMD-71114
EMDB: EMD-71115
EMDB: EMD-71116

Oligonucleotides

Figure 3: Foreign DNA strand 1 (PAM – 34 bp) (IDT): 5′ - AAAAACCTGGACTA CTACAACCTTCGCTTTTTGG -3′ This study N/A
Figure 3: Foreign DNA strand 2 (No PAM – 32 bp) (IDT): 5′ - TTTTTGCGAAGGT TGTAGTAGTCCAGGAAAAA – 3’ This study N/A
Figures 4 and 5: CRISPR leader-repeat-spacer strand 1 (IDT): 5’- CATCTAGATC CATGGACCCTTTTTTCGGACGATTTCT TACGCCCTTATAAATCAGCAAGTTACG AGACCTCGAAAAAAGAGGGTTTCTGG CGGGAAAAACTCGGTATTTCTTTTTCC TTCAAATGGTTATAGGTTTTCGGAGCT AGTTCACTGCCGTGTAGGCAGCTAAG AAAATCAGCCGGACGTTGTAGTAGTC GAGC-3’ This study N/A
Figures 4 and 5: CRISPR leader-repeat-spacer strand 2 (IDT): 5’- GCTCGACTA CTACAACGTCCGGCTGATTTTCTTAGC TGCCTACACGGCAGTGAACTAGCTCC GAAAACCTATAACCATTTGAAGGAAA AAGAAATACCGAGTTTTTCCCGCCA GAAACCCTCTTTTTTCGAGGTCTCG TAACTTGCTGATTTATAAGGGCGTA AGAAATCGTCCGAAAAAAGGGTCC ATGGATCTAGATG -3’ This study N/A
Figure 4: Foreign DNA strand 1 (PAM – 34 bp) (IDT): 5’- TACATGCTCTAGCA AAACGACTTGCACAACGAGG -3’ This study N/A
Figure 4: Foreign DNA strand 2 (PAM – 34 bp) (IDT): 5’- CCATTAAGTGCAAGT CGTTTTGCTAGAGCTACAT -3’ This study N/A
Figures 4 and 5: Foreign DNA strand 1 (no PAM) (IDT): 5’- TACATGCTCTAGC AAAACGACTTGCACAACGA -3’ This study N/A
Figures 4 and 5: Foreign DNA strand 2 (no PAM) (IDT): 5’- ATTAAGTGCAAGT CGTTTTGCTAGAGCTACAT -3’ This study N/A

Recombinant DNA

Cas1–2/3 expression plasmid
(StrepII-Cas1)
Addgene #89240; RRID: Addgene_89240
6xHis-tagged IHFα Addgene #149384; RRID: Addgene_149384
StrepII-tagged IHFβ Addgene #149385; RRID: Addgene_149385

Software and algorithms

SerialEM (>3.8) Mastronarde69 https://bio3d.colorado.edu/SerialEM/
Leginon (>3.4) Suloway et al.70 https://emg.nysbc.org//projects/leginon/wiki/Leginon_Homepage
cryoSPARC (>= v4) Punjani et al.61 https://cryosparc.com/
ChimeraX (>1.8) Pettersen et al.71 https://www.rbvi.ucsf.edu/chimerax/
AlphaFold3 Abramson et al.72 https://alphafoldserver.com/
Isolde (1.9) Croll44 https://tristanic.github.io/isolde/
Coot (0.9.6) Emsley et al.45 https://www2.mrc-lmb.cam.ac.uk/personal/pemsley/coot/
Phenix (1.20.1–4487) Liebschner et al.46 https://phenix-online.org/
Prodigal (2.6.3) Hyatt et al.73 https://github.com/hyattpd/Prodigal
CRISPRDetect (v3) Biswas et al.74 https://doi.org/10.1186/s12864-016-2627-0
MacsyFinder (1.0.5) Abby et al.75 https://github.com/gem-pasteur/macsyfinder
CD-hit (4.8.1) Li et al.76 https://sites.google.com/view/cd-hit/home
MAFFT (7.520) Katoh et al.77 https://mafft.cbrc.jp/alignment/server/index.html
Fasttree (2.0) Price et al.78 https://morgannprice.github.io/fasttree/
Weblogo (2.8.2) Schnedier et al.79 https://weblogo.berkeley.edu/
ggtree (3.1.4) Yu et al.80 https://github.com/YuLab-SMU/ggtree

Other

Superose 6 10/300 SEC column Cytiva 17517201
Superdex 200 HiLoad 16/600 SEC column Cytiva 28989335
StrepTrap HP affinity column (5 mL) Cytiva 28–9075-47
Heparin Sepharose 6 Fast Flow affinity resin Cytiva 17–0998-01
Quantifoil R0.6/1 Cu 300 mesh grids Quantifoil NA
Quantifoil R1.2/1.3 Cu 300 mesh grids Quantifoil NA
Quantifoil R2/1 + 2 nm continus C Quantifoil NA

Highlights.

  • A positively charged channel on the Cas1–2/3 complex captures fragments of DNA

  • A loop in the RecA1 domain controls access to the Cas3 nuclease active site

  • Foreign DNA binding allosterically regulates access to additional DNA binding sites

  • Distortion of the CRISPR repeat sequence licenses complete foreign DNA integration

ACKNOWLEDGMENTS

We thank members of the Wiedenheft lab, and particularly Nathaniel Burman, Royce Wilkinson, and Murat Buyukyoruk, for their invaluable input and discussions. Research in the Wiedenheft laboratory is supported by the National Institutes of Health (R35GM134867), the M. J. Murdock Charitable Trust, and the Montana Agricultural Experimental Station. B.W. is the endowed chair of plant science at Montana State University (RRID: SCR_000979). A.S.-F. is supported by the National Institutes of Health (K99GM147842 and R00GM147842), and by the Postdoctoral Enrichment Program Award from the Burroughs Wellcome Fund (G-1021106.01). Some of the microscopy was performed using resources provided by the National Center for CryoEM Access and Training (NCCAT) and the Simons Electron Microscopy Center located at the New York Structural Biology Center, supported by the NIH Common Fund Transformative High Resolution Cryo-Electron Microscopy program (U24 GM129539 and NIGMS R24 GM154192) and by grants from the Simons Foundation (SF349247) and NY State Assembly. Funding for the Montana State University cryo-EM Core Facility (RRID:SCR_026324) was contributed by National Science Foundation (DBI-1828765), the MJ Murdock Charitable Trust, the National Institute of General Medical Sciences (P30GM140963), and the MSU Office of Research, Economic Development and Graduate Education. Computational efforts were performed on the Tempest High Performance Computing System, operated and supported by University Information Technology Research Cyberinfrastructure (RRID:SCR_026229) at Montana State University.

Footnotes

DECLARATION OF INTERESTS

B.W. is the founder of SurGene LLC. B.W. and A.S.-F. are inventors on patent applications related to CRISPR-Cas systems and applications thereof.

REFERENCES

  • 1.Kapitonov VV, and Koonin EV (2015). Evolution of the RAG1-RAG2 locus: both proteins came from the same transposon. Biol. Direct 10, 20. 10.1186/s13062-015-0055-8 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Martin EC, Le Targa L, Tsakou-Ngouafo L, Fan T-P, Lin C-Y, Xiao J, Huang Z, Yuan S, Xu A, Su Y-H, et al. (2023). Insights into RAG Evolution from the Identification of “Missing Link” Family A RAGL Transposons. Mol. Biol. Evol. 40, msad232. 10.1093/molbev/msad232 . [DOI] [Google Scholar]
  • 3.Krupovic M, Makarova KS, Forterre P, Prangishvili D, and Koonin EV (2014). Casposons: a new superfamily of self-synthesizing DNA transposons at the origin of prokaryotic CRISPR-Cas immunity. BMC Biol. 12, 36. 10.1186/1741-7007-12-36 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Fugmann SD, Messier C, Novack LA, Cameron RA, and Rast JP (2006). An ancient evolutionary origin of the Rag1/2 gene locus. Proc. Natl. Acad. Sci. USA 103, 3728–3733. 10.1073/pnas.0509720103 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Carmona LM, and Schatz DG (2017). New insights into the evolutionary origins of the recombination-activating gene proteins and V(D)J recombination. FEBS J. 284, 1590–1605. 10.1111/febs.13990 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Béguin P, Charpin N, Koonin EV, Forterre P, and Krupovic M (2016). Casposon integration shows strong target site preference and recapitulates protospacer integration by CRISPR-Cas systems. Nucleic Acids Res. 44, 10367–10376. 10.1093/nar/gkw821 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Little AJ, Corbett E, Ortega F, and Schatz DG (2013). Cooperative recruitment of HMGB1 during V(D)J recombination through interactions with RAG1 and DNA. Nucleic Acids Res. 41, 3289–3301. 10.1093/nar/gks1461 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Liu C, Zhang Y, Liu CC, and Schatz DG (2022). Structural insights into the evolution of the RAG recombinase. Nat. Rev. Immunol. 22, 353–370. 10.1038/s41577-021-00628-6 . [DOI] [PubMed] [Google Scholar]
  • 9.Chi H, Pepper M, and Thomas PG (2024). Principles and therapeutic applications of adaptive immunity. Cell 187, 2052–2078. 10.1016/j.cell.2024.03.037 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Yosef I, Goren MG, and Qimron U (2012). Proteins and DNA elements essential for the CRISPR adaptation process in Escherichia coli. Nucleic Acids Res. 40, 5569–5576. 10.1093/nar/gks216 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lee H, and Sashital DG (2022). Creating memories: molecular mechanisms of CRISPR adaptation. Trends Biochem. Sci. 47, 464–476. 10.1016/j.tibs.2022.02.004 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Siezen RJ, Starrenburg MJC, Boekhorst J, Renckens B, Molenaar D, and van Hylckama Vlieg JET (2008). Genome-Scale Genotype-Phenotype Matching of Two Lactococcus lactis Isolates from Plants Identifies Mechanisms of Adaptation to the Plant Niche. Appl. Environ. Microbiol. 74, 424–436. 10.1128/AEM.01850-07 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Sternberg SH, Richter H, Charpentier E, and Qimron U (2016). Adaptation in CRISPR-Cas Systems. Mol. Cell 61, 797–808. 10.1016/j.molcel.2016.01.030 . [DOI] [PubMed] [Google Scholar]
  • 14.Krupovic M, Béguin P, and Koonin EV (2017). Casposons: mobile genetic elements that gave rise to the CRISPR-Cas adaptation machinery. Curr. Opin. Microbiol. 38, 36–43. 10.1016/j.mib.2017.04.004 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Koonin EV, Makarova KS, Wolf YI, and Krupovic M (2020). Evolutionary entanglement of mobile genetic elements and host defence systems: guns for hire. Nat. Rev. Genet. 21, 119–131. 10.1038/s41576-019-0172-9 . [DOI] [PubMed] [Google Scholar]
  • 16.Makarova KS, Wolf YI, Iranzo J, Shmakov SA, Alkhnbashi OS, Brouns SJJ, Charpentier E, Cheng D, Haft DH, Horvath P, et al. (2020). Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants. Nat. Rev. Microbiol. 18, 67–83. 10.1038/s41579-019-0299-x . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wright AV, Liu J-J, Knott GJ, Doxzen KW, Nogales E, and Doudna JA (2017). Structures of the CRISPR genome integration complex. Science 357, 1113–1118. 10.1126/science.aao0679 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Nuñez JK, Lee ASY, Engelman A, and Doudna JA (2015). Integrase-mediated spacer acquisition during CRISPR–Cas adaptive immunity. Nature 519, 193–198. 10.1038/nature14237 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Santiago-Frangos A, Buyukyoruk M, Wiegand T, Krishna P, and Wiedenheft B (2021). Distribution and phasing of sequence motifs that facilitate CRISPR adaptation. Curr. Biol. 31, 3515–3524.e6. 10.1016/j.cub.2021.05.068 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Fagerlund RD, Wilkinson ME, Klykov O, Barendregt A, Pearce FG, Kieper SN, Maxwell HWR, Capolupo A, Heck AJR, Krause KL, et al. (2017). Spacer capture and integration by a type I-F Cas1–Cas2–3 CRISPR adaptation complex. Proc. Natl. Acad. Sci. USA 114, E5122–E5128. 10.1073/pnas.1618421114 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wiegand T, Semenova E, Shiriaeva A, Fedorov I, Datsenko K, Severinov K, and Wiedenheft B (2020). Reproducible Antigen Recognition by the Type I-F CRISPR-Cas System. CRISPR J. 3, 378–387. 10.1089/crispr.2020.0069 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Kim S, Loeff L, Colombo S, Jergic S, Brouns SJJ, and Joo C (2020). Selective loading and processing of prespacers for precise CRISPR adaptation. Nature 579, 141–145. 10.1038/s41586-020-2018-1 . [DOI] [PubMed] [Google Scholar]
  • 23.Künne T, Kieper SN, Bannenberg JW, Vogel AIM, Miellet WR, Klein M, Depken M, Suarez-Diez M, and Brouns SJJ (2016). Cas3-Derived Target DNA Degradation Fragments Fuel Primed CRISPR Adaptation. Mol. Cell 63, 852–864. 10.1016/j.molcel.2016.07.011 . [DOI] [PubMed] [Google Scholar]
  • 24.Bernheim A, and Sorek R (2020). The pan-immune system of bacteria: antiviral defence as a community resource. Nat. Rev. Microbiol. 18, 113–119. 10.1038/s41579-019-0278-2 . [DOI] [PubMed] [Google Scholar]
  • 25.Nuñez JK, Harrington LB, Kranzusch PJ, Engelman AN, and Doudna JA (2015). Foreign DNA capture during CRISPR–Cas adaptive immunity. Nature 527, 535–538. 10.1038/nature15760 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Nuñez JK, Kranzusch PJ, Noeske J, Wright AV, Davies CW, and Doudna JA (2014). Cas1–Cas2 complex formation mediates spacer acquisition during CRISPR–Cas adaptive immunity. Nat. Struct. Mol. Biol. 21, 528–534. 10.1038/nsmb.2820 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Mojica FJM, D´ıez-Villaseñor C, García-Martínez J, and Almendros C (2009). Short motif sequences determine the targets of the prokaryotic CRISPR defence system. Microbiology 155, 733–740. 10.1099/mic.0.023960-0 . [DOI] [PubMed] [Google Scholar]
  • 28.Kieper SN, Almendros C, Behler J, McKenzie RE, Nobrega FL, Haagsma AC, Vink JNA, Hess WR, and Brouns SJJ (2018). Cas4 Facilitates PAM-Compatible Spacer Selection during CRISPR Adaptation. Cell Rep. 22, 3377–3384. 10.1016/j.celrep.2018.02.103 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lee H, Zhou Y, Taylor DW, and Sashital DG (2018). Cas4-Dependent Prespacer Processing Ensures High-Fidelity Programming of CRISPR Arrays. Mol. Cell 70, 48–59.e5. 10.1016/j.molcel.2018.03.003 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Lee H, Dhingra Y, and Sashital DG (2019). The Cas4-Cas1-Cas2 complex mediates precise prespacer processing during CRISPR adaptation. eLife 8, e44248. 10.7554/eLife.44248 . [DOI] [Google Scholar]
  • 31.Dhingra Y, Suresh SK, Juneja P, and Sashital DG (2022). PAM binding ensures orientational integration during Cas4-Cas1-Cas2-mediated CRISPR adaptation. Mol. Cell 82, 4353–4367.e6. 10.1016/j.molcel.2022.09.030 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Drabavicius G, Sinkunas T, Silanskas A, Gasiunas G, Venclovas Č, and Siksnys V (2018). DnaQ exonuclease-like domain of Cas2 promotes spacer integration in a type I-E CRISPR-Cas system. EMBO Rep. 19, e45543. 10.15252/embr.201745543 . [DOI] [Google Scholar]
  • 33.Wang JY, Tuck OT, Skopintsev P, Soczek KM, Li G, Al-Shayeb B, Zhou J, and Doudna JA (2023). Genome expansion by a CRISPR trimmer-integrase. Nature 618, 855–861. 10.1038/s41586-023-06178-2 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Jackson RN, Lavin M, Carter J, and Wiedenheft B (2014). Fitting CRISPR-associated Cas3 into the Helicase Family Tree. Curr. Opin. Struct. Biol. 24, 106–114. 10.1016/j.sbi.2014.01.001 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Huo Y, Nam KH, Ding F, Lee H, Wu L, Xiao Y, Farchione MD Jr., Zhou S, Rajashankar K, Kurinov I, et al. (2014). Structures of CRISPR Cas3 offer mechanistic insights into Cascade-activated DNA unwinding and degradation. Nat. Struct. Mol. Biol. 21, 771–777. 10.1038/nsmb.2875 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Xiao Y, Luo M, Dolan AE, Liao M, and Ke A (2018). Structure basis for RNA-guided DNA degradation by Cascade and Cas3. Science 361, eaat0839. 10.1126/science.aat0839 . [DOI] [Google Scholar]
  • 37.Kim DY, Lee SY, Ha HJ, and Park HH (2024). Structural basis of Cas3 activation in type I-C CRISPR-Cas system. Nucleic Acids Res. 52, 10563–10574. 10.1093/nar/gkae723 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Redding S, Sternberg SH, Marshall M, Gibb B, Bhat P, Guegler CK, Wiedenheft B, Doudna JA, and Greene EC (2015). Surveillance and Processing of Foreign DNA by the Escherichia coli CRISPR-Cas System. Cell 163, 854–865. 10.1016/j.cell.2015.10.003 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Mulepati S, and Bailey S (2013). In Vitro Reconstitution of an Escherichia coli RNA-guided Immune System Reveals Unidirectional, ATP-dependent Degradation of DNA Target*. J. Biol. Chem. 288, 22184–22192. 10.1074/jbc.M113.472233 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Sinkunas T, Gasiunas G, Fremaux C, Barrangou R, Horvath P, and Siksnys V (2011). Cas3 is a single-stranded DNA nuclease and ATP-dependent helicase in the CRISPR/Cas immune system. EMBO J. 30, 1335–1342. 10.1038/emboj.2011.41 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Rollins MF, Chowdhury S, Carter J, Golden SM, Wilkinson RA, Bondy-Denomy J, Lander GC, and Wiedenheft B (2017). Cas1 and the Csy complex are opposing regulators of Cas2/3 nuclease activity. Proc. Natl. Acad. Sci. USA 114, E5113–E5121. 10.1073/pnas.1616395114 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Makarova KS, Haft DH, Barrangou R, Brouns SJJ, Charpentier E, Horvath P, Moineau S, Mojica FJM, Wolf YI, Yakunin AF, et al. (2011). Evolution and classification of the CRISPR–Cas systems. Nat. Rev. Microbiol. 9, 467–477. 10.1038/nrmicro2577 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Santiago-Frangos A, Henriques WS, Wiegand T, Gauvin CC, Buyukyoruk M, Graham AB, Wilkinson RA, Triem L, Neselu K, Eng ET, et al. (2023). Structure reveals why genome folding is necessary for site-specific integration of foreign DNA into CRISPR arrays. Nat. Struct. Mol. Biol. 30, 1675–1685. 10.1038/s41594-023-01097-2 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Croll TI (2018). ISOLDE: a physically realistic environment for model building into low-resolution electron-density maps. Acta Crystallogr. D Struct. Biol. 74, 519–530. 10.1107/S2059798318002425 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Emsley P, Lohkamp B, Scott WG, and Cowtan K (2010). Features and development of Coot. Acta Crystallogr. D 66, 486–501. 10.1107/S0907444910007493 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Liebschner D, Afonine PV, Baker ML, Bunkóczi G, Chen VB, Croll TI, Hintze B, Hung L-W, Jain S, McCoy AJ, et al. (2019). Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr. D 75, 861–877. 10.1107/S2059798319011471 . [DOI] [Google Scholar]
  • 47.Tang D, Li H, Wu C, Jia T, He H, Yao S, Yu Y, and Chen Q (2021). A distinct structure of Cas1–Cas2 complex provides insights into the mechanism for the longer spacer acquisition in Pyrococcus furiosus. Int. J. Biol. Macromol. 183, 379–386. 10.1016/j.ijbiomac.2021.04.074 . [DOI] [PubMed] [Google Scholar]
  • 48.Ka D, Hong S, Jeong U, Jeong M, Suh N, Suh J-Y, and Bae E (2017). Structural and dynamic insights into the role of conformational switching in the nuclease activity of the Xanthomonas albilineans Cas2 in CRISPR-mediated adaptive immunity. Struct. Dyn. 4, 054701. 10.1063/1.4984052 . [DOI] [Google Scholar]
  • 49.Topuzlu E, and Lawrence CM (2016). Recognition of a pseudo-symmetric RNA tetranucleotide by Csx3, a new member of the CRISPR associated Rossmann fold superfamily. RNA Biol. 13, 254–257. 10.1080/15476286.2015.1130209 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Ka D, Jang DM, Han BW, and Bae E (2018). Molecular organization of the type II-A CRISPR adaptation module and its interaction with Cas9 via Csn2. Nucleic Acids Res. 46, 9805–9815. 10.1093/nar/gky702 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Wiedenheft B, Zhou K, Jinek M, Coyle SM, Ma W, and Doudna JA (2009). Structural Basis for DNase Activity of a Conserved Protein Implicated in CRISPR-Mediated Genome Defense. Structure 17, 904–912. 10.1016/j.str.2009.03.019 . [DOI] [PubMed] [Google Scholar]
  • 52.Wilkinson ME, Nakatani Y, Staals RHJ, Kieper SN, Opel-Reading HK, McKenzie RE, Fineran PC, and Krause KL (2016). Structural plasticity and in vivo activity of Cas1 from the type I-F CRISPR–Cas system. Biochem. J. 473, 1063–1072. 10.1042/BCJ20160078 . [DOI] [PubMed] [Google Scholar]
  • 53.Rollins MF, Chowdhury S, Carter J, Golden SM, Miettinen HM, Santiago-Frangos A, Faith D, Lawrence CM, Lander GC, and Wiedenheft B (2019). Structure reveals mechanism of CRISPR RNA-guided nuclease recruitment and anti-CRISPR viral mimicry. Mol. Cell 74, 132–142.e5. 10.1016/j.molcel.2019.02.001 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Wang X, Yao D, Xu J-G, Li A-R, Xu J, Fu P, Zhou Y, and Zhu Y (2016). Structural basis of Cas3 inhibition by the bacteriophage protein AcrF3. Nat. Struct. Mol. Biol. 23, 868–870. 10.1038/nsmb.3269 . [DOI] [PubMed] [Google Scholar]
  • 55.Beloglazova N, Petit P, Flick R, Brown G, Savchenko A, and Yakunin AF (2011). Structure and activity of the Cas3 HD nuclease MJ0384, an effector enzyme of the CRISPR interference. EMBO J. 30, 4616–4627. 10.1038/emboj.2011.377 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Zhang L, Wang H, Zeng J, Cao X, Gao Z, Liu Z, Li F, Wang J, Zhang Y, Yang M, and Feng Y (2024). Cas1 mediates the interference stage in a phage-encoded CRISPR–Cas system. Nat. Chem. Biol. 20, 1471–1481. 10.1038/s41589-024-01659-5 . [DOI] [PubMed] [Google Scholar]
  • 57.Hu C, Ni D, Nam KH, Majumdar S, McLean J, Stahlberg H, Terns MP, and Ke A (2022). Allosteric control of type I-A CRISPR-Cas3 complexes and establishment as effective nucleic acid detection and human genome editing tools. Mol. Cell 82, 2754–2768.e5. 10.1016/j.molcel.2022.06.007 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Gong B, Shin M, Sun J, Jung C-H, Bolt EL, van der Oost J, and Kim J-S (2014). Molecular insights into DNA interference by CRISPR-associated nuclease-helicase Cas3. Proc. Natl. Acad. Sci. USA 111, 16359–16364. 10.1073/pnas.1410806111 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Bepler T, Morin A, Rapp M, Brasch J, Shapiro L, Noble AJ, and Berger B (2019). Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs. Nat. Methods 16, 1153–1160. 10.1038/s41592-019-0575-8 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Wagner T, Merino F, Stabrin M, Moriya T, Antoni C, Apelbaum A, Hagel P, Sitsel O, Raisch T, Prumbaum D, et al. (2019). SPHIRE-crYOLO is a fast and accurate fully automated particle picker for cryo-EM. Commun. Biol. 2, 218. 10.1038/s42003-019-0437-z . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Punjani A, Rubinstein JL, Fleet DJ, and Brubaker MA (2017). cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296. 10.1038/nmeth.4169 . [DOI] [PubMed] [Google Scholar]
  • 62.Goren MG, Doron S, Globus R, Amitai G, Sorek R, and Qimron U (2016). Repeat Size Determination by Two Molecular Rulers in the Type I-E CRISPR Array. Cell Rep. 16, 2811–2818. 10.1016/j.celrep.2016.08.043 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Dickerson RE (1998). DNA bending: The prevalence of kinkiness and the virtues of normality. Nucleic Acids Res. 26, 1906–1926. 10.1093/nar/26.8.1906 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Buyukyoruk M, Krishna P, Santiago-Frangos A, and Wiedenheft B (2025). Discovery of Diverse CRISPR Leader Motifs, Putative Functions, and Applications for Enhanced CRISPR Detection and Subtype Annotation. CRISPR J. 8, 137–148. 10.1089/crispr.2024.0093 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Watts EA, Garrett SC, Catchpole RJ, Clark LM, Sanders TJ, Marshall CJ, Wenck BR, Vickerman RL, Santangelo TJ, Fuchs R, et al. (2023). Histones direct site-specific CRISPR spacer acquisition in model archaeon. Nat. Microbiol. 8, 1682–1694. 10.1038/s41564-023-01446-3 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Wang JY, Hoel CM, Al-Shayeb B, Banfield JF, Brohawn SG, and Doudna JA (2021). Structural coordination between active sites of a CRISPR reverse transcriptase-integrase complex. Nat. Commun. 12, 2571. 10.1038/s41467-021-22900-y . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Dillard KE, Brown MW, Johnson NV, Xiao Y, Dolan A, Hernandez E, Dahlhauser SD, Kim Y, Myler LR, Anslyn EV, et al. (2018). Assembly and Translocation of a CRISPR-Cas Primed Acquisition Complex. Cell 175, 934–946.e15. 10.1016/j.cell.2018.09.039 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Vorontsova D, Datsenko KA, Medvedeva S, Bondy-Denomy J, Savitskaya EE, Pougach K, Logacheva M, Wiedenheft B, Davidson AR, Severinov K, and Semenova E (2015). Foreign DNA acquisition by the I-F CRISPR–Cas system requires all components of the interference machinery. Nucleic Acids Res. 43, 10848–10860. 10.1093/nar/gkv1261 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Mastronarde DN (2005). Automated electron microscope tomography using robust prediction of specimen movements. J. Struct. Biol. 152, 36–51. 10.1016/j.jsb.2005.07.007 . [DOI] [PubMed] [Google Scholar]
  • 70.Suloway C, Pulokas J, Fellmann D, Cheng A, Guerra F, Quispe J, Stagg S, Potter CS, and Carragher B (2005). Automated molecular microscopy: The new Leginon system. J. Struct. Biol. 151, 41–60. 10.1016/j.jsb.2005.03.010 . [DOI] [PubMed] [Google Scholar]
  • 71.Pettersen EF, Goddard TD, Huang CC, Meng EC, Couch GS, Croll TI, Morris JH, and Ferrin TE (2021). UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82. 10.1002/pro.3943 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, Ronneberger O, Willmore L, Ballard AJ, Bambrick J, et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500. 10.1038/s41586-024-07487-w . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, and Hauser LJ (2010). Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinf. 11, 119. 10.1186/1471-2105-11-119 . [DOI] [Google Scholar]
  • 74.Biswas A, Staals RHJ, Morales SE, Fineran PC, and Brown CM (2016). CRISPRDetect: A flexible algorithm to define CRISPR arrays. BMC Genom. 17, 356. 10.1186/s12864-016-2627-0 . [DOI] [Google Scholar]
  • 75.Abby SS, Néron B, Ménager H, Touchon M, and Rocha EPC (2014). MacSyFinder: A Program to Mine Genomes for Molecular Systems with an Application to CRISPR-Cas Systems. PLoS One 9, e110726. 10.1371/journal.pone.0110726 . [DOI] [Google Scholar]
  • 76.Li W, and Godzik A (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659. 10.1093/bioinformatics/btl158 . [DOI] [PubMed] [Google Scholar]
  • 77.Katoh K, and Standley DM (2013). MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol. Biol. Evol. 30, 772–780. 10.1093/molbev/mst010 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Price MN, Dehal PS, and Arkin AP (2010). FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One 5, e9490. 10.1371/journal.pone.0009490 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Schneider TD, and Stephens RM (1990). Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100. 10.1093/nar/18.20.6097 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Yu G, Smith DK, Zhu H, Guan Y, and Lam TT-Y (2017). ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36. 10.1111/2041-210X.12628 . [DOI] [Google Scholar]
  • 81.Couvin D, Bernheim A, Toffano-Nioche C, Touchon M, Michalik J, Néron B, Rocha EPC, Vergnaud G, Gautheret D, and Pourcel C (2018). CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins. Nucleic Acids Res. 46, W246–W251. 10.1093/nar/gky425 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Gouveia-Oliveira R, Sackett PW, and Pedersen AG (2007). MaxAlign: maximizing usable data in an alignment. BMC Bioinf. 8, 312. 10.1186/1471-2105-8-312 . [DOI] [Google Scholar]
  • 83.Ashkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, and Ben-Tal N (2016). ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 44, W344–W350. 10.1093/nar/gkw408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Alva V, Nam S-Z, Söding J, and Lupas AN (2016). The MPI bioinformatics Toolkit as an integrative platform for advanced protein sequence and structure analysis. Nucleic Acids Res. 44, W410–W415. 10.1093/nar/gkw348 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Capella-Gutiérrez S, Silla-Martínez JM, and Gabaldó n T (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973. 10.1093/bioinformatics/btp348 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Nicholls RA, Fischer M, McNicholas S, and Murshudov GN (2014). Conformation-independent structural comparison of macromolecules with ProSMART. Acta Crystallogr. D 70, 2487–2499. 10.1107/S1399004714016241 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Emsley P (2017). Tools for ligand validation in Coot. Acta Crystallogr. D Struct. Biol. 73, 203–210. 10.1107/S2059798317003382 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Williams CJ, Headd JJ, Moriarty NW, Prisant MG, Videau LL, Deis LN, Verma V, Keedy DA, Hintze BJ, Chen VB, et al. (2018). MolProbity: More and better reference data for improved all-atom structure validation. Protein Sci. 27, 293–315. 10.1002/pro.3330 . [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

MMC1
MMC2

Data Availability Statement

Electron density maps are publicly available as of the date of publication at EMDB: EMD-71091 (Cas1–2/3), EMDB: EMD-71097 (Cas1–2/3 and foreign DNA), EMDB: EMD-71112 (integration complex with PAM—CRISPR-leader associated), EMDB: EMD-71101 (integration complex with PAM—first transesterification), EMDB: EMD-71114 (integration complex without the—CRISPR leader associated), EMDB: EMD-71115 (integration complex without the PAM—first transesterification), and EMD-71116 (integration complex without the PAM— full repeat). Atomic models are publicly available as of the date of publication at the Protein DataBank under accession codes: PDB: 9P11 (Cas1–2/3) and PDB: 9P1D (Cas1–2/3 and foreign DNA). Raw movies from each data collection are available at EMPIAR: EMPIAR-12916 (Cas1–2/3), EMPIAR-12917 (Cas1–2/3 and foreign DNA), EMPIAR: EMPIAR-12928 (integration complex with PAM), and EMPIAR: EMPIAR-12922 (integration complex without the PAM). Alignments and Newick files used for conservation and phylogenetic analyses are available as supplementary data. This paper does not report original code. Any additional information required to re-process the data reported in this paper is available from the lead contact upon request.

RESOURCES