ABSTRACT
Site-specific recombinases (integrases) can mediate the horizontal transfer of genomic islands. The ability to integrate large DNA sequences into target sites is very important for genetic engineering in prokaryotic and eukaryotic cells. Here, we characterized an unprecedented catalogue of 530 tyrosine-type integrases by examining genes potentially encoding tyrosine integrases in bacterial genomic islands. The phylogeny of putative tyrosine integrases revealed that these integrases form an evolutionary clade that is distinct from those already known and are affiliated with novel integrase groups. We systematically searched for candidate integrase genes, and their integration activities were validated in a bacterial model. We verified the integration functions of six representative novel integrases by using a two-plasmid integration system consisting of a donor plasmid carrying the integrase gene and attP site and a recipient plasmid harboring an attB site in recA-deficient Escherichia coli. Further quantitative reverse transcription-PCR (qRT-PCR) assays validated that the six selected integrases can be expressed with their native promoters in E. coli. The attP region reductions showed that the extent of attP sites of integrases is approximately 200 bp for integration capacity. In addition, mutational analysis showed that the conserved tyrosine at the C terminus is essential for catalysis, confirming that these candidate proteins belong to the tyrosine-type recombinase superfamily, i.e., tyrosine integrases. This study revealed that the novel integrases from bacterial genomic islands have site-specific recombination functions, which is of physiological significance for their genomic islands in bacterial chromosomes. More importantly, our discovery expands the toolbox for genetic engineering, especially for efficient integration activity.
IMPORTANCE Site-specific recombinases or integrases have high specificity for DNA large fragment integration, which is urgently needed for gene editing. However, known integrases are not sufficient for meeting multiple integrations. In this work, we discovered an array of integrases through bioinformatics analysis in bacterial genomes. Phylogeny and functional assays revealed that these new integrases belong to tyrosine-type integrases and have the ability to conduct site-specific recombination. Moreover, attP region extent and catalysis site analysis were characterized. Our study provides the methodology for discovery of novel integrases and increases the capacity of weapon pool for genetic engineering in bacteria.
KEYWORDS: integrase, recombinase, site-specific recombination
INTRODUCTION
Integrases are site-specific recombinases that catalyze recombination between specific attachment (att) sites, independent of external energy and the gain or loss of nucleotides (1–3). Integrases are often present in genomic islands (GIs) or bacterial chromosomes (4–6). They can prompt the integration and excision of GIs into and from bacterial chromosomes, leading to horizontal gene transfer (5, 7). Integrases also play roles in plasmid copy number control and chromosome segregation in the context of cell division (2). Most integrases belong to one of two recombinase superfamilies, referred to as tyrosine and serine integrases (or recombinases), respectively, depending on whether catalysis is mediated by a tyrosine or a serine residue (8–10). Serine recombinases encompass small serine recombinases, resolvase, serine transposases, and large serine recombinases (11). Serine recombinases cleave DNA strands, producing double-stranded breaks, which is distinct from tyrosine recombinases (11).
Tyrosine recombinases are divided into two classes, complex unidirectional and simple bidirectional integrases (1). Complex unidirectional integrases depend on accessory proteins to conduct integration or excision reactions; simple bidirectional integrases do not require accessory proteins to carry out the recombination (2, 12, 13). Accessory proteins should be precisely called accessory DNA bending proteins, also termed recombination directionality factors (RDFs), the most common of which include integration host factor (IHF), Xis, and Fis (3, 13). IHF and Fis are also host-related factors, locating in the chromosomes of host strains; Xis lies on the lambda prophages or GIs. Lambda integrase requires accessory protein Xis and host factor IHF to perform site-specific recombination. In contrast, Cre and Flp require only their integrase itself to conduct integration or excision reactions. In addition, many tyrosine integrases need RDFs (Vis, AlpA, and Hef) to conduct site-specific recombination (14–17).
Tyrosine integrases generally have seven canonical conserved amino acid residues thought to play a catalytic role in recombination reactions: Arg (RI), Glu/Asp (E/DII), Lys (KIII), His (HIV), Arg (RV), His (HVI), and Tyr (YVII) (18). Tyrosine is the most important characteristic residue; it can directly attack the phosphodiester bonds of target DNA (att sites), forming a covalent 3′ phosphotyrosine high-energy intermediate (3, 8). In general, the catalytic region of tyrosine integrases is located in the C-terminal domain, while the binding region is located in the N-terminal domain. The N-terminal domains are frequently variable, while C-terminal domains are relatively conserved among different tyrosine integrase families. The seven typical conserved amino acid residues are all in the C-terminal domains (3, 8, 18, 19).
Tyrosine integrases have been used for genetic engineering in prokaryotic and eukaryotic cells. Lambda, Flp, XerCD, and Cre are commonly applied for gene integration and knockout in bacteria (3, 8, 20, 21). In addition, Cre has been used for constructing knock-in rat lines (22) and mouse models of colorectal cancer (23) and for neuron-specific genome modification in the adult rat brain and regulation of gene expression in neurons (24) in concert with CRISPR-Cas9. Tyrosine integrases generally do not generate DNA double-stranded breaks or activate host repair systems, depending on themselves (Cre, Flp) or requiring accessory proteins (lambda family) to perform integration reactions. Hence, tyrosine integrases do not give rise to more host chromosomal mutations (off-target effect) than the CRISPR-Cas9 system and may be relatively safer genetic engineering tools for therapeutic applications (2).
GIs are widespread in the chromosomes of bacteria (4). Due to their mobilization capacities, mediated by integrases through site-specific recombination, GIs are also considered important mobile genetic elements (MGEs) (25, 26). They also serve as a reservoir of integrase resources. With the accumulation of sequenced bacterial genomes in the NCBI database, many potential integrase resources residing in GIs await discovery and will likely be uncovered by bioinformatics analyses. In the present study, we examined new tyrosine-type integrases through comparison of a known tyrosine integrase with complete bacterial genome sequences. We identified three new groups of tyrosine-type integrases from diverse GIs and characterized their site-specific recombination functions by construction of a two-plasmid integration system in recA-deficient Escherichia coli. This study showed that these tyrosine integrases mediate the integration reactions between att sites through site-specific recombination, independent of other GI-related factors. These novel tyrosine integrases potentially expand the toolbox for gene editing.
RESULTS
Identification of novel integrase groups.
Our previous study revealed that the integrase from Shigella flexneri 51575/2a mediates the movement of GIsul2 and cr2-sul2 unit (ISCR2 element) (27). The integrase belongs to a putative tyrosine site-specific recombinase and is located in GIs (28, 29). To further discover more integrase homologues or new integrases, we used the integrase protein (GenBank accession no. AAP17879 [30]) as a reference sequence to compare the NCBI complete bacterial genome database. Through bioinformatics analysis, we identified 530 protein sequences that shared more than 30% and less than 80% amino acid identity with the reference integrase and less than 80% identity with one another. Based on the levels of identity with the reported integrase and their locations in chromosomes, the 530 protein sequences were divided into three groups (see Table S1 in the supplemental material): group A, with 45.0 to 74.9% identity; group B, with 38.6 to 40.2% identity; and group C, with 30.0 to 37.5% identity. The group A integrase coding regions were downstream of guaA encoding GMP synthase, group B integrase coding regions were downstream of mnmE encoding GTPase, and group C integrase coding regions were downstream of tRNA genes. These putative integrases were commonly distributed in Gammaproteobacteria, including important clinical pathogens such as Salmonella enterica, Klebsiella pneumoniae, Pseudomonas aeruginosa, Escherichia coli, and Acinetobacter baumannii.
To determine the phylogenetic relations between these putative and known tyrosine integrases (collected from reference 1), we created a phylogenetic tree of known tyrosine integrase sequences, with 40 selected putative integrase sequences located in GIs from groups A to C (Fig. S1). The evolutionary tree showed that all putative integrases were distinct and clustered separately from known tyrosine integrases and formed a highly supported monophyletic clade (99% bootstrap confidence). The lambda family was relatively close to these potential integrases. Therefore, another evolutionary tree was established using the lambda family integrases and the 40 putative integrases (Fig. 1). The putative tyrosine-type integrases displayed 100% bootstrap confidence with the lambda integrases in this cladogram. Based on their evolutionary relationship with known integrases, these putative proteins appear to represent novel families of integrases. Moreover, they were classified into three groups (A to C) located in different clusters (Fig. 1), consistent with the classification based on the identity and chromosomal locations (Table S1). The bootstrap confidences were also high for the three clusters (>94%) (Fig. 1).
FIG 1.
Phylogenetic tree of lambda integrases and new tyrosine integrase families. The tree was constructed based on the neighbor-joining method. Arabic numbers on each node indicate the bootstrap probabilities or percentages after 1,000 replicates. Only percentages above 70% are shown. The strain names and corresponding accession numbers are shown on the branches. Small solid green circles indicate representative integrases. The GenBank accession numbers of the six analyzed integrases from group A (S. enterica serovar Djakarta S-1087), group B (P. aeruginosa PA1088 and K. pneumoniae Kp52.145), group C (A. pasteurianus NBRC 101655, S. enterica Enteritidis CFSAN033543, and Y. pseudotuberculosis YPIII) were APY53871, AOX30331, CDO16820, BAU38274, ARQ55935, and AJJ59886, respectively.
The putative integrases showed site-specific recombination activity in recA-deficient E. coli.
We constructed a two-plasmid integration system to determine the functions of the new integrases. The donor plasmids contained an integrase gene and its attP region, i.e., a hybrid product of attL and attR sites derived from GIs through site-specific recombination. The attP regions also include native promoter sequences of integrases existing in GIs. We selected six putative integrases from groups A to C (Fig. 1). These integrases were located at 5′ ends of different GIs, showing consistency with the direction of GIs in chromosomes (Fig. S2). We synthesized the six new integrase genes and their attP fragments (472 to 656 bp). Six attP regions and their locations associated with integrases are displayed in Fig. S3A to C. The backbone of the donor plasmid was derived from pKD46 (ampicillin resistance marker), a temperature-sensitive plasmid that can be eliminated by increasing the temperature to 37°C (31). The recipient plasmids, bearing a kanamycin resistance marker, were composed of pKF18k-2 (TaKaRa) and the relevant attB regions. The attB sites were derived from an uninterrupted region of the 3′ end of guaA, mnmE, and tRNA genes for groups A to C, respectively. The corresponding six attB regions (624 to 811 bp) were synthesized and ligated into pKF18k-2. Six attB regions and their locations are displayed in Fig. S3A to C.
The six donor plasmids were designated pKDAIntSe (Salmonella enterica), pKDBIntPa (Pseudomonas aeruginosa), pKDBIntKp (Klebsiella pneumoniae), pKDCIntAp (Acetobacter pasteurianus), pKDCIntSe (Salmonella enterica), and pKDCIntYp (Yersinia pseudotuberculosis) (Table 1). The respective recipient plasmids were named pKFAattBSe, pKFBattBPa, pKFBattBKp, pKFCattBAp, pKFCattBSe, and pKFCattBYp (Table 1). The integration experiments were conducted using a two-plasmid system (pKDInt and pKFattB) in a recA-deficient E. coli strain (see Materials and Methods for details). Then, six randomly selected integration plasmids for each assay were extracted and used as the templates for the restriction enzyme and PCR analyses. We analyzed the integration products using XbaI, which cuts only the recipient plasmid pKFattB. Restriction analysis of every representative integration product with restriction endonuclease XbaI revealed strong bands of 7.6 to 7.8 kb on agarose gels, corresponding to the expected sum of donor and recipient plasmids (Table 1; Fig. 2a). Furthermore, we used junction primer pairs (one in the donor plasmid and the other in the recipient plasmid) pKD-F/M13-47 and pKF-F/Int-R to detect the site-specific recombination reactions (Fig. 2c) and obtained target bands of 728 to 870 bp and 641 to 876 bp, respectively (Fig. 2b; Table 1). The sequencing of PCR products also validated the formation of junction structures. Therefore, the putative integrases could mediate their respective integration reactions.
TABLE 1.
Sizes of donor plasmids, recipient plasmids, and cointegrates and their PCR detections
| Plasmid type or cointegratea | Name | Size (bp) |
|---|---|---|
| Donor plasmids | pKDAIntSe | 4,931 |
| pKDBIntPa | 5,119 | |
| pKDBIntKp | 5,053 | |
| pKDCIntAp | 4,846 | |
| pKDCIntSe | 4,979 | |
| pKDCIntYp | 4,915 | |
| Recipient plasmids | pKFAattBSe | 2,843 |
| pKFBattBPa | 2,650 | |
| pKFBattBKp | 2,681 | |
| pKFCattBAp | 2,734 | |
| pKFCattBSe | 2,662 | |
| pKFCattBYp | 2,725 | |
| Cointegrates | pKFDASe | 7,774 |
| pKFDBPa | 7,769 | |
| pKFDBKp | 7,734 | |
| pKFDCAp | 7,580 | |
| pKFDCSe | 7,641 | |
| pKFDCYp | 7,640 | |
| PCR assays | pKD-F/M13-47 + pKF-F/Int-Rb | 870/715 |
| 741/641 | ||
| 728/846 | ||
| 739/812 | ||
| 730/741 | ||
| 736/876 |
Cointegrates, the integration product through site-specific recombination of donor plasmids with recipient plasmids. PCR assays, detection of the junction regions (attL/attR) formed by the recombination of attP with attB sites.
Includes the corresponding primers of Int-R for the 6 different integrase genes.
FIG 2.
Site-specific recombination functions for the new tyrosine integrase families. (a and b) Agarose gels show XbaI restriction analysis (a) and PCR verification (b) of cointegrate formation mediated by six tyrosine integrases from groups A to C. For XbaI digestion, D indicates the donor plasmid (pKDCIntYp) without XbaI site, and R indicates the recipient plasmid (pKFCattBYp) containing the XbaI site in panel a. The recipient plasmid was cut by XbaI, obtaining the size of 2,725 bp (Table 1). Bands at 2.7 to 2.8 kb correspond to the original recipient plasmid copies. For every PCR assay, the left and right lanes are the PCR products amplified by pKD-F/M13-47 and pKF-F/Int-R, respectively. Lane CK, control without templates. Lane M, Trans15K DNA marker (a) and M5 DL2000 plus DNA marker (b). (c) Simplified schematic representations of the two-plasmid integration experiments. Donor plasmids (pKDInt) carry integrase genes and their corresponding attP regions. Recipient plasmids (pKFattB) carry their respective attB regions. Through site-specific recombination, cointegrates (pKFD) form. The left junction (attL) and right junction (attR) were formed. PCR primers for detection of the two junctions were pKD-F/M13-47 and pKF-F/Int-R, respectively.
Group B IntPa (B-IntPa) and IntKp (B-IntKp) share 44% identity and have similar conserved domains. The core attP and attB sites of B-IntPa showed 41% identity with those of B-IntKp, respectively. The core attP site of B-IntKp also showed 41% identity with the attB site of B-IntPa. To examine whether B-IntKp can mediate integration of its donor plasmid (pKDBIntKp) into the recipient plasmid (pKFBattBPa) of B-IntPa, we performed a crossover integration reaction. After the integration experiments, we extracted recombinant plasmids from six randomly selected colonies. Restriction analysis with XbaI yielded a strong single band of 7.7 kb on agarose gels (Fig. 3a), indicative of the integration of pKDBIntKp into pKFBattBPa. Using the same primer pairs as the pKFDBKp cointegrate, bands of 706 and 837 bp were obtained (Fig. 3b), indicative of the crossover integration. Sequencing of the newly formed attL and attR regions indicated that a 4-bp ATCG is the core exchange site between the attP and attB sites of B-IntKp and B-IntPa, respectively (Fig. 3c).
FIG 3.
The donor plasmid pKDBIntKp can integrate into the recipient plasmid pKFBattBPa mediated by the integrase from K. pneumoniae 52.145. (a and b) Agarose gels show XbaI restriction analysis (a) and PCR detection (b) of the cointegrate formations. D, donor plasmid pKDBIntKp without XbaI site. R, recipient plasmid pKFBattBPa containing XbaI site. The recipient was digested using XbaI, obtaining the size of 2,650 bp (Table 1). Lanes CK, control without template but containing the corresponding primers. Bands at 2.7 kb correspond to original or residual recipient plasmid copies. Lanes M, Trans15K DNA marker (a) and M5 DL15000 DNA marker (b). (c) The alignment maps show the core att site between attP of B-IntKp and attB of B-IntPa. Black highlighting indicates the identical nucleotides for three att sites. Light blue highlighting indicates the identical nucleotides between two att sites.
Integration frequencies of putative integrases and their expression levels.
To determine the efficacies of putative integrases with their cognate att sites under the control of respective native promoters, we examined the integration frequencies for the above seven integration events. The integration or recombination frequency is the rate of cointegrate formation between donor and recipient plasmids, which depends on the catalytic capacity of the integrases and their expression levels. The integration frequencies of six putative integrases were as follows (ordered from high to low): B-IntKp > B-IntPa > C-IntAp > A-IntSe > C-IntYp > C-IntSe (Table 2). The frequency data ranges were consistent with the recombination frequencies of pathogenicity island integrases from E. coli 536 with a two-plasmid system (32).
TABLE 2.
New tyrosine integrase-mediated integration frequencya
| Expt | Integration frequency for indicated tyrosine integrase |
||||||
|---|---|---|---|---|---|---|---|
| A-IntSe | B-IntPa | B-IntKp | C-IntAp | C-IntSe | C-IntYp | IntKpPa | |
| Integration | 7.6 × 10−2 ±4.0 × 10−3 | 2.9 × 10−1 ±3.2 × 10−2 | 4.2 × 10−1 ±3.9 × 10−2 | 1.4 × 10−1 ±7.4 × 10−2 | 5.5 × 10−5 ±2.3 × 10−5 | 5.5 × 10−3 ±8.2 × 10−4 | 4.1 × 10−1 ±3.7 × 10−2 |
| Control | 6.6 × 10−5 ±9.8 × 10−6 | 0 | 0 | 9.0 × 10−8 ±1.8 × 10−8 | 5.7 × 10−6 ±5.2 × 10−6 | 5.4 × 10−6 ±4.0 × 10−6 | NA |
These integration experiments were performed three times independently, and the values shown are means ± SDs. NA, not applied. “Control” represents the negative control, containing only the donor plasmids without recipient plasmids.
To investigate the possible nonspecific recombination between attP sites and chromosome of the host strain (E. coli Top10), we conducted the negative-control experiments using only donor plasmids. The frequency of colonies grown at 37°C ranged from 0 to (6.6 ± 0.98) × 10−5 (Table 2). We selected three colonies (1 for C-Ap, 2 for C-Se) to perform the whole-genome sequencing. For C-Ap, its donor plasmid, pKDCIntAp, existed as an independent scaffold with 100% coverage and identity; for C-Se, pKDCIntSe also existed as an independent scaffold with 100% coverage and identity (Table S2). The genomic sequence analysis indicated that donor plasmids of C-Ap and C-Se did not insert into the chromosome of E. coli Top10. The existence of donor plasmids may be due to the leak replication of the temperature-sensitive replicon carried by donor plasmids during 37°C or 42°C growth (33, 34). The differences in colony-forming ability in the negative-control experiment are probably related to the respective donor plasmids containing different integrases. Our results showed that B-IntKp had the highest efficiency in terms of mediating site-specific integration in the recA-deficient E. coli strain. Interestingly, the integration frequency of B-IntKp with attB of B-IntPa was (0.41 ± 3.7) × 10−2 (Table 2), almost identical to that of B-IntKp [(0.42 ± 3.9) × 10−2] with its cognate attB. This indicates that B-IntKp can mediate the integration reaction despite substantial sequence differences between att sites. In contrast, the integration frequency of C-IntSe was the lowest (approximately 104-fold lower than that of B-IntKp), suggestive of the low integration capacity for the integrase.
We used the putative native promoters of the integrases from their corresponding genomic islands but not known promoters to express integrases in E. coli. This is because native promoters may be more appropriate for integrase expression and also can determine whether these integrases are active in GIs. To detect the mRNA expression levels of the six integrases with distinct promoters during integration, we performed qRT-PCR determination experiments. The six integrases showed different expression levels, as follows, from high to low in order: B-IntPa > B-IntKp > C-IntYp > A-IntSe > C-IntAp > C-IntSe (Fig. 4). This is likely attributable to the different promoter regions of these integrases. The results also further validated that integrases play roles in integration. To observe the integration frequencies under the same expression levels, we normalized the six frequencies by division of each relative expression level. This showed that C-IntAp has the highest relative integration frequency among the six integrases and that the frequency of C-IntSe is still the lowest one (Fig. S4).
FIG 4.

mRNA expression levels for selected integrases. The expression levels of six selected integrases were determined by qRT-PCR analysis. With B-IntKp as the control, the fold changes in expression levels are shown at the top of the error bars. All data were determined in triplicate, and the results are means ± SDs.
The length of attP sites for integration function.
The donor plasmids described above carried the relevant attP regions along with the integrase genes. The attP regions were formed by recombination of attL and attR sites from GIs. We synthesized six putative attP regions ranging from 472 to 656 bp in size. The required attP site was generally composed of two motifs with partial inverted repeat symmetry and a central crossover sequence (2). Considering the large attP regions, we reduced the attP regions in donor plasmids. Three donor plasmids, pKDAIntSe, pKDBIntKp, and pKDCIntAp, were used as the templates to amplify the reduced attP regions with corresponding primers. As the 3′ end of the attP region overlapped the upstream promoter area of the integrase gene coding region (Fig. S3; Fig. 5a), we performed sequence reduction in the 5′ end of the attP regions. Primer pair F1/R was used for obtaining the 253/385-bp attP regions (A-IntSe/B-IntKp); F2/R was used for obtaining the 303/436/260-bp attP regions (A-IntSe/B-IntKp/C-IntAp) (Fig. 5a).
FIG 5.
Extent of attP sites for integration. (a) Schematic of shortened attP regions; “core” indicates the core att site or crossover sequence, and “int CDS” indicates the coding sequence of the int gene. The thick black line indicates the pKD46 backbone sequence. F1/R and F2/R were used to amplify reduced attP regions: attPSe1087-F1/pKD-R for 253 bp of attPSe1087, attPKp-F1/pKD-R for 385 bp of attPKp, attPSe1087-F2/pKD-R for 303 bp of attPSe1087, attPKp-F2/pKD-R for 436 bp of attPKp, and attPAp-F2/pKD-R for 260 bp of attPAp. “deleted” indicates that the regions were deleted by the above-mentioned primers. (b) XbaI digestion of cointegrate (7,562 bp) formed through recombination of pKDAIntSe harboring reduced attP (200 bp) and pKFAattBSe. Lane D, donor plasmid pKDAIntSe with a shortened attP region, not containing XbaI site. Lane R, recipient plasmid pKFAattBSe (2.8 kb; Table 1) with XbaI site, which was cut by XbaI enzyme. (c) XbaI digestion of cointegrate (7,514 bp) formed by recombination of pKDBIntKp harboring reduced attP (200 bp) and pKFBattBKp. Bands at 2.7 kb correspond to original or residual recipient plasmid copies. Lane D, donor plasmid pKDBIntKp with a shortened attP region, not containing XbaI site. Lane R, recipient plasmid pKFBattBKp (2.7 kb; Table 1) with XbaI site, which was cut by XbaI enzyme. Lane M, Trans15K DNA marker.
Using the shortened attP donor plasmids and respective recipient plasmids to perform the integration experiments, the attP reduction (F1/R) did not result in integration reactions. However, extending the attP reduction (F2/R) resulted in integration reactions for pKDAIntSe (Fig. 5b) and pKDBIntKp (Fig. 5c) but not for pKDCIntAp. The integration frequencies for the three truncated attP sites were (7.2 ± 3.3) × 10−2, (3.2 ± 0.48) × 10−1 and (1.1 ± 0.82) × 10−7, respectively (Table 3). The attP reductions for A-IntSe and B-IntKp did not show significant changes for integration frequency compared to their respective primitive attP regions. But the frequency of C-IntAp almost decreased to the negative-control level (Tables 2 and 3). On the other hand, the expression levels of A-IntSe and B-IntKp for their attP reductions exhibited significant changes (Fig. 6b and d). Interestingly, the C-IntAp expression for its shortened attP did not display distinct changes from that of the primitive attP region (Fig. 6e and f). This suggested that attP of C-IntAp requires more than 260 bp for integration. For A-IntSe and B-IntKp, the 303-bp and 436-bp attP regions are sufficient for integrations with their cognate attB, respectively. Considering the relative frequencies of three integrases for their reduced attP regions, A-IntSe and C-IntAp displayed consistency in their respective primitive frequencies (Fig. S5; Tables 2 and 3). The relative frequency of B-IntKp1 was higher than its native frequency (Fig. S5; Tables 2 and 3).
TABLE 3.
Integration frequency for integrases with shortened attP sitesa
| Integrase | Integration frequency |
|---|---|
| A-IntSe1 | 7.2 × 10−2 ± 3.3 × 10−2 |
| A-IntSe2 | 0 |
| B-IntKp1 | 3.2 × 10−1 ± 4.8 × 10−2 |
| B-IntKp2 | 0 |
| C-IntAp1 | 1.1 × 10−7 ± 8.2 × 10−8 |
Integration experiments were performed three times independently, and the values shown are means ± SDs. A-IntSe1, B-IntKp1, and C-IntAp1 represent the deletion at the 5′ end of their respective attP regions. A-lntSe2 and B-IntKp2 represent the deletion at 3′ end of their respective attP regions based on A-lntSe1 and B-IntKp1. The deletion details are shown in Fig. S3 and Fig. 6a, c, and e.
FIG 6.
Effect of reduced attP regions on the mRNA expression levels of integrases. (a, c, and e) attP region structure of A-IntSe, B-IntKp, and C-IntAp, respectively. A-, B-, and C-Int1 represent the deletion of the 5′ part of the corresponding attP regions. A- and B-Int2 represent the deletion of the 3′ part of the corresponding attP regions on the basis of A- and B-Int1. The black line indicates the attP region of the donor plasmid, and the dashed line indicates the deletion area. The green line indicates the core site of attP regions, the gray line represents the IHF binding site, and the blue line represents the predicted promoter structure. The numbers below the lines indicate the sequence locations, and Δ1 and Δ2 indicate the first and second deletion regions, respectively. The inverted triangle represents the initiation codon of integrases. (b, d, and f) mRNA expression levels of A-IntSe, B-IntKp, and C-IntAp integrases, respectively. All data were determined in triplicate, and the results are displayed as means ± SDs. A statistical analysis was performed using a one-way ANOVA with GraphPad for the data. ns, not significant; **, P < 0.01; ***, P < 0.001; ****, P < 0.0001.
To further reduce the 3′ end of attP regions, we predicted the promoter sequences of integrases by using the SoftBerry BPROM program (35). The nonpromoter sequences at the 3′ end of attP of A-IntSe and B-IntKp were removed (Fig. 6a and c; Fig. S3). Unexpectedly, the integration frequencies for A-IntSe2 and B-IntKp2 were both zero, and their expression levels significantly decreased (Table 3; Fig. 6b and d). This suggested that the deleted sequences located at the 3′ end of attP regions may be involved in the required att sites or potential promoter regions outside of the prediction.
Analysis of accessory proteins correlated with integrase-mediated recombination.
To find related accessory proteins, we compared relevant small proteins in terms of annotation information of GIs with Xis and known RDFs. There were no potential accessory proteins found in the selected GIs. On the other hand, we searched for IHF and Fis homologues in native host genomes of integrases and found the putative host-encoded factors (Tables S3 and S4). The IHF and Fis from K. pneumoniae, S. enterica, and Y. pseudotuberculosis have more than 90% amino acid identity with the respective counterparts from E. coli, suggesting that the three native hosts contain host-related factors similar to IHF and Fis of E. coli. The potential IHFα and IHFβ of P. aeruginosa share 74% and 85% identity, respectively, with that of E. coli and the potential IHFα and IHFβ of A. pasteurianus display 47% and 36% identity, respectively, with that of E. coli, suggesting that the two native hosts may contain IHFs distinct from those of E. coli. But both P. aeruginosa and A. pasteurianus do not carry relevant Fis homologues. Our analysis showed that these native hosts of integrases carry IHF or IHF homologues of E. coli but not Xis or RDFs, suggesting that Xis or RDFs are likely not required for the recombination.
To determine whether IHF is required for the recombination reaction, we searched for potential IHF binding sites with a 13-bp consensus sequence (5′-WATCAANNNNTTR-3′) by using the online FIMO program (36, 37). Putative IHF motifs were predicted in attP regions of A-IntSe, B-IntKp, C-IntSe, and C-IntYp with a P value of less than 0.0001, except in B-IntPa and C-IntAp (Fig. S3). This is in line with the presence/absence of IHF in corresponding native host species of integrases, suggesting that A-IntSe, B-IntKp, C-IntSe, and C-IntYp require IHF but that B-IntPa and C-IntAp may not. In contrast, only one putative IHF binding motif was predicted in the attB region of IntYp (Fig. S3), showing that IHFs mainly bind attP and not attB sites (38).
Validation of conserved residues and catalytic active sites of the new integrases.
Because lambda integrase is one of the best-studied recombinases, we used it as the reference when performing conserved residue analysis. We aligned six putative integrases with lambda tyrosine integrase (GenBank accession no. NP_040609.1) (Fig. S6; Table 4). The results showed that the three groups of integrases all contained seven canonical conserved active-site residues: R, E, K, H, R, H, and Y. The second conserved amino acid of the putative integrases was E rather than D in lambda integrase, while the other conserved residues were the same as those of lambda integrase. These observations indicate that group A to C integrases belong to the typical tyrosine integrase superfamilies. In addition, the positions of conserved residues are closer in integrases from the same group but farther apart in integrases from different groups (Table 4), suggesting the evolutionary conservation of integrases in the same group.
TABLE 4.
Analysis of the consensus conserved residues of lambda and putative tyrosine integrases
| Tyrosine-type integrase | Conserved residueb |
||||||
|---|---|---|---|---|---|---|---|
| RIa | E/dII | KIII | HIV | RV | H/wVI | YVII | |
| Lambda Int | R212 | D215 | K239 | H308 | R311 | H333 | Y342 |
| A-IntSe | R238 | E241 | K265 | H335 | R338 | H361 | Y370 |
| B-IntPa | R238 | E241 | K265 | H330 | R333 | H356 | Y365 |
| B-IntKp | R238 | E241 | K265 | H328 | R331 | H354 | Y363 |
| C-IntAp | R240 | E243 | K265 | H327 | R330 | H354 | Y364 |
| C-IntSe | R242 | E245 | K281 | H349 | R352 | H376 | Y386 |
| C-IntYp | R241 | E244 | K268 | H330 | R333 | H356 | Y366 |
The consensus sequence for the conserved residues of tyrosine-type integrases.
Bold amino acid residues indicate the loss of integrations.
To identify the tyrosine site for the six integrases, we mutated the tyrosine to alanine, i.e., Y370A of A-IntSe, Y365A of B-IntPa, Y363A of B-IntKp, Y364A of C-IntAp, Y386A of C-IntSe, and Y366A of C-IntYp. As mentioned above, we randomly selected six colonies and performed XbaI digestion analysis for every integration reaction, and the tyrosine donor mutants could not integrate into their respective recipient plasmids. The phylogeny, conserved active-site analysis, and tyrosine mutagenesis revealed that these tyrosine sites are active and necessary for integration. Using InterPro and PROSITE protein domain analysis (39, 40), we also identified three common amino acid residues in the putative integrases of all three groups: RI, HIV, and RV. The three conserved residues also correspond to catalytic signature motifs (18). In consideration of the high degree of conservation of active residues among the six integrases, B-IntKp served as a representative and was used as a template for mutagenesis. R238, H328, and R331 residues of B-IntKp were mutated to alanine residues (R238A, H328A, and R331A, respectively). XbaI restriction analysis showed that R331A of B-IntKp led to a loss of integration ability, similar to the Y363A mutation, indicating that R331 is also essential for the selected integrase activity. However, R238A and H328A mutations of B-IntKp still can result in the occurrence of the recombinant plasmids (Fig. S7), which is not essential for integration. We further conducted the frequency calculations for the four B-IntKp mutants. The integration frequencies of the Y363A and R331A mutants were zero, and the R238A and H328A mutants showed decreased frequencies compared to the wild type (Table 5), which once again validated that the Y363 and R331 sites are vital but that the R238 and H328 sites may not be required for the integration reaction.
TABLE 5.
Integration frequency for mutant integrases of B-IntKpa
| Integrase | Integration frequency |
|---|---|
| Wild type | 4.2 × 10−1 ± 3.9 × 10−2 |
| Y363A mutant | 0 |
| R238A mutant | 5.7 × 10−5 ± 1.4 × 10−5 |
| R331A mutant | 0 |
| H328A mutant | 7.1 × 10−6 ± 2.1 × 10−6 |
The integration experiments were performed three times independently, and the values shown are means ± SDs. Y363A, R238A, R331A, and H328A are the mutants of B-IntKp integrase.
DISCUSSION
The present study showed that B-IntKp may have a higher rate of integration with the attB site of B-IntPa than with B-IntPa itself at its attB site. This suggested that the B-IntKp integrase cannot only cross-integrate into attB of B-IntPa but can also better integrate into attB of B-IntPa than B-IntPa itself. Further investigations are required to characterize the relationships between integrases and their cognate att sites. In the same group of integrases, identification of the optimal matches for integrases and their cognate att sites will be important for elucidation of the underlying mechanisms and for practical applications. This cross talk may have played an important role in genome evolution and horizontal gene transfer (41). We also used E. coli as a host to determine the functionality of integrases, although they were derived from different host species. Therefore, the low integration frequency of integrases may have been due to incompatibility in E. coli as a host, and the native hosts of integrases may be crucial for their integration capacity.
We detected the mRNA expression of selected integrases with their native promoters in E. coli. The mRNA levels of integrases were inconsistent with the integration frequency. For example, the mRNA level of B-IntPa with its cognate promoter increased significantly by 7.6-fold compared with that of B-IntKp, but the integration frequency of B-IntPa was lower than that of B-IntKp [(0.29 ± 3.2) × 10−2 versus (0.42 ± 3.9) × 10−2]; the mRNA levels of C-IntAp and C-IntSe were almost identical (0.059 versus 0.051), but their integration frequency was significantly distinct, (0.14 ± 7.4) × 10−2 versus (0.00006 ± 2.3) × 10−5, suggesting that the mRNA expression is not the only determinant for the recombination ability. This can be explained by the capacity of integrase itself to catalyze recombination. Different integrases have disparate recombination capacities due to the discrepant protein structures. This experiment with mRNA levels also validated that the selected integrases can be expressed in the heterogenous model host, E. coli.
The length of the attP sites for A-IntSe and B-IntKp was approximately 200 bp, consistent with the common sizes generally reported to date (range, 30 to 200 bp); for C-IntAp, the attP may require a length of more than 200 bp. This sufficient length could enhance the sequence specificity in a given genome for recombination reactions (2). The long attP regions are important for tyrosine recombinases, especially for lambda family integrases. The attP length for lambda integrase is 250 bp (1980 represents Hsu 1980 (42) and Mizuuchi 1980 (43)), and the minimal attP sequence for another lambda phage (Φ24B) integrase is between 350 and 427 bp (44). These long attP sites are essential for binding their cognate integrases or accessory proteins (6, 45, 46). The attPλ consists of a core binding site (COC′) and two integrase binding sites (P and P′) (47). Xis, the key determinant of directionality, is essential for the excisive reaction of the λ pathway (48, 49). Fis, a host-encoded accessory protein, can stimulate excisive recombination in the λ pathway (50). The two factors bind the P arm region of attPλ. IHF, another host factor, is essential for integrative and excisive recombination (51, 52). IHF binds both the P and P′ arm regions of attPλ. These small accessory proteins (RDFs) play architectural roles in the recombination reactions (1).
In our study, Xis and other RDFs were not discovered in selected GIs. Fis was not present in any of the selected host chromosomes. With respect to the above-mentioned factors, IHF was thought to be an indispensable host-encoded accessory protein for lambda integrase-mediated recombination. Using the FIMO program, we found the presence of IHF binding sites in attP regions of integrases, except in B-IntPa and C-IntAp. We speculated that the two integrases may not need IHF for recombination. Due to the existence of IHF homologues in their native host strains, it is also possible that other types of IHF or similar functional small proteins may be associated with recombination. For lambda phage Φ24B, its integrase does not require IHF to drive the integration (44). The pleolipovirus SNJ2 integrase requires two small accessory proteins, Orf2 and Orf3, but not IHF for integration (1). PAI-II536 recombination does not require IHF, but PAI-III536 excision and integration can be enhanced and inhibited, respectively, by IHF (32).
For attP of B-IntKp, deletion of its sequence at the 5′ end significantly decreased the expression of the integrase (Fig. 6c and d). The IHF binding site existed at this location (Fig. S3B). This suggested that IHF can enhance the expression of B-IntKp, conforming to the previous report (3). However, the integration frequency for attP reduction at its IHF binding site did not show significant decreases compared with the primitive attP region (Tables 2 and 3). Interestingly, the deletion of the sequence of the 3′ end of this attP also led to the remarkable decrease in B-IntKp expression and the resultant loss of integration (Fig. 6c and d; Table 3). This deletion region did not contain an IHF binding site, suggesting that this region may be correlated with potential promoters or other factors. For attP of A-IntSe, removing the 5′-end sequence boosted the expression of A-IntSe (Fig. 6a and b), implying that this region may participate in the mediation of integration. These studies and our results indicate that IHF may be also not essential for some tyrosine-type integrases (such as A-IntSe and B-IntKp) besides the Cre, Flp, and SSV-type integrases. Other non-IHF small proteins also play similar roles in recombination. Future research should focus on the discovery of other types of IHF or similar functional small proteins and the multifunctionality of IHF.
The utility of integrases is limited by their restricted ability to bind to predefined target DNA (i.e., att sites). Two recent studies used prime editing to efficiently introduce the relevant att sequences into the target sites (53, 54). Then, in combination with Bxb1 integrase, the system was shown to be able to integrate large DNA fragments into human cells. This methodology will facilitate the application of integrases in gene editing. Bxb1 belongs to the serine integrase family. Owing to the double-stranded break ability of serine integrases, there may be some limitations (off-target effect) for genome editing in the future, especially in animal or human cells. Our study considerably increases the number of tyrosine integrases available for gene editing. By using these diverse integrases, it will be possible to integrate multiple DNA fragments into different target sites within a genome. Although this is a preliminary study, it is important due to the discovery of a large number of integrases. With the rapid increase in the number of sequenced genomes, bioinformatics strategies such as those applied herein will facilitate the discovery of novel integrases, including serine and tyrosine integrases, to meet the requirements for gene editing in the future.
MATERIALS AND METHODS
Bioinformatics analysis and phylogenetic tree construction.
To identify new tyrosine integrases, we used the reported tyrosine integrase (GenBank accession no. AAP17879) from Shigella flexneri 2a 2457T GIguaA as the query sequence against a collection of complete bacterial genome sequences from the NCBI database as of December 2019. The analysis was performed using TBLASTN, and cutoff values for identity and coverage were set at 30% and 90%, respectively. The obtained protein sequences were further filtered, and redundant sequences with 80% identity to each other were removed using the cd-hit-est program (55). Finally, nonredundant protein sequences were identified as putative tyrosine integrases.
Putative tyrosine integrase sequences were extracted from the relevant genomes. Known tyrosine integrase sequences, including those for FLP, Cre, phiCh1, BJ1, IntG, and the unidentified phages Int, Shufflon recombinase, DAI, Lambda, SSV1, pTN3, pNOB8, SNJ2, IntC, IntI, XerA, and XerC/D families, were obtained from a study by Wang et al. (1). These known amino acid sequences were downloaded from GenBank. The corresponding accession numbers are shown in Fig. S1 in the supplemental material and in Fig. 1. The protein sequences were aligned using CLUSTALW with the default parameters. Then, an evolutionary tree was constructed using the neighbor-joining method in MEGA6 with the default parameters, except that the partial deletion parameter was set to 80% and 1,000 bootstrap replicates.
Construction of a two-plasmid integration system.
The donor plasmids were composed of the pKD46 backbone, attP region, and integrase gene and its upstream sequence. The pKD46 backbone was amplified by F1/R1 using pKD46 plasmid DNA as the template. The partial attP region was amplified by F2/R2 with synthesized DNA (Sangon Biotech) as the template. The integrase gene, along with its upstream sequence (including the partial attP region) was amplified by F3/R3 with synthesized DNA as the template. The corresponding three pairs of primers for construction of the six donor plasmids are listed in Table 6. The three PCR products for all constructs were assembled using a NEBuilder hi-fi DNA assembly cloning kit and yielded the donor plasmids pKDAIntSe, pKDBIntPa, pKDBIntKp, pKDCIntAp, pKDCIntSe, and pKDCIntYp.
TABLE 6.
Primers used for constructing donor plasmids
| Primer | Sequence (5′–3′) |
|---|---|
| Kp-F1 | CGTTGTTATTATTGCAAATATAGTTGCCATGGGTATGGACAGTTTT |
| Kp-R1 | GCAATATAGATTAAATATCTAAAATAGAAGCTATTTAAAAAATGCACCGGGG |
| Kp-F2 | CCCCGGTGCATTTTTTAAATAGCTTCTATTTTAGATATTTAATCTATATTGC |
| Kp-R2 | CTTATTTACCGATACAGAAGCTGGAAAATCTAGCTAACTC |
| Kp-F3 | GAGTTAGCTAGATTTTCCAGCTTCTGTATCGGTAAATAAG |
| Kp-R3 | AAAACTGTCCATACCCATGGCAACTATATTTGCAATAATAACAACG |
| Yp-F1 | CGATATGCGGTACTTACAACCCATGGGTATGGACAGTTTTC |
| Yp-R1 | GTTCCATCAATCGGTTAGAGTATTTAAAAAATGCACCGGGG |
| Yp-F2 | CCCCGGTGCATTTTTTAAATACTCTAACCGATTGATGGAAC |
| Yp-R2 | GAAATGGTACGCCCTACAGGATTCAAATTAACACAATAAATGATTG |
| Yp-F3 | CAATCATTTATTGTGTTAATTTGAATCCTGTAGGGCGTACCATTT |
| Yp-R3 | GAAAACTGTCCATACCCATGGGTTGTAAGTACCGCATATCG |
| Se1087-F1 | GCGCTTTTCTGCACAATAACCCATGGGTATGGACAGTTTT |
| Se1087-R1 | CCTTCAGCGTGGTTAAACTCTATTTAAAAAATGCACCGGGG |
| Se1087-F2 | CCCCGGTGCATTTTTTAAATAGAGTTTAACCACGCTGAAGG |
| Se1087-R2 | TTCCCACTCAATGGTTTAGGAAC |
| Se1087-F3 | CTAAACCATTGAGTGGGAATGATTTACAGC |
| Se1087-R3 | AAAACTGTCCATACCCATGGGTTATTGTGCAGAAAAGCGC |
| Pa-F1 | GTGATGCCGGGGTGAGATGCCATGGGTATGGACAGTTTT |
| Pa-R1 | CGACGATGATTATCAGATACAGATATCGCCAGCGTCGCAC |
| Pa-F2 | ACATCACTTCCCGATGCAGAAGCGGCATTTATCCCGCATGTC |
| Pa-R2 | AAAACTGTCCATACCCATGGCATCTCACCCCGGCATCAC |
| Pa-F3 | GTGCGACGCTGGCGATATCTGTATCTGATAATCATCGTCG |
| Pa-R3 | AAATGCCGCTTCTGCATCGGGAAGTGATGTGGGGTCCATG |
| Se-F1 | CAGCAAGCTGCTTCGTTAACCATGGGTATGGACAGTTTT |
| Se-R1 | CGTTAACGTCAATGTAATGAACTATTTAAAAAATGCACCGGGG |
| Se-F2 | CCCCGGTGCATTTTTTAAATAGTTCATTACATTGACGTTAACG |
| Se-R2 | TGGCTCCTCTGACTGGACTCGAACTTCATCTGTAGCT |
| Se-F3 | AGCTACAGATGAAGTTCGAGTCCAGTCAGAGGAGCCA |
| Se-R3 | AAAACTGTCCATACCCATGGTTAACGAAGCAGCTTGCTG |
| Ap-F1 | TGATGCCAATCAGTGCGATCCATGGGTATGGACAGTTTTC |
| Ap-R1 | CTGTTCGAAAACCTTAATGTTAGTATTTAAAAAATGCACCGGGG |
| Ap-F2 | CCCCGGTGCATTTTTTAAATACTAACATTAAGGTTTTCGAACAG |
| Ap-R2 | CGTAGGGAGATGGAGGTCTGGGCG |
| Ap-F3 | CGCCCAGACCTCCATCTCCCTACG |
| Ap-R3 | GAAAACTGTCCATACCCATGGATCGCACTGATTGGCATCA |
The recipient plasmids were constructed from the pKF18k-2 and attB regions. The pKF18k-2 plasmid was digested with XbaI and HindIII. The attB region for each integrase was derived from the relevant strain without the GI. These strains were provisionally designated blank strains and included E. coli DH5α, Pseudomonas aeruginosa VA-134, Klebsiella pneumoniae KPNIH30, Acetobacter pasteurianus 386B, Salmonella enterica subsp. enterica serovar Blegdam S-1824, and Yersinia aleksiciae 159. These attB regions were synthesized (Sangon Biotech) as the templates for XbaI and HindIII double enzyme digestion, as described above. The two fragments were ligated using NEB T4 DNA ligase, yielding recipient plasmids pKFAattBSe, pKFBattBPa, pKFBattBKp, pKFCattBAp, pKFCattBSe, and pKFCattBYp.
Integration experiments and integration frequency.
The donor and recipient plasmids were cotransformed into the recA-deficient E. coli Top10 strain and cultured on LB agar plates containing ampicillin (100 μg/mL) and kanamycin (50 μg/mL) at 30°C; the colonies were then screened. The clones were cultured in liquid LB containing ampicillin (100 μg/mL) and kanamycin (50 μg/mL) at 30°C for 24 h. The cultures were then diluted and incubated on agar LB plates containing ampicillin (100 μg/mL) and kanamycin (50 μg/mL) at 37°C to identify colonies bearing recombinant plasmids. We randomly selected six colonies for culture at 37°C and extracted the plasmid DNA as a template for restriction analysis.
For all strains bearing the donor and recipient plasmids, three individual replicates were performed from the beginning of the integration experiments in liquid LB at 30°C for 24 h. Then, four agar plates for each dilution gradient of one replicate were cultured at 37°C and 30°C. Ampicillin (100 μg/mL) and kanamycin (50 μg/mL) were present in the medium for selection throughout the integration experiments. The frequency was calculated as the number of CFU at 37°C divided by the number of CFU at 30°C after the integration experiments.
Determination of nonspecific recombination of integrases.
E. coli Top10 harboring only donor plasmid was cultured at 30°C for 24 h and then diluted and incubated at 37°C. Three colonies (1 for C-Ap and 2 for C-Se) from the negative-control experiments were selected. The genomic DNA was extracted from C-AP1, C-Se2, and C-Se3 colonies by using the TIANamp bacterial DNA kit (Tiangen, China). The three samples were sequenced using the Illumina HiSeq 2000 platform. Sequencing reads were de novo assembled by using SPAdes software (56).
Detection of the mRNA levels of integrases.
The mRNA expression levels of six selected integrases were examined by qRT-PCR. Briefly, bacterial cultures in liquid LB were incubated at 30°C to an optical density at 600 nm (OD600) of 1.0. Cultures were harvested by centrifugation. Total RNAs were extracted using a FastPure cell/tissue total RNA isolation kit V2 (Nanjing Vazyme Biotech Co., Ltd., Nanjing, China). Reverse transcription with 1 μg of RNA as the template in a 20-μL reaction volume was performed using HiScript II Q RT supermix (Nanjing Vazyme Biotech Co., Ltd.) according to the supplied instructions. The amplification was performed using a real-time PCR instrument with ChamQ universal SYBR qPCR master mix (Nanjing Vazyme Biotech Co., Ltd.). Relative abundances were determined using the relative standard curve 2−ΔΔCT method with purA as the reference gene for normalization of the total RNA levels.
Statistical analyses were completed with GraphPad Prism 8 software. Relevant values are displayed as the mean ± standard deviation (SD) of results from independent experiments with three replicates. To obtain all pairwise comparisons, a one- or two-way analysis of variance (ANOVA) with a posttest was used for determining significance (ns, not significant; **, P < 0.01; ***, P < 0.001; ****, P < 0.0001).
Shortening of attP regions.
For attP reduction in pKDAIntSe, primers attPSe1087-F1/R and attPSe1087-F2/pKD-R yielded attP of 100 and 200 bp, respectively. For attP reduction in pKDBIntKp, primers attPKp-F1/pKD-R and attPKp-F2/pKD-R yielded attP of 100 and 200 bp, respectively. The attP reduction at its 5′ end was performed using primers attPAp-P1/P2 with pKDCIntAp as the template. The primers used are listed in Table 7. The attP reduction donor plasmids were used for integration experiments with the respective recipient plasmids pKFAattBSe, pKFBattBKp, and pKFCattBAp. The integrations were verified by enzyme restriction analyses and PCR determinations.
TABLE 7.
Primers used for shortening attP regions and constructing mutant donor plasmids
| Primer | Sequence (5′–3′) |
|---|---|
| Shortening attP regions | |
| attPSe1087-F1 | TGTCATAAAATTCCGTGATA |
| attPSe1087-F2 | CAAATTTCATCATTTTGACGGT |
| attPKp-F1 | AAGGCAATGTATCGCTGATA |
| attPKp-F2 | AAGCAGTATCCTGCTTTTGTG |
| pKD-R | TATTTAAAAAATGCACCGGGGC |
| attPAp-F2 | AACTTATGGGGGTATATTAGG |
| Mutations of integrases | |
| mKp-F | TTCGCGGCGTTGCCAACCGTGCGGAGTACCTGAATCA |
| mKp-R | CGCACGGTTGGCAACGCCGCGAACGCCGCCAATCTTAT |
| mYp-F | TGGCACAGCCAACCATGCGCAGTATCTGGAGCAAC |
| mYp-R | CGCATGGTTGGCTGTGCCACGAATGGTATTCT |
| mSe1087-F | CGTGCGGTGGCCAACAAAGCGGAGTATGCCCGGCA |
| mSe1087-R | CTTTGTTGGCCACCGCACGCACGCCCTTCTGCT |
| mPa-F | CCGGTTGGCGACGCCACGCACGCCGCCGATG |
| mPa-R | GGCGTCGCCAACCGGGCGGAGTATGCCGAGCAGC |
| mAp-F | AGGCCGCCGCTAACCGGGCTGAGTATCGGGAAA |
| mAp-R | GCCCGGTTGCAGGCGGCCTCAACCTGGTTGCGT |
| mKpHis328-F | TGTTATTGCTGACTTCCGGCGTACTGCCAGCACAT |
| mKpHis328-R | CCGGAAGTCAGCAATAACAAAATCCCTGACGTTC |
| mKpArg331-F | TGACTTCGCGCGTACTGCCAGCACATTGCTACAT |
| mKpArg331-R | GGCAGTACGCGCGAAGTCATGAATAACAAAATCC |
| mKpArg238-F | TGCATGGTGGCTAAATCGGAGATGATTGAAGCGA |
| mKpArg238-R | CGATTTAGCCACCATGCAGATGATCAGTAAGTGCA |
IHF binding site prediction for att regions.
FIMO online version 5.4.1 (https://meme-suite.org/meme/tools/fimo) was used for finding individual motif occurrences. The input motif was the consensus sequence (5′-WATCAANNNNTTR-3′) of IHF binding sites. The input sequences were different attP or attB regions. The FIMO program then scanned the input sequences for matches to motif occurrences with a P value of less than 0.0001.
Mutagenesis of integrases.
The mutant donor plasmids were constructed using a Mut Express II fast mutagenesis kit (Vazyme Biotech Co., Ltd.). Briefly, with wild-type donor plasmid pKDAIntKp as the template, a mutant donor plasmid carrying the Y363A mutation was generated by amplification using the mKp-F/R primer pair. Mutant donor plasmids at other positions were also obtained by applying the same strategy. The relevant primers are listed in Table 7.
Data availability.
The three assembled scaffold sequences from next-generation sequencing have been deposited in the National Microbiology Data Center (NMDC; https://nmdc.cn/en) under BioProject no. NMDC10018286, including genome accession no. NMDC60046094 (C-Ap1), NMDC60046095 (C-Se2), and NMDC60046096 (C-Se3).
ACKNOWLEDGMENTS
This work was supported by the National Natural Science Foundation of China (grant no. 32070075) and the National Key Research and Development Program of China (grant no. 2022YFC2303900).
We declare that there are no conflicts of interest.
Footnotes
Supplemental material is available online only.
Contributor Information
Jie Feng, Email: fengj@im.ac.cn.
Isaac Cann, University of Illinois Urbana-Champaign.
REFERENCES
- 1.Wang J, Liu Y, Liu Y, Du K, Xu S, Wang Y, Krupovic M, Chen X. 2018. A novel family of tyrosine integrases encoded by the temperate pleolipovirus SNJ2. Nucleic Acids Res 46:2521–2536. 10.1093/nar/gky005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Meinke G, Bohm A, Hauber J, Pisabarro MT, Buchholz F. 2016. Cre recombinase and other tyrosine recombinases. Chem Rev 116:12785–12820. 10.1021/acs.chemrev.6b00077. [DOI] [PubMed] [Google Scholar]
- 3.Landy A. 2015. The λ integrase site-specific recombination pathway. Microbiol Spectr 3:MDNA3-0051-2014. 10.1128/microbiolspec.MDNA3-0051-2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Juhas M, van der Meer JR, Gaillard M, Harding RM, Hood DW, Crook DW. 2009. Genomic islands: tools of bacterial horizontal gene transfer and evolution. FEMS Microbiol Rev 33:376–393. 10.1111/j.1574-6976.2008.00136.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Antonenka U, Nölting C, Heesemann J, Rakin A. 2005. Horizontal transfer of Yersinia high-pathogenicity island by the conjugative RP4 attB target-presenting shuttle plasmid. Mol Microbiol 57:727–734. 10.1111/j.1365-2958.2005.04722.x. [DOI] [PubMed] [Google Scholar]
- 6.Groth AC, Calos MP. 2004. Phage integrases: biology and applications. J Mol Biol 335:667–678. 10.1016/j.jmb.2003.09.082. [DOI] [PubMed] [Google Scholar]
- 7.Burrus V, Pavlovic G, Decaris B, Guédon G. 2002. Conjugative transposons: the tip of the iceberg. Mol Microbiol 46:601–610. 10.1046/j.1365-2958.2002.03191.x. [DOI] [PubMed] [Google Scholar]
- 8.Grindley ND, Whiteson KL, Rice PA. 2006. Mechanisms of site-specific recombination. Annu Rev Biochem 75:567–605. 10.1146/annurev.biochem.73.011303.073908. [DOI] [PubMed] [Google Scholar]
- 9.Nunes-Düby SE, Kwon HJ, Tirumalai RS, Ellenberger T, Landy A. 1998. Similarities and differences among 105 members of the Int family of site-specific recombinases. Nucleic Acids Res 26:391–406. 10.1093/nar/26.2.391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Smith MC, Thorpe HM. 2002. Diversity in the serine recombinases. Mol Microbiol 44:299–307. 10.1046/j.1365-2958.2002.02891.x. [DOI] [PubMed] [Google Scholar]
- 11.Stark WM. 2014. The serine recombinases. Microbiol Spectr 2:MDNA3-0046-2014. 10.1128/microbiolspec.MDNA3-0046-2014. [DOI] [PubMed] [Google Scholar]
- 12.Lewis JA, Hatfull GF. 2001. Control of directionality in integrase-mediated recombination: examination of recombination directionality factors (RDFs) including Xis and Cox proteins. Nucleic Acids Res 29:2205–2216. 10.1093/nar/29.11.2205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Jayaram M, Ma CH, Kachroo AH, Rowley PA, Guga P, Fan HF, Voziyanov Y. 2015. An overview of tyrosine site-specific recombination: from an flp perspective. Microbiol Spectr 3:MDNA3-0021-2014. 10.1128/microbiolspec.MDNA3-0021-2014. [DOI] [PubMed] [Google Scholar]
- 14.Polo S, Sturniolo T, Dehó G, Ghisotti D. 1996. Identification of a phage-coded DNA-binding protein that regulates transcription from late promoters in bacteriophage P4. J Mol Biol 257:745–755. 10.1006/jmbi.1996.0199. [DOI] [PubMed] [Google Scholar]
- 15.Calì S, Spoldi E, Piazzolla D, Dodd IB, Forti F, Dehò G, Ghisotti D. 2004. Bacteriophage P4 Vis protein is needed for prophage excision. Virology 322:82–92. 10.1016/j.virol.2004.01.016. [DOI] [PubMed] [Google Scholar]
- 16.Kirby JE, Trempy JE, Gottesman S. 1994. Excision of a P4-like cryptic prophage leads to Alp protease expression in Escherichia coli. J Bacteriol 176:2068–2081. 10.1128/jb.176.7.2068-2081.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lesic B, Bach S, Ghigo JM, Dobrindt U, Hacker J, Carniel E. 2004. Excision of the high-pathogenicity island of Yersinia pseudotuberculosis requires the combined actions of its cognate integrase and Hef, a new recombination directionality factor. Mol Microbiol 52:1337–1348. 10.1111/j.1365-2958.2004.04073.x. [DOI] [PubMed] [Google Scholar]
- 18.Gibb B, Gupta K, Ghosh K, Sharp R, Chen J, Van Duyne GD. 2010. Requirements for catalysis in the Cre recombinase active site. Nucleic Acids Res 38:5817–5832. 10.1093/nar/gkq384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Krogh BO, Shuman S. 2000. Catalytic mechanism of DNA topoisomerase IB. Mol Cell 5:1035–1041. 10.1016/s1097-2765(00)80268-3. [DOI] [PubMed] [Google Scholar]
- 20.Lin DL, Traglia GM, Baker R, Sherratt DJ, Ramirez MS, Tolmasky ME. 2020. Functional analysis of the Acinetobacter baumannii XerC and XerD site-specific recombinases: potential role in dissemination of resistance genes. Antibiotics (Basel) 9:405. 10.3390/antibiotics9070405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Hirano N, Muroi T, Takahashi H, Haruki M. 2011. Site-specific recombinases as tools for heterologous gene integration. Appl Microbiol Biotechnol 92:227–239. 10.1007/s00253-011-3519-5. [DOI] [PubMed] [Google Scholar]
- 22.Ma Y, Zhang L, Huang X. 2017. Building Cre knockin rat lines using CRISPR/Cas9. Methods Mol Biol 1642:37–52. 10.1007/978-1-4939-7169-5_3. [DOI] [PubMed] [Google Scholar]
- 23.Roper J, Tammela T, Akkad A, Almeqdadi M, Santos SB, Jacks T, Yilmaz ÖH. 2018. Colonoscopy-based colorectal cancer modeling in mice with CRISPR-Cas9 genome editing and organoid transplantation. Nat Protoc 13:217–234. 10.1038/nprot.2017.136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Carullo NVN, Hinds JE, Revanna JS, Tuscher JJ, Bauman AJ, Day JJ. 2021. A Cre-dependent CRISPR/dCas9 system for gene expression regulation in neurons. eNeuro 8:ENEURO.0188-21.2021. 10.1523/ENEURO.0188-21.2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Farrugia DN, Elbourne LD, Mabbutt BC, Paulsen IT. 2015. A novel family of integrases associated with prophages and genomic islands integrated within the tRNA-dihydrouridine synthase A (dusA) gene. Nucleic Acids Res 43:4547–4557. 10.1093/nar/gkv337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Rocha EPC, Bikard D. 2022. Microbial defenses against mobile genetic elements and viruses: who defends whom from what? PLoS Biol 20:e3001514. 10.1371/journal.pbio.3001514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zhang G, Cui Q, Li J, Guo R, Leclercq SO, Du L, Tang N, Song Y, Wang C, Zhao F, Feng J. 2022. The integrase of genomic island GIsul2 mediates the mobilization of GIsul2 and ISCR-related element CR2-sul2 unit through site-specific recombination. Front Microbiol 13:905865. 10.3389/fmicb.2022.905865. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Nigro SJ, Hall RM. 2011. GIsul2, a genomic island carrying the sul2 sulphonamide resistance gene and the small mobile element CR2 found in the Enterobacter cloacae subspecies cloacae type strain ATCC 13047 from 1890, Shigella flexneri ATCC 700930 from 1954 and Acinetobacter baumannii ATCC 17978 from 1951. J Antimicrob Chemother 66:2175–2176. 10.1093/jac/dkr230. [DOI] [PubMed] [Google Scholar]
- 29.Song L, Pan Y, Chen S, Zhang X. 2012. Structural characteristics of genomic islands associated with GMP synthases as integration hotspot among sequenced microbial genomes. Comput Biol Chem 36:62–70. 10.1016/j.compbiolchem.2012.01.001. [DOI] [PubMed] [Google Scholar]
- 30.Wei J, Goldberg MB, Burland V, Venkatesan MM, Deng W, Fournier G, Mayhew GF, Plunkett G, 3rd, Rose DJ, Darling A, Mau B, Perna NT, Payne SM, Runyen-Janecky LJ, Zhou S, Schwartz DC, Blattner FR. 2003. Complete genome sequence and comparative genomics of Shigella flexneri serotype 2a strain 2457T. Infect Immun 71:2775–2786. 10.1128/IAI.71.5.2775-2786.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Datsenko KA, Wanner BL. 2000. One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc Natl Acad Sci USA 97:6640–6645. 10.1073/pnas.120163297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Wilde C, Mazel D, Hochhut B, Middendorf B, Le Roux F, Carniel E, Dobrindt U, Hacker J. 2008. Delineation of the recombination sites necessary for integration of pathogenicity islands II and III into the Escherichia coli 536 chromosome. Mol Microbiol 68:139–151. 10.1111/j.1365-2958.2008.06145.x. [DOI] [PubMed] [Google Scholar]
- 33.Hashimoto-Gotoh T, Franklin FC, Nordheim A, Timmis KN. 1981. Specific-purpose plasmid cloning vectors. I. Low copy number, temperature-sensitive, mobilization-defective pSC101-derived containment vectors. Gene 16:227–235. 10.1016/0378-1119(81)90079-2. [DOI] [PubMed] [Google Scholar]
- 34.Hashimoto-Gotoh T, Yamaguchi M, Yasojima K, Tsujimura A, Wakabayashi Y, Watanabe Y. 2000. A set of temperature sensitive-replication/-segregation and temperature resistant plasmid vectors with different copy numbers and in an isogenic background (chloramphenicol, kanamycin, lacZ, repA, par, polA). Gene 241:185–191. 10.1016/S0378-1119(99)00434-5. [DOI] [PubMed] [Google Scholar]
- 35.Solovyev V, Salamov A. 2011. Automatic annotation of microbial genomes and metagenomic sequences, p 61–78. In Li RW (ed), Metagenomics and its applications in agriculture, biomedicine and environmental studies. Nova Science Publishers, Hauppage, NY. [Google Scholar]
- 36.Grant CE, Bailey TL, Noble WS. 2011. FIMO: scanning for occurrences of a given motif. Bioinformatics 27:1017–1018. 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Chen S, Hu M, Hu A, Xue Y, Wang S, Liu F, Li C, Zhou X, Zhou J. 2022. The integration host factor regulates multiple virulence pathways in bacterial pathogen Dickeya zeae MS2. Mol Plant Pathol 23:1487–1507. 10.1111/mpp.13244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Biswas T, Aihara H, Radman-Livaja M, Filman D, Landy A, Ellenberger T. 2005. A structural basis for allosteric control of DNA recombination by lambda integrase. Nature 435:1059–1066. 10.1038/nature03657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Blum M, Chang HY, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, Nuka G, Paysan-Lafosse T, Qureshi M, Raj S, Richardson L, Salazar GA, Williams L, Bork P, Bridge A, Gough J, Haft DH, Letunic I, Marchler-Bauer A, Mi H, Natale DA, Necci M, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A, Finn RD. 2021. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res 49:D344–D354. 10.1093/nar/gkaa977. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Sigrist CJ, de Castro E, Cerutti L, Cuche BA, Hulo N, Bridge A, Bougueleret L, Xenarios I. 2013. New and continuing developments at PROSITE. Nucleic Acids Res 41:D344–D347. 10.1093/nar/gks1067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Manson JM, Gilmore MS. 2006. Pathogenicity island integrase cross-talk: a potential new tool for virulence modulation. Mol Microbiol 61:555–559. 10.1111/j.1365-2958.2006.05262.x. [DOI] [PubMed] [Google Scholar]
- 42.Hsu PL, Ross W, Landy A. 1980. The lambda phage att site: functional limits and interaction with Int protein. Nature 285:85–91. 10.1038/285085a0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Mizuuchi M, Mizuuchi K. 1980. Integrative recombination of bacteriophage lambda: extent of the DNA sequence involved in attachment site function. Proc Natl Acad Sci USA 77:3220–3224. 10.1073/pnas.77.6.3220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Mohaisen MR, McCarthy AJ, Adriaenssens EM, Allison HE. 2020. The site-specific recombination system of the Escherichia coli bacteriophage Φ24B. Front Microbiol 11:578056. 10.3389/fmicb.2020.578056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Mizuuchi M, Mizuuchi K. 1985. The extent of DNA sequence required for a functional bacterial attachment site of phage lambda. Nucleic Acids Res 13:1193–1208. 10.1093/nar/13.4.1193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Fogg PC, Rigden DJ, Saunders JR, McCarthy AJ, Allison HE. 2011. Characterization of the relationship between integrase, excisionase and antirepressor activities associated with a superinfecting Shiga toxin encoding bacteriophage. Nucleic Acids Res 39:2116–2129. 10.1093/nar/gkq923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Tong W, Warren D, Seah NE, Laxmikanthan G, Van Duyne GD, Landy A. 2014. Mapping the λ integrase bridges in the nucleoprotein Holliday junction intermediates of viral integrative and excisive recombination. Proc Natl Acad Sci USA 111:12366–12371. 10.1073/pnas.1413007111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Abremski K, Gottesman S. 1982. Purification of the bacteriophage lambda xis gene product required for lambda excisive recombination. J Biol Chem 257:9658–9662. 10.1016/S0021-9258(18)34123-1. [DOI] [PubMed] [Google Scholar]
- 49.Nash HA. 1975. Integrative recombination of bacteriophage lambda DNA in vitro. Proc Natl Acad Sci USA 72:1072–1076. 10.1073/pnas.72.3.1072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Thompson JF, Moitoso de Vargas L, Koch C, Kahmann R, Landy A. 1987. Cellular factors couple recombination with growth phase: characterization of a new component in the lambda site-specific recombination pathway. Cell 50:901–908. 10.1016/0092-8674(87)90516-2. [DOI] [PubMed] [Google Scholar]
- 51.Miller HI, Kikuchi A, Nash HA, Weisberg RA, Friedman DI. 1979. Site-specific recombination of bacteriophage lambda: the role of host gene products. Cold Spring Harb Symp Quant Biol 43:1121–1126. 10.1101/sqb.1979.043.01.125. [DOI] [PubMed] [Google Scholar]
- 52.Miller HI, Kirk M, Echols H. 1981. SOS induction and autoregulation of the himA gene for site-specific recombination in Escherichia coli. Proc Natl Acad Sci USA 78:6754–6758. 10.1073/pnas.78.11.6754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Yarnall MTN, Ioannidi EI, Schmitt-Ulms C, Krajeski RN, Lim J, Villiger L, Zhou W, Jiang K, Garushyants SK, Roberts N, Zhang L, Vakulskas CA, Walker JA, 2nd, Kadina AP, Zepeda AE, Holden K, Ma H, Xie J, Gao G, Foquet L, Bial G, Donnelly SK, Miyata Y, Radiloff DR, Henderson JM, Ujita A, Abudayyeh OO, Gootenberg JS. 2022. Drag-and-drop genome insertion of large sequences without double-strand DNA cleavage using CRISPR-directed integrases. Nat Biotechnol 10.1038/s41587-022-01527-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Anzalone AV, Gao XD, Podracky CJ, Nelson AT, Koblan LW, Raguram A, Levy JM, Mercer JAM, Liu DR. 2022. Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nat Biotechnol 40:731–740. 10.1038/s41587-021-01133-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Li W, Godzik A. 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659. 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
- 56.Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental material. Download aem.01738-22-s0001.xlsx, XLSX file, 0.3 MB (260KB, xlsx)
Supplemental material. Download aem.01738-22-s0002.xlsx, XLSX file, 0.01 MB (11.7KB, xlsx)
Supplemental material. Download aem.01738-22-s0003.pdf, PDF file, 1.5 MB (1.5MB, pdf)
Data Availability Statement
The three assembled scaffold sequences from next-generation sequencing have been deposited in the National Microbiology Data Center (NMDC; https://nmdc.cn/en) under BioProject no. NMDC10018286, including genome accession no. NMDC60046094 (C-Ap1), NMDC60046095 (C-Se2), and NMDC60046096 (C-Se3).





