Structural Determinants of Sleeping Beauty Transposase Activity

György Abrusán; Stephen R Yant; András Szilágyi; Joseph A Marsh; Lajos Mátés; Zsuzsanna Izsvák; Orsolya Barabás; Zoltán Ivics

doi:10.1038/mt.2016.110

. 2016 Jul 12;24(8):1369–1377. doi: 10.1038/mt.2016.110

Structural Determinants of Sleeping Beauty Transposase Activity

György Abrusán ^1,^2,^*, Stephen R Yant ^3,⁹, András Szilágyi ⁴, Joseph A Marsh ¹, Lajos Mátés ⁵, Zsuzsanna Izsvák ⁶, Orsolya Barabás ⁷, Zoltán Ivics ^8,^*

PMCID: PMC5010145 EMSID: EMS68530 PMID: 27401040

Abstract

Transposases are important tools in genome engineering, and there is considerable interest in engineering more efficient ones. Here, we seek to understand the factors determining their activity using the Sleeping Beauty transposase. Recent work suggests that protein coevolutionary information can be used to classify groups of physically connected, coevolving residues into elements called “sectors”, which have proven useful for understanding the folding, allosteric interactions, and enzymatic activity of proteins. Using extensive mutagenesis data, protein modeling and analysis of folding energies, we show that (i) The Sleeping Beauty transposase contains two sectors, which span across conserved domains, and are enriched in DNA-binding residues, indicating that the DNA binding and endonuclease functions of the transposase coevolve; (ii) Sector residues are highly sensitive to mutations, and most mutations of these residues strongly reduce transposition rate; (iii) Mutations with a strong effect on free energy of folding in the DDE domain of the transposase significantly reduce transposition rate. (iv) Mutations that influence DNA and protein-protein interactions generally reduce transposition rate, although most hyperactive mutants are also located on the protein surface, including residues with protein-protein interactions. This suggests that hyperactivity results from the modification of protein interactions, rather than the stabilization of protein fold.

Introduction

Recent findings identified a structural organization of protein domains that is distinct from their known hierarchical organization into secondary and tertiary structural elements. These structures, termed “sectors”¹ form physically connected networks of coevolving residues within proteins, and span across secondary structural elements. Sectors are identified using multiple alignments with a procedure called Statisctical Coupling Analysis (SCA), which uses the covariance matrix of amino acid variability at different positions of the alignment, and their conservation.¹ It has been noticed that the residues that show correlated evolution in the alignments have a block structure in the SCA matrix: they can be partitioned into clusters of residues, which show correlated evolution within the cluster, but are essentially uncorrelated with residues of other clusters. These groups of coevolving residues were termed “sectors”, in analogy to financial sectors.^1,2 Several important biological properties of proteins are determined by sectors: although they typically make up only 10–30% of the residues of a protein, they were shown to significantly contribute to the specification of protein folds,³ allosteric communication in proteins,⁴ and evolution of novel functions.⁵ Since it is possible to engineer functional artificial protein folds based purely on sector information,⁶ or modify their functions using sector residues⁵ (at least in small domains), sectors are of considerable importance also for protein engineering. However, most work to date on the architecture, functions and importance of sectors have focused on relatively few single-domain proteins, often with only a single sector,^1,4,5,7 and the number of studies with multidomain and multisector proteins is low.^1,8 Thus, it is unclear to what degree the current findings can be generalized, and whether sectors are of similar importance in more complex multi-domain structures as in small proteins.²

Most DNA transposons contain a single gene encoding the transposase protein, which is flanked by terminal inverted repeats (TIRs). Transposons “jump” by a cut-and-paste mechanism, during which the transposase moves the sequence flanked by TIRs to a new genomic location. Since transposases require only the TIRs, and any sequence flanked by TIRs can be moved by externally supplied transposases, they can be used for gene transfer.⁹ In consequence, transposons are popular tools that are widely used for genome engineering, including cancer gene identification by insertional mutagenesis,¹⁰ germline transgenesis,¹¹ somatic gene transfer for gene therapy,⁹ or cellular reprogramming.¹² Their primary advantage over viral vectors for gene therapy is that they have considerably fewer side effects, including low immunogenicity and genotoxicity, while, at least for some applications, they provide stable transgene expression levels with efficiency matching viral vectors.⁹ Several transposon systems are currently applied as genome-engineering tools, including the piggyBac, Tol2, and Sleeping Beauty transposons.^{13,14,15,16,17,18} The first DNA transposon tool capable for gene transfer in vertebrates was Sleeping Beauty (SB), which was reconstructed from extinct Tc1/mariner transposons in fish.¹⁹ Sleeping Beauty, and especially its hyperactive variant²⁰ is still one of the most widely used transposon tool, and it is the only transposon vector being currently in human clinical trials.^21,22

In this work, using our extensive mutagenesis data available for the Sleeping Beauty transposase, we investigate the structural elements that are the most sensitive to mutations, with particular emphasis on protein sectors. We show that sectors are enriched in DNA-binding residues and are highly sensitive to mutations, which cannot be explained by positional conservation. In addition, our analysis suggests that hyperactivity results from the modification of protein-protein interactions, rather than improved protein folding. Wild-type transposases are not optimal for practical use, because they evolved to transpose at relatively low frequency, as high transposition rates harm their host. As a consequence, modifying their activity or insertion patterns through point mutations is of considerable practical importance, and our results may aid their optimization by identifying mutations that are likely to result in transposases with reduced transposition rate.

Results

Determination of the tertiary structure of SB transposase and protein core

The amino acid sequence of the Sleeping Beauty transposase was obtained from Ivics et al.¹⁹ Experimentally determined protein structures are available for the DDE domain of the transposase²³ and the N-terminal HTH motif of the DNA-binding domain,²⁴ but not for the entire transposase. Thus we predicted the tertiary structure of Sleeping Beauty with the I-TASSER molecular modeling platform,^25,26 which uses threading and also ab-initio modeling for structure prediction. Additionally, we used the coordinates of the existing experimental structures (see above) as constraints (Supplementary Figure S1a). Due to the availability of high quality templates, a high-quality structure prediction was possible: the estimated template modeling (TM) score²⁷ of the predicted tertiary structure with an experimentally determined structure is 0.86 (± 0.1). Models of this quality can be successfully used in mutagenesis studies and stability analyses.²⁸ The most similar structure in the Protein Data Bank (PDB, http:/www.rcsb.org) to the predicted structure (supplementary SB.pdb file) is the Mos1 transposase,²⁹ which was also the highest ranking template used by I-TASSER (see Supplementary Figure S1b,c for structural alignments between the Mos1 transposase and the predicted structure, and Supplementary Figure S2 for a Ramachandran plot of the predicted SB transposase using PROCHECK³⁰). Transposases typically function in a dimeric^29,31 or tetrameric enzyme complex^32,33 (and the N-terminal domain of SB was reported to be able to form tetramers in vitro³²), but the high structural and mechanistic similarity of the monomer to Mos1 strongly suggests that the active core unit of the complex is a very similar dimer as the one seen for Mos1. (Nevertheless tetramers may exist and may even be the functional state, for example during assembly.) Thus the monomer produced by I-TASSER was used to build a dimer, using the Mos1 (3HOT) transposase as a template DNA nucleotides were replaced with Chimera,³⁴ to match the inverted repeats of SB; next the SB transposases were superposed over the Mos1 dimer (3HOT) with TMalign,³⁵ followed by correction of clashes and minimization. Severe atomic overlaps (e.g., rings penetrated by other groups) in the initial complex model were manually corrected (Supplementary Figure S1d). The model was then subjected to energy minimization in vacuo by the steepest descent algorithm in GROMACS5 (ref. 36) using the CHARMM27 force field. The minimization converged to machine precision with no remaining overlaps between atoms. Visualizations of the protein structures were made with Chimera.

As buried residues in proteins are known to be less tolerant of mutations than exposed residues,³⁷ we determined relative solvent accessibility of each residue of the structure with DSSP³⁸ (see Methods and Supplementary Table S1). The 75 residues with relative solvent accessibility <= 0.1 were assumed to form the protein core. Residues that take part in protein-protein interactions were determined using the difference in solvent accessible surface areas of the monomeric and dimeric form of SB: all residues that have different solvent accessible surface areas in the dimer and monomer were assumed to take part in protein-protein interactions. DNA-protein interactions were determined with the SNAP tool of the 3DNA package.³⁹

Identification of sectors of the SB transposase

To identify sectors in SB, multiple alignments were made with three different state-of-art tools: muscle,⁴⁰ probcons,⁴¹ and mafft⁴² (see Methods). Using the three alignments, statistical coupling analyses were performed to identify protein sectors, with the method described by Halabi et al.,¹ using a modified MATLAB script provided by the same study. SCA tests whether the conservation of an amino acid at any position in the sequence alignment is correlated with the conservation of any other residue of the protein,⁴ i.e. identifies residues that coevolve. First, it builds a weighted correlation matrix of coevolving amino acids for all residues in the alignment (Figure 1a), and this matrix is subsequently cleaned from statistical noise with a randomization method.¹ The analysis of eigenvalue spectra identified three significant eigenvalues for all three alignments (after the exclusion of the largest one), indicating that there might be up to three sectors in the protein. However, after examining residue weights along eigenvectors 2–4, we could identify only two sectors along eigenvector 2 (see Supplementary Figure S3) that had similar residue compositions irrespectively of the alignment used (Supplementary Tables S3 and S4). Due to the different spatial pattern of the residue weights of the three alignments, attempts to identify a third sector resulted in a poorly defined sector, which had different residues depending on the alignment used, and was also strongly correlated with the other two sectors. In consequence, we use only the two sectors that could be consistently defined in all three alignments, which together contain 72–78 residues, depending on the alignment. The cleaned SCA matrices of all three alignments show that the two sectors are essentially independent (Figure 1b), i.e., the correlations between the residues of a sector are much stronger than the correlations between sectors.

**Identification of sectors and conserved domains in the *Sleeping Beauty* (SB) transposase**. (a) Statistical Coupling Analysis (SCA) matrix for the muscle alignment of 289 homologous sequences present in RepBase (+SB). The matrix represents correlations between amino acid frequencies at each position of the alignment, *i.e.*, residue pairs that coevolve. (b) Cleaned SCA matrices for three alignments made with *muscle*, *probcons*, and *mafft* aligners, containing the residues of the two sectors. Residues within sectors show correlated evolution, while there is almost no correlation between sectors. (c) The transposase contains three Pfam conserved domains; two HTH domains with DNA binding functions, and a DDE domain with endonuclease activity. (d) The distribution of conservation scores (D) across the sequence. (**e,f**) The location of the two sectors identified with the *muscle* alignment in the tertiary structure of the SB transposase. The sectors are located across secondary structure elements, and are less compact than the ones reported so far, possibly due to the low sequence similarity in the alignments. Both sectors have residues in multiple conserved domains; most notably sector 2, which has residues in all three Pfam domains of the protein. (g) The location of the conserved residues (D > 0.5, muscle alignment, see also **Supplementary Table S2**) of the transposase. (h) The residues of the protein core. All residues with relative solvent accessibility below 0.1 are highlighted with red.

The location of the sectors in the transposase structure is somewhat different from the pattern observed in smaller proteins¹ (Figure 1e,f). Residues of both sectors are located in more than one conserved domain, and in the case of the second sector, residues are present in all three Pfam⁴³ domains of the transposase (Figure 1c,f), indicating that the division to conserved domains does not strictly correspond with the units of the protein that actually coevolve. Sectors (but also conserved residues) are enriched in DNA-binding residues: their fraction is 29%, as opposed to the 17% observed for the entire protein (P < 0.05 for all three alignments, tests of proportions), but there is no significant difference between the two sectors. The residues of sectors are physically less tightly connected than in most small proteins examined so far, which may arise from the low sequence conservation of the alignments: inaccuracies in the alignments due to the low sequence similarity result in noise, which reduces correlations among residues, and in consequence SCA may fail to detect certain residues as sector residues. To a lesser degree, minor inaccuracies in the transposon sequences themselves may contribute to such noise, as many transposon sequences—including Sleeping Beauty—are reconstructions of extinct repeats.

The dependence of transposition rate on sectors, protein core, and conservation

To examine the effect of different residues and protein regions on transposition rate, we used transposition rate measurements of 286 SB mutants, which represent a compilation of all Sleeping Beauty point mutations known to us and also unpublished mutants (see Methods). The distribution of 286 point mutations is approximately uniform across the SB transposase sequence (Figure 2a); however their amino acid distribution is not, as the majority of mutants were alanine replacements (Supplementary Figure S4). In general, the transposition rates of mutants vary significantly, from completely inactivating the transpsosase to significantly increasing the transposition rate (Figure 2a). The location of the residues in the protein structure have a large influence on their effect: mutations in protein sectors, conserved residues (D > 0.5, see Methods) and the protein core result in a significantly larger reduction in transposition rate in comparison with the residues that do not belong to any of these groups (Figure 2b, both sectors, conserved residues and the core are significantly different from other residues, P << 0.05, pairwise comparisons with Mann-Whitney U-tests).

**Effect of residue location on the transposition rate of SB mutants.** (a) The location of the mutations along the SB transposase sequence, and their effect on transposition rate. The 286 mutants are distributed approximately evenly across the sequence; the majority of mutants reduces transposition rate (< 100% of SB). None of the Pfam conserved domains show a clear difference from the rest of the sequence. (b) The effect of sectors, conserved residues and protein core on transposition rate (median, box: 25–75%, whiskers: 10–90%). Mutants in both sectors, conserved residues and residues of the protein core have significantly lower transposition rates than other residues, irrespectively of the aligner used (P < 0.05, Mann-Whitney U-tests).

Sectors represent an extension of the traditional concept of conservation, and there is significant overlap between residues that are part of a sector and also have high positional conservation (D > 0.5). Recently, it has been questioned whether the effect of sectors on transposition rate is independent from the effect of conservation.² To test this, we split sector and conserved residues into three groups: sector residues with low conservation (D < 0.5), sector residues with high conservation, and conserved residues that are not part of a sector. The comparison of these groups with residues that are neither part of sectors, nor the protein core, and are also not conserved (“other” residues, Figure 3) indicates, that the effect of sectors on transposition rate is not simply due to positional conservation, as the three groups are significantly different from the “other” residues (P < 0.05 for all comparisons except “conserved only” of the mafft alignment, Fisher post-hoc tests, analysis of variance on log transformed transposition rates), and there is no significant difference between non-conserved sector and non–sector-conserved residues (P > 0.05 in all three alignments, Fisher post-hoc tests, analysis of variance).

**The effect of sectors on transposition rate is not a by-product of positional conservation**. Sector and conserved residues were split into three groups: sector residues with low positional conservation (D < 0.5), sector residues with high positional conservation, and conserved residues that are not part of any sector. Transposition rates of mutants (median, box: 25–75%, whiskers: 10–90%) in all three groups are significantly different from mutants in other residues (P << 0.05 for all comparisons, Fisher post-hoc tests, analysis of variance on log transformed transposition rates), and there is no significant difference between the mutants of nonconserved sector and conserved but nonsector residues (P > 0.05 in all three alignments, Fisher post-hoc tests).

The effect of mutations on protein stability

Most proteins can function only in a narrow range of folding energies,⁴⁴ as unstable proteins may not fold properly and very stable ones may be too rigid to perform their functions. Mutating a residue in a protein can have significant effects on its overall stability (ΔG, the free energy of unfolding) and function, thus we tested whether the differences in transposition rate between the sectors, conserved residues and core of the protein and other residues are caused by their effect on protein stability, measured as the difference of the predicted folding energy (ΔΔG) between the wild type SB transposase and the mutants. The analysis shows that mutations in sector, conserved and core residues usually have a destabilizing effect on the structure (ΔΔG > 0, P << 0.05, t-tests; Figure 4a).

**The effect of mutations on the change of the free energy of unfolding (median, box: 25–75%, whiskers: 10–90%)**. (a) Mutations in sectors and the core are significantly more destabilizing (ΔΔG > 0) than mutants of other residues (P < 0.05 for all comparisons, t-tests). (b) The monomer of SB. The flexible N-terminal arm of the protein containing the HTH domains (residues 1–120) is indicated with white, the globular part (residues 121–340), which contains the DDE domain, with gray. (c) In the flexible arm the effect of mutations on ΔΔG is not correlated with transposition rate (P = 0.95). (d) In the globular region we find a significant negative correlation (P << 0.001, R = −0.51) between ΔΔG and transposition rate.

Although three conserved domains were identified in the SB sequence, an analysis of the flexibility of the structure with the PiSQRD tool⁴⁵ and also recent analyses of the Mos1 and SB transposase^24,46 indicate that the structure can be split into two large regions; the relatively flexible N-terminal part of the protein containing the DNA binding HTH-domains (residues 1–120), and a rigid, globular region (residues 121–340) containing the DDE domain (Figure 4b and Figure 1c). Mutations have different effects on folding energies in these two regions; while we detected a clear negative correlation (P << 0.001, R = −0.51) between transposition rate and ΔΔG (Figure 4c) in the globular part of the protein, there is no such relationship (P = 0.95, R = 0.0049) in the N-terminal region containing the HTH domains (Figure 4d).

Next, we tested whether mutants in the two regions have different effects on the transposition rate of SB, and we found that the two regions are markedly different. In the flexible part of the protein mutants of sector, conserved and core residues do not differ significantly from the remaining residues (P > 0.05 for all comparisons, Mann-Whitney U-tests, Figure 5a), while in the region containing the DDE domain there is a highly significant difference (P << 0.001 for all comparisons, Mann-Whitney U-tests, Figure 5b). Additionally, 50% of the mutants of “other” residues have higher transposition rates than the wild type.

**The effect of residue location on transposition rate, in the two regions of the protein (median, box: 25–75%, whiskers: 10–90%)**. (a) In the HTH-region, mutants of sectors, conserved or buried residues are not significantly different from the remaining mutants (P > 0.05 for all comparisons, Mann-Whitney U-tests). (b) In the DDE domain the differences are highly significant (P < 0.05 for all comparisons, Mann-Whitney U-tests), even after correcting for the different effects of free energy of folding (c). Note that in the DDE domain 50% of mutants of “other” residues are characterized with higher activity than the wild type SB.

As the location of mutations has a significant effect on the free energy of folding, and in the DDE domain ΔΔG is correlated with transposition rates (Figure 4d), we tested whether the effect of sectors remains significant if we remove the effect of ΔΔG on transposition rate, i.e., we adjust all rates to ΔΔG = 0. The results show that the corrected transposition rates are still highly significantly different from other residues (P < 0.05 for all comparisons, Mann-Whitney U-tests, Figure 5c), thus the biological effect of sectors and conserved residues cannot be explained with their effect on ΔΔG alone.

The effect of protein-protein and protein-DNA interactions on transposition rate

Transposases typically form protein complexes during transposition, and recent studies on mariner transposases related to Sleeping Beauty (Hsmar1 and Mos1) indicate that mutants that disrupt allosteric communication within its dimer are characterized by increased activity.^47,48,49 In particular, almost all mutants of the conserved WVPHEL motif (except P and E) of Hsmar1 transposase were hyperactive,⁴⁸ most likely due to lowering the kinetic barrier to synapsis.^50,51 Our findings suggest that the mechanism that causes hyperactivity in SB may be comparable to Hsmar1, and probably involves the modification (or disruption) of protein-protein interactions (although the WVPHEL motif is not conserved in the SB transposase). This hypothesis is also consistent with the observation that the relationship between transposase concentration and SB activity is similar to Hsmar1 (ref. 51). We tested whether mutants of residues taking part in protein-protein and DNA-protein interactions have different transposition rates than other residues at the protein surface. In general, when outliers are excluded, mutants of both protein and DNA-interacting residues have significantly lower transposition rates than other residues at the surface (Figure 6, P < 0.05, Kruskal-Wallis test). However, all but two of the hyperactive mutants (with 300% or higher activity) are located at the protein surface, and none are present in the core of the DDE domain. Of the 12 hyperactive surface mutations, four are in the protein-protein interfaces of the dimer (including the most active mutant), and none are in DNA-protein interfaces (see Figure 7 and supplementary Chimera visualization). Since the SB transposase can probably also form tetramers during transposition,³² there are probably more residues that take part in protein-protein interactions, suggesting that modification of interactions might be a key factor responsible for hyperactivity.

**The effect of DNA-protein and protein-protein interactions on transposition rate**. When outliers are excluded, mutants of residues interacting with DNA (“DNA”) and the other SB chain (“PPI”) have significantly lower transposition rates (P < 0.05, Mann-Whitney U-test) than other residues located at the surface (RSA > 0.1). Surprisingly, 12 of the 14 hyperactive mutants (outliers, 300+% activity) are also located in the protein surface, and none in the DNA binding regions, suggesting that the modification of protein-protein interactions might be responsible for their dramatically increased activity. (Outliers with identical transposition rates were shifted by 10%, for visibility)

**The location of the 14 hyperactive mutations in the SB dimer.** Yellow residues represent mutants in protein-protein interfaces, red residues other mutants. As SB probably forms also a tetramer in certain phases of transposition, the number of residues taking part in protein-protein interactions is probably higher. (See also the supplementary “hyperactive.py” Chimera file.)

Discussion

We performed an analysis of protein sectors in a relatively large, multidomain protein with a complex tertiary and quaternary structure, and attempt to predict the effect of mutations on transposition rate, based on their location and effect on protein stability. Although sector identification depends on the alignment used, we could identify two sectors in the SB transposase, regardless of the alignment method. Most previous studies focused on smaller, single-domain proteins,^1,7,52 and one study⁸ identified a sector that spans two domains; our analysis indicates that sectors can span multiple conserved domains of a protein (Figure 1e,f), and, in the case of SB, are enriched in DNA binding residues. There may be at least two explanations for the observation that sector residues are present in more than one domain: first, in some stages of transposition these residues may be in physical contact. Second, since sector identification is a purely statistical procedure which searches for coevolving residues in the entire protein sequence, in the case of two (or multi) domain proteins where both domains are necessary for the protein to function, coevolution between the domains is highly likely, and sectors that are confined to a single domain are probably present only in domains that are essentially independent.

A significant effect of mutations on transposition rate could be demonstrated in sectors, the protein core, conserved residues, protein-protein, and protein-DNA interface: mutating these residues typically resulted in transposases with low transposition rates. Recently, Teşileanu et al. suggested that depending on the method used for sector identification, any biological effect of the first sector may be the consequence of sequence conservation alone.² Since we used the method of Halabi et al.¹ for sector identification, which does not use the first eigenvector of the SCA correlation matrix, their concerns do not apply for our results. However, as a significant fraction of sector residues are conserved, we also analyzed sector and conserved residues independently, and show that mutations of not conserved sector residues have a similar effect on transposition rate as mutations of conserved but nonsector residues (Figure 3), thus the biological functions of sectors cannot be explained with conservation alone.

In comparison with smaller proteins,⁵ the influence of sectors on protein function appears to be more complicated in SB, and depend on the tertiary structure. In the globular part of the protein, we could detect a clear effect of sectors, conservation and core on transposition rate, even when the effect of the free energy of folding was excluded (Figure 5). However, in the flexible part of the protein containing the HTH-domains, we found no effect of sectors, nor a correlation between transposition rate and ΔΔG (Figure 4), which indicates that further studies are needed to evaluate the importance of sectors in nonglobular (including disordered and coiled coil) proteins.

While we did not find a “recipe” for making hyperactive mutants of SB, our analysis allows prioritizing residues for targeted mutagenesis. Half of the residues in the DDE domain that are not part of sectors, conserved residues or protein core have increased transpositional activity. In addition, 12 out of the 14 hyperactive mutants (mutants with at least 3× increased activity compared to the wild type) are located in the protein surface, and 4 of them are in the protein-protein interfaces of the dimer, suggesting that similarly to the Hsmar1, the disruption of self-regulating protein-protein interactions may be an important factor in generating hyperactive mutants. In contrast, no hyperactive mutants are present in DNA-protein interfaces or in the buried residues (core) of the DDE domain. Since mutations of these regions typically strongly reduce the rate of transposition, this suggests that despite the fact that SB is a reconstructed sequence and it is most likely inaccurate to some degree, both DNA binding and folding are close to optimal.

Materials and Methods

Identification of SB homologs and making of multiple alignments. Transposase sequences homologous to Sleeping Beauty were identified in the 6-frame translated RepBase database (v17.12),⁵³ the main database of eukaryotic transposable elements, using the jackhmmer tool of the HMMER 3.0 package,⁵⁴ with bit score cutoff 27. We excluded from the hits all matches that show homology only to a short fragment of SB, and kept only those hits that span at least from residues 50 to 290 of the SB transposase, thus covering more than 70% of the sequence. Next, to remove sequences with high similarity (>90%), the homologous sequences were clustered with uclust.⁵⁵ The determination of protein sectors depends on multiple sequence alignments, but in the case of SB the average pairwise sequence similarity between the homologous sequences is low (19%), and in this low range of sequence similarity only approximately 50–80% of the residues can be aligned correctly with current methods.⁵⁶ This means that the choice of the aligner may influence the results significantly (i.e., the determination of sector and conserved residues), and to estimate the biases introduced by different alignment methods, we used three different alignment tools: muscle,⁴⁰ probcons,⁴¹ and mafft.⁴² After aligning the sequences, the alignments were trimmed to the 340 residues of SB transposase, i.e., we removed all columns with gaps in the SB sequence. All three alignments are available for download as Supplementary Data.

Determination of conservation and SCA calculations. Conservation (D, Kullback-Leibler entropy) at any given position of the sequence was defined as the divergence of the observed frequency from the background frequency of the most frequent residue at the position, and was calculated with the following equation: D = f ln(f/q) + (1–f) ln((1-f)/(1-q)),¹ where f denotes the frequency of the amino acid at a given position of the sequence, and q represents its background frequency. We used the same background frequencies as in ref. 1, and conserved residues were defined as residues with D > 0.5. Both for SCA and D calculations, we excluded all positions from the alignments, where the frequency of gaps was higher than 30%. SCA calculations (calculating the correlation matrix, spectral cleaning, randomization of the alignments) were performed with a modified Matlab script provided by the Halabi et al.¹ Sectors were determined by a visual examination of residue weights of eigenvectors 2–4 (see Halabi et al. for details), sector 1 was defined as residues with weights < −0.05 along eigenvector 2, sector 2 as residues with weights > 0.05 along eigenvector 2 (see Supplementary Figure S3).

Construction of SB mutants and determining their transposition rate. The mutants were partly obtained from published studies,^{20,57,58,59,60} and partly (~80 mutants) represent unpublished material. Site-directed mutagenesis of the transposase gene was done by polymerase chain reaction following the QuikChange (Stratagene) principle of site-directed mutagenesis. The mutants were tested against the corresponding wild-type SB transposase in cell-based transposition assays, as originally described by Ivics et al.¹⁹

Stability calculations and in-silico mutagenesis. The free energy of unfolding (ΔG) of the SB transposase, and its changes were calculated with the FoldX tool (version 4),^61,62 using the predicted structure of the SB complex. First, the structure produced by I-TASSER was optimized with the RepairPDB function to correct torsion angles, van der Waals clashes and total energies. Next, we calculated the effect of the mutations on the ΔG of the structure for the 286 mutants. The difference in ΔG between the “wild type” SB and its mutants is given as ΔΔG; its positive values indicate destabilizing, negative values stabilizing mutations.

SUPPLEMENTARY MATERIAL Figure S1. Stuctural characteristics of the Sleeping Beauty transposase. Figure S2. Ramachandran plot of the Sleeping Beauty transposase. Figure S3. Residue weights along eigenvectors 2-3 of the SCA correlation matrix. Figure S4. The amino acid distribution of the mutations. Table S1. Relative Solvent accessibility of SB transposase residues. Table S2. Conservation (D) of residues in the three alignments. Table S3. Sector residues based on the muscle, probcons and mafft alignments. Table S4. The number of shared residues between sectors based on different alignments. Supplementary Data

Acknowledgments

We thank Mark A. Kay for support, and the referees for useful comments and suggestions. G.A. was supported by the Hungarian Scientific Research Fund (OTKA grant PD83571) and the Medical Research Council (grant MR/M02122X/1 to J.M.). A.S. was supported by the OTKA grant K105415. Z.I. was supported by the European Union (EU FP5 JUMPY and EU FP6 INTHER grants), and also by grants from the Volkswagen Stiftung and the Bundesministerium für Bildung und Forschung (NGFN-2). Z.I. was supported by the EU FP6 INTHER grant. J.A.M. was supported by the Medical Research Council (MR/M02122X/1). O.B. was supported by the EMBL.

Supplementary Material

Supplementary Figures

Click here for additional data file.^{(1.6MB, doc)}

Supplementary Tables

Click here for additional data file.^{(83KB, xls)}

Supplementary Data

Click here for additional data file.^{(914.3KB, zip)}

References

Halabi, N, Rivoire, O, Leibler, S and Ranganathan, R (2009). Protein sectors: evolutionary units of three-dimensional structure. Cell 138: 774–786. [DOI] [PMC free article] [PubMed] [Google Scholar]
Teşileanu, T, Colwell, LJ and Leibler, S (2015). Protein sectors: statistical coupling analysis versus conservation. PLoS Comput Biol 11: e1004091. [DOI] [PMC free article] [PubMed] [Google Scholar]
Socolich, M, Lockless, SW, Russ, WP, Lee, H, Gardner, KH and Ranganathan, R (2005). Evolutionary information for specifying a protein fold. Nature 437: 512–518. [DOI] [PubMed] [Google Scholar]
Süel, GM, Lockless, SW, Wall, MA and Ranganathan, R (2003). Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat Struct Biol 10: 59–69. [DOI] [PubMed] [Google Scholar]
McLaughlin, RN Jr, Poelwijk, FJ, Raman, A, Gosal, WS and Ranganathan, R (2012). The spatial architecture of protein function and adaptation. Nature 491: 138–142. [DOI] [PMC free article] [PubMed] [Google Scholar]
Russ, WP, Lowery, DM, Mishra, P, Yaffe, MB and Ranganathan, R (2005). Natural-like function in artificial WW domains. Nature 437: 579–583. [DOI] [PubMed] [Google Scholar]
Reynolds, KA, McLaughlin, RN and Ranganathan, R (2011). Hot spots for allosteric regulation on protein surfaces. Cell 147: 1564–1575. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smock, RG, Rivoire, O, Russ, WP, Swain, JF, Leibler, S, Ranganathan, R et al. (2010). An interdomain sector mediating allostery in Hsp70 molecular chaperones. Mol Syst Biol 6: 414. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ivics, Z and Izsvák, Z (2011). Nonviral gene delivery with the sleeping beauty transposon system. Hum Gene Ther 22: 1043–1051. [DOI] [PubMed] [Google Scholar]
Mann, MB, Jenkins, NA, Copeland, NG and Mann, KM (2014). Sleeping Beauty mutagenesis: exploiting forward genetic screens for cancer gene discovery. Curr Opin Genet Dev 24: 16–22. [DOI] [PubMed] [Google Scholar]
Ammar, I, Izsvák, Z and Ivics, Z (2012). The Sleeping Beauty transposon toolbox. Methods Mol Biol 859: 229–240. [DOI] [PubMed] [Google Scholar]
Grabundzija, I, Wang, J, Sebe, A, Erdei, Z, Kajdi, R, Devaraj, A et al. (2013). Sleeping Beauty transposon-based system for cellular reprogramming and targeted gene insertion in induced pluripotent stem cells. Nucleic Acids Res 41: 1829–1847. [DOI] [PMC free article] [PubMed] [Google Scholar]
Grabundzija, I, Irgang, M, Mátés, L, Belay, E, Matrai, J, Gogol-Döring, A et al. (2010). Comparative analysis of transposable element vector systems in human cells. Mol Ther 18: 1200–1209. [DOI] [PMC free article] [PubMed] [Google Scholar]
Abe, G, Suster, ML and Kawakami, K (2011). Tol2-mediated transgenesis, gene trapping, enhancer trapping, and the Gal4-UAS system. Methods Cell Biol 104: 23–49. [DOI] [PubMed] [Google Scholar]
Kawakami, K (2007). Tol2: a versatile gene transfer vector in vertebrates. Genome Biol 8 Suppl 1: S7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Di Matteo, M, Mátrai, J, Belay, E, Firdissa, T, Vandendriessche, T and Chuah, MK (2012). PiggyBac toolbox. Methods Mol Biol 859: 241–254. [DOI] [PubMed] [Google Scholar]
Li, X, Burnight, ER, Cooney, AL, Malani, N, Brady, T, Sander, JD et al. (2013). piggyBac transposase tools for genome engineering. Proc Natl Acad Sci USA 110: E2279–E2287. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yusa, K, Zhou, L, Li, MA, Bradley, A and Craig, NL (2011). A hyperactive piggyBac transposase for mammalian applications. Proc Natl Acad Sci USA 108: 1531–1536. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ivics, Z, Hackett, PB, Plasterk, RH and Izsvák, Z (1997). Molecular reconstruction of Sleeping Beauty, a Tc1-like transposon from fish, and its transposition in human cells. Cell 91: 501–510. [DOI] [PubMed] [Google Scholar]
Mátés, L, Chuah, MK, Belay, E, Jerchow, B, Manoj, N, Acosta-Sanchez, A et al. (2009). Molecular evolution of a novel hyperactive Sleeping Beauty transposase enables robust stable gene transfer in vertebrates. Nat Genet 41: 753–761. [DOI] [PubMed] [Google Scholar]
Guerrero, AD, Moyes, JS and Cooper, LJ (2014). The human application of gene therapy to re-program T-cell specificity using chimeric antigen receptors. Chin J Cancer 33: 421–433. [DOI] [PMC free article] [PubMed] [Google Scholar]
Singh, H, Huls, H, Kebriaei, P and Cooper, LJ (2014). A new approach to gene therapy using Sleeping Beauty to genetically modify clinical-grade T cells to target CD19. Immunol Rev 257: 181–190. [DOI] [PMC free article] [PubMed] [Google Scholar]
Voigt, F, Wiedemann, L, Zuliani, C, Querques, I, Sebe, A, Mátés, L et al. (2016). Sleeping Beauty transposase structure allows rational design of hyperactive variants for genetic engineering. Nat Commun 7: 11126. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carpentier, CE, Schreifels, JM, Aronovich, EL, Carlson, DF, Hackett, PB and Nesmelova, IV (2014). NMR structural analysis of Sleeping Beauty transposase binding to DNA. Protein Sci 23: 23–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang, Y (2008). I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 9: 40. [DOI] [PMC free article] [PubMed] [Google Scholar]
Roy, A, Kucukural, A and Zhang, Y (2010). I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc 5: 725–738. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang, Y and Skolnick, J (2004). Scoring function for automated assessment of protein structure template quality. Proteins 57: 702–710. [DOI] [PubMed] [Google Scholar]
Zhang, Y (2009). Protein structure prediction: when is it useful? Curr Opin Struct Biol 19: 145–155. [DOI] [PMC free article] [PubMed] [Google Scholar]
Richardson, JM, Colloms, SD, Finnegan, DJ and Walkinshaw, MD (2009). Molecular architecture of the Mos1 paired-end complex: the structural basis of DNA transposition in a eukaryote. Cell 138: 1096–1108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Laskowski, RA, MacArthur, MW, Moss, DS and Thornton, JM (1993). PROCHECK: a program to check the stereochemical quality of protein structures. Journal of Applied Crystallography 26: 283–291. [Google Scholar]
Nesmelova, IV and Hackett, PB (2010). DDE transposases: Structural similarity and diversity. Adv Drug Deliv Rev 62: 1187–1195. [DOI] [PMC free article] [PubMed] [Google Scholar]
Izsvák, Z, Khare, D, Behlke, J, Heinemann, U, Plasterk, RH and Ivics, Z (2002). Involvement of a bifunctional, paired-like DNA-binding domain and a transpositional enhancer in Sleeping Beauty transposition. J Biol Chem 277: 34581–34588. [DOI] [PubMed] [Google Scholar]
Montaño, SP, Pigli, YZ and Rice, PA (2012). The μ transpososome structure sheds light on DDE recombinase evolution. Nature 491: 413–417. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pettersen, EF, Goddard, TD, Huang, CC, Couch, GS, Greenblatt, DM, Meng, EC et al. (2004). UCSF Chimera–a visualization system for exploratory research and analysis. J Comput Chem 25: 1605–1612. [DOI] [PubMed] [Google Scholar]
Zhang, Y and Skolnick, J (2005). TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33: 2302–2309. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pronk, S, Páll, S, Schulz, R, Larsson, P, Bjelkmar, P, Apostolov, R et al. (2013). GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29: 845–854. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bowie, JU, Reidhaar-Olson, JF, Lim, WA and Sauer, RT (1990). Deciphering the message in protein sequences: tolerance to amino acid substitutions. Science 247: 1306–1310. [DOI] [PubMed] [Google Scholar]
Touw, WG, Baakman, C, Black, J, te Beek, TA, Krieger, E, Joosten, RP et al. (2015). A series of PDB-related databanks for everyday needs. Nucleic Acids Res 43(Database issue): D364–D368. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lu, XJ and Olson, WK (2008). 3DNA: a versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures. Nat Protoc 3: 1213–1227. [DOI] [PMC free article] [PubMed] [Google Scholar]
Edgar, RC (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
Do, CB, Mahabhashyam, MS, Brudno, M and Batzoglou, S (2005). ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res 15: 330–340. [DOI] [PMC free article] [PubMed] [Google Scholar]
Katoh, K and Standley, DM (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780. [DOI] [PMC free article] [PubMed] [Google Scholar]
Finn, RD, Mistry, J, Tate, J, Coggill, P, Heger, A, Pollington, JE et al. (2010). The Pfam protein families database. Nucleic Acids Res 38(Database issue): D211–D222. [DOI] [PMC free article] [PubMed] [Google Scholar]
DePristo, MA, Weinreich, DM and Hartl, DL (2005). Missense meanderings in sequence space: a biophysical view of protein evolution. Nat Rev Genet 6: 678–687. [DOI] [PubMed] [Google Scholar]
Aleksiev, T, Potestio, R, Pontiggia, F, Cozzini, S and Micheletti, C (2009). PiSQRD: a web server for decomposing proteins into quasi-rigid dynamical domains. Bioinformatics 25: 2743–2744. [DOI] [PubMed] [Google Scholar]
Cuypers, MG, Trubitsyna, M, Callow, P, Forsyth, VT and Richardson, JM (2013). Solution conformations of early intermediates in Mos1 transposition. Nucleic Acids Res 41: 2020–2033. [DOI] [PMC free article] [PubMed] [Google Scholar]
Claeys Bouuaert, C, Walker, N, Liu, D and Chalmers, R (2014). Crosstalk between transposase subunits during cleavage of the mariner transposon. Nucleic Acids Res 42: 5799–5808. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu, D and Chalmers, R (2014). Hyperactive mariner transposons are created by mutations that disrupt allosterism and increase the rate of transposon end synapsis. Nucleic Acids Res 42: 2637–2645. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dornan, J, Grey, H and Richardson, JM (2015). Structural role of the flanking DNA in mariner transposon excision. Nucleic Acids Res 43: 2424–2432. [DOI] [PMC free article] [PubMed] [Google Scholar]
Claeys Bouuaert, C, Liu, D and Chalmers, R (2011). A simple topological filter in a eukaryotic transposon as a mechanism to suppress genome instability. Mol Cell Biol 31: 317–327. [DOI] [PMC free article] [PubMed] [Google Scholar]
Claeys Bouuaert, C, Lipkow, K, Andrews, SS, Liu, D and Chalmers, R (2013). The autoregulation of a eukaryotic DNA transposon. Elife 2: e00668. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lockless, SW and Ranganathan, R (1999). Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286: 295–299. [DOI] [PubMed] [Google Scholar]
Jurka, J, Kapitonov, VV, Pavlicek, A, Klonowski, P, Kohany, O and Walichiewicz, J (2005). Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110: 462–467. [DOI] [PubMed] [Google Scholar]
Eddy, SR (2009). A new generation of homology search tools based on probabilistic inference. Genome Inform 23: 205–211. [PubMed] [Google Scholar]
Edgar, RC (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26: 2460–2461. [DOI] [PubMed] [Google Scholar]
Nuin, PA, Wang, Z and Tillier, ER (2006). The accuracy of several multiple sequence alignment programs for proteins. BMC Bioinformatics 7: 471. [DOI] [PMC free article] [PubMed] [Google Scholar]
Geurts, AM, Yang, Y, Clark, KJ, Liu, G, Cui, Z, Dupuy, AJ et al. (2003). Gene transfer into genomes of human cells by the sleeping beauty transposon system. Mol Ther 8: 108–117. [DOI] [PubMed] [Google Scholar]
Zayed, H, Izsvák, Z, Walisko, O and Ivics, Z (2004). Development of hyperactive sleeping beauty transposon vectors by mutational analysis. Mol Ther 9: 292–304. [DOI] [PubMed] [Google Scholar]
Yant, SR, Park, J, Huang, Y, Mikkelsen, JG and Kay, MA (2004). Mutational analysis of the N-terminal DNA-binding domain of sleeping beauty transposase: critical residues for DNA binding and hyperactivity in mammalian cells. Mol Cell Biol 24: 9239–9247. [DOI] [PMC free article] [PubMed] [Google Scholar]
Baus, J, Liu, L, Heggestad, AD, Sanz, S and Fletcher, BS (2005). Hyperactive transposase mutants of the Sleeping Beauty transposon. Mol Ther 12: 1148–1156. [DOI] [PubMed] [Google Scholar]
Guerois, R, Nielsen, JE and Serrano, L (2002). Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 320: 369–387. [DOI] [PubMed] [Google Scholar]
Schymkowitz, JW, Rousseau, F, Martins, IC, Ferkinghoff-Borg, J, Stricher, F and Serrano, L (2005). Prediction of water and metal binding sites and their affinities by using the Fold-X force field. Proc Natl Acad Sci U S A 102: 10147–10152. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Figures

Click here for additional data file.^{(1.6MB, doc)}

Supplementary Tables

Click here for additional data file.^{(83KB, xls)}

Supplementary Data

Click here for additional data file.^{(914.3KB, zip)}

[bib1] Halabi, N, Rivoire, O, Leibler, S and Ranganathan, R (2009). Protein sectors: evolutionary units of three-dimensional structure. Cell 138: 774–786. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Teşileanu, T, Colwell, LJ and Leibler, S (2015). Protein sectors: statistical coupling analysis versus conservation. PLoS Comput Biol 11: e1004091. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Socolich, M, Lockless, SW, Russ, WP, Lee, H, Gardner, KH and Ranganathan, R (2005). Evolutionary information for specifying a protein fold. Nature 437: 512–518. [DOI] [PubMed] [Google Scholar]

[bib4] Süel, GM, Lockless, SW, Wall, MA and Ranganathan, R (2003). Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat Struct Biol 10: 59–69. [DOI] [PubMed] [Google Scholar]

[bib5] McLaughlin, RN Jr, Poelwijk, FJ, Raman, A, Gosal, WS and Ranganathan, R (2012). The spatial architecture of protein function and adaptation. Nature 491: 138–142. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Russ, WP, Lowery, DM, Mishra, P, Yaffe, MB and Ranganathan, R (2005). Natural-like function in artificial WW domains. Nature 437: 579–583. [DOI] [PubMed] [Google Scholar]

[bib7] Reynolds, KA, McLaughlin, RN and Ranganathan, R (2011). Hot spots for allosteric regulation on protein surfaces. Cell 147: 1564–1575. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] Smock, RG, Rivoire, O, Russ, WP, Swain, JF, Leibler, S, Ranganathan, R et al. (2010). An interdomain sector mediating allostery in Hsp70 molecular chaperones. Mol Syst Biol 6: 414. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Ivics, Z and Izsvák, Z (2011). Nonviral gene delivery with the sleeping beauty transposon system. Hum Gene Ther 22: 1043–1051. [DOI] [PubMed] [Google Scholar]

[bib10] Mann, MB, Jenkins, NA, Copeland, NG and Mann, KM (2014). Sleeping Beauty mutagenesis: exploiting forward genetic screens for cancer gene discovery. Curr Opin Genet Dev 24: 16–22. [DOI] [PubMed] [Google Scholar]

[bib11] Ammar, I, Izsvák, Z and Ivics, Z (2012). The Sleeping Beauty transposon toolbox. Methods Mol Biol 859: 229–240. [DOI] [PubMed] [Google Scholar]

[bib12] Grabundzija, I, Wang, J, Sebe, A, Erdei, Z, Kajdi, R, Devaraj, A et al. (2013). Sleeping Beauty transposon-based system for cellular reprogramming and targeted gene insertion in induced pluripotent stem cells. Nucleic Acids Res 41: 1829–1847. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] Grabundzija, I, Irgang, M, Mátés, L, Belay, E, Matrai, J, Gogol-Döring, A et al. (2010). Comparative analysis of transposable element vector systems in human cells. Mol Ther 18: 1200–1209. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Abe, G, Suster, ML and Kawakami, K (2011). Tol2-mediated transgenesis, gene trapping, enhancer trapping, and the Gal4-UAS system. Methods Cell Biol 104: 23–49. [DOI] [PubMed] [Google Scholar]

[bib15] Kawakami, K (2007). Tol2: a versatile gene transfer vector in vertebrates. Genome Biol 8 Suppl 1: S7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] Di Matteo, M, Mátrai, J, Belay, E, Firdissa, T, Vandendriessche, T and Chuah, MK (2012). PiggyBac toolbox. Methods Mol Biol 859: 241–254. [DOI] [PubMed] [Google Scholar]

[bib17] Li, X, Burnight, ER, Cooney, AL, Malani, N, Brady, T, Sander, JD et al. (2013). piggyBac transposase tools for genome engineering. Proc Natl Acad Sci USA 110: E2279–E2287. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] Yusa, K, Zhou, L, Li, MA, Bradley, A and Craig, NL (2011). A hyperactive piggyBac transposase for mammalian applications. Proc Natl Acad Sci USA 108: 1531–1536. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] Ivics, Z, Hackett, PB, Plasterk, RH and Izsvák, Z (1997). Molecular reconstruction of Sleeping Beauty, a Tc1-like transposon from fish, and its transposition in human cells. Cell 91: 501–510. [DOI] [PubMed] [Google Scholar]

[bib20] Mátés, L, Chuah, MK, Belay, E, Jerchow, B, Manoj, N, Acosta-Sanchez, A et al. (2009). Molecular evolution of a novel hyperactive Sleeping Beauty transposase enables robust stable gene transfer in vertebrates. Nat Genet 41: 753–761. [DOI] [PubMed] [Google Scholar]

[bib21] Guerrero, AD, Moyes, JS and Cooper, LJ (2014). The human application of gene therapy to re-program T-cell specificity using chimeric antigen receptors. Chin J Cancer 33: 421–433. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] Singh, H, Huls, H, Kebriaei, P and Cooper, LJ (2014). A new approach to gene therapy using Sleeping Beauty to genetically modify clinical-grade T cells to target CD19. Immunol Rev 257: 181–190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] Voigt, F, Wiedemann, L, Zuliani, C, Querques, I, Sebe, A, Mátés, L et al. (2016). Sleeping Beauty transposase structure allows rational design of hyperactive variants for genetic engineering. Nat Commun 7: 11126. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Carpentier, CE, Schreifels, JM, Aronovich, EL, Carlson, DF, Hackett, PB and Nesmelova, IV (2014). NMR structural analysis of Sleeping Beauty transposase binding to DNA. Protein Sci 23: 23–33. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Zhang, Y (2008). I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 9: 40. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Roy, A, Kucukural, A and Zhang, Y (2010). I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc 5: 725–738. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] Zhang, Y and Skolnick, J (2004). Scoring function for automated assessment of protein structure template quality. Proteins 57: 702–710. [DOI] [PubMed] [Google Scholar]

[bib28] Zhang, Y (2009). Protein structure prediction: when is it useful? Curr Opin Struct Biol 19: 145–155. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Richardson, JM, Colloms, SD, Finnegan, DJ and Walkinshaw, MD (2009). Molecular architecture of the Mos1 paired-end complex: the structural basis of DNA transposition in a eukaryote. Cell 138: 1096–1108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Laskowski, RA, MacArthur, MW, Moss, DS and Thornton, JM (1993). PROCHECK: a program to check the stereochemical quality of protein structures. Journal of Applied Crystallography 26: 283–291. [Google Scholar]

[bib31] Nesmelova, IV and Hackett, PB (2010). DDE transposases: Structural similarity and diversity. Adv Drug Deliv Rev 62: 1187–1195. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Izsvák, Z, Khare, D, Behlke, J, Heinemann, U, Plasterk, RH and Ivics, Z (2002). Involvement of a bifunctional, paired-like DNA-binding domain and a transpositional enhancer in Sleeping Beauty transposition. J Biol Chem 277: 34581–34588. [DOI] [PubMed] [Google Scholar]

[bib33] Montaño, SP, Pigli, YZ and Rice, PA (2012). The μ transpososome structure sheds light on DDE recombinase evolution. Nature 491: 413–417. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Pettersen, EF, Goddard, TD, Huang, CC, Couch, GS, Greenblatt, DM, Meng, EC et al. (2004). UCSF Chimera–a visualization system for exploratory research and analysis. J Comput Chem 25: 1605–1612. [DOI] [PubMed] [Google Scholar]

[bib35] Zhang, Y and Skolnick, J (2005). TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33: 2302–2309. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] Pronk, S, Páll, S, Schulz, R, Larsson, P, Bjelkmar, P, Apostolov, R et al. (2013). GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29: 845–854. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] Bowie, JU, Reidhaar-Olson, JF, Lim, WA and Sauer, RT (1990). Deciphering the message in protein sequences: tolerance to amino acid substitutions. Science 247: 1306–1310. [DOI] [PubMed] [Google Scholar]

[bib38] Touw, WG, Baakman, C, Black, J, te Beek, TA, Krieger, E, Joosten, RP et al. (2015). A series of PDB-related databanks for everyday needs. Nucleic Acids Res 43(Database issue): D364–D368. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Lu, XJ and Olson, WK (2008). 3DNA: a versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures. Nat Protoc 3: 1213–1227. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Edgar, RC (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] Do, CB, Mahabhashyam, MS, Brudno, M and Batzoglou, S (2005). ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res 15: 330–340. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] Katoh, K and Standley, DM (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib43] Finn, RD, Mistry, J, Tate, J, Coggill, P, Heger, A, Pollington, JE et al. (2010). The Pfam protein families database. Nucleic Acids Res 38(Database issue): D211–D222. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] DePristo, MA, Weinreich, DM and Hartl, DL (2005). Missense meanderings in sequence space: a biophysical view of protein evolution. Nat Rev Genet 6: 678–687. [DOI] [PubMed] [Google Scholar]

[bib45] Aleksiev, T, Potestio, R, Pontiggia, F, Cozzini, S and Micheletti, C (2009). PiSQRD: a web server for decomposing proteins into quasi-rigid dynamical domains. Bioinformatics 25: 2743–2744. [DOI] [PubMed] [Google Scholar]

[bib46] Cuypers, MG, Trubitsyna, M, Callow, P, Forsyth, VT and Richardson, JM (2013). Solution conformations of early intermediates in Mos1 transposition. Nucleic Acids Res 41: 2020–2033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] Claeys Bouuaert, C, Walker, N, Liu, D and Chalmers, R (2014). Crosstalk between transposase subunits during cleavage of the mariner transposon. Nucleic Acids Res 42: 5799–5808. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] Liu, D and Chalmers, R (2014). Hyperactive mariner transposons are created by mutations that disrupt allosterism and increase the rate of transposon end synapsis. Nucleic Acids Res 42: 2637–2645. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] Dornan, J, Grey, H and Richardson, JM (2015). Structural role of the flanking DNA in mariner transposon excision. Nucleic Acids Res 43: 2424–2432. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib50] Claeys Bouuaert, C, Liu, D and Chalmers, R (2011). A simple topological filter in a eukaryotic transposon as a mechanism to suppress genome instability. Mol Cell Biol 31: 317–327. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib51] Claeys Bouuaert, C, Lipkow, K, Andrews, SS, Liu, D and Chalmers, R (2013). The autoregulation of a eukaryotic DNA transposon. Elife 2: e00668. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib52] Lockless, SW and Ranganathan, R (1999). Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286: 295–299. [DOI] [PubMed] [Google Scholar]

[bib53] Jurka, J, Kapitonov, VV, Pavlicek, A, Klonowski, P, Kohany, O and Walichiewicz, J (2005). Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110: 462–467. [DOI] [PubMed] [Google Scholar]

[bib54] Eddy, SR (2009). A new generation of homology search tools based on probabilistic inference. Genome Inform 23: 205–211. [PubMed] [Google Scholar]

[bib55] Edgar, RC (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26: 2460–2461. [DOI] [PubMed] [Google Scholar]

[bib56] Nuin, PA, Wang, Z and Tillier, ER (2006). The accuracy of several multiple sequence alignment programs for proteins. BMC Bioinformatics 7: 471. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] Geurts, AM, Yang, Y, Clark, KJ, Liu, G, Cui, Z, Dupuy, AJ et al. (2003). Gene transfer into genomes of human cells by the sleeping beauty transposon system. Mol Ther 8: 108–117. [DOI] [PubMed] [Google Scholar]

[bib58] Zayed, H, Izsvák, Z, Walisko, O and Ivics, Z (2004). Development of hyperactive sleeping beauty transposon vectors by mutational analysis. Mol Ther 9: 292–304. [DOI] [PubMed] [Google Scholar]

[bib59] Yant, SR, Park, J, Huang, Y, Mikkelsen, JG and Kay, MA (2004). Mutational analysis of the N-terminal DNA-binding domain of sleeping beauty transposase: critical residues for DNA binding and hyperactivity in mammalian cells. Mol Cell Biol 24: 9239–9247. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib60] Baus, J, Liu, L, Heggestad, AD, Sanz, S and Fletcher, BS (2005). Hyperactive transposase mutants of the Sleeping Beauty transposon. Mol Ther 12: 1148–1156. [DOI] [PubMed] [Google Scholar]

[bib61] Guerois, R, Nielsen, JE and Serrano, L (2002). Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 320: 369–387. [DOI] [PubMed] [Google Scholar]

[bib62] Schymkowitz, JW, Rousseau, F, Martins, IC, Ferkinghoff-Borg, J, Stricher, F and Serrano, L (2005). Prediction of water and metal binding sites and their affinities by using the Fold-X force field. Proc Natl Acad Sci U S A 102: 10147–10152. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Structural Determinants of Sleeping Beauty Transposase Activity

György Abrusán

Stephen R Yant

András Szilágyi

Joseph A Marsh

Lajos Mátés

Zsuzsanna Izsvák

Orsolya Barabás

Zoltán Ivics

Abstract

Introduction

Results

Determination of the tertiary structure of SB transposase and protein core

Identification of sectors of the SB transposase

Figure 1.

The dependence of transposition rate on sectors, protein core, and conservation

Figure 2.

Figure 3.

The effect of mutations on protein stability

Figure 4.

Figure 5.

The effect of protein-protein and protein-DNA interactions on transposition rate

Figure 6.

Figure 7.

Discussion

Materials and Methods

Acknowledgments

Supplementary Material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Structural Determinants of Sleeping Beauty Transposase Activity

György Abrusán

Stephen R Yant

András Szilágyi

Joseph A Marsh

Lajos Mátés

Zsuzsanna Izsvák

Orsolya Barabás

Zoltán Ivics

Abstract

Introduction

Results

Determination of the tertiary structure of SB transposase and protein core

Identification of sectors of the SB transposase

Figure 1.

The dependence of transposition rate on sectors, protein core, and conservation

Figure 2.

Figure 3.

The effect of mutations on protein stability

Figure 4.

Figure 5.

The effect of protein-protein and protein-DNA interactions on transposition rate

Figure 6.

Figure 7.

Discussion

Materials and Methods

Acknowledgments

Supplementary Material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases