Abstract
Using all-atom replica-exchange molecular dynamics simulations, we mapped the mechanisms of binding of the nuclear localization signal (NLS) sequence from Venezuelan equine encephalitis virus (VEEV) capsid protein to importin-α (impα) transport protein. Our objective was to identify the VEEV NLS sequence fragment that confers native, experimentally resolved binding to impα as well as to study associated binding energetics and conformational ensembles. The two selected VEEV NLS peptide fragments, KKPK and KKPKKE, show strikingly different binding mechanisms. The minNLS peptide KKPK binds non-natively and nonspecifically by adopting five diverse conformational clusters with low similarity to the x-ray structure 3VE6 of NLS-impα complex. Despite the prevalence of non-native interactions, the minNLS peptide still largely binds to the impα major NLS binding site. In contrast, the coreNLS peptide KKPKKE binds specifically and natively, adopting a largely homogeneous binding ensemble with a dominant, highly native-like conformational cluster. The coreNLS peptide retains most of native binding interactions, including π-cation contacts and a tryptophan cage. While KKPK binding is governed by a complex multistate free energy landscape featuring transitions between multiple binding poses, the coreNLS peptide free energy map is simple, exhibiting a single dominant native-like bound basin. We argue that the origin of the coreNLS peptide binding specificity is several electrostatic interactions formed by the two C-terminal amino acids, Lys10 and Glu11, with impα. The coreNLS sequence is then sufficient for native binding, but none of the amino acids flanking minNLS, including Lys10 and Glu11, are strictly necessary for the native pose. Our analyses indicate that the VEEV coreNLS sequence is virtually unique among human and viral proteins interacting with impα making it a potential target for VEEV-specific inhibitors.
Significance
Venezuelan equine encephalitis virus (VEEV) is an infectious pathogen with epidemic potential. Its capsid protein contains a nuclear localization signal (NLS) binding to the nuclear transport protein importin-α. Capsid binding to importin-α triggers assembly of the complex blocking nuclear traffic. Because the molecular mechanism governing VEEV NLS binding to importin-α is unknown, we performed all-atom replica-exchange molecular dynamics simulations mapping VEEV NLS interactions with importin-α. We show that the minNLS KKPK fragment binds non-natively, whereas the coreNLS fragment KKPKKE exhibits native binding, reproducing the x-ray structure. VEEV coreNLS sequence is virtually unique among human and viral proteins interacting with importin-α. Binding conformational ensembles, free energy landscapes, and bioinformatics data suggest that it may serve as a target for VEEV-specific inhibitors.
Introduction
Venezuelan equine encephalitis virus (VEEV) is a highly infectious pathogen presenting a threat for epidemic outbreaks (1,2,3). A typical VEEV infection results in flu-like symptoms, but severe neurological disorders including encephalitis may ensue in 14% of cases, and 1% can be fatal. Despite its epidemic potential, no FDA-approved vaccines or antivirals have been developed (2). Previous studies have found that VEEV capsid protein is a virulence factor suppressing the host innate immune response (2). After infecting a host cell, VEEV interferes with nucleocytoplasmic trafficking by physically blocking the nuclear pore complex (NPC). The N-terminus of VEEV capsid protein harbors an NLS sequence, which binds to the nuclear transport protein importin-α (impα), which, together with its partner protein importin-β, delivers cargo to the cell nucleus via the NPC (4,5). In addition, the VEEV capsid contains a nuclear export signal sequence, which, via binding to the nuclear export protein CRM1, facilitates trafficking host proteins back into the cytoplasm. Because the VEEV capsid has both NLS and nuclear export signal, it assembles a tetrameric complex with impα, importin-β, and CRM1, which clogs the NPC channel, interfering with host nucleocytoplasmic traffic (5). Formation of this tetrameric complex is critical for VEEV infection, and consequently the binding of VEEV NLS to impα constitutes a key ingredient of VEEV pathogenesis (5,6).
After years of intensive research, many classes of NLS have been identified, including classical mono- and bipartite variants (7,8,9). The consensus sequence for monopartite NLS, which is also employed by the VEEV capsid, is given by the stretch of mostly basic amino acids K-K/R-X-K/R, where K and R represent lysine and arginine, and X is a variable amino acid. To aid comparing NLSs from different proteins, the four NLS positions are referred to as P2-P5. These four NLS amino acids fit the binding cavities in the major NLS binding site on the impα surface within the Armadillo tandem repeats 2 to 4 (8). The basis of molecular recognition of the NLS sequence by impα thus lies in tight binding of NLS to the major impα binding site with the estimated free energies approaching kcal/mol (10,11). Careful mutagenesis analysis has suggested that the contribution of P2 to is dominant, while the contributions of P3 to P5 range from 25% to two-thirds of the P2 contribution, with the share of P4 being the smallest (8,10). There are several flanking amino acids P1 and, possibly, P6 and P7, which also contribute to NLS binding by apparently conferring a specific NLS bound pose (12).
The VEEV NLS sequence A KKPKKE (P1-P7) obeys a classic K-K/R-X-K/R pattern placing Lys, Pro, and Lys at positions P3-P5. The numbering of VEEV NLS derives from the resolved x-ray structure of the complex formed by extended VEEV 12-mer NLS sequence E GPSAKKPKKEA with the mouse impα protein (PDB: 3VE6 (13)). This structure shows that Lys6 at position P2 forms an electrostatic contact with negatively charged impα amino acid Asp122, while Lys7 (P3) and/or Lys9 (P5) reside in tryptophan cages and establish π-cation interactions with their indole rings. In recent years, there have been several studies that exploited the structure of impα complexed with the VEEV NLS to develop small-molecule inhibitors blocking the binding of VEEV NLS to impα (14,15). In particular, in silico structure-based drug design utilizing the 3VE6 structure was successfully used to select the inhibitors of the impα-VEEV NLS interactions. This effort identified the lead compound 1111684, which specifically targets VEEV NLS binding to the impα major binding site. However, central to these efforts is accurate information about the NLS binding ensemble, which is lacking in static PDB structures. All-atom molecular dynamics simulations can potentially circumvent this limitation by mapping the full scope of NLS binding ensemble.
In this paper, we performed all-atom replica-exchange molecular dynamics simulations with solute tempering (REST) to study the mechanism governing VEEV NLS binding to impα. Our objective was to determine the VEEV NLS sequence fragment, which engenders specific binding to impα, and to probe its energetic contributions. To this end, we considered two VEEV NLS peptide fragments, KKPK and KKPKKE, spanning positions P2-P5 and P2-P7, respectively. To validate our computational predictions, we used the 3VE6 structure, which is expected to be reproduced by the REST simulations. We found that the two peptides exhibit strikingly different binding mechanisms. The short KKPK, which is termed the minNLS peptide (14), binds non-natively and nonspecifically by adopting a multitude of poses distinct from the one seen in the 3VE6 complex. However, the minNLS peptide still largely targets the major NLS binding site. Its longer counterpart, KKPKKE or the coreNLS peptide, binds specifically and natively by reproducing the 3VE6 complex structure. We determined the energetic factors involved in the specificity of coreNLS peptide binding and mapped the free energy landscapes governing binding of min- and coreNLS peptides to impα. Implications of our work for designing VEEV inhibitors are discussed.
Materials and methods
All-atom explicit solvent model
Two simulation systems were considered. Each included a peptide derived from the NLS VEEV capsid sequence (5) and a truncated, 211 residue impα protein, taken from PDB: 3VE6. The two peptides studied are the minNLS fragment, K KPK, and its extended counterpart, the coreNLS sequence K KPKKE. These peptides represent the NLS sequence positions P2-P5 and P2-P7, respectively, and bind to the four principal binding cavities on the impα surface (8). Both peptides had neutral acetylated and amidated caps. The mouse protein structure 3VE6 was used, because it is the only structure we are aware of that includes impα complexed with the NLS of VEEV. The mouse impα is highly similar to the human analog (PDB: 3FEY (16)), both in terms of sequence similarity (98% overall and 100% within the minNLS binding site) and fitted root mean-square deviations (RMSD) (0.69 Å) as reported by the RCSB PDB pairwise structure alignment tool (17). The truncation of impα at residue 211 served to reduce the computational load while retaining all binding interactions between impα and VEEV NLS in 3VE6. Because monomeric impα has low conformational stability without its self-inhibitory domain (18), soft harmonic restraints (19) were applied to the Cα atoms outside of the confining binding sphere (see below). This allowed for minor fluctuations in the structure being otherwise consistent with 3VE6. The native binding poses of the peptides are shown in Fig. 1.
Figure 1.
The native binding poses adopted by the minNLS KKPK and coreNLS KKPKKE peptides in PDB: 3VE6 structure are shown in (a) and (b), respectively. Peptide residues are distinguished by colors. Impα amino acids constituting the minNLS in (a) and coreNLS in (b) binding sites are colored in yellow and shown in van der Waals representation, except for three tryptophans shown in licorice. The impα amino acids from min- and coreNLS binding sites in one-letter code are L34 (K10), S35 (K9,K10), R36 (K10), E37 (K10,E11), P40 (K10), F68 (K9), W72 (K9,E11), T75 (K7), N76 (P8,K9,K10), A78 (K6), S79 (K6,K7,P8), G80 (K6), T81 (K6), S82 (K6), T85 (K6), Q111 (K9), W114 (K7,P8,K9), N118 (K6,K7), D122 (K6), N158 (K7), W161 (K6,K7). In this list parentheses identify bound KKPK or KKPKKE amino acids. The impα amino acids bound exclusively by K10 and E11 constitute the extension of the minNLS binding site to the coreNLS binding site. Non-minNLS or non-coreNLS impα amino acids are in gray. To see this figure in color, go online.
The protein and peptides were modeled using the all-atom CHARMM36m force field (20), and the CHARMM-modified TIP3P water model was used (21,22). The minNLS system contained 7715 water molecules, 22 chloride ions, and 24 sodium ions, while the coreNLS system differed only in the count of water molecules (7703). The NaCl salt concentration was 150 mM. The overall charge of the systems were neutral. In total, the minNLS system contained 26,485 atoms, and the coreNLS system had 26,486. In both cases, the initial unit cell dimensions were roughly 58 × 58 × 77 Å.
Replica-exchange simulations
Peptide binding was sampled using isobaric-isothermal REST molecular dynamics (23). Since REST formalism is documented elsewhere (23,24), only its brief outline is provided below. In all, replicas were distributed geometrically over temperatures from K to K. Exchanges between the replicas r and simulated at the temperatures m and take place with the probability , where , , H is the enthalpy, X defines system coordinates, and is the gas constant. Solvent-solvent and solute-solvent interactions at a temperature were scaled using the factors and , respectively. This scaling excludes solvent-solvent energy contributions from ω and reduces the number of replicas while preserving a wide temperature range and exchange rates. Thus, a peptide was tempered as “hot” solute, peptide-solvent interactions were partially tempered, whereas the rest of the system, including water and impα, was treated as “cold” solvent. Replica exchanges were attempted every 2 ps, with a success rate of about 0.34 for the minNLS and 0.26 for the coreNLS.
REST simulations were performed using NAMD (25) with a 1 fs integration step and periodic boundary conditions. The SHAKE algorithm was used to constrain covalent bonds associated with hydrogen atoms. Electrostatic interactions were computed using Ewald summations, and van der Waals interactions were smoothly switched off from 8 to 12 Å. Underdamped Langevin dynamics with a damping coefficient ps was used to control temperature, and the Nosé-Hoover Langevin piston method with piston period and decay of 200 and 100 fs, respectively, was used to set pressure at 1 atm. The x, y, and z dimensions were coupled. The peptide center of mass was confined to a repulsive sphere of radius Å, centered around a point slightly offset from the center of the minNLS binding site in the 3VE6 structure. The repulsive walls were implemented using soft harmonic potential. The sphere encompasses the entire min- or coreNLS binding site and the surrounding amino acids. The confining sphere effectively increased the peptide concentration without introducing interpeptide interaction.
In all, we produced four REST trajectories per system. Initial peptide structures in the trajectories were prepared as follows. A peptide within the confining sphere was simulated for 25 ns at 700 K using REST energy scaling to randomize its conformations. From these simulations, a diverse set of structures were selected, ranging from bound to partially or fully unbound to serve as initial structures for each replica. Additional 1 ns equilibration simulations were then performed for each of the replicas at their respective REST temperatures applying REST scaling. At each temperature, structures at 700, 800, 900 ps, and 1 ns were taken as initial conditions for the four trajectories. As such, replicas and trajectories featured unique and diverse initial conditions. For example, the average RMSD between a pair of initial structures for coreNLS simulations is 9.2 Å. Each replica in a trajectory was simulated for 200 ns; therefore, the total sampling across all temperatures and trajectories amounted to 6.4 μs, or 0.8 μs per temperature for each peptide. REST performance and convergence are analyzed in supporting material, including establishing equilibration times for each peptide. For min- and coreNLS peptides we excluded the initial 140 and 40 ns of sampling per temperature and trajectory, respectively. Thus, the total equilibrium sampling at 310 K used for analysis was 240 ns for the minNLS and 640 ns for the coreNLS.
Computation of structural probes
To assess binding interactions, peptides were compared with the native structure by considering residue-specific interaction or contacts. To detect a contact between amino acids, we computed the minimal separation between any pair of their heavy atoms. If these atoms were separated by less than 4.5 Å, a contact between protein and peptide amino acids is formed. Using this definition, for each peptide residue j we computed the fraction of native contacts formed by j with impα in the 3VE6 structure and retained in REST simulations. Thus, implies that j always forms all its native interactions. We also computed the fraction of non-native contacts among all formed by j in the REST sampling. Similarly, a contact vector reports the probabilities for impα amino acids i to bind the peptide. For analysis we selected the top 10 impα amino acids with highest . A peptide is considered bound to impα if it forms at least one contact with impα amino acids. The impα minNLS binding site is composed of impα amino acids forming contacts with the minNLS peptide in the 3VE6 structure (Fig. 1). The coreNLS binding site is defined in a similar way. Hydrogen bonding was measured via visual molecular dynamics (26), using a donor (D)-acceptor (A) cutoff distance of 3.5 Å and a minimum DHA angle of 135. A peptide lysine side chain occupies a tryptophan cage, if 1) a lysine contacts both tryptophans and 2) the distance between the line connecting tryptophan side chain centers of mass and the lysine side chain center of mass is less than 2 Å. We assumed that π-cation interaction occurs if 1) the distance between the Trp indole ring center of mass and the lysine nitrogen atom is less than 6.5 Å and 2) the angle between the indole ring normal and the line connecting the indole center of mass and nitrogen is less than 50 or more than 130. This definition follows from the analysis of π-cation energetics in the CHARMM force field performed by Reuter and co-workers (27). All structural probes are reported as averages computed after equilibration at K. To compute standard errors, we considered each REST trajectory as a sample. However, because low free energy states were unevenly distributed over trajectories, we computed the associated errors by dividing the equilibrated data into four equal samples along the simulation timeline.
Conformational clustering
We used the method of Daura et al. to perform density-based conformational clustering of the peptides (28). Specifically, before peptide clustering, the impα structures with bound peptides were aligned based on minimal RMSD of impα side chains from the minNLS binding site. Following protein alignment, the RMSDs between peptide poses were computed. Note that this procedure does not include peptide alignment and, beside clustering, it was also applied to the computations of RMSD distributions. Overall, we selected 10,000 protein-aligned poses for each peptide, sampled periodically from equilibrated portions of REST simulations. Peptide clusters were defined using the RMSD cutoff of Å (29), and only the top populated clusters capturing at least % of all ligand binding poses were retained for analysis. To compute the root mean-square fluctuations of peptide amino acid j, , we averaged the RMSD of heavy atoms of j after impα alignment.
Results and discussion
Binding of minNLS peptide KKPK to impα
Using REST simulations, we investigated the conformational ensemble sampled by the minNLS peptide KKPK upon binding to impα. When the KKPK comprising the NLS positions P2-P5 is incorporated into a 12-mer VEEV NLS peptide, it adopts a tightly bound native pose, as shown in Fig. 1 a. In the 3VE6 structure the KKPK fragment forms 24 native contacts with impα amino acids. Lys9 (P5) resides in the cage formed by the side chains of Trp72 and Trp114 (see methods). Trp114 and Trp161 do not form a well-defined cage, but Lys7 (P3) forms contact with both of their side chains. In addition, there is an electrostatic contact between Lys6 and impα Asp122 in 3VE6. In our REST simulations at 310 K the minNLS peptide binds to impα with the probability of , implying that the simulations virtually always sample bound peptide states. To investigate KKPK binding poses, we evaluated the homogeneity of the KKPK binding ensemble by computing the probability distribution of RMSD values computed between all pairs KKPK structures. The plot of in Fig. 2 a reveals a broad, approximately unimodal distribution peaking at Å, with a standard deviation of 3.4 Å. This distribution suggests a diverse, heterogeneous bound ensemble.
Figure 2.
Probability distributions of RMSD values computed between all pairs of bound KKPK (a) or KKPKKE (b) peptides at 310 K. Approximately unimodal distribution for minNLS KKPK peaking at Å suggests a heterogeneous ensemble of bound poses. In contrast, for the coreNLS peptide KKPKKE is bimodal driven by the appearance of a well-populated native-like bound cluster and a low-populated second cluster with lower native content.
To further characterize the minNLS peptide binding, we computed the probabilities of forming contacts between KKPK and impα amino acids i and presented in Table 1 the top 10 impα amino acids most frequently involved in KKPK binding (referred to as top 10 binding amino acids). Note that the purpose of Table 1 is to identify the KKPK binding site on impα and the nature of binding interactions, whereas the peptide binding poses are probed by the subsequent cluster analysis. Out of 10 impα amino acids, 8 belong to the impα minNLS binding site, 5 are polar, and another 3 are anionic. The top binding amino acid is Trp161 bound with the probability 1.0. (As shown below, high Trp results from native and non-native KKPK binding.) Note that Asp122, considered the anchoring impα amino acid for binding NLS (8), is only at the eighth position. Nevertheless, the KKPK peptide forms, on average, electrostatic contacts between its lysines and anioinic impα residues. We also computed the list of the top 10 impα amino acids most frequently forming hydrogen bonds with KKPK. Seven of them appear among the top 10 binding amino acids in Table 1. In fact, the average number of hydrogen bonds between KKPK and impα is or 1.1 per KKPK residue. Among the 14 most stable hydrogen bonds occurring with probability , only 1 involves the KKPK backbone. These findings indicate that hydrogen bonding between lysine side chains and impα constitutes a major binding factor. Strikingly, the two amino acids most frequently engaged in hydrogen bonding with KKPK are Asp200 and Glu196 ( and 0.55). These two do not belong to the minNLS binding site and are also the only two non-minNLS amino acids among the top 10 binding amino acids in Table 1. Thus, non-native binding is driven by electrostatics and hydrogen bonding. Overall, these observations reinforce our conclusion that KKPK binding to impα is primarily controlled by polar interactions, salt bridges, and hydrogen bonding. The outcome is not surprising given a highly charged state of KKPK.
Table 1.
Top binding amino acids with the strongest affinities toward the NLS peptides
Rank | KKPK | KKPKKE | ||
---|---|---|---|---|
Amino acid | Amino acid | |||
1 | Trp161 | 0.96 0.00 | Trp161 | 0.99 0.00 |
2 | Ser79 | 0.83 0.01 | Ser79 | 0.97 0.00 |
3 | Asn118 | 0.77 0.03 | Asn118 | 0.97 0.00 |
4 | Asp200 | 0.73 0.01 | Gly80 | 0.95 0.00 |
5 | Gly80 | 0.69 0.03 | Ala78 | 0.95 0.00 |
6 | Ala78 | 0.68 0.03 | Asp122 | 0.95 0.00 |
7 | Thr85 | 0.68 0.03 | Thr85 | 0.95 0.00 |
8 | Asp122 | 0.67 0.04 | Trp114 | 0.92 0.00 |
9 | Trp114 | 0.66 0.02 | Thr81 | 0.86 0.00 |
10 | Glu196 | 0.60 0.01 | Trp72 | 0.86 0.00 |
Amino acids in bold belong to minNLS or coreNLS binding sites.
If KKPK primarily interacts with the minNLS binding site, why does it form the broad heterogeneous binding ensemble captured in Fig. 2 a? To clarify this question, we use Table 2, which breaks down the binding interactions with respect to individual KKPK amino acids. On average, a KKPK amino acid retains only the fraction of native interactions formed by the amino acid in the 3VE6 structure (see materials and methods). The average fraction of non-native interactions per amino acid is , and it increases from N- to C-termini reaching 0.86 for Lys9 (P5). Moreover, the average fraction of contacts extending beyond the minNLS binding site is 0.31. Consequently, the N-terminal Lys6 (P2) marginally retains native contacts ( Lys6 ), but they become lost by the C-terminal Lys9 ( Lys9 ). If we measure the binding affinity of peptide amino acids j by the total number of contacts they form , then Table 2 demonstrates that the N-terminal Lys6 has the strongest binding affinity, that of Lys9 is 40% weaker, but becomes particularly suppressed at Lys7 and Pro8. Thus, KKPK amino acids form mostly non-native interactions, while still being mostly confined to the minNLS binding site, and N-terminal Lys6 (P2) reveals the strongest binding affinity.
Table 2.
Binding interactions formed by the NLS peptides
Lys6 | Lys7 | Pro8 | Lys9 | Lys10 | Glu11 | |
---|---|---|---|---|---|---|
KKPK | ||||||
a | 0.51 0.07 | 0.24 0.04 | 0.22 0.04 | 0.11 0.02 | ||
b | 0.40 0.03 | 0.27 0.09 | 0.65 0.12 | 0.86 0.04 | ||
c | 7.6 0.68 | 1.9 0.18 | 1.8 0.43 | 4.6 0.53 | ||
KKPKKE | ||||||
a | 0.86 0.01 | 0.67 0.01 | 0.75 0.01 | 0.65 0.00 | 0.21 0.00 | 0.36 0.00 |
b | 0.26 0.00 | 0.03 0.01 | 0.11 0.01 | 0.24 0.01 | 0.55 0.08 | 0.75 0.02 |
c | 10.6 0.09 | 4.1 0.16 | 2.6 0.17 | 5.1 0.23 | 2.8 0.27 | 2.9 0.04 |
Average fraction of retained native contacts, i.e., those formed by amino acid j in the 3VE6 structure and also observed in the REST simulations.
Average fraction of non-native contacts formed by amino acid j. Amino acid j forms a non-native contact, if it is absent in the 3VE6 structure.
Average number of binding contacts formed by the peptide amino acid j.
In the native pose seen in the 3VE6 structure the side chain of Lys9 (P5) resides in the tryptophan cage Trp72-Trp114. To determine if the minNLS peptide populates this cage upon binding, we computed the probability for the Lys9 side chain to occupy the volume between Trp72 and Trp114 side chains as described in materials and methods. We found that the respective and is 0.0 for any other lysine and Trp cage combination, suggesting that none of KKPK lysine side chains occupy tryptophan cages, natively or non-natively. Furthermore, using the definition given in methods we determined that, in the native 3VE6 pose, Lys-charged amino groups establish three π-cation interactions with tryptophan indole rings (8). Specifically, Lys9 (P5) interacts with Trp72 and Trp114, whereas Lys7 (P3) with Trp161. In the REST simulations of KKPK binding to impα, the largest probability of π-cation interactions is 0.14 observed between Lys7 and Trp161. Thus, π-cation interactions play a negligible role in KKPK binding to impα. Our data also imply that caging Lys between Trp side chains and forming π-cation interactions do not compensate for entropic loss of localizing KKPK in the native pose. Taken together, we conclude that the minNLS peptide does not maintain a native pose and instead samples a multitude of non-native binding poses with generally minor native content per amino acid.
If the KKPK peptide does not bind natively, what is the distribution of its alternative binding poses? To this end, we performed clustering of KKPK conformations bound to impα as described in methods. The top 5 clusters denoted as CL1-CL5 are listed in Table 3 and together comprise 30% of bound structures. The fraction of assigned structures into the most populated CL1 is 0.08, but is reduced to 0.04 in CL5. Thus, none of the clusters is dominant, confirming a highly heterogeneous bound ensemble. The centroids of the clusters are displayed in Fig. 3. It is noteworthy that the minimum pairwise RMSD between these centroids is 6.8 Å, underscoring their structural diversity. Also, none of the centroids exhibits a native pose with the RMSD from the 3VE6 structure, Å. Importantly, the primary purpose of clustering is to compare the distributions of binding poses adopted by min- and coreNLS peptides. From this perspective, as long as the RMSD cutoff is applied consistently to both peptides, it affords a comparison of their binding poses. Since many of KKPK binding poses are not included in the top-ranking clusters, there are no high-density regions among them, which would translate into physically relevant low free energy states.
Table 3.
Populated clusters in the bound ensembles of KKPK and KKPKKE peptides
Rank | a | RMSDb (Å) |
---|---|---|
KKPK | ||
1 | 0.08 0.03 | 6.5 0.1 |
2 | 0.07 0.00 | 11.2 0.1 |
3 | 0.05 0.01 | 10.2 0.1 |
4 | 0.05 0.01 | 8.7 0.3 |
5 | 0.04 0.01 | 3.6 0.1 |
KKPKKE | ||
1 | 0.67 0.01 | 2.5 0.1 |
2 | 0.08 0.01 | 7.4 0.0 |
The occupancy probability, i.e., the fraction of peptide structures included in a cluster.
The RMSD measured between the cluster centroid and 3VE6 structure.
Figure 3.
(a) The centroids of the top 5 populated clusters in the minNLS KKPK bound ensemble. (b) The centroids of the top 2 populated clusters in the coreNLS KKPKKE bound ensemble. In both panels, peptides are shown in orange with the N-terminus marked with the red sphere. Impα amino acids constituting min- or coreNLS binding sites are in yellow. Impα amino acids outside these binding sites but binding the peptide are in pink, while any other impα amino acids are in gray. A different version of this figure is presented in Fig. S8, which compares the cluster centroids with the 3VE6 native pose. To see this figure in color, go online.
The analysis of the five centroids describes KKPK binding conformational ensemble. The most populated CL1 in Fig. 3 a has a large RMSD of 6.5 Å from the native pose (Table 3) and forms 17 side chain contacts retaining the fraction of of native interactions. Lys6 forms almost all native interactions (), but other amino acids form few, particularly, Lys9, which adopts a completely non-native pose outside of the native tryptophan cage. In all, the fraction of 0.29 of CL1 centroid interactions are established with non-minNLS amino acids. Thus, only the N-terminus of KKPK is natively anchored, while the rest of the peptide exhibits a non-native pose and forms interactions beyond the minNLS binding site. The second cluster CL2 has a large RMSD of 11.2 Å from the native pose and is linked to impα via 19 contacts. Strikingly, CL2 forms a single native contact resulting in a negligible fraction of retained native interactions (). Notably, more than half of contacts or the fraction of 0.53 are associated with non-minNLS binding. Therefore, the binding pose of CL2 centroid is highly non-native and partially shifted away from the minNLS binding site. The third cluster CL3 centroid has an RMSD of 10.2 Å and forms 18 contacts with impα. Similar to CL2 it retains few native contacts (). Furthermore, the fraction of 0.33 of contacts is formed with the non-minNLS amino acids. Interestingly, the Lys6 side chain binds non-natively to Trp114 and Trp161. Thus, as CL2 this cluster bears little similarity with the native pose. The CL4 centroid has an RMSD of 8.7 Å forming 13 contacts with impα, and the fraction of retained native contacts is merely 0.13. The fraction of 0.31 of bound interactions are non-minNLS. A notable feature of CL4 is a weak binding of the N-terminus, as most binding interactions are centered at the C-terminal Lys9. In particular, Lys9 binds to Trp161 (Table 1). Thus, this centroid also adopts a non-native pose. Finally, CL5 has a lower RMSD of 3.9 Å, forms 26 binding contacts, and the fraction of retained native contacts increases to . Each amino acid reproduces more than half of the respective native contacts (), particularly Lys6 with . The fraction of non-minNLS contacts is 0.23. Unique to this cluster is Lys7 natively binding to both Trp114 and Trp161, although Lys9 does not reside in its native cage. Thus, CL5 is the most native of all five clusters. Taken together, out of the top 5 clusters only the last, least populated partially adopts the native pose, whereas all others fail to retain the majority of native interactions (). In summary, cluster analysis confirms a high heterogeneity of KKPK binding ensemble with little native content.
It is informative to compare the top 10 binding amino acids with the results of alanine mutagenesis, which identified impα amino acids with highest affinity to VEEV minNLS (14). From their docking simulations, the top 10 amino acids with the strongest contribution to the NLS peptide binding ( kcal/mol) are, in descending order, Asp122, Trp114, Trp161, Asn118, Asn76, Asp200, Thr85, Asn158, Trp72, and Gln111. In all, six amino acids from this list shown in italics and including the top 4 are also present among the top 10 binding amino acids in Table 1. This result may appear surprising given that the docking simulations considered KKPK are in the 3VE6 native pose, whereas in our REST simulations KKPK adopts a manifold of binding poses with little similarity to the 3VE6 structure. However, Table 1 shows that most KKPK interactions are still confined to the minNLS binding site, underscoring that these impα amino acids, while important for KKPK binding, do not uniquely confer its pose.
To explore the free energy landscape of KKPK binding, we present in Fig. 4 A the free energy landscape , where is the probability of the bound state with the numbers of native and non-native contacts and , respectively. Fig. S9 a presents mapping of the top 5 clusters CL1-CL5 onto the four free energy basins in Fig. 4 A. The mapping indicates that the top 5 clusters generally fit the most thermodynamically viable bound states of the KKPK peptide, supporting the clustering protocol. Consequently, we use clusters to refer to free energy states. The basin free energies are given in Table 4, demonstrating, as expected from the cluster analysis, that CL1 has the lowest G. The “unaccounted” free energy basin in Fig. 4 a with negligible native content is populated by a multitude of clusters with occupancy of less than 0.01. Using the landscape and minimum free energy paths between the states, we reconstruct the KKPK navigation through its bound ensemble as CL5 CL1 CL4 + CL3 CL2. When KKPK resides in the most stable state CL1, it may occasionally cross a high free energy barrier of kcal/mol to enter the native-like CL5. More likely, however, KKPK transits from CL1 into CL3 + CL4 crossing the barrier kcal/mol. By crossing subsequent barrier of 1.6 kcal/mol the peptide reaches the most non-native CL2. Thus, the free energy landscape in Fig. 4 a suggests that the bound KKPK samples a one-dimensional pathway, along which it trades off native and non-native binding interactions. In summary, the minNLS peptide, which represents a highly conserved motif in the NLS sequence (8), does not retain its native binding pose forming less than a third of native interactions.
Figure 4.
(a) The two-dimensional free energy landscape presented as a function of the numbers of native and non-native contacts depicts minNLS KKPK peptide binding to impα. (b) The analogous landscape is computed for the coreNLS peptide KKPKKE. The contour lines have increments of 0.5 kcal/mol. Bound clusters from Table 3 are explicitly matched in Fig. S9, a and b with the low free energy states from Table 4. The scales on the right color code free energy. To see this figure in color, go online.
Table 4.
The low free energy bound states
State k | a | b (kcal/mol) | k lc | (kcal/mol) |
---|---|---|---|---|
KKPK | ||||
CL1 | 0.36 0.01 | 0.0 0.0 | CL1CL5 | |
CL1CL3+CL4 | ||||
CL2 | 0.07 0.00 | 0.5 0.1 | CL2CL3+CL4 | |
CL3+CL4 | 0.13 0.04 | 0.3 0.0 | CL3+CL4CL1 | |
CL3+CL4CL2 | ||||
CL5 | 0.06 0.02 | 0.5 0.1 | CL5CL1 | |
KKPKKE | ||||
CL1 | 0.67 0.01 | 0.0 0.0 | CL1CL2 | |
CL2 | 0.18 0.03 | 0.4 0.1 | CL2CL1 |
The bold font is used to distinguish the free energy states.
Fraction of peptides observed in a state k. To compute we included all the structures in k with the free energies , where is the minimum free energy in k and is the free energy of transition state along the minimum free energy path out of k.
To compute the free energy of k, , we integrated within the interval . For all states kcal/mol.
Transition from states k to l crosses the free energy barrier .
Binding of coreNLS peptide KKPKKE to impα
If the minNLS peptide fails to adopt the NLS binding pose, would its longer counterpart, the coreNLS peptide KKPKKE (P2-P7), alter the binding ensemble? To answer this question, we studied the binding of KKPKKE to impα. In the 3VE6 structure in Fig. 1 the coreNLS peptide forms an additional eight side-chain contacts with impα compared with KKPK, bringing the total to 32. Lys10 forms six interactions, particularly, a salt bridge with Glu37, while Glu11 is engaged in two contacts and forms no salt bridges. In the REST simulations at 310 K the coreNLS peptide binds to impα with the probability of , indicating that the simulations sample exclusively bound states. We first assess the KKPKKE binding ensemble using the probability distribution of RMSD values computed between all pairs of bound peptides. The bimodal plot of in Fig. 2 b strikingly differs from the KKPK unimodal distribution. The major peak in occurs at Å, while a minor peak occurs at 7.6 Å. This distribution suggests a largely homogeneous ensemble of closely related bound poses augmented by a distinct, but poorly populated alternative pose.
The top 10 impα amino acids most frequently involved in coreNLS binding are given in Table 1. All top binding amino acids belong to the coreNLS binding site, and 8 out of 10 exhibit , including Trp161 and Trp114, indicating that KKPKKE is almost always bound to them. This observation is consistent with KKPKKE maintaining its native pose. Among the top binding amino acids, eight are polar or anionic. Since KKPK and KKPKKE share eight top binding amino acids and the top 3 are identical, the peptides bind to the same impα location. On average, the number of favorable electrostatic contacts between charged KKPKKE and impα amino acids is , which is about 70% more than for the minNLS peptide. The list of top 10 impα amino acids involved in hydrogen bonding with KKPKKE also includes mostly (namely, eight) coreNLS impα amino acids with particularly strong bonds occurring with polar Asn118 and Asn76. Six of them are also among the top 10 binding amino acids in Table 1. In fact, the average number of hydrogen bonds between KKPKKE and impα is or 1.6 per peptide residue, which is about 50% more than for KKPK. Thus, hydrogen bonding is even more important for KKPKKE binding than for KKPK, and both peptides utilize mostly polar and electrostatic binding intercations.
To directly check if KKPKKE retains its native binding pose, we analyzed the native binding interactions formed by peptide amino acids j in Table 2. On average, the fraction of native interactions retained by a coreNLS amino acid is , which exceeds the corresponding KKPK value more than twice. Moreover, the average fraction of non-native interactions per amino acid is , which is almost twofold lower than for KKPK. The average fraction of contacts extending beyond the impα coreNLS site is 0.19. Importantly, there is a significant variation in the native and non-native interactions along the coreNLS sequence. For four N-terminal amino acids, Lys6 to Lys9, , but for the C-terminal Lys10 and Glu11 drops to 0.21 and 0.36. Concomitantly, the non-native fraction increases from to 0.55 and 0.75, respectively. These non-native interactions are primarily related to the stable hydrophobic contact between Lys10 and Trp72 (observed with probability 0.63) and stable salt bridges between Glu11 and Lys32 (0.72) or Arg31 (0.64). Thus, the first four amino acids in KKPKKE retain their native interactions, whereas the last two acquire mostly non-native contacts. This observation contrasts KKPK binding, in which only Lys6 marginally retains native interactions (Table 2). The binding affinities of KKPKKE amino acids are measured by the total number of binding interactions. According to Table 2, the N-terminal Lys6 forms the largest number of binding contacts . The next are Lys7 and Lys9 with more than twice smaller . The binding of the C-terminal Lys10 and Glu11 is about threefold weaker than that of Lys6. Thus, the coreNLS peptide primarily binds to impα via N-terminal Lys6 and, to a lesser degree, via Lys9. In the native 3VE6 pose, Lys9 and Glu11 form an intrapeptide salt bridge. However, in the REST simulations this interaction is disrupted, occurring with the probability of 0.02.
Since Lys9 preserves about two-thirds of the native interactions in Table 2, one may expect that its side chain is housed in the native Trp cage. To verify this expectation, we computed the probabilities for the Lys9 side chain to fit into the Trp72 and Trp114 cages. As expected, this probability is high (), while for Lys6 or Lys7 to occur in the same cage is . Thus, Lys9 retains its native position. We then analyzed the contribution of π-cation interactions to KKPKKE binding. The native π-cation interaction between Lys7 and Trp161 is formed with probability . There are even stronger native π-cation interactions of Lys9 with Trp72 and Trp114, for which and 0.76. Importantly, all non-native π-cation interactions, with the exception of the Lys7-Trp114 pair, have negligible probabilities. Hence, the three π-cation contacts present in the 3VE6 structure are largely reproduced in our REST simulations. These findings are in stark contrast with the binding of minNLS peptide to impα, which neither populates Trp cages nor establishes any π-cation interactions. In summary, opposite to the minNLS peptide, the coreNLS generally maintains the native pose, and this conclusion agrees well with the major peak in at Å in Fig. 2 b.
It is instructive to connect the bound KKPKKE peptide ensemble with the B-factors, B, reported for the 3VE6 structure. Recent study has estimated the maximum B-factors, , associated with the crystalline solid state (30). For the resolution of 2.8 Å reported for 3VE6 Å. It is striking that the minimum B for Glu11 is 81 Å and only two heavy atoms in Lys10 have their B-factors . Moreover, none of the heavy atoms in Lys6-Lys9 have their B exceeding . To provide a more direct comparison, we computed the average RMSFs of heavy atoms in min- and coreNLS amino acids j and compared them with 3VE6 B-factors averaged over the same atoms. Fig. 5 shows that and are not correlated for minNLS (correlation coefficient ), but exhibit a strong correlation for coreNLS (). Furthermore, for for Lys10 and Glu11 are well above , suggesting their disordered state in the x-ray structure. Finally, Fig. 5 distinctly demonstrates that minNLS values for Lys6-Lys9 are noticeably larger than for the coreNLS peptide. Indeed, their respective average RMSFs are 8.2 and 3.5 Å, i.e., the minNLS structural fluctuation exceeds those in coreNLS by more than twofold. This analysis agrees with the two previous conclusions. First, the four N-terminal amino acids in KKPKKE maintain largely native pose, whereas the two C-terminal amino acids form mostly non-native contacts. Second, while the minNLS peptide binds to impα non-natively without maintaining a specific pose, the core NLS peptide retains the native bound structure, which is consistent with 3VE6. In addition, the consistency of B-factors computed in silico and experimentally, as well as a good agreement between the binding poses of the coreNLS peptide adopted in our REST simulations and in the native 3VE6 structure, support the accuracy of the CHARMM36m force field.
Figure 5.
The scatterplots probing the correlation between the averaged root mean-square fluctuations of heavy atoms in peptide amino acids j and the B-factors averaged over the same atoms in amino acids j. The B-factors are extracted from PDB: 3VE6 structure. Filled and open circles refer to min- and coreNLS peptides, respectively. The black line indicates linear regression fit to coreNLS data, whereas the dashed line marks , the boundary of the crystalline state (30). The plots suggest, respectively, nonspecific and native binding of KKPK and KKPKKE peptides.
Binding of the coreNLS peptide to impα has been previously studied using the docking Glide program (14). It was found that the amino acids Lys6 through Lys9 fit well the 3VE6 pose of the 12-mer NLS peptide, but the positions of Lys10 and Glu11 become non-native. Moreover, according to Glide, KKPK binds to impα with a score of −9.6 kcal/mol, which is higher than −11.8 kcal/mol for the coreNLS. Finally, a recent molecular dynamics study probed the backbone fluctuations in the SV40 NLS peptide PAKKRKV (31). The Arg130Pro mutant affecting position P4 with the sequence PAKKPKV resembles VEEV NLS peptides and exhibits suppressed fluctuations in the N-terminal (being about 2 Å), which are increasing twofold in excess of 4 Å for the last two C-terminal residues. This outcome is consistent with our data for min- and coreNLS VEEV peptides. Furthermore, qualitatively similar fluctuation profiles were seen for other SV40 mutants replacing Arg with Gly, Ala, Val, or Met. Taking into account that, beyond positions P2, P3, and P5, the sequences of SV40 and VEEV NLS peptides differ, one may suggest that in general the disorder in the bound NLS is reduced in the N-terminus but grows toward the C-terminus.
To provide further analysis of the binding ensemble, we clustered KKPKKE conformations bound to impα as described in methods. Our computations revealed only two populated clusters, CL1 and CL2, given in Table 3, which together capture 75% of bound structures. The cluster centroids are displayed in Fig. 3 b. The dominant CL1 comprises 67% of all structures and is highly native with the RMSD of its centroid from the 3VE6 pose of only 2.5 Å. Indeed, the CL1 centroid forms 31 side-chain contacts retaining the fraction of of native interactions. Lys6 through Lys9 retain almost all native interactions (), and their fraction of non-native interactions does not exceed 0.27. In contrast, for Lys10 , whereas Glu11 establishes largely non-native interactions (). Thus, in line with the analysis above, Lys6 through Lys9 are natively bound, but the C-terminus Lys10 and Glu11 adopt a significant fraction of non-native, including non-coreNLS, interactions. CL1 makes up the dominant peak in Fig. 2 b. The second CL2 includes only 8% of the bound ensemble, and its structures differ from 3VE6 by 7.4 Å. Since the RMSD between the centroids of CL1 and CL2 is 7.5 Å, the two clusters are distinct (as seen in Fig. 3 b). The CL2 centroid is linked to impα via 30 contacts with the fraction of retained native contacts . Although Lys6 and Lys7 preserve at least two-thirds of native interactions, Lys10 and Glu11 form none. For Pro8 and Lys9, is 0.33 and 0.50, respectively. Thus, the CL2 is less native like, and its C-terminus is shifted away from the native pose. Nevertheless, an appearance of the dominant cluster with the low native RMSD implicates a mostly homogeneous, native-like binding ensemble sampled by the coreNLS peptide.
Following the approach taken for KKPK, we explored the free energy of KKPKKE binding using , where and are the numbers of native and non-native binding contacts formed by the coreNLS peptide. in Fig. 4 b reveals two basins (states), which perfectly match the two clusters CL1 and CL2 (see Fig. S9 b). Therefore, CL1 and CL2 represent the thermodynamically stable bound states. The free energies of the two states are given in Table 4. CL1 has the lowest free energy G separated from the second state CL2 by a gap of 0.4 kcal/mol. To reach CL2 from CL1, KKPKKE must cross the free energy barrier kcal/mol, which is reduced to 2.2 kcal/mol if the peptide transits in the opposite direction. Thus, the free energy landscape in Fig. 4 b demonstrates that the bound coreNLS peptide primarily samples the highly native low free state CL1 and only transiently visits the metastable state CL2 with reduced native content. We then conclude that the coreNLS peptide, which adds two amino acids to the C-terminus of the highly conserved NLS motif KKPK (P2-P5) (8), largely preserves the native binding pose of the 12-mer VEEV NLS peptide.
Comparison of min- and coreNLS peptide binding and broader outlook
The min- and coreNLS peptide binding to impα can be succinctly compared as follows. First, both peptides primarily bind to the impα major NLS binding site and exhibit the strongest binding affinity to the same three impα amino acids. Second, the binding affinity is skewed in both peptides toward the N-terminus, and native or non-native interactions primarily occur at the N- or C-termini, respectively. Nonetheless, these similarities coexist with the key differences in their binding behavior. First and foremost, the minNLS peptide has a diverse binding conformational ensemble featuring multiple sparsely populated binding poses. The coreNLS peptide, in contrast, adopts a largely homogeneous binding ensemble with a dominant, highly native cluster capturing two-thirds of binding poses. Second, KKPK forms mostly non-native binding interactions, does not retain π-cation native contacts, and its Lys9 does not reside in the native tryptophan cage. In contrast, the coreNLS peptide largely natively binds to impα retaining most of the native binding interactions, including π-cation contacts and its tryptophan cage. Third, KKPK binding is governed by a complex multistate free energy landscape reflecting peptide interconversion through multiple binding poses. In contrast, the free energy landscape for the coreNLS peptide binding is simple featuring a single dominant native-like basin.
What is, then, the source of such divergent binding scenarios exhibited by min- and coreNLS peptides? An apparent reason are the two C-terminal amino acids, Lys10 and Glu11, which bind relatively weakly to impα, each forming less than half of the interactions attributed, on average, to Lys6, Lys7, or Lys9. Specifically, Lys10 marginally retains the native salt bridge with impα Glu37, while Glu11 establishes two new stable electrostatic interactions with impα Lys32 and Arg31. Those C-terminal interactions enable locking the coreNLS peptide into the native pose. Thus, the sequence KKPKKE is sufficient to lock the NLS peptide into the native pose, while the other six, primarily N-terminal amino acids in the 12-mer VEEV NLS peptide from the 3VE6 structure, are not necessary for the native pose. This conclusion does not preclude that the N-terminal amino acids might also be sufficient for native binding. It is also important that we performed the binding REST simulations of KKPKK peptide and found that, as with the minNLS peptide, it does not form the native pose (see supporting material). Fig. S10 shows a broad probability distribution of RMSD values computed between all pairs of bound KKPKK structures. Thus, coreNLS peptide and not KKPKK or KKPK are sufficient for native binding to impα.
Is the minNLS KKPK sequence present in other proteins? In Table S1 we show that there are seven human proteins containing KKPK and predicted to interact with impα. Among them is TAF8, and its 9-mer NLS peptide PVKKPKIRR is complexed with impα in PDB: 4WV6 (32). The TAF8 KKPK fragment adopts a pose almost identical to that in 3VE6 (the respective RMSD = 0.9 Å). Based on our findings we predict that the KKPK fragment is not sufficient to confer native binding specificity of TAF8 to impα. Furthermore, because the TAF8 sequence flanking KKPK is distinct from VEEV NLS, it follows then that different amino acids surrounding KKPK can enforce its native binding. Thus, even though KE amino acids at the VEEV P6 and P7 positions are sufficient to induce the minNLS peptide native pose, none of the amino acids surrounding minNLS, including Lys10 and Glu11, in the proteins from Table S1, are strictly necessary for native binding. We also show in supporting material that there are 40 viral proteins harboring the KKPK fragment and 28 of those are expected to function as NLS. Taken together, we surmise that KKPK is not unique to VEEV and is shared among several human and many viral proteins expected to interact with impα. If so, our simulations offer the first step in characterizing the physicochemical mechanisms of NLS binding to impα and nucleocytoplasmic trafficking applicable to many human or virus proteins. However, our findings may not be directly applicable to bipartite NLS sequences, which critically depend on binding to the impα minor binding site (33). Moreover, our conclusion about KKPK nonspecific binding to impα may not be applicable to other variants of the conserved motif K-K/R-X-K/R.
Given the importance of the coreNLS sequence in enforcing native binding, we evaluated in supporting material the number of proteins predicted to interact with impα isoform KPNA2 and harbor the KKPKKE fragment. We found none. Expanding the search to other impα isoforms identified one additional protein, RPL5, predicted to interact with impα isoform KPNA6, which has highly homologous major NLS binding site (34). Aside from VEEV, there are no coreNLS sequences among viral proteins. Thus, KKPKKE interactions with impα are exceedingly rare and only found for VEEV and human protein RPL5. Then, the therapeutic implications of our work are that the development of VEEV antivirals should concentrate on a rare and distinct interface between KKPKKE and impα, although additional efforts must be devoted to understand potential off-target interactions affecting RPL5.
Does the VEEV NLS demonstrate the strongest binding affinity to impα among other NLS sequences? The single-site SV40 mutations at position P4 (Arg130 corresponding to our Pro8) performed by Smith et al. (31) offer a potential answer. Their study showed that all hydrophobic mutants replacing Arg with Gly, Ala, Val, Pro, or Met reduces the binding affinity of NLS to impα by at least about 10-fold. Thus, it appears that the VEEV NLS featuring Pro at position P4 is not fully optimized for binding to impα. This conjecture is further supported by the previous structural and energetic analysis of NLS binding to major impα binding site (12). It was argued that the best NLS sequence at positions P2-P5 is KRRK. If so, the VEEV NLS is indeed not fully optimized for binding to impα as it features Lys at P3 and Pro at P4. Lys7 cannot form contacts with anionic Glu196 that would be established by Arg at P3. Similarly, Pro at P4 cannot form electrostatic interactions that would be available for Arg at this position. The suboptimal sequence composition of VEEV NLS for binding to impα is consistent with our data showing that KKPK minNLS peptide binds nonspecifically, failing to adopt a native pose in the NLS major site.
Finally, exploration of NLS sequences may have implications for biotechnology. Gene editing technologies such as CRISPR-Cas9, TALEN, and ZFN routinely exploit NLS sequences to aid the passage to the cell nucleus (35,36). A common NLS choice is the one from SV40 (with minNLS sequence KKRK), although recently the c-Myc NLS sequence (KRVK) was found to improve the efficiency of gene editing in a CRISPR-Cas9 system (37). One may hypothesize that NLS sequences involved in gene editing technologies should not be solely optimized for binding to impα, but also for release after trafficking into a nucleus. This assumption would explain the better performance of KRVK over KKRK, because it compromises binding of the P4 NLS position to impα. Based on this rationale, gene editing utilizing the VEEV NLS sequence KKPK is also worth exploring as it may further balance NLS binding affinity and release from impα.
Conclusions
In summary, our all-atom REST have mapped the mechanisms governing VEEV NLS binding to impα. Our objective was to identify the VEEV NLS sequence fragment that confers the specific binding to impα as well as to study associated binding energetics and conformational ensembles. The two VEEV NLS peptide fragments selected for the study, KKPK and KKPKKE, show strikingly different binding mechanisms. We showed that the minNLS peptide KKPK binds non-natively and nonspecifically by adopting five diverse conformational clusters, of which only one and the least-populated shows similarity to the 3VE6 impα-NLS complex. However, despite the prevalence of non-native interactions the minNLS peptide still largely binds to the impα major NLS binding site. In contrast, the coreNLS peptide KKPKKE binds specifically and natively adopting largely homogeneous binding ensemble with a dominant, highly native-like conformational cluster. Moreover, the coreNLS peptide retains most native binding interactions, including π-cation contacts and the tryptophan cage. While KKPK binding is governed by a complex multistate free energy landscape featuring transitions between multiple binding poses, the coreNLS peptide free energy map is simple exhibiting a single dominant native-like bound basin. An apparent origin of the coreNLS peptide binding specificity are the two C-terminal amino acids, Lys10 and Glu11. Although binding relatively weakly, they form three electrostatic contacts with impα enabling locking the coreNLS peptide into the native pose. Thus, coreNLS sequence is sufficient to lock the NLS peptide into the native pose, but none of the amino acids flanking minNLS in different proteins are necessary for the native pose. Our findings suggest that the VEEV coreNLS sequence, which is almost unique among human and viral proteins interacting with impα, may serve as a target for VEEV-specific inhibitors.
Author contributions
D.K.K. designed the research. B.M.D. carried out all simulations. B.M.D., C.L., and D.K.K. analyzed the data. B.M.D., C.L., and D.K.K. wrote the article. A.O., X.E.L., K.W.F., M.P., and K.K.-H. reviewed the article.
Acknowledgments
Research reported in this publication was supported by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under award no. R01AI143817 (to D.K.K., M.P., and K.K.-H.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Declaration of interests
The authors declare no competing interests.
Editor: Yuji Sugita.
Footnotes
Supporting material can be found online at https://doi.org/10.1016/j.bpj.2023.07.024.
Supporting material
References
- 1.Zacks M.A., Paessler S. Encephalitic alphaviruses. Vet. Microbiol. 2010;140:281–286. doi: 10.1016/j.vetmic.2009.08.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Lundberg L., Carey B., Kehn-Hall K. Venezuelan equine encephalitis virus capsid—the clever caper. Viruses. 2017;9:279. doi: 10.3390/v9100279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Aguilar P.V., Estrada-Franco J.G., et al. Weaver S.C. Endemic Venezuelan equine encephalitis in the Americas: Hidden under the dengue umbrella. Future Virol. 2011;6:721–740. doi: 10.2217/FVL.11.5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Atasheva S., Garmashova N., et al. Frolova E. Venezuelan equine encephalitis virus capsid protein inhibits nuclear import in mammalian but not in mosquito cells. J. Virol. 2008;82:4028–4041. doi: 10.1128/JVI.02330-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Atasheva S., Fish A., et al. Frolova E.I. Venezuelan equine encephalitis virus capsid protein forms a tetrameric complex with CRM1 and importin α/β that obstructs nuclear pore complex function. J. Virol. 2010;84:4158–4171. doi: 10.1128/JVI.02554-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Atasheva S., Kim D.Y., et al. Frolov I. Venezuelan equine encephalitis virus variants lacking transcription inhibitory functions demonstrate highly attenuated phenotype. J. Virol. 2015;89:71–82. doi: 10.1128/JVI.02252-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Marfori M., Mynott A., et al. Kobe B. Molecular basis for specificity of nuclear import and prediction of nuclear localization. Biochim. Biophys. Acta. 2011;1813:1562–1577. doi: 10.1016/j.bbamcr.2010.10.013. [DOI] [PubMed] [Google Scholar]
- 8.Christie M., Chang C.-W., et al. Kobe B. Structural biology and regulation of protein import into the nucleus. J. Mol. Biol. 2016;428:2060–2090. doi: 10.1016/j.jmb.2015.10.023. [DOI] [PubMed] [Google Scholar]
- 9.Kosugi S., Hasebe M., et al. Yanagawa H. Six classes of nuclear localization signals specific to different binding grooves of importin alpha. J. Biol. Chem. 2009;284:478–485. doi: 10.1074/jbc.M807017200. [DOI] [PubMed] [Google Scholar]
- 10.Hodel M.R., Corbett A.H., Hodel A.E. Dissection of a nuclear localization signal. J. Biol. Chem. 2001;276:1317–1325. doi: 10.1074/jbc.M008522200. [DOI] [PubMed] [Google Scholar]
- 11.Wirthmueller1 L., Roth C., et al. Wiermer M. Hop-on hop-off: Importin-α-guided tours to the nucleus in innate immune signaling. Front. Plant Sci. 2013;4:149. doi: 10.3389/fpls.2013.00149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Fontes M.R.M., Teh T., et al. Kobe B. Structural Basis for the Specificity of Bipartite Nuclear Localization Sequence Binding by Importin-α. J. Biol. Chem. 2003;278:27981–27987. doi: 10.1074/jbc.M303275200. [DOI] [PubMed] [Google Scholar]
- 13.Fan F. 2012. Crystal Structure Analysis of Venezuelan Equine Encephalitis Virus Capsid Protein NLS and Importin Alpha. [Google Scholar]
- 14.Shechter S., Thomas D.R., et al. Jans D.A. Novel inhibitors targeting Venezuelan equine encephalitis virus capsid protein identified using in silico structure-based-drug-design. Sci. Rep. 2017;7 doi: 10.1038/s41598-017-17672-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Thomas D.R., Lundberg L., et al. Jans D.A. Identification of novel antivirals inhibiting recognition of Venezuelan equine encephalitis virus capsid protein by the importin α/β1 heterodimer through high-throughput screening. Antivir. Res. 2018;151:8–19. doi: 10.1016/j.antiviral.2018.01.007. [DOI] [PubMed] [Google Scholar]
- 16.Dias S.M.G., Wilson K.F., et al. Cerione R.A. The molecular basis for the regulation of the cap-binding complex by the importins. Nat. Struct. Mol. Biol. 2009;16:930–937. doi: 10.1038/nsmb.1649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Prlić A., Bliven S., et al. Bourne P.E. Pre-calculated protein structure alignments at the RCSB PDB website. Bioinformatics. 2010;26:2983–2985. doi: 10.1093/bioinformatics/btq572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Diaz-Garcia C., Hornos F., et al. Neira J.L. Human importin α3 and its N-terminal truncated form, without the importin-β-binding domain, are oligomeric species with a low conformational stability in solution. BBA General Subjects. 2020;1864 doi: 10.1016/j.bbagen.2020.129609. [DOI] [PubMed] [Google Scholar]
- 19.Pang X., Zhou H.X. Design Rules for Selective Binding of Nuclear Localization Signals to Minor Site of Importin α. PLoS One. 2014;9 doi: 10.1371/journal.pone.0091025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Huang J., Rauscher S., et al. MacKerell A.D., Jr. CHARMM36m: An improved force field for folded and intrinsically disordered proteins. Nat. Methods. 2017;14:71–73. doi: 10.1038/nmeth.4067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Jorgensen W.L., Chandrasekhar J., et al. Klein M.L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983;79:926–935. [Google Scholar]
- 22.MacKerell A.D., Bashford D., et al. Karplus M. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B. 1998;102:3586–3616. doi: 10.1021/jp973084f. [DOI] [PubMed] [Google Scholar]
- 23.Wang L., Friesner R.A., Berne B.J. Replica Exchange with Solute Scaling: A More Efficient Version of Replica Exchange with Solute Tempering (REST2) J. Phys. Chem. B. 2011;115:9431–9438. doi: 10.1021/jp204407d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Smith A.K., Lockhart C., Klimov D.K. Does Replica Exchange with Solute Tempering efficiently sample Aβ peptide conformational ensembles? J. Chem. Theor. Comput. 2016;12:5201–5214. doi: 10.1021/acs.jctc.6b00660. [DOI] [PubMed] [Google Scholar]
- 25.Phillips J.C., Hardy D.J., et al. Tajkhorshid E. Scalable molecular dynamics on CPU and GPU architectures with NAMD. J. Chem. Phys. 2020;153 doi: 10.1063/5.0014475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Humphrey W., Dalke A., Schulten K. VMD: Visual molecular dynamics. J. Mol. Graph. 1996;14:33–38. doi: 10.1016/0263-7855(96)00018-5. [DOI] [PubMed] [Google Scholar]
- 27.Khan H.M., MacKerell A.D., Jr., et al. Reuter N. Cation-π Interactions between Methylated Ammonium Groups and Tryptophan in the CHARMM36 Additive Force Field. J. Chem. Theor. Comput. 2019;15:7–12. doi: 10.1021/acs.jctc.8b00839. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Daura X., Gademann K., et al. Mark A.E. Peptide Folding: When Simulation Meets Experiment. Angew. Chem. Int. Ed. 1999;38:236–240. [Google Scholar]
- 29.Castro-Alvarez A., Costa A.M., Vilarrasa J. The Performance of Several Docking Programs at Reproducing Protein–Macrolide-Like Crystal Structures. Molecules. 2017;22:136. doi: 10.3390/molecules22010136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Carugo O. How large B-factors can be in protein crystal structures. BMC Bioinf. 2018;19:61. doi: 10.1186/s12859-018-2083-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Smith K.M., Di Antonio V., et al. Alvisi G. Contribution of the residue at position 4 within classical nuclear localization signals to modulating interaction with importins and nuclear targeting. BBA Mol Cell Res. 2018;1865:1114–1129. doi: 10.1016/j.bbamcr.2018.05.006. [DOI] [PubMed] [Google Scholar]
- 32.Trowitzsch S., Viola C., et al. Berger I. Cytoplasmic TAF2–TAF8–TAF10 complex provides evidence for nuclear holo–TFIID assembly from preformed submodules. Nat. Commun. 2015;6:6011. doi: 10.1038/ncomms7011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Doll S.G., Meshkin H., et al. Cingolani1 G. Recognition of the TDP-43 nuclear localization signal by importin α1/β. Cell Rep. 2022;39 doi: 10.1016/j.celrep.2022.111007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Tsimbalyuk S., Forwood J.K. 2022. Importin Alpha 7 Delta IBB (KPNA6) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Wang H.-X., Li M., et al. Leong K.W. CRISPR/Cas9-Based Genome Editing for Disease Modeling and Therapy: Challenges and Opportunities for Nonviral Delivery. Chem. Rev. 2017;117:9874–9906. doi: 10.1021/acs.chemrev.6b00799. [DOI] [PubMed] [Google Scholar]
- 36.Zagoskin A.A., Zakharova M.V., Nagornykh M.O. Structural Elements of DNA and RNA Eukaryotic Expression Vectors for In Vitro and In Vivo Genome Editor Delivery. Mol. Biol. 2022;56:950–962. doi: 10.1134/S0026893322060218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Gier R.A., Budinich K.A., et al. Shi J. High-performance CRISPR-Cas12a genome editing for combinatorial genetic screening. Nat. Commun. 2020;11:3455. doi: 10.1038/s41467-020-17209-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.