Structural prerequisites for CRM1-dependent nuclear export signaling peptides: accessibility, adapting conformation, and the stability at the binding site

Yoonji Lee; Jimin Pei; Jordan M Baumhardt; Yuh Min Chook; Nick V Grishin

doi:10.1038/s41598-019-43004-0

. 2019 Apr 29;9:6627. doi: 10.1038/s41598-019-43004-0

Structural prerequisites for CRM1-dependent nuclear export signaling peptides: accessibility, adapting conformation, and the stability at the binding site

Yoonji Lee ¹, Jimin Pei ², Jordan M Baumhardt ³, Yuh Min Chook ³, Nick V Grishin ^1,^2,^✉

PMCID: PMC6488578 PMID: 31036839

Abstract

Nuclear export signal (NES) motifs function as essential regulators of the subcellular location of proteins by interacting with the major nuclear exporter protein, CRM1. Prediction of NES is of great interest in many aspects of research including cancer, but currently available methods, which are mostly based on the sequence-based approaches, have been suffered from high false positive rates since the NES consensus patterns are quite commonly observed in protein sequences. Therefore, finding a feature that can distinguish real NES motifs from false positives is desired to improve the prediction power, but it is quite challenging when only using the sequence. Here, we provide a comprehensive table for the validated cargo proteins, containing the location of the NES consensus patterns with the disordered propensity plots, known protein domain information, and the predicted secondary structures. It could be useful for determining the most plausible NES region in the context of the whole protein sequence and suggests possibilities for some non-binders of the annotated regions. In addition, using the currently available crystal structures of CRM1 bound to various classes of NES peptides, we adopted, for the first time, the structure-based prediction of the NES motifs bound to the CRM1’s binding groove. Combining sequence-based and structure-based predictions, we suggest a novel and more straight-forward approach to identify CRM1-binding NES sequences by analysis of their structural prerequisites and energetic evaluation of the stability at the CRM1’s binding site.

Subject terms: Computational biophysics, Protein structure predictions, Computational models, Molecular modelling

Introduction

Active transport between the nucleus and cytoplasm is an essential regulatory mechanism for many cellular proteins. As a major nuclear exporter factor, chromosome maintenance protein 1 (CRM1; or exportin-1, XPO1) mediates nuclear export of hundreds of distinct cargo proteins by recognizing short sequence motifs called Nuclear Export Signal (NES)^1–3. CRM1 shuttles between the nucleus and the cytoplasm, binds cargo molecules at high RanGTP levels inside the nucleus, traverses nuclear pore complex (NPC) as ternary cargo–CRM1–RanGTP complexes, and releases cargo into the cytoplasm upon hydrolysis of the Ran-bound GTP⁴. Since spatial re-localization of oncoproteins and tumor suppressor proteins is important in cancer cells, understanding of the NES can help the basic research about this process and can also help the discovery of anticancer agents⁵.

Classical NES motifs in the early studies were referred to as a cluster of hydrophobic residues, mostly leucines (hence also called Leu-rich NES), within a 10–15 residue-long sequence motif^1,6,7. Many years of research on various export cargoes and randomization-and-selection screens showed that more residue types, such as Ile, Val, Met, and Phe, are also allowed at the hydrophobic positions of the CRM1-dependent NES signals^8,9. These hydrophobic residues (Φ) are spaced with various patterns following the consensus Φ1-(x)_2–3-Φ2-(x)_2–3-Φ3-x-Φ4, where x denotes any amino acid. Later, structural studies of the CRM1 bound to NES peptides revealed another hydrophobic pocket in CRM1 that can bind to one more hydrophobic amino acid (Φ0)^10,11. This site is less restricted to hydrophobic residues compared to others. Until recently, the existing 11 consensus patterns were defined by the peptide library-based study⁹ and structural analyses of CRM1-NES complexes^11–14. They consist of four to five hydrophobic residues (Φ0-Φ4; generally, L, I, V, M, and F) which are bound to the corresponding hydrophobic pockets (P0-P4) in CRM1. Based on the pattern of these Φ’s and spacing sequences, the NES motifs are classified as class 1a, 1b, 1c, 1d, 2, 3, and 4. Additionally, compared to these classes, some peptides bind in the opposite (−) direction, making their Φ3-Φ4 positions bound to P0-P1 (class 1-reverse)¹³. Until recently, X-ray crystal structures of CRM1 bound to NES peptides of the 1a, 1b, 1c, 2, 3, 4, and 1a-reverse classes have been solved. Depending on the classes, the NES peptides showed distinct backbone conformations binding to the central portion of the hydrophobic groove of CRM1. One turn helix in the middle is remarkably conserved among all classes maintaining a hydrogen bonding with the Lys residue (Lys568) in human CRM1¹⁴.

Modeling short motifs or patterns like NES is a major research area in bioinformatics. Since NES motifs are essential regulators of the subcellular location of proteins in relation to cancer, cell cycle, cell differentiation and other important aspects of molecular biology, prediction of the NES motif is of great interest but still remains a challenge. Until now, more than 300 experimentally identified protein cargoes are recorded in databases such as validNESs¹⁵ and NESdb¹⁶ and over 1000 putative CRM1 cargoes were identified in a recent proteomics study¹⁷. Based on the ever-growing repertoire of the protein cargoes of CRM1, many attempts were tried to employ machine learning approaches to decide whether a given sequence has a CRM1-dependent NES motif or not. Several computational tools, such as NetNES⁸, NESsential¹⁸, NESmapper¹⁹, LocNES²⁰, Wregex²¹, and NoLogo²² have been developed to predict NES motifs. Most of them are sequence-based predictors based on consensus pattern matching and calculation of biophysical properties such as disordered propensity, secondary structure components, and solvent accessibilities. To capture the diversity of the NES sequences, the consensus patterns were generally applied in the form of regular expression or position-specific scoring matrix (PSSM). Unfortunately, NES patterns are quite commonly observed in a large portion of the proteome so that the prediction based on these consensus patterns results in a high false positive rate. Since a functional NES needs to be solvent-exposed and not buried in a globular fold, Kırlı et al. applied these criteria and pattern matching to identify NES motifs in a set of validated, new CRM1 cargoes and found that functional NES motifs still could not be identified in a significant portion of them¹⁷. Moreover, sequences of functional NES motifs appear to be more diverse than previously appreciated. A large portion of experimentally defined NES regions does not match the current consensus patterns¹⁷. As a solution to reduce the high false positive rate, other biophysical features such as disorder propensity, secondary structure component, and evolutionary conservation were incorporated into machine learning algorithms like support vector machines (SVM) or neural networks^8,20. However, the false positive rates remain high. In addition to the ever-expanding NES patterns resulting in many false positives when used in NES prediction, the limited information about direct CRM1 binding of the annotated NES regions is detrimental to develop accurate predictors using available data sets. Therefore, predicting NES motifs using only protein sequence information seems to have limitations, and the combination with structure-based predictions could be a new strategy to distinguish NES motifs and false positives.

In this study, using validated cargo protein sequences in NESdb and validNES, we provide a comprehensive look-up table which contains the location of the NES consensus patterns with the disorder propensity plots, conserved domain information, and the predicted secondary structure. This information could be useful for determining the most plausible NES region in the context of the whole protein sequence and for suggesting possibilities for some non-binders of the annotated NES regions. In addition, for the first time, we adopted the structure-based prediction of the NES sequences bound to the CRM1’s NES binding groove, using multiple crystal structures of CRM1-NES peptide as templates. For several experimentally validated NES peptides and false positive ones, we calculated the relative binding energy of the sequence segments at the CRM1’s binding pocket, and the prediction reliability of these binding energies was validated by the experimental binding affinities. Combining sequence-based and structure-based predictions, we suggest the novel and more straight-forward approach to identify NES sequences that bind directly to CRM1.

Results and Discussion

Deducing NES consensus pattern-matching sequences in candidate cargo proteins

Using the validated cargo protein sequences in NESdb and validNES (which have Leptomycin B (LMB)-sensitive data as evidence of CRM1-dependency), we extracted the NES consensus pattern-matching sequence segments based on the modified version of the Kosugi consensus^16,20 as summarized in Fig. 1. All the possible consensus patterns are recorded and prioritized by the empirical class priority (see Methods for details). Based on these criteria, 4226 consensus-matching segments were extracted for 318 cargo protein sequences. Among them, 463 segments were treated as candidate NES motifs as they occur in regions that overlap to experimental evidence, and 3763 were treated as false positives (FPs). The experimental NES regions of 54 cargo proteins do not match the current consensus and are not considered in this study. Also excluded are four cargo proteins with no reported NES regions and five cargos with long reported NES regions (>25 residues) that do not have specific residues annotated. Among the consensus patterns, class 1a is the most abundant class (41%) as expected. Especially, compared to the false positive sequences, class 1a is observed more than twice as often in the candidate NES sequences. Classes 1c, 2, and 3 follow with 14~15%, class 1a-reverse is observed in 8.6%, and classes 1b, 1d, 4, or 1c-reverse seem to be quite rare (Fig. S1).

NES consensus patterns used in this study. For the hydrophobic positions, Φ_1–4 are Leu, Ile, Val, Met, or Phe, and for the Φ₁ and Φ₂ positions, Thr or Ala is allowed for one position. Φ₀ is not restricted to the hydrophobic amino acids. In the reverse classes, the criteria are applied in the opposite direction, and one of the Φ₀ or Φ₁ should be Leu, Phe, or Met. The spacer residues (x) can be any amino acid, but several positions have exceptions. The spacers in Φ₂ [X]_nΦ₃XΦ₄ (or Φ₀XΦ₁[X]_nΦ₂ in reverse classes) do not allow to have Pro or Trp. For class 4, at least one residue of the spacers in Φ₃XXXΦ₄ should be Pro to make a turn (as observed in the X-ray crystal structure of CRM1-X11L2 peptide).

A comprehensive look-up table of NES patterns in NES cargo proteins

In order to make the NES motif to be accessible to CRM1-binding, the motif should not be located in the compactly folded protein domains. The NES motif may be located at the N-terminus, at the C-terminus, or within an unstructured region of an export cargo¹¹. Therefore, for a precise prediction of the export signals, it is crucial to consider the motifs’ location with respect to protein domains and disordered regions. For all possible NES consensus patterns of the cargo proteins that we extracted, we analyzed the relationship with the protein ordered/disordered regions, known domains, and their predicted secondary structures, and provide a comprehensive online table. For a given full protein sequence, we plotted the disordered propensity, the location of the known domains, the predicted secondary structures, and all possible NES consensus regions (Fig. 2). For a given entry, the information annotated in NESdb or validNES, such as evidence of CRM1-dependency, mutation data, functional sequences or sites, is listed together. The locations of all NES consensus-matching segments are marked together with the experimentally validated regions (Fig. 2A, the bottom of the plot). The reference databases (NESdb, validNES, and UniProt), protein visualization tool (ProViz)²³ and the structure and model database (SWISS-MODEL repository)²⁴ are linked for user convenience, and the filter for easy look-up is also provided. This table could be useful for determining the most likely NES region in the context of a whole protein sequence. The online table is accessible via: http://prodata.swmed.edu/nes_pattern_location/.

Location of the NES consensus patterns in Snurportin-1. (A) Disordered propensity, conserved domain information, predicted secondary structure, and the location of the consensus patterns are plotted together. The defined ordered region (by the cutoff value of 0.1; gray dashed line) is represented by the sky-blue box at the top. The regions of the conserved domains annotated in smart, Pfam, NCBI-curated, and CDD are marked in the middle. The predicted secondary structures (SS) were colored by red, black, and blue for α-helix, coil, and β-strand, respectively. The gradient of the color corresponds to the confidence level of the prediction. For the NES regions, experimentally validated regions are displayed in blue (with mutation data annotated in NESdb) and cyan (annotated as a functional sequence in NESdb or as a site in validNES). All the consensus pattern matching segments are located at the bottom. Segments not in the ordered regions and without β-strand predictions in the middle are highlighted in yellow. The red boxes are the pattern-matching segments overlapping with experimental evidence. (B) The crystal structure of CRM1-SNUPN complex structure (PDB id: 3GB8)¹². SNUPN is displayed by the cartoon, and the validated NES motif, Snurportin1 domain, and Snurportin-1_C domain are colored in red, green, and orange, respectively. CRM1 is represented by a white surface. (C) The list of the pattern-matching sequences in SNUPN. In the ‘candidates’ column, NES candidates and false positives are annotated with “cand” and “fp,” respectively. If the segment is located in the disordered or boundary region, it is flagged with “_D” while in the ordered region, it is flagged with “_O.” If the segment’s β-strand content is over 0.5, it is flagged with “_beta.” In the ‘sequence’ column, hydrophobic positions are colored in red, and the positions with the experimental evidence are marked with ‘*’ (mutation) and ‘+’ (functional sequence in NESdb or sites in validNES). The values in ‘diso,’ ‘spotd,’ and ‘iup’ are the average disordered propensity for the segment calculated by DISOPRED3, SPOT-Disorder, and IUPRED2A, respectively. The locations with respect to disordered/ordered region or conserved domains are listed in the ‘loc_DISO’ and ‘loc_CDD’ columns. ‘beta’ is for the β-strand content in the middle of the segment.

NES candidates in the disordered or ordered regions

Even if a sequence motif can be fitted to the NES consensus, a motif that is located deep in the globular fold can hardly bind to CRM1 unless the region unfolds. In some cases, it may be possible to unfold and bind, but we assume that these cases would be very limited. Also, short linear interaction motifs like NES motifs have been proposed to be locally disordered to facilitate dynamic interactions with their binding partners, and the NES prediction algorithms have used disorder context to help distinguish correct NES motifs from false predictions^18,20. However, NES motifs do not necessarily have to locate in the fully disordered region. Indeed, we have observed that some NES candidates are located in the fully disordered regions, but others are located next to ordered or “boundary” regions. Therefore, we employed the disorder propensity as a pre-filter to remove the segments located in the “highly” ordered regions.

Various computational tools have been developed for analyzing potential intrinsic disorder of protein sequences and were quite successful owing to clear association between disordered propensity and sequence features such as low complexity or high aromatic composition. We utilized DISOPRED3²⁵ and SPOT-disorder²⁶, which use homologous sequences’ alignment-based profiles for detecting disordered regions, and IUPred2A²⁷ which is much faster since it does not rely on the sequence alignment. Disordered regions for some proteins are quite differently predicted depending on the programs. In order to define ordered and buried regions with high confidence, we applied strict cutoff values (~0.1) to decide the order/disorder border lines (note that the most of the programs’ cutoff value for disordered regions are ~0.5). If a residue’s disorder propensities predicted by both DISOPRED3 and SPOT-Disorder are below 0.1, the residue is defined as in highly ordered region (note that the predicted values by IUPred2A are also recorded for the reference).

As shown in Fig. 3A, 55% of the NES candidate motifs are located in the disordered region, and 37% are found in the boundary region between the ordered and disordered parts. Only 8% of the NES candidate motifs are located in the highly ordered region. Among the 361 candidate motifs, 37 segments (for 20 cargo proteins) are located in the highly ordered region which may have less possibility to be accessible to CRM1 binding. For example, HDAC1 (uniport ID: Q13547) has a reported NES motif with a mutation data (L158A/L161A/L164A) for nuclear export²⁸. This region can be fitted to the classes 1c, 2, or 3, but it is located in the highly ordered region. The crystal structure of HDAC1 (PDB ID: 4bkx) showed that this segment is buried in the globular domain and seems unlikely to be accessed by CRM1 (Fig. 4A). Note that in case of its homolog HDAC5, the candidate NES motif (₁₀₈₁EEAETVSAMALLSVGA₁₀₉₆, class 1a) is located in the disordered region after the conserved Hist_deacetyl domain and found to directly bind to CRM1. The similar region (after the Hist_deacetyl domain) in HDAC1 (₃₅₈YLEKIKQRLFENLRMLP₃₇₄, class 1c) could be also considered as a possible NES motif of HDAC1. Table S1 lists the NES candidate motifs located in the highly ordered region and Fig. 4A,C shows some examples for these segments in the available 3D structures.

Location of the candidate NES and false positive sequences. (A) Location with respect to the disordered or ordered regions. DISO: located in the disordered region; boundary: located at the end of the highly ordered region; ORD: located in the highly ordered region. (B) Location with respect to the known domains annotated in CDD. MID: located in the middle of the domain; boundary: located at the end of the domain; small: located in the small domain (<50 residues); NA: located in the region with no annotated information.

Examples of possible non-binders to CRM1. (A) Segments located in the ordered globular fold. (B) Segments with β-strands in the middle. (C) Segments located in the ordered region and have β-strands in the middle. The hydrophobic residues are colored in red or green in the sequences and displayed as sticks in the structures.

In case of the false positives, the segments located in the highly ordered region is 19%, a larger percentage than those of the candidate NES motifs (note that the segments in the ordered region are far lower than those in the disordered region since we use the stringent cutoff for defining ordered region). The false positives in the disordered or boundary regions are 31% and 51%, respectively.

CDD domains and NES locations

To analyze the candidate NES motifs’ location with respect to the conserved regions, we extracted the conserved domain information for the cargo protein sequences using the four different databases, i.e., SMART, Pfam, NCBI-curated, and Conserved Domain Database (CDD). As shown in Fig. 3B, only 33% of the candidate NES regions are located in the middle of the CDD domains, and 40% is in the boundary region. It seems that the NES regions do not necessarily locate in the protein domains. Rather, the known domains are often considered to form folding units, masking the possible motifs from binding other proteins. In case of the false positives, more than half are located in the middle of the known domains. It may be because the hydrophobic residues are commonly located in the protein core or domains.

Secondary structure components of the NES peptides

Crystal structures of CRM1-bound NES peptides have been resolved for the classes 1a, 1a-reverse, 1b, 1c, 2, 3, and 4. They showed distinct backbone conformations that match their hydrophobic positions to the corresponding hydrophobic pockets in CRM1. Structural analysis, as well as secondary structure prediction of NES motifs, suggest that most NES motifs contain α-helices or helix-to-extended conformation^12–14. The class 1d is also expected to have helix-strand, and other reverse (−) classes are likely the reverse of their (+) counterparts¹⁴. The common feature of the backbone conformations among the classes is one turn of helix at the region from Φ2 to Φ3¹⁴.

In our analysis of the 361 candidate motifs, 36 segments (for 23 cargoes) have a β-strand conformation in the middle (β-strand contents of the middle part is >50%) (Table S2). Among them, 11 segments were confirmed to have β-strands in the available X-ray or solution structures. For example, NPM has two reported NES regions, but both of them are predicted to form β-strands in the middle of the segments. As shown in Fig. 4B, the two segments are both β-strands located in the middle of the jelly-roll fold. Indeed, both regions were also reported to be quite weak binders of CRM1²⁹ and the sequence of 42–61 failed to bind CRM1 in GST-pulldown assay (Chook Lab, unpublished results; annotated in NESdb). The candidate NES region in TDP-43 is also located in β-strands within a folded globular RRM domain, and it is recently validated to be a non-binder to CRM1 rather it is exported by passive diffusion³⁰. For six segments, there is no experimentally determined structure, but homology models showed the β-strands for the segments. For 17 segments, no structural information is available. For two segments, the conformation in the modeled structures (with sequence identities of 79% and 98%, respectively) are found to be helix reflecting the limitation of the secondary structure prediction.

Evaluation of the stability of the NES peptides at the CRM1 binding groove based on structure modeling

Recent structural works of CRM1 complexed with various cargo sequences expand the possible consensus patterns^13,14. Also, the NES-binding site in RanGTP-bound CRM1 is found to be quite rigid, and the peptides display CRM1-dependent NES activity only if their backbone conformations can place a sufficient number of the hydrophobic residues into the CRM1’s binding groove¹¹. The adapting conformation of the peptides can be efficiently analyzed by structure-based modeling methods so that the application of the structural information can advance more accurate NES prediction.

Using the reported NES peptides with experimental binding affinities^14,31 as a benchmarking set (Table 1), we evaluated the binding energy (E_bind) for a given peptide sequence at the CRM1 groove (see Methods for details). Binding energy can be assumed as relative stability of the protein(CRM1)-peptide(NES) complex structure compared to the protein itself and free peptide. The lower the binding energy, the higher the possibility for the peptide segments to bind at CRM1. Multiple crystal structures of CRM1-NES peptide (super PKI and MVM-NS2 for classes 1a; FMRP-1b for class 1b; SNUPN for class 1c; FMRP and SMAD4 for class 2; HIV-Rev for class2-rev type; X11L2 for class 4; and CPEB4 for class 1a-reverse; class 1a templates can be used to fit class 3 NES peptides) were utilized as templates. The model generation and energy calculation process are summarized in Fig. 5A.

Table 1.

Peptide sequences of the validated NES motifs or false positives used in the structure-based modeling.

Protein	Class	NES sequence	K_D (nM)^§	ref.
MVM NS2	1a	₇₇STVDEMTKKFGTLTIHD₉₃	2	³¹
^*super PKI	1a	₃₄NLNELALKLAGLDINK₄₉	4	³¹
PKI	1a	₃₄NSNELALKLAGLDINK₄₉	34	³¹
ADAR1	1a	₁₂₁RGVDCLSSHFQELSIYQ₁₃₇	69	³¹
MEK1	1a	₂₈TNLEALQKKLEELELDE₄₄	70	³¹
Pax	1a	₂₆₄RELDELMASLSDFKFMA₂₈₀	700	³¹
^*CPEB4-R	1a	₃₉₅RMIDILSSELSHMDFTR₃₇₉	710	³¹
NPMmutA	1a	₂₇₈MTDQEAIQDLCLAVEEVSLRK₂₉₈	790	³¹
HDAC5	1a	₁₀₈₁EAETVSAMALLSVG₁₀₉₅	1600	³¹
p73	1a	₃₆₄NFEILMKLKESLELMELVP₃₈₂	2000	³¹
^*hRio2-R	1a	₄₀₅GKIEELAQNFETMEFSR₃₈₉	2600	³¹
Stradα	1a	₄₁₃GIFGLVTNLEELEVD₄₂₇	10300	³¹
^*FMRP-1b	1b	YLKEVDQLRALERLQID	3000	¹⁴
SNUPN	1c	₁MEELSQALASSFSVSQDLNS₂₀	12500	³¹
HPV E7	1c	₇₃HVDIRTLEDLLMGTLGIVC₉₁	34000	³¹
HIV Rev	2	₇₃LQLPPLERLTLDC₈₅	1180	³¹
FMRP	2	₄₂₄LKEVDQLRLERLQID₄₃₈	2000	³¹
SMAD4	2	₁₃₄ERVVSPGIDLSGLTLQ₁₄₉	4600	³¹
mDia2	3	₁₁₅₇SVPEVEALLARLRAL₁₁₇₁	1600	³¹
CDC7	3	₄₅₆QDLRKLCERLRGMDSSTP₄₇₃	20000	³¹
X11L2	4	₅₅SSLQELVQQFEALPGDLV₇₂	1500	³¹
CPEB4	1a-R	₃₇₉RTFDMHSLESSLIDIMR₃₉₅	800	³¹
hRio2	1a-R	₃₈₉RSFEMTEFNQALEEIKG₄₀₅	2800	³¹
^*PKImut1 (I47A)	—	₃₄NSNELALKLAGLDANK₄₉	150000	³¹
^*PKImut2 (L42A/L45A)	—	₃₄NSNELALKAAGADINK₄₉	900000	³¹
^†APC	1a	₁₆₃AQLQNLTKRIDSLPL₁₇₄	(−)	³³
^‡Cyclin D1	1a	₂₈₁VDLACTPTDVRDVDI₂₉₅	(−)	—
APRIL	1b	₁₀₆LEPLKKLECLKSLDL₁₂₀	(−)	³³
^‡hTERT	1c	₉₆₅KAGRNMRRKLFGVLRLKC₉₈₂	(−)	—
DcpS	1c	₁₃₆TEKHLQKYLRQDLRL₁₅₀	(−)	³³
Cdk5	2	₁₃₃LINRNGELKLADFGL₁₄₇	(−)	³³
^†FGF1	2	₁₃₈THYGQKAILFLPLPV₁₅₂	(−)	³³
COMMD1	3	₁₇₁ILKTLSEVEESISTL₁₈₅	(−)	²⁰
DEAF1	1a-R	₄₅₂SWLYLEEMVNSLLNTAQQ₄₆₉	(−)	¹³
SGN5	1a-R	₂₂₁YALEVSYFKSSLDRKLL₂₃₈	(−)	¹³
^†COMMD1–2	1a-R	₁₆₄DEVKVNQILKTLSEVEES₁₈₁	(−)	¹³
^†ELF3	1a-R	₁₁₁RLVFGPLGDQLHAQLR₁₂₆	(−)	¹³

Open in a new tab

^*Engineered or mutated (underscored residues are the ones inserted or mutated).

^†Do not fit the consensus in Fig. 1 (due to Pro or do not have a bulky residue at Φ3/4 in the class 1a-R).

^‡Unpublished data.

^§(−) means no binding determined by pull-down binding assay.

Structure-based prediction of the stability of CRM1-NES peptide complex. (A) CRM1-NES peptide complex model generation and E_bind calculation procedure. (B) Generated models for the complex structures of CRM1-NES peptides with lowest E_bind. Class 1a peptides are displayed with CRM1 (in white) at the top. The hydrophobic (Φ) residues of these NES peptides (shown in the sticks) occupy the corresponding hydrophobic pockets (P0-P4) in CRM1. Peptides of other classes are shown at the bottom with the hydrophobic residues shown in the sticks.

Final model structures showed that all classes were predicted well with their Φ residues bound to the corresponding hydrophobic pockets (Fig. 5B). The calculated E_bind selected the right template for each class, and it can be utilized to find the most plausible class when multiple consensus patterns are found in one segment. The calculated E_bind values correlated quite well to the experimental K_D values (Fig. 6, left; R²~0.63; Pearson’s r~0.79 with p = 2e − 6). However, in the case of the two PKI mutant peptides which have extremely low binding affinities, the E_bind scores are not quite distinguishable from those of the weak binders such as SNUPN, SMAD4, and HPV-E7. In case of the PKI double mutant peptides, we found a large interface cavity at the binding interface with CRM1 (Fig. S3A), but this feature, definitely detrimental to binding, is not well reflected in the modeling process or energy calculation. To penalize the interface cavity of the complex structure, residue solvent accessibility (RSA) for key interface residues (Fig. S3B) is calculated using the NACCESS program³² and treated as another scoring term. The RSA-corrected E_bind scores (E_bind^RSA) is obtained by calculating E_bind^RSA = E_bind + w∙RSA (w is the weight for the RSA term and is optimized to maximize the correlation) (Fig. 6, middle). E_bind^RSA gave improved correlation (Fig. 6, right; R²~0.73; Pearson’s r~0.86 with p = 5e-8).

Correlation between the binding energies and the experimental K_D values. The binding scores are averaged in the five independent runs (<E_bind>^5runs; <E_bind^RSA >^5runs for the RSA-corrected values) and compared to the logarithm of K_D values (lnK_D). The CRM1-binders with K_D values are shown in filled markers with error bars which are the standard deviation during the five runs. The false positives are shown in orange empty markers. In the middle, the correlation between R² and the weights for RSA during the E_bind correction is shown. The weight of 0.35 were applied for calculation of E_bind^RSA.

For comparison, several false positive sequences that can be fitted to NES consensus but are experimentally validated as non-binders (determined by pull-down binding assay)^13,33 are subjected to modeling with the same procedure. Interestingly, these false positives showed significantly higher E_bind scores reflecting their low binding affinities at the CRM1 binding groove. Notably, the peptides such as COMMD1 (₁₆₄DEVKVNQILKTLSEVEES₁₈₁) and ELF3 (₁₁₁RLVFGPLGDQLHAQLR₁₂₆) were not fitted to the right template (i.e., the lowest E_bind complex is not the class 1a-R structure). It suggests that these sequences could be energetically unstable when their backbone conformations are fitted their hydrophobic residues to CRM1 hydrophobic pockets. In case of the false positive peptides fitted to the right template (Fig. 7), the backbone conformation and the Φ residues may appear to be pretty similar to the true positive ones; however, they showed inferior binding energies. In some cases, such as Cyclin D1 (Fig. 7A, middle) or FGF1 (Fig. 7C, right), the backbone conformation seems to be not maintained well when presenting the Φ side chains into the pockets.

Comparison of the structural models and binding energies of the CRM1-binding NES motifs (blue) and false positive sequences (orange). CRM1 structure is colored in white. The hydrophobic residues are colored in red in the sequences and displayed as sticks in the structures. The spacer residues are represented as lines.

We expect the merit of this structure-based, energy-based method is to discriminate true positive and false positive with similar sequence patterns, by analyzing energetic differences at the CRM1 binding site via full-atom modeling. This atomic-level energetic analysis cannot be deduced by using the only sequence. In this perspective, our method would suggest novel approaches to find the CRM1-binding NES motifs. We cannot ignore the fact that the interaction between CRM1 and a whole cargo protein can be more than that of the CRM1-NES peptide¹⁰; however, it is extremely difficult to consider extra contacts between CRM1 and cargo’s whole structure which may be different depending on each cargo. Based on our previous result describing the strength of the CRM1-NES peptide interaction correlated to the nuclear export activity³¹, we assume that the energy prediction between CRM1 and NES peptide is a practical strategy.

For evaluating the performance, we compared our results to those of other sequence-based methods, i.e., NetNES⁸, NESmapper¹⁹, and LocNES²⁰ (Figs S4–S20). Using the whole sequences of 17 proteins in Table 1, we extracted 19 positive cases (regions annotated as NES motifs in the NESdb or validNES database with mutational evidence) and 341 negative cases (non-NES regions with consensus pattern-matching). As shown in Table S3, E_bind score performs the same as LocNES in terms of recall rate (both predicts 17 true positives out of 19 experimentally verified NES cases). On the other hand, E_bind outperforms LocNES in terms of specificity and false positive rate. E_bind recorded 23 cases of false positives while LocNES predicted nearly the double amount of false positives (40 cases). NetNES showed better specificity (true negative rate (TNR): 0.988) than our method (TNR: 0.933). However, its recall rate (sensitivity or true positive rate (TPR): 0.474) was much lower than our method (TPR: 0.895). Our method seems to work well enough compared to these available methods. It effectively decreases false positives while maintaining a high recall rate, showing the best performance with respect to the balance of precision & recall (F₁ score), and effectiveness (DOR).

Possibility of non-binders to CRM1 among the NES-annotated regions

The databases like validNESs¹⁵ and NESdb¹⁶ provide valuable information on NES research, however, defining CRM1-dependent NES regions is still a difficult task. The expanding NES patterns result in many false positives. Also, the lack of information showing direct CRM1 binding to many annotated NES regions prevents development of accurate predictors using available data sets. Most published experimental studies were focused on showing that a protein is an export cargo, by deletion of the whole region encompassing a candidate NES or by mutation of all the suspected hydrophobic residue positions. These perturbations are drastic and may affect structural stability and result in defects of functions other than CRM1-binding and nuclear export. Therefore, one should interpret the experimental data carefully to identify the CRM1-binding NES location, and it is always possible that regions which have been annotated as experimentally validated are not in fact functional NES motifs. Indeed, some of the annotated NES regions were found in the buried (highly ordered) protein domains (Fig. 4A,C). Some others can form β-strands in the middle of the segment (Fig. 4B,C) which would be rare in real NES sequences. Candidate segments that form β-strands and are located in the ordered region are observed in three cargoes including FAK (₉₁RSEEVHWLHVDMGVSS₁₀₆), MoKA (₁₉₀KIQTLHLVGVNVPE₂₀₃), and Sirt1 (₄₂₃DEVDLLIVIGSSLKVRP₂₃₉). We suggest that these segments have high possibility to be non-binders to CRM1 unless they unfold or transform their conformations upon specific conditions. Some cargo proteins might be exported following other events such as binding to an NES-containing adaptor protein.

Even if a segment fits the NES consensus and also satisfies the location criteria, these criteria are still not enough to locate the real NES segments in the whole protein sequence (see yellow highlighted segments in the online table). We tested the E_bind calculation to the all possible segments of the natural cargo proteins listed in Table 1. If a segment cannot form an energetically stable complex at the CRM1’s NES binding groove, it is likely a non-binder to CRM1. As shown in Fig. 8, the NES candidates are likely to have the lower E_bind scores compared to other false positive segments. Among the seventeen cases, eleven cases have the NES candidate motifs with the lowest E_bind, and four cases have the NES regions with the second lowest E_bind but the difference between the lowest and second lowest is usually marginal (less than 2). Although the data set used in the structure-based modeling is quite small, the resulting binding energy values can discriminate between CRM1 binders and false positives. This structure-based prediction method can be utilized as one of the features to find real CRM1-dependent NES peptides in the pool of numerous false positive sequences.

Distinguishing CRM1-binding NES motifs and false positives by E_bind. Location of NES consensus and their binding energies in (A) Snurportin-1 (O95149), (B) MEK1 (Q05116) and (C) FMRP (Q06787). The description for the plots is same as Fig. 2. The calculated E_bind scores for the important segments (pattern-matching segments which are not located in the highly ordered region and do not have β-strand conformation in the middle; yellow highlighted) were displayed together. The E_bind scores of the candidate NES motifs were underlined and marked in red. The classes of the consensus patterns are marked in parentheses.

Conclusion

In summary, we analyzed the structural prerequisites for CRM1-dependent NES motifs, i.e., accessibility (by locating disordered/ordered regions), adapting conformation (by predicting secondary structures), and the stability at the binding site (by applying structure-based modeling to calculate binding energies). The comprehensive table including all the possible consensus patterns with the disordered propensity plot, conserved domain information, and the predicted secondary structures provide valuable information for determining or correcting the most probable NES regions.

In light of the currently resolved crystal structures of CRM1-NES peptides with diverse classes, we modeled the CRM1-NES peptide complex structures and calculated the stability of the NES peptides at the CRM1 binding groove. The resulting binding energies correlate well to the experimental binding affinities, and we can distinguish the real NES motifs and false positives which both match NES consensus patterns. Also, we do not rely on the input sequence’s pattern, rather use the energy function to select the most energetically favorable class template. Therefore, if the multiple patterns exist in one peptide segment, this energy calculation can be a tool to predict the peptide’s conformation when it binds to CRM1. Although the method can still be improved, this study provides a starting point to predict NES motifs by combining sequence-based and structure-based approaches. Because our method is template-based modeling, it is difficult to adequately model NES motifs of classes other than those of the templates. Since newly discovered NES motifs often deviate from the established consensus patterns, more structural information is definitely needed not only to understand new consensus patterns and NES-CRM1 binding mechanism but also to more accurately predict NES motifs.

Methods

Extraction of the NES consensus sequences

For the cargo proteins which have LMB sensitive data as CRM1-dependency annotated in NESdb¹⁶ and validNES¹⁵, the NES consensus-matching sequence segments were extracted by utilizing the modified version of the Kosugi consensus^16,20 (Fig. 2): Φ1-X_1,2,3-Φ2-[^PW]₂-Φ3-[^PW]-Φ4; Φ1-X_2,3-Φ2-[^PW]₃-Φ3-[^PW]-Φ4; or Φ1-X₂-Φ2-X[^PW]₂-Φ3-[^PW]₂-Φ4 ([^PW] is any of the 20 amino acids except Pro and Trp; Ala or Thr can be used only once at Φ1 or Φ2; X stands for any amino acid). If one segment or segments in the similar region (difference between the two segments’ starting residue numbers <5) can be fitted to multiple patterns, all the possible patterns are recorded but prioritized based on the fact that: (i) the class 1a pattern is the most frequently observed class in the validated NES sets, suggesting that it interacts more preferentially with CRM1 than other classes^9,16,22; (ii) in the current NES databases, class 3 sequences are as prevalent as NES motifs of classes 1c and 2¹³; (iii) the classes 1b and 1d can be found only in a few NES sequences, and the majority of the class 1d sequences can be overlapped to the class 1a pattern in the validated NES sets^9,13; and (iv) reverse(−) of classes 3 and 4 appears to lack β-strands to hydrogen bond with the Lys residue and may not be ideal NES motifs¹⁴. This empirical class priority is defined as follows: (i) class 1a with five Φs (c1a-5) as priority 1; (ii) class 1 with four Φs (c1a-4), classes 1a-R, 2, 3, and 4 as priority 2; (iii) classes 1a/1c with Thr or Ala in one of their Φ1 or Φ2 positions as priority 3; (iv) classes 1b, 1d, 1c-reverse, and classes 2/3 with Thr or Ala in one of their Φ1 or Φ2 positions as priority 4, and (v) classes 1b/1d with Thr or Ala in one of their Φ1 or Φ2 positions as priority 5. The extracted regions are from the one residue before Φ0 to the two more residues after Φ4 (or shorter if located at the protein C- or N-termini). If the Φ2-Φ4 portion of the extracted region overlaps with experimental evidence (annotated as “mutations that affect nuclear export,” “mutations that affect CRM1 binding,” or “functional export signal” in NESdb, or annotated as “sites” in validNES), it is considered as a candidate NES. If not, it is deemed as a false positive.

Calculation of disorder propensity and definition of ordered regions

The disorder propensity of the cargo protein sequences is calculated using three different programs, DISOPRED3²⁵, SPOT-disorder²⁶, and IUPred2A²⁷. For DISOPRED3 and SPOT-disorder calculation, which is based on multiple sequence alignment, uniref90_2015_01³⁴ database is used to find homologs during PSI-BLAST search³⁵. In order to define ordered regions with high confidence, we applied strict cutoff values (~0.1) to decide the order/disorder border lines (note that the default values for disordered regions of these three programs here are ~0.5). If a residue’s disorder propensities predicted by both DISOPRED and SPOT-disorder are below 0.1, the residue is defined as ordered (“O”). If not, the residue is recorded as potentially disordered (“D”). The predicted values by IUPred2A is also recorded for the reference. The sequence segment’s location is determined by scanning the portion of “D” or “O” in the segment and flanking residues (20 residues at both sides) (Fig. S2A). If the portion of “D” mark is more than 90% for the segment and flanking regions, the location of the segment (loc_DISO) is defined as an ordered region (“ORD”). If “O” is more than 90%, the location is determined as a disordered region (“DISO”). The other segments are considered as the ones located in the “boundary” region. The segments in the boundary regions can be found at the end of the ordered regions, or they can locate in the ordered regions where some portions (>10%) have higher disorder propensity than the cutoff value.

Extraction of the conserved domain information of the cargo proteins

By using the Batch CD-search tool³⁶, the conserved domain information for the cargo protein sequences was extracted. Four different databases, i.e., CDD (cdd v3.16), NCBI_Curated (cdd_ncbi v3.16), Pfam (oasis_pfam v3.16), SMART (oasis_smart v3.16), were searched with the expect value threshold of 0.01. The results were retrieved by the Concise mode.

Prediction of secondary structure

Secondary structures of the cargo protein sequences are predicted by PSIPRED Version 3.21³⁷. During PSI-BLAST search³⁵ to find homologs, uniref90_2015_01³⁴ database is used. In the online table, the confidence level of the prediction is also colored by a gradient from dark (high confidence) to light (low confidence).

Relative binding energy (E_bind) prediction

Ten crystal structures of CRM1 bound to various NES peptides, including MVM-NS2 (PDB ID: 6CIT³¹), super PKI (unpublished data), FMRP-1b (5UWO¹⁴), SNUPN (3GB8¹²), FMRP (5UWJ¹⁴), SMAD4 (5UWU¹⁴), HIV-Rev (3NBZ¹¹), X11L2 (5UWS¹⁴), and CPEB4 (5DIF¹³), were utilized as templates. For the CRM1 part, we extracted the residues from 479 to 655 (numbered in scCRM1) to reduce the computation time. For potential NES peptides, the positions from Φ0–1 to Φ4 + 2 positions were modeled (or a shorter segment in case a sequence used in the experimental K_D measure is shorter). A given peptide sequence is fitted to the backbone coordinates of every template structure. By using the Rosetta backrub module³⁸, the backbone conformations of the fitted NES peptide and the surrounding helices in CRM1 are sampled to generate 50 models (50,000 backrub Monte Carlo trials/steps were run for each model). Among them, five complex structures with the lowest energy are selected and then optimized by the Rosetta relax module^39,40, which searches the local conformational space around the starting structure. The relaxation was carried out 50 times for each model (i.e., the total number of models for a given peptide sequence is 10 × 50 = 500 models) with ‘-use_input_sc -ex1 -ex2’ flag for more rigorous search. The backrub-modeled backbone conformation was constrained during the relaxation by applying ‘-constrain_relax_to_start_cords’ flag. Structures of the CRM1 protein itself and the free peptide are also modeled separately with the same process. The all-atom energy function REF15 in Rosetta v.3.9 were utilized for all calculation.

The binding energy (E_bind) is calculated as E_complex − E_protein − E_peptide. The values for E_complex, E_protein, and E_peptide are the average of the lowest 10 energy values among the 500 models. For E_peptide, we utilized the lowest E_peptide among the all different backbone fitted models. Among the various template-fitted models, the one with the lowest E_bind score is selected. The E_bind scores were corrected with a solvent accessibility term calculated by the NACESS v.2.1.1 program³², which calculates the atomic accessible surface defined by rolling a probe of given size around a vdw surface. To penalize the cavity at the interface of CRM1 and low-affinity binders (such as PKI double mutant), the RSA values for the hydrophobic residues at the interface (Fig. S3) were extracted and added to the E_bind scores with the optimized weight.

Supplementary information

Supporting information^{(1.4MB, pdf)}

Acknowledgements

This work is funded by the Cancer Prevention Research Institute of Texas (CPRIT) Grants RP170170 (N.V.G. and Y.M.C.) and RP180410 (Y.M.C.), the National Institutes of Health Grant (GM127390 to N.V.G.) and Welch Foundation Grants (I-1532 to Y.M.C and I-1505 to N.V.G). The authors acknowledge the Texas Advanced Computing Center (TACC; http://www.tacc.utexas.edu) at The University of Texas at Austin for providing HPC resources.

Author Contributions

N.V.G. conceived of the presented idea and designed the research. Y.L. and J.P. developed the theory, performed the simulation, and analyzed the data. J.M.B. and Y.M.C. performed the experimental validation of the binding affinities and provided the structural data. Y.L. wrote the manuscript. Y.L., J.P., J.M.B., Y.M.C. and N.V.G. contributed to the interpretation of the results and revised the manuscript. N.V.G. supervised all the study.

Data Availability

The datasets generated during and/or analyzed during the current study are included in this published article and available via: http://prodata.swmed.edu/nes_pattern_location/.

Competing Interests

The authors declare no competing interests.

Footnotes

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information accompanies this paper at 10.1038/s41598-019-43004-0.

References

1.Fornerod M, Ohno M, Yoshida M, Mattaj IW. CRM1 is an export receptor for leucine-rich nuclear export signals. Cell. 1997;90:1051–1060. doi: 10.1016/S0092-8674(00)80371-2. [DOI] [PubMed] [Google Scholar]
2.Fukuda M, et al. CRM1 is responsible for intracellular transport mediated by the nuclear export signal. Nature. 1997;390:308–311. doi: 10.1038/36894. [DOI] [PubMed] [Google Scholar]
3.OssarehNazari B, Bachelerie F, Dargemont C. Evidence for a role of CRM1 in signal-mediated nuclear protein export. Science. 1997;278:141–144. doi: 10.1126/science.278.5335.141. [DOI] [PubMed] [Google Scholar]
4.Dickmanns A, Monecke T, Ficner R. Structural Basis of Targeting the Exportin CRM1 in Cancer. Cells-Basel. 2015;4:538–568. doi: 10.3390/cells4030538. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Kau TR, Way JC, Silver PA. Nuclear transport and cancer: From mechanism to intervention. Nat Rev Cancer. 2004;4:106–117. doi: 10.1038/nrc1274. [DOI] [PubMed] [Google Scholar]
6.Fischer U, Huber J, Boelens WC, Mattaj IW, Luhrmann R. The Hiv-1 Rev Activation Domain Is a Nuclear Export Signal That Accesses an Export Pathway Used by Specific Cellular Rnas. Cell. 1995;82:475–483. doi: 10.1016/0092-8674(95)90436-0. [DOI] [PubMed] [Google Scholar]
7.Wen W, Meinkoth JL, Tsien RY, Taylor SS. Identification of a Signal for Rapid Export of Proteins from the Nucleus. Cell. 1995;82:463–473. doi: 10.1016/0092-8674(95)90435-2. [DOI] [PubMed] [Google Scholar]
8.la Cour T, et al. Analysis and prediction of leucine-rich nuclear export signals. Protein Eng Des Sel. 2004;17:527–536. doi: 10.1093/protein/gzh062. [DOI] [PubMed] [Google Scholar]
9.Kosugi S, Hasebe M, Tomita M, Yanagawa H. Nuclear Export Signal Consensus Sequences Defined Using a Localization-Based Yeast Selection System. Traffic. 2008;9:2053–2062. doi: 10.1111/j.1600-0854.2008.00825.x. [DOI] [PubMed] [Google Scholar]
10.Monecke T, et al. Crystal Structure of the Nuclear Export Receptor CRM1 in Complex with Snurportin1 and RanGTP. Science. 2009;324:1087–1091. doi: 10.1126/science.1173388. [DOI] [PubMed] [Google Scholar]
11.Guttler T, et al. NES consensus redefined by structures of PKI-type and Rev-type nuclear export signals bound to CRM1. Nat Struct Mol Biol. 2010;17:1367–U1229. doi: 10.1038/nsmb.1931. [DOI] [PubMed] [Google Scholar]
12.Dong XH, et al. Structural basis for leucine-rich nuclear export signal recognition by CRM1. Nature. 2009;458:1136–U1171. doi: 10.1038/nature07975. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Fung HYJ, Fu SC, Brautigam CA, Chook YM. Structural determinants of nuclear export signal orientation in binding to exportin CRM1. Elife. 2015;4:e10034. doi: 10.7554/eLife.10034. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Fung HYJ, Fu SC, Chook YM. Nuclear export receptor CRM1 recognizes diverse conformations in nuclear export signals. Elife. 2017;6:e23961. doi: 10.7554/eLife.23961. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Fu SC, Huang HC, Horton P, Juan HF. ValidNESs: a database of validated leucine-rich nuclear export signals. Nucleic Acids Res. 2013;41:D338–D343. doi: 10.1093/nar/gks936. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Xu DR, Grishin NV, Chook YM. NESdb: a database of NES-containing CRM1 cargoes. Mol Biol Cell. 2012;23:3673–3676. doi: 10.1091/mbc.E12-01-0045. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Kirli K, et al. A deep proteomics perspective on CRM1-mediated nuclear export and nucleocytoplasmic partitioning. Elife. 2015;4:e11466. doi: 10.7554/eLife.11466. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Fu SC, Imai K, Horton P. Prediction of leucine-rich nuclear export signal containing proteins with NESsential. Nucleic Acids Res. 2011;39:e111. doi: 10.1093/nar/gkr493. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Kosugi S, Yanagawa H, Terauchi R, Tabata S. NESmapper: Accurate Prediction of Leucine-Rich Nuclear Export Signals Using Activity-Based Profiles. Plos Comput Biol. 2014;10:e1003841. doi: 10.1371/journal.pcbi.1003841. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Xu DR, et al. LocNES: a computational tool for locating classical NESs in CRM1 cargo proteins. Bioinformatics. 2015;31:1357–1365. doi: 10.1093/bioinformatics/btu826. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Prieto G, Fullaondo A, Rodriguez JA. Prediction of nuclear export signals using weighted regular expressions (Wregex) Bioinformatics. 2014;30:1220–1227. doi: 10.1093/bioinformatics/btu016. [DOI] [PubMed] [Google Scholar]
22.Liku ME, Legere EA, Moses AM. NoLogo: a new statistical model highlights the diversity and suggests new classes of Crm1-dependent nuclear export signals. Bmc Bioinformatics. 2018;19:65. doi: 10.1186/s12859-018-2076-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Jehl P, Manguy J, Shields DC, Higgins DG, Davey NE. ProViz-a web-based visualization tool to investigate the functional and evolutionary features of protein sequences. Nucleic Acids Res. 2016;44:W11–W15. doi: 10.1093/nar/gkw265. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Bienert S, et al. The SWISS-MODEL Repository-new features and functionality. Nucleic Acids Res. 2017;45:D313–D319. doi: 10.1093/nar/gkw1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Jones DT, Cozzetto D. DISOPRED3: precise disordered region predictions with annotated protein-binding activity. Bioinformatics. 2015;31:857–863. doi: 10.1093/bioinformatics/btu744. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Hanson J, Yang YD, Paliwal K, Zhou YQ. Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks. Bioinformatics. 2017;33:685–692. doi: 10.1093/bioinformatics/btw678. [DOI] [PubMed] [Google Scholar]
27.Meszaros B, Erdos G, Dosztanyi Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 2018;46:W329–W337. doi: 10.1093/nar/gky384. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Kim JY, et al. HDAC1 nuclear export induced by pathological conditions is essential for the onset of axonal damage. Nat Neurosci. 2010;13:180–U163. doi: 10.1038/nn.2471. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Bolli N, et al. Born to be exported: COOH-terminal nuclear export signals of different strength ensure cytoplasmic accumulation of nucleophosmin leukemic mutants. Cancer Res. 2007;67:6230–6237. doi: 10.1158/0008-5472.Can-07-0273. [DOI] [PubMed] [Google Scholar]
30.Pinarbasi ES, et al. Active nuclear import and passive nuclear export are the primary determinants of TDP-43 localization. Sci Rep-Uk. 2018;8:7083. doi: 10.1038/s41598-018-25008-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Fu SC, Fung HYJ, Cagatay T, Baumhardt J, Chook YM. Correlation of CRM1-NES affinity with nuclear export activity. Mol Biol Cell. 2018;29:2037–2044. doi: 10.1091/mbc.E18-02-0096. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.‘NACCESS’, computer program. (Department of Biochemistry and Molecular Biology, University College, London, 1993).
33.Xu DR, Farmer A, Collett G, Grishin NV, Chook YM. Sequence and structural analyses of nuclear export signals in the NESdb database. Mol Biol Cell. 2012;23:3677–3693. doi: 10.1091/mbc.E12-01-0046. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Suzek BE, et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics. 2015;31:926–932. doi: 10.1093/bioinformatics/btu739. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004;32:W327–W331. doi: 10.1093/nar/gkh454. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999;292:195–202. doi: 10.1006/jmbi.1999.3091. [DOI] [PubMed] [Google Scholar]
38.Smith CA, Kortemme T. Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction. J Mol Biol. 2008;380:742–756. doi: 10.1016/j.jmb.2008.05.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Nivon LG, Moretti R, Baker D. A Pareto-Optimal Refinement Method for Protein Design Scaffolds. Plos One. 2013;8:e59004. doi: 10.1371/journal.pone.0059004. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Conway P, Tyka MD, DiMaio F, Konerding DE, Baker D. Relaxation of backbone bond geometry improves protein energy landscape modeling. Protein Sci. 2014;23:47–55. doi: 10.1002/pro.2389. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting information^{(1.4MB, pdf)}

Data Availability Statement

The datasets generated during and/or analyzed during the current study are included in this published article and available via: http://prodata.swmed.edu/nes_pattern_location/.

[CR1] 1.Fornerod M, Ohno M, Yoshida M, Mattaj IW. CRM1 is an export receptor for leucine-rich nuclear export signals. Cell. 1997;90:1051–1060. doi: 10.1016/S0092-8674(00)80371-2. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Fukuda M, et al. CRM1 is responsible for intracellular transport mediated by the nuclear export signal. Nature. 1997;390:308–311. doi: 10.1038/36894. [DOI] [PubMed] [Google Scholar]

[CR3] 3.OssarehNazari B, Bachelerie F, Dargemont C. Evidence for a role of CRM1 in signal-mediated nuclear protein export. Science. 1997;278:141–144. doi: 10.1126/science.278.5335.141. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Dickmanns A, Monecke T, Ficner R. Structural Basis of Targeting the Exportin CRM1 in Cancer. Cells-Basel. 2015;4:538–568. doi: 10.3390/cells4030538. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Kau TR, Way JC, Silver PA. Nuclear transport and cancer: From mechanism to intervention. Nat Rev Cancer. 2004;4:106–117. doi: 10.1038/nrc1274. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Fischer U, Huber J, Boelens WC, Mattaj IW, Luhrmann R. The Hiv-1 Rev Activation Domain Is a Nuclear Export Signal That Accesses an Export Pathway Used by Specific Cellular Rnas. Cell. 1995;82:475–483. doi: 10.1016/0092-8674(95)90436-0. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Wen W, Meinkoth JL, Tsien RY, Taylor SS. Identification of a Signal for Rapid Export of Proteins from the Nucleus. Cell. 1995;82:463–473. doi: 10.1016/0092-8674(95)90435-2. [DOI] [PubMed] [Google Scholar]

[CR8] 8.la Cour T, et al. Analysis and prediction of leucine-rich nuclear export signals. Protein Eng Des Sel. 2004;17:527–536. doi: 10.1093/protein/gzh062. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Kosugi S, Hasebe M, Tomita M, Yanagawa H. Nuclear Export Signal Consensus Sequences Defined Using a Localization-Based Yeast Selection System. Traffic. 2008;9:2053–2062. doi: 10.1111/j.1600-0854.2008.00825.x. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Monecke T, et al. Crystal Structure of the Nuclear Export Receptor CRM1 in Complex with Snurportin1 and RanGTP. Science. 2009;324:1087–1091. doi: 10.1126/science.1173388. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Guttler T, et al. NES consensus redefined by structures of PKI-type and Rev-type nuclear export signals bound to CRM1. Nat Struct Mol Biol. 2010;17:1367–U1229. doi: 10.1038/nsmb.1931. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Dong XH, et al. Structural basis for leucine-rich nuclear export signal recognition by CRM1. Nature. 2009;458:1136–U1171. doi: 10.1038/nature07975. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Fung HYJ, Fu SC, Brautigam CA, Chook YM. Structural determinants of nuclear export signal orientation in binding to exportin CRM1. Elife. 2015;4:e10034. doi: 10.7554/eLife.10034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Fung HYJ, Fu SC, Chook YM. Nuclear export receptor CRM1 recognizes diverse conformations in nuclear export signals. Elife. 2017;6:e23961. doi: 10.7554/eLife.23961. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Fu SC, Huang HC, Horton P, Juan HF. ValidNESs: a database of validated leucine-rich nuclear export signals. Nucleic Acids Res. 2013;41:D338–D343. doi: 10.1093/nar/gks936. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Xu DR, Grishin NV, Chook YM. NESdb: a database of NES-containing CRM1 cargoes. Mol Biol Cell. 2012;23:3673–3676. doi: 10.1091/mbc.E12-01-0045. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Kirli K, et al. A deep proteomics perspective on CRM1-mediated nuclear export and nucleocytoplasmic partitioning. Elife. 2015;4:e11466. doi: 10.7554/eLife.11466. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Fu SC, Imai K, Horton P. Prediction of leucine-rich nuclear export signal containing proteins with NESsential. Nucleic Acids Res. 2011;39:e111. doi: 10.1093/nar/gkr493. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Kosugi S, Yanagawa H, Terauchi R, Tabata S. NESmapper: Accurate Prediction of Leucine-Rich Nuclear Export Signals Using Activity-Based Profiles. Plos Comput Biol. 2014;10:e1003841. doi: 10.1371/journal.pcbi.1003841. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Xu DR, et al. LocNES: a computational tool for locating classical NESs in CRM1 cargo proteins. Bioinformatics. 2015;31:1357–1365. doi: 10.1093/bioinformatics/btu826. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Prieto G, Fullaondo A, Rodriguez JA. Prediction of nuclear export signals using weighted regular expressions (Wregex) Bioinformatics. 2014;30:1220–1227. doi: 10.1093/bioinformatics/btu016. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Liku ME, Legere EA, Moses AM. NoLogo: a new statistical model highlights the diversity and suggests new classes of Crm1-dependent nuclear export signals. Bmc Bioinformatics. 2018;19:65. doi: 10.1186/s12859-018-2076-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Jehl P, Manguy J, Shields DC, Higgins DG, Davey NE. ProViz-a web-based visualization tool to investigate the functional and evolutionary features of protein sequences. Nucleic Acids Res. 2016;44:W11–W15. doi: 10.1093/nar/gkw265. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Bienert S, et al. The SWISS-MODEL Repository-new features and functionality. Nucleic Acids Res. 2017;45:D313–D319. doi: 10.1093/nar/gkw1132. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Jones DT, Cozzetto D. DISOPRED3: precise disordered region predictions with annotated protein-binding activity. Bioinformatics. 2015;31:857–863. doi: 10.1093/bioinformatics/btu744. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Hanson J, Yang YD, Paliwal K, Zhou YQ. Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks. Bioinformatics. 2017;33:685–692. doi: 10.1093/bioinformatics/btw678. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Meszaros B, Erdos G, Dosztanyi Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 2018;46:W329–W337. doi: 10.1093/nar/gky384. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Kim JY, et al. HDAC1 nuclear export induced by pathological conditions is essential for the onset of axonal damage. Nat Neurosci. 2010;13:180–U163. doi: 10.1038/nn.2471. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Bolli N, et al. Born to be exported: COOH-terminal nuclear export signals of different strength ensure cytoplasmic accumulation of nucleophosmin leukemic mutants. Cancer Res. 2007;67:6230–6237. doi: 10.1158/0008-5472.Can-07-0273. [DOI] [PubMed] [Google Scholar]

[CR30] 30.Pinarbasi ES, et al. Active nuclear import and passive nuclear export are the primary determinants of TDP-43 localization. Sci Rep-Uk. 2018;8:7083. doi: 10.1038/s41598-018-25008-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Fu SC, Fung HYJ, Cagatay T, Baumhardt J, Chook YM. Correlation of CRM1-NES affinity with nuclear export activity. Mol Biol Cell. 2018;29:2037–2044. doi: 10.1091/mbc.E18-02-0096. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.‘NACCESS’, computer program. (Department of Biochemistry and Molecular Biology, University College, London, 1993).

[CR33] 33.Xu DR, Farmer A, Collett G, Grishin NV, Chook YM. Sequence and structural analyses of nuclear export signals in the NESdb database. Mol Biol Cell. 2012;23:3677–3693. doi: 10.1091/mbc.E12-01-0046. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Suzek BE, et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics. 2015;31:926–932. doi: 10.1093/bioinformatics/btu739. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004;32:W327–W331. doi: 10.1093/nar/gkh454. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999;292:195–202. doi: 10.1006/jmbi.1999.3091. [DOI] [PubMed] [Google Scholar]

[CR38] 38.Smith CA, Kortemme T. Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction. J Mol Biol. 2008;380:742–756. doi: 10.1016/j.jmb.2008.05.023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Nivon LG, Moretti R, Baker D. A Pareto-Optimal Refinement Method for Protein Design Scaffolds. Plos One. 2013;8:e59004. doi: 10.1371/journal.pone.0059004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Conway P, Tyka MD, DiMaio F, Konerding DE, Baker D. Relaxation of backbone bond geometry improves protein energy landscape modeling. Protein Sci. 2014;23:47–55. doi: 10.1002/pro.2389. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Structural prerequisites for CRM1-dependent nuclear export signaling peptides: accessibility, adapting conformation, and the stability at the binding site

Yoonji Lee

Jimin Pei

Jordan M Baumhardt

Yuh Min Chook

Nick V Grishin

Abstract

Introduction

Results and Discussion

Deducing NES consensus pattern-matching sequences in candidate cargo proteins

Figure 1.

A comprehensive look-up table of NES patterns in NES cargo proteins

Figure 2.

NES candidates in the disordered or ordered regions

Figure 3.

Figure 4.

CDD domains and NES locations

Secondary structure components of the NES peptides

Evaluation of the stability of the NES peptides at the CRM1 binding groove based on structure modeling

Table 1.

Figure 5.

Figure 6.

Figure 7.

Possibility of non-binders to CRM1 among the NES-annotated regions

Figure 8.

Conclusion

Methods

Extraction of the NES consensus sequences

Calculation of disorder propensity and definition of ordered regions

Extraction of the conserved domain information of the cargo proteins

Prediction of secondary structure

Relative binding energy (Ebind) prediction

Supplementary information

Acknowledgements

Author Contributions

Data Availability

Competing Interests

Footnotes

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Relative binding energy (E_bind) prediction