Skip to main content
Biophysical Journal logoLink to Biophysical Journal
. 2016 May 10;110(9):1943–1956. doi: 10.1016/j.bpj.2016.04.009

A New Method for Determining Structure Ensemble: Application to a RNA Binding Di-Domain Protein

Wei Liu 1, Jingfeng Zhang 1, Jing-Song Fan 1, Giancarlo Tria 2, Gerhard Grüber 2, Daiwen Yang 1,
PMCID: PMC4939551  PMID: 27166803

Abstract

Structure ensemble determination is the basis of understanding the structure-function relationship of a multidomain protein with weak domain-domain interactions. Paramagnetic relaxation enhancement has been proven a powerful tool in the study of structure ensembles, but there exist a number of challenges such as spin-label flexibility, domain dynamics, and overfitting. Here we propose a new (to our knowledge) method to describe structure ensembles using a minimal number of conformers. In this method, individual domains are considered rigid; the position of each spin-label conformer and the structure of each protein conformer are defined by three and six orthogonal parameters, respectively. First, the spin-label ensemble is determined by optimizing the positions and populations of spin-label conformers against intradomain paramagnetic relaxation enhancements with a genetic algorithm. Subsequently, the protein structure ensemble is optimized using a more efficient genetic algorithm-based approach and an overfitting indicator, both of which were established in this work. The method was validated using a reference ensemble with a set of conformers whose populations and structures are known. This method was also applied to study the structure ensemble of the tandem di-domain of a poly (U) binding protein. The determined ensemble was supported by small-angle x-ray scattering and nuclear magnetic resonance relaxation data. The ensemble obtained suggests an induced fit mechanism for recognition of target RNA by the protein.

Introduction

Many multiple domain proteins possess domain motions. Such motions result in coexistence of multiple conformers with different populations or a structure ensemble in solution. The motions often play key roles in protein functions (1, 2, 3, 4) such as catalysis, regulatory activity, and cellular locomotion. Recently, intensive efforts have been made to investigate the less populated conformers and structure ensembles using various experimental methods (5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17). Paramagnetic relaxation enhancement (PRE) (18) technique has been proven to be a powerful method in conformational ensemble study (19, 20, 21). In PRE experiments, the detectable distance (r) between a paramagnetic center located at a spin-label and an observed proton can reach up to ∼35 Å, providing long-distance information for structure determination. Besides, the proportional property of PRE to r6 allows lowly populated (<10%) conformations to contribute significantly to experiment observables (22), provided that proper spin-labeling sites are chosen. In terms of structure, however, interpretation of PRE data, which are the weighted averages of physical quantities over all conformers truly existing in a dynamic system, is challenging due to inherent protein dynamics and extra flexibility of spin-labels.

Currently, a number of methods have been proposed to calculate a structure ensemble from experimental data such as PREs. The strategies utilized in these methods can be classified into direct and indirect strategies (23, 24). The direct strategy is to calculate a structure ensemble from PREs and other possible restraints by simulated annealing (provided by Xplor-NIH) (20, 21, 25). The indirect strategy is to create a pool of structures and then search for an ensemble of candidates from the pool (26, 27, 28, 29, 30, 31). Both strategies optimize the agreement between experimental PREs and backcalculated PREs from the derived ensemble. In general, the agreement improves gradually with the increase of ensemble size (number of structures) and often reaches a plateau at relatively large ensemble sizes. In fact, the exact ensemble size is uncertain and the use of a large number of conformers to represent the ensemble is often inevitable, which may lead to ambiguous interpretations of the structure-function relationship of a multidomain protein. Furthermore, overfitting often occurs when too many variables (or protein structures) are used to interpret a limited number of observables. The overfitting may result in some false structures, which can mislead our understanding of the structure-function relationship.

Here, we present a new (to our knowledge) method, denoted as orthogonal-parameters-based ensemble optimization (OPEO), aiming for using a minimal ensemble size to interpret PREs of a di-domain protein. In this method, individual domain structures were assumed to be known and rigid; spin-label and protein conformations were defined by three and six orthogonal parameters, respectively. First, the ensemble representation of each spin-label was solved by fitting conformer populations and orthogonal parameters against intradomain PREs. Subsequently, the protein structure ensemble was determined from interdomain PREs. To prevent overfitting caused by excessive protein ensemble members, a new (to our knowledge) criterion was established. As demonstrated on a reference ensemble (24) with predefined protein conformer number, conformations, and populations, the method and criterion proposed here can produce correct ensembles that agree very well with the input reference ensemble. We also applied the method to determine the structure ensemble of the di-domain of a poly (U) binding protein (Pub1p) in Saccharomyces cerevisiae.

Pub1p is known as an important regulator of cellular mRNA decay (32, 33). It modulates mRNA stability and turnover by blocking different degradation pathways and responds to various stresses such as glucose starvation, heat shock, arsenite, sodium azide, and high ethanol levels (34, 35, 36, 37). Pub1p is a multidomain protein containing two tandem RNA recognition motif (RRM) domains in the N-terminal region, one RRM domain in the C-terminal region, and a conserved methionine- and asparagine-rich domain between RRMs 2 and 3. Previous studies indicate that individual RRM domains bind to poly(U) with similar affinity, but the two tandem RRM domains connected by a 10-residue linker, here denoted as PubRRM12, have significantly higher affinity than the individual domains. The crystal structure of PubRRM12 has been reported in Li et al. (38), indicating that the two RRM domains do not interact. To understand the molecular mechanism of RRM-mediated RNA/DNA recognition, we solved the individual domain structures of PubRRM12 by nuclear magnetic resonance (NMR) and then used the method developed here to determine the structure ensemble. We showed that the tandem di-domain exists in four conformers in solution, and that none of them is in a fully open conformation that can readily interact with single-stranded RNA.

Materials and Methods

Protein expression and purification

Recombinant PubRRM12 (residues E71–K240), RRM1 (residues E71–S154), and RRM2 (residues Q155–K240) were cloned into a pETM vector and transformed into BL21(DE3) strain. 15N spin-labeled (15N, 13C spin-labeled) proteins were expressed overnight in M9 minimal medium containing 1 g/L 15N NH4Cl (1 g/L 15N NH4Cl and 2 g/L 13C spin-labeled glucose) at 20°C. The proteins were purified using Ni-NTA beads followed by cleavage of the His-tag with thrombin and further purification with a gel filtration column Superdex 75 (GE Healthcare Life Sciences, Marlborough, MA). The final buffer used contained 20 mM NaCl and 20 mM sodium phosphate at pH 6.5.

Spin-labeling

The wild-type PubRRM12 contains no cysteine residues. Single-cysteine mutants were prepared by site-directed mutagenesis at residues M107, H123, N148, S190, and N218. 15N-labeled mutants were expressed and purified using the same protocol as for the wild-type PubRRM12. Just before spin-labeling, the purified proteins were treated for 4 h with 5 mM DTT (dithiothreitol) at room temperature. DTT was then removed from the sample using a gel filtration column Superdex 75 (GE Healthcare Life Sciences). The eluted protein was immediately incubated with a 10-fold molar excess of MTSL overnight at 4°C. The free MTSL was removed using a gel filtration column again. The eluted protein was concentrated to ∼0.2 mM in a buffer of 20 mM NaCl and 20 mM sodium phosphate at pH 6.5, and then used for NMR experiments.

NMR spectroscopy and structure determination of individual domains

All NMR experiments were performed on an Avance 800 spectrometer (Bruker, Billerica, MA) equipped with a cryo-probe at 25°C. To determine the structure, two-dimensional (2D) HSQC, three-dimensional HNCA, HNCOCA, MQ-(H)CCH-TOCSY (39), and four-dimensional (4D) NOESY (40, 41) were recorded on a 13C,15N spin-labeled PubRRM12 sample at a protein concentration of 1.0 mM in a buffer with 90% H2O, 10% D2O, 20 mM NaCl, and 20 mM sodium phosphate (pH 6.5). 15N relaxation rates R1 and R2 and heteronuclear 15N nuclear Overhauser effects (NOEs) were measured on 15N-labeled samples with ∼0.2 mM protein. Generalized order parameter (S2) and localized correlation time (τloc) were extracted on a per-residue basis from R1, R2, and NOE data using a simple method based on the Lipari-Szabo model, as described previously in Yang et al. (42). This method can be used to obtain dynamics parameters for nonspherical proteins. To examine if the two domains interact with each other, 2D 1H-15N HSQC spectra of RRM1, RRM2, and PubRRM12 were acquired on 15N spin-labeled samples (∼2 mg/mL) in the same buffer. PRE data were obtained from measurements of 1HN transverse relaxation rates (R2) of spin-labeled and unlabeled proteins using a two time-point method with a relaxation delay of 4 ms between the two time points (43).

NMR spectra were processed using NMRPipe (44) and analyzed using SPARKY. Backbone and side-chain resonance assignments were achieved using the 4D NOESY-based strategy described previously in Xu et al. (40). Unambiguous NOEs were obtained from three subspectra: 13C,15N-edited; 13C,13C-edited; and 15N,15N-edited 4D NOESY. Ambiguous NOEs were further assigned during iterated structure calculation and refinement. Distance constraints were obtained from the NOEs assigned, while dihedral angle restraints of φ and ψ were calculated with TALOS+ (45) using the assigned chemical shifts of Cα, Cβ, N, Hα, and HN. One-hundred conformers were calculated with Xplor-NIH (46, 47) using the standard simulated annealing method. Twenty conformers with the lowest target function values were selected for analysis.

Chemical shift perturbation

The combined chemical shift difference of an amide in two samples was calculated by (48):

Δδ=[(ΔδNH2+ΔδN2/25)/2]0.5,

where ΔδNHδN) is the 1HN (15N) chemical shift difference between PubRRM12 and RRM1 or between PubRRM12 and RRM2.

Synthetic PREs

To test performance of the method proposed here, we synthesized intradomain and interdomain PREs based on a reference ensemble method (24) then employed the synthetic PREs to calculate a structural ensemble, and finally compared the calculated and reference ensembles. The reference ensemble of PubRRM12 was generated in two steps by assuming that each domain adopts the lowest energy NMR conformation and is rigid, while the linker is fully flexible. First, a pool of protein structures was created by randomly rotating φ- and ψ-angles of the linker residues without steric clashes between any residues. Second, three structures with significant differences were arbitrarily selected as the structure ensemble members, and the populations of the three conformers were assigned as 0.6, 0.2, and 0.2, respectively. Due to its flexibility, MTSL can adopt multiple conformations. An ensemble of MTSL conformers was generated by random dynamic simulation using Xplor-NIH (46, 47) while fixing the protein backbone. The MTSL ensemble used here consisted of 3, 20, or 100 conformers with equal populations. Five sets of PRE data were synthesized from five MTSL-labeled variants by assuming the labeling sites at respective residues M107, H123, N148, S190, and N218. For each set of the data, the PRE of each amide proton was calculated using Eqs. 2 and 3 based on the generated spin-label conformers and protein structure ensemble. Coordinates of the free electron in MTSL were assumed the same as those of the nitroxide oxygen. The apparent correlation time (τcapp) used in Eq. 2 was set to 6 ns for all the electron-proton vectors in all the spin-labeled variants.

To account for the uncertainty in distances derived from PREs that is caused by PRE measurement errors and unknown protein and spin-label dynamics, random Gaussian noise was added to each synthetic PRE value. The random noise included a constant error of 3 s−1 that accounted for the measurement errors for both spin-labeled and unlabeled samples and a proportional error of 30% of the PRE value. Introduction of the proportional error is based on the fact that the larger the PRE value, the larger its measurement error (in absolute value) and the larger the PRE uncertainty caused by the unknown dynamics. Eighty-percent PREs were randomly selected for ensemble calculations, and the remaining 20% were used for cross validation.

Small angle x-ray scattering data collection and analysis

Small angle x-ray scattering (SAXS) data were measured with a NanoStar SAXS instrument (Bruker) equipped with a Metal-Jet x-ray source (Excillum, Karlsruhe, Germany) and VÅNTEC 2000-detector system (Bruker) as described previously in Tay et al. (49). SAXS experiments were carried out at 15°C in a series of protein concentrations ranging from 0.5 to 2.0 mg/mL, for a sample volume of 40 μL. The data were collected with six frames at 5 min intervals, and no radiation damage was detected by comparing these frames. The scattering of the buffer was subtracted from the scattering of the sample, and all the scattering data were normalized by the concentration as well as the incoming intensity.

All the data were processed using the program package PRIMUS (50). Quantitative assessment of the protein flexibility was done using the ensemble optimization method (EOM) 2.0 (51). In EOM 2.0, a pool of 10,000 independent models is created first by randomly varying the domain linker conformations. Afterwards, a genetic algorithm (GA) is used to select ensembles with varying numbers of conformers by calculating the average theoretical profile and fitting it to the experimental SAXS data. For each PubRRM12, the GA was repeated 100 independent times and the ensemble with the lowest discrepancy value (χ2) was reported as the best solution out of 100 final ensembles.

Computational strategy

To simplify the complexity caused by spin-label flexibility and protein domain dynamics, a two-step calculation method (25) with a rigid body model is implemented. In all calculations, individual protein domains are assumed to be rigid; the dynamics of a spin-label is represented by a limited number of conformers (m); and the spin-label in each conformer is considered as a single point whose three orthogonal parameters or coordinates (xkyk, zk) represent the location of the free electron in the label. In the first step, for the ith spin-label with a given ensemble size m, the values xk, yk, and zk, and population (pk) (k = 1, 2, … m, ∑pk = 1) are determined from its intradomain PREs by minimizing the Q factor using the conventional GA (52). The optimal ensemble size is determined from the dependence of the Q factor on m. The Q factor for the ith spin-labeled variant is given by:

Qi=j{Γ2obs(j)Γ2cal(j)}2j{Γ2obs(j)2}, (1)

where Γ2obs(j) (Γ2cal(j)) is the experimental/synthetic (calculated) intradomain PRE of nuclear spin j; the sum extends over the spins located in the same domain as the spin-label anchoring residue; and Γ2cal is given by:

Γ2cal(j)=Krj6[4τcapp+3τcapp1+(ωHτcapp)2], (2)

where K = 1.23 × 10−44 m6/s2 and ωH is the proton Larmor frequency. The value rj6 is an ensemble-average and given by:

rj6=k=1mpkrjk6, (3)

where rjk=[(xjxk)2+(yjyk)2+(zjzk)2]0.5, which is the distance between spin j and the free electron in the kth conformer; and xj, yj, and zj are the coordinates of spin j. In Eq. 2, τcapp is the apparent correlation time and given by τcS2, where τc and S2 are the correlation time and generalized order parameter of an electron-proton vector, respectively. For MTSL whose electron relaxation time is much longer than the protein overall rotational time, (τr) (21), τc = τr. The value τcapp is assumed identical for all the electron-proton vectors.

For the spins with Γ2obs values >80 s−1 or with undetectable HSQC peaks after spin-labeling, their Γ2ob values are considered as 80 s−1 in the Q-factor calculation. Similarly, when Γ2cal > 80 s−1, Γ2cal is treated as 80 s−1.

For each spin-labeled variant, the τcapp together with spin-label positions is estimated initially by minimizing the Q factor defined by Eq. 1 (53). Normally, the τcapp values for different variants do not vary significantly. So a uniform τcapp is assumed for all the variants and its value is set as the average of the estimated τcapp values for all the variants. Subsequently, the ensemble of each spin-label is determined using the uniform τcapp.

After obtaining the spin-label ensembles, we determine the number of protein conformers (n), relative separation and orientation of two domains, and population (P) of each conformer in the second step. The separation of the two domains is defined by three spherical coordinates (rd, θ, and φ) where rd is the distance between the two domain centers, and θ and φ are the polar and azimuthal angles, respectively, while the relative orientation is expressed by three Euler angles (α, β, and γ). Interdomain PREs of spins located in a domain different from the spin-label anchoring domain are used to calculate n, rdj, θj, φj, αj, βj, γj, and Pj, where j = 1, 2, …, n. Several sets of PRE data obtained from different spin-labeled variants are often used for protein structure ensemble determination. So, an overall Q factor (Qall) is used for minimization:

Qall=kiQi2ki, (4)

where Qi is the Q factor for the ith spin-labeled variant, and ki is the number of PREs used in the calculation of Qi. Spins with Γ2obs values <10 s−1 are not counted in the calculation of ki but included in the calculation of Qi because they are more error-prone.

Besides the agreement between experimental/synthetic and calculated Γ2 values, the spatial domain-domain conflict (Pclash) and restraint imposed by the linker between two domains (Plinker) needs to be considered in structure calculations. The final Q factor used for a candidate ensemble is given by:

Q=Qall+Pclash+Plinker. (5)

The dummy residue method is used to evaluate the domain-domain conflict (54). The value Pclash = 100 if the distance between any two Cα atoms in two different domains is smaller than a preset distance limit (6 Å in this study), otherwise Pclash = 0. The value Plinker = 100 if the distance between the Cα atoms in the C-terminus of domain 1 and N-terminus of domain 2 is larger than a limit (23 Å in this study), otherwise Plinker = 0. To cross validate the calculated protein structure ensemble, a small portion of interdomain PREs (10–20%), which is randomly chosen, is not used directly in the optimization. The Q factor calculated from this portion of PREs is denoted as Qfree.

For a given protein ensemble size n, the values rdj, θj, φj, αj, βj, γj, and Pj (j = 1, 2, …, n, ∑Pj = 1) can be optimized by the GA. However, the traditional GA becomes very time-consuming when n is large. To reduce computation time, we developed a progressive narrowing GA protocol (PNGA) (Fig. 1). In the first round of optimization, all the variables are allowed to vary in the entire possible ranges. The Q factor obtained from the optimized ensemble is recorded as the best Q factor (Qbest), and all parameters corresponding to this ensemble are recorded as the best variable values. In the next round of optimization, the range of each variable is reduced by a small fraction (Fred, which is an adjustable parameter and in a range of 1–5%) and the center of the variable range is at the best value obtained in the previous round. If the Q factor obtained from this round of optimization is smaller than Qbest and passes cross validation (Qfree< 1.4Q), the new result will become Qbest; the best variable values will be changed accordingly, the centers of all variable ranges will be reset, and all variable ranges will be reduced further by Fred for the next round of optimization. Otherwise, all parameters will remain unchanged, but the ranges will be reduced by another Fred for the next round of optimization. This optimization process is repeated until the Q-factor value converges. The entire calculation process can be repeated several times to avoid local minimum traps. Because the structures are calculated through orthogonal-parameter-based ensemble optimization, our computational strategy is denoted as OPEO.

Figure 1.

Figure 1

Flowchart describing the PNGA protocol.

Results

Validation of method OPEO with error-free synthetic data

The PREs used in this section were synthesized using a reference ensemble that contained three protein conformers (Fig. S1 in the Supporting Material) in which each spin-label adopted three conformations with equal populations. The synthetic PREs were assumed to have no errors. Using the synthetic intradomain PREs for a variant spin-labeled at M107, we optimized the positions of the three spin-label conformers by fixing τcapp at 6 ns. This procedure was repeated for the other four variants spin-labeled at respective sites H123, N148, S190, and N218. The derived positions were very similar to the input ones with differences in a range of 0.3–1.0 Å (Table S1) when the population of each spin-label conformer was fixed at the input value (1/3) during the optimization. The differences in spin-label positions became larger (1.0–2.4 Å) when the population was not fixed but used as a fitting parameter. We found that the Q factor became smaller than 0.01 and was relatively insensitive to the change of spin-label positions once the positions were very close to the true positions. So, the true spin-label positions were difficult to identify. To overcome this drawback, development of other minimization target functions will be necessary in further studies.

After the spin-label positions in the five variants were determined, protein structure ensembles were calculated with OPEO from interdomain PREs of all the variants by gradually increasing the ensemble size (n) (from 1 to 4). The Q and Qfree factors reached nearly a plateau when the protein ensemble size was three (Fig. 2, a and b). The optimized protein ensembles resembled very closely the reference one with root mean-square deviation (RMSD) values of 1.7 and 2.4 Å for fixed and unfixed spin-label conformer populations, respectively (Fig. 2 c). Although the Q-factor value decreased further with the increase of the ensemble size, the RMSD values between the calculated and reference ensembles became larger when n > 3 (Fig. 2, a–c), indicating occurrence of overfitting. The structure difference between the input and calculated ensembles with three structure members should be caused by deviations of the calculated spin-label positions from the input reference ones. When the spin-label positions were assumed the same as the input ones, the RMSD was nearly zero (Fig. 2 c). Therefore, a correct structure ensemble can be determined from PRE data using the method proposed here.

Figure 2.

Figure 2

Dependences of Q (a), Qfree (b), RMSDav (c), and O (d) factors on protein ensemble size, which were derived from synthetic PREs. RMSDav is the average RMSD between the calculated and reference ensembles (RMSDav=iPi×RMSDi, where Pi is the population of the calculated ith protein conformer, RMSDi is the RMSD value between the calculated ith conformer and its closest reference/input conformer). Equation 6 defines the O factor that is an overfitting indicator. The results for the cases with fixed and unfixed spin-label populations are indicated by ○ and ▵, respectively. The results for the case where the spin-label conformations used in the protein ensemble calculation are identical to the reference ones are shown by ▪. In this particular case, the calculated third- and fourth-ensemble members have nearly identical structures.

Number of pseudo spin-label conformers

Due to spin-label flexibility, the exact spin-label conformer number and conformations are unknown. Even if such information is available, it is very difficult, if not impossible, to handle computationally a large number of spin-label conformations when a GA is used to calculate a structure ensemble from a limited number of PREs. Is it good enough to use a few pseudo conformers to represent a group of spin-label conformers? To address this question, we assessed the influence of the pseudo spin-label conformer number on the calculated protein structure ensemble using a reference ensemble. The reference ensemble used here was similar to the one used above, but each spin-label in one protein conformer was assumed to have 100 conformers with equal populations (Fig. S1). First, for each spin-label variant we determined the spin-label positions and populations from intradomain PREs through Q-factor minimization by assuming that the spin-label was represented by 1, 3, 5, 7, and 9 pseudo conformers, respectively. The Q factors decreased sharply when the number of the pseudo conformers increased from 1 to 3. Further reduction was insignificant with the increase of the number from 3 to 9. A previous work showed that a five-conformer ensemble represents the dynamic MTSL better than a single conformer (55), which is consistent with our result. Subsequently, protein structure ensembles were calculated with OPEO by fixing the protein ensemble size at three (n = 3).

When the pseudo spin-label conformer numbers were 1–9, the RMSD values between the reference and calculated protein ensembles were 4.5–6 Å and did not display an obvious trend in the absence of PRE errors (Fig. S2 a). In addition, the total population differences were in a range of 0.15–0.3 (Fig. S2 b), although the synthetic PREs matched the backcalculated PREs extremely well (Fig. S3) with a Q factor of ∼0.09. To evaluate the effects of PRE errors on the structure and population differences, three groups of PRE data with different random errors were used to calculate structure ensembles. The ensembles obtained varied, and were also slightly different from those calculated from the data without errors (Fig. S2, ac), although the back-calculated PREs matched the synthetic PREs quite well (Figs. S4–S6) with Q-factor values ranging from 0.19 to 0.26 (Fig. S2 c). The result indicates that the calculated ensemble is influenced not only by the magnitude of the errors but also by the distribution of the errors among different residues. It is noteworthy that the magnitudes of the three sets of errors were identical and were larger than the upper limit of potential errors in most cases. Surprisingly, the errors did not necessarily cause deterioration in the overall structural quality (Fig. S2, a and b).

To examine if the structure difference is influenced by the number of spin-label conformers existing in a reference ensemble, we prepared another reference ensemble in which each spin-label in a protein conformer had 20 random conformations and the protein ensemble was the same as the one used above. Similarly, the RMSD values between the reference and calculated protein ensembles fluctuated with the pseudo-conformer numbers (1–9) in a range of 3.5–5.5 Å, slightly smaller than those shown in Fig. S2 a. Our results suggest that the deviations in structures and populations should be caused mainly by using a small number of pseudo conformers to represent a large number of spin-label conformers and by the mutual compensation of conformer structures and populations in terms of PREs as further discussed below.

Taking into account computation time that increases with the number of pseudo spin-label conformers as well as differences in structure and population, we proposed to use three pseudo conformers to represent the effective positions of a dynamic MTSL. The previous study on protein/DNA complexes incorporated with dT-EDTA-Mn2+ in the DNA has also suggested that a three-conformer model is generally sufficient to represent the spin-label in accurate backcalculation of PRE data (56). According to the results obtained here, the RMSD between the calculated and reference protein ensembles is <6.5 Å but >2 Å, which is insensitive to PRE errors, when three pseudo MTSL conformers are assumed. With this kind of structure accuracy, one shall not look at detailed structures at residue level, but instead focus on the overall structural states of a multidomain protein such as the open and closed states.

Efficiency of progressive narrowing genetic algorithm

We proposed a PNGA for structure ensemble calculation to reduce computation time. Fig. 3 shows the improvement in comparison with the conventional GA. When the conventional GA was employed to calculate ensembles with four members using a population size of 1200 for 60 replicas, the Q factor always oscillated at a high level. Using a population size of 12,000, the Q factor oscillated at a lower level. When the PNGA was used with a population size of 1200 and a Fred value of 5%, the Q factor reached its lowest value before 20 cycles and completely converged after 40 cycles. The converged Q value was even slightly lower than the minimal value obtained by the conventional GA with a 10-times larger population size. Therefore, PNGA is significantly more efficient (∼20 times faster) than the conventional GA.

Figure 3.

Figure 3

Comparison of results obtained using PNGA with a population size of 1200, progressive narrowing factor (Fred) of 5% (▪), conventional GA with population sizes of 1200 (○) and 12,000 (▵).

Overfitting

We examined overfitting phenomena using reference ensembles, and observed two types of overfittings. When the ensemble size was set at the correct value, the calculated structure ensemble might fit very well to the PRE data used in the calculation, but did not fit well to the unused PRE data, i.e., the Q factor was small but Qfree was large. This type of overfitting can be easily ruled out by checking both Q and Qfree values. Another type of overfitting occurred when excessive ensemble members were used or the ensemble size was larger than the true size. This type of overfitting is often difficult to identify using Q and Qfree because they decrease with the increase of the ensemble size without very obvious minima (Figs. 2 and S7). The overfitting might be evidenced by the increase of RMSD between the calculated and reference ensembles. In practice, the RMSD is not available because the true structure ensemble is unknown for a real system. To avoid this type of overfitting, we have to establish a new criterion.

One feature of the second type overfitting is that unnecessary ensemble members do not improve the Q value as much as the necessary ones do. Thus, we would expect a turning point in the plot of Q factor against the ensemble size when overfitting occurred. In practice, however, the turning point may not be obvious (Figs. 2 and S7). Accordingly, it is more reasonable to consider both the decreasing speed of Q factor against the ensemble size and the Q values than to consider only the Q values in identifying overfitting. Thus an O factor is defined as:

O=2(Qn2Qn1)Qn2Qn+Qn, (6)

where Qn-2, Qn-1, and Qn are the Q values for ensemble sizes of n-2, n-1, and n; n ≥ 2; and Q0 = 1. The first term is the ratio of the Q-factor reduction rate from size n-2 to n-1 to that from size n-2 to n. Normally, it ranges from 1 when the nth ensemble member improves the Q factor as much as the (n-1)th member (i.e., Qn-2Qn-1 = Qn-1Qn) to 2 when the nth member has no improvement to the Q factor (i.e., Qn = Qn-1). A sharp increase in the ratio is expected when an unnecessary ensemble member is introduced or overfitting occurs. Before occurrence of the overfitting, the ratio may fluctuate with the increase of the ensemble size. It is possible to observe a turning point in a plot of the ratio against the ensemble size even when the Q factor is still relatively large. To eliminate these turning points before overfitting, the second term in Eq. 6 is introduced because the Q factor decreases with the increase of the ensemble size. According to our results obtained from the reference ensembles, the turning points in the O factor plots were much more obvious than those in the Q-factor plots (Figs. 2 d and S7 d). Therefore, the correct ensemble size can be identified from the turning point in an O-factor plot.

Individual domain structures in solution

Using triple resonance NMR experimental data (Table S2), three-dimensional structures of individual PubRRM12 domains were solved (Fig. 4 a). However, domain-domain orientation could not be determined due to the absence of NOEs between the two domains. Each domain exhibits a canonical βαββαβ fold (57). The structure of each domain obtained here is very similar to the crystal structure solved previously in Li et al. (38) with pairwise backbone RMSD values <1 Å.

Figure 4.

Figure 4

PubRRM12 structure and chemical shift perturbations (Δδ) caused by domain-domain interaction. (a) Ribbon representation with highlighted residues (red) that displayed significant chemical shift perturbations (i.e., Δδ > Δδav + SD). (b) Combined chemical shift differences between PubRRM12 and RRM1 and between PubRRM12 and RRM2. The blue-dashed line represents the average Δδ value over all available residues (Δδav), while the red-dashed line denotes the value of Δδav + SD, where SD is the standard deviation of Δδ values for all available residues. To see this figure in color, go online.

Domain-domain Interaction

To determine whether there are weak interactions between the two domains, isolated RRM1 and RRM2 were compared with the intact PubRRM12 in amide chemical shifts at low protein concentrations (∼2 mg/mL). Interestingly, some residues located far away in sequence from the linker displayed significant chemical shift differences between the intact di-domain and isolated individual domains (Fig. 4 b). As shown below, PubRRM12 existed in monomer at a protein concentration of ∼2 mg/mL. Therefore, the observed chemical shift differences are caused by weak domain-domain interactions rather than weak oligomerization or aggregation.

Structure ensemble of PubRRM12

Five PubRRM12 mutants each with one MTSL at respective residues M107, H123, N148, S190, and N218 were used in PRE data collection. In total, 548 PRE restraints collected from the five spin-labeled mutants were used in structure ensemble calculation (Fig. S8), excluding the domain-domain linker region, a loop in RRM2 (195–202), and termini, which are relatively flexible. Ten-percent of PREs for each mutant were randomly chosen for cross validation, and the rest were used in structure calculation. For each spin-label site, an ensemble of three MTSL conformers was used to represent MTSL’s dynamics. The τcapp values optimized from the intradomain PREs were in 5.3–6.6 ns for the five variants. A uniform τcapp of 6 ns, the average value over all the variants, was used to calculate MTSL positions and protein structures. The Q, Qfree, and O factors against the ensemble size are shown in Fig. 5. Overfitting occurred when the ensemble size reached 5 as indicated by a sharp turning point at an ensemble size of 4 (Fig. 5 b). Thus, the structure ensemble of PubRRM12 could be represented by four members (Fig. 6) with populations of 43.6% (E1), 6.9% (E2), 43.2% (E3), and 6.3% (E4). The calculated structure ensemble fits very well to the experimental PREs (Fig. S8), indicating that E1–E4 reflect the domain dynamics of PubRRM12 in solution.

Figure 5.

Figure 5

Dependences of Q (▪) and Qfree (○) (a) and O (b) factors on the ensemble size of PubRRM12, which were derived from experimental PREs.

Figure 6.

Figure 6

Ribbon diagrams of four ensemble members (E1, E2, E3, and E4) of PubRRM12. E1, E2, E3, and E4 are shown in magenta, yellow, green, and orange. RRM1 domain structures are superimposed.

According to the structures, both E1 and E3 are stabilized mainly by hydrophobic interactions involving two large and protrusive hydrophobic patches: one on sheet β1–β4 in RRM1 and the other on β5–β8 in RRM2 (Fig. 7). In addition, electrostatic interaction may contribute to the stability of E3 because the positive patch in RRM1 can interact with the negative patch in RRM2. The hydrophobic patch on β1–β4 is also important for the stability of E2 because it interacts with the protrusive hydrophobic patch formed by β5 and α4. Different from E1–E3, E4 is driven mainly by electrostatic interactions through the negative patch in RRM2 and the positive patch in RRM1. The domain-domain interaction interfaces in E1–E4 are consistent with the chemical shift perturbation regions, which are mainly located in the two β-sheets (Fig. 4). Although four distinct conformers coexist in solution, only one set of NMR signals was observed, indicating that the conformers are in fast exchange on the chemical shift timescale.

Figure 7.

Figure 7

Domain-domain interaction sites highlighted by solid circles. The interaction regions found in structure ensembles E1, E2, E3, and E4 are indicated by labels E1, E2, E3, and E4, respectively. (a and d) Electrostatic potential surfaces of RRM1 and RRM2 calculated by using DELPHI. Electrostatic potential is colored from blue (positive charge) to red (negative charge). (b and e) Hydrophobic surfaces of RRM1 and RRM2. Hydrophobic residues (Phe, Trp, Tyr, Leu, Ile, Val, Met, Ala, Pro, and Gly) are colored in yellow, while hydrophilic residues (Thr, Ser, Lys, His, Glu, Gln, Asn, Asp, and Arg) are in gray. (c and f) Ribbon diagrams of RRM1 and RRM2 domains showing the orientation of each domain in the surface representations.

Validation of the calculated ensemble

To investigate whether the two domains rotate independently or cooperatively in solution, we conducted 15N relaxation experiments. The localized correlation times and order parameters derived from the relaxation data are shown in Fig. S9. Except for a few residues located in the linker, C-terminus, N-terminus, and long loop, the observed correlation times are mainly in a range of 8–10 ns, which is close to the rotational correlation times of globular ∼19 kDa proteins (∼11.5 ns) (58). If the two domains (each comprising ∼80 residues) rotate independently in solution, the correlation times should be similar to the overall tumbling times of the individual domains (∼5.6 ns). Therefore, the two domains in the di-domain protein do not rotate either independently or fully coherently, indicating existence of domain dynamics or relative domain motions due to weak domain-domain interactions.

As a complementary tool to the NMR-studies described above, solution x-ray scattering experiments were performed on PubRRM12 at protein concentrations of 0.5, 1.2, and 2.0 mg/mL, respectively. Extrapolation to theoretical infinite dilution was used for analysis (Fig. 8 a; Table S3). The Guinier plot at low angles for different concentrations appeared linear and confirmed good data quality and monodispersity of PubRRM12 with no indication of protein aggregation (Fig. 8 a, inset). The molecular mass estimated from the 0.5 and 2.0 mg/mL scattering data was 16 ± 3 kDa (Table S3), indicating that the protein is monomeric at the concentrations studied. From the Guinier approximation a radius of gyration (Rg) of 18.66 ± 0.53 Å was derived. According to the structures of E1–E4, N148 is close to the domain-domain interface. To assess if the MTSL attached at N148 interferes with domain-domain arrangements, we collected SAXS data of the spin-labeled variant. The SAXS profiles for the spin-labeled sample and wild-type PubRRM12 were nearly identical (Fig. S10), indicating that the MTSL at N148 does not change the structure ensemble.

Figure 8.

Figure 8

Solution x-ray scattering studies of PubRRM12. (a) Small angle x-ray scattering patterns (○). (Inset) Guinier plot shows linearity for PubRRM12 at the highest concentration of 2.0 mg/mL. (b) Normalized Kratky plot of PubRRM12 (solid green circles) compared to the compact globular lysozyme with a peak (—; solid blue circles). (c) Comparison of the Rg distributions (random pool, gray; selected ensemble, red). The Rg value of E1–E4 derived from NMR are indicated by dashed lines. (d) Comparison of experimental scattering data (o) and computed curve (solid line) by combining theoretical scattering intensities from NMR-derived structures E1–E4. To see this figure in color, go online.

The scattering pattern of PubRRM12 exhibits a broad bell-shaped profile shifted toward the right with respect to standard globular lysozyme (Fig. 8 b), indicating existence of multiple conformers that are in dynamic equilibrium (59). To characterize dynamic behavior of PubRRM12, the ensemble optimization method (51) was performed. As a result, the ensemble solution selected by EOM 2.0 provided a discrepancy of χ2 = 0.225 by selecting an ensemble of four structures. The Rg distribution for the ensembles that each fit well to the SAXS data is much narrower than that for a random pool of structures (Fig. 8 c). The Rg values of structures E1–E4 are all located inside the red area (Fig. 8 c), indicating the structures derived from the experimental PREs agree with the SAXS data. Despite the overall agreement, the apparent (or weighted average) Rg value for the NMR-derived ensemble is ∼2 Å smaller than the Rg measured by SAXS (18.66 Å), suggesting the NMR model is slightly more compact. This may be caused by the difference in protein concentration (∼4 mg/mL for PRE experiments versus ≤2 mg/mL for SAXS). A concentration-dependent alteration from a more extended to a more compact conformer at higher concentrations is a common phenomenon of two domain proteins. In addition, it is possible that there exist a small fraction of extended conformers, which give rise to negligible interdomain PREs and cannot be detected by PRE-based methods, but contribute significantly to the Rg measured by SAXS. The SAXS profile computed by combining theoretical scattering intensities from structures E1–E4 is very consistent with the experimental data (Fig. 8 d), further supporting that the structure ensemble of E1–E4 is a good representation of the true ensemble in solution.

To investigate the driving forces for the calculated structure ensemble, we acquired PRE data at different salt concentrations using a mutant with a spin-label attached at N148. As expected, the intradomain PREs (E71–S154) were not influenced by ionic strength (Fig. 9). Residues displayed significant PRE differences at 0 and 200 mM NaCl, which were concentrated in a region from the C-terminal end of β7 to the middle of α4. If a structure is stabilized mainly by electrostatic interactions, this structure will be altered by salt because electrostatic interactions decrease with the increase of ionic strength. If a structure is stabilized by hydrophobic interactions, this structure will not be affected by salt. According to the NMR-derived ensemble, the spin-label at N148 in RRM1 is close to the region from the end of β7 to the middle of α4 in structure E4 and is also close to the center of the RRM2 β-sheet (β5–β8) in structure E1. If our structure ensemble is correct in geometry and stabilization force, the residues from the end of β7 to the middle of α4 will have larger PREs at lower salt, while those in the RRM2 β-sheet will have similar PREs at higher and lower salt concentrations. This prediction agrees very well with our experimental observation, demonstrating that the calculated ensemble represents the true structures of PubRRM12 in solution.

Figure 9.

Figure 9

PRE data of PubRRM12 with a MTSL at N148 recorded at 0 (▪) and 200 mM (○) NaCl.

Discussion

The method proposed here determines di-domain protein structure ensemble by rotating and shifting one domain relative to the other, which uses only six orthogonal parameters to fully define the conformation of each protein conformer. In this method, the linker’s conformation is neglected because the function of a multidomain protein is often related to the domain-domain orientation and separation rather than the linker conformation. Previous rigid-body-based methods (20, 21, 22, 25, 26, 47, 60) solve structure ensembles by focusing on structures of the linker that connects two rigid domains. The linker’s backbone has at least 2n degrees of freedom for each conformer, where n is the number of residues in the linker. For an ensemble with m conformers, the total degrees of freedom for OPEO (6m+m-1) is much smaller than those for other methods (2nm+m-1) when n > 6. Hence our method should be much more efficient in identifying optimal structure ensemble than previous methods, but it cannot be used to determine the conformations of the linker. To obtain realistic linker conformations and consider energy minimization of each conformer, we will develop a more sophisticated PNGA-based protocol.

Structure ensemble determination from PREs is often achieved by optimizing the agreement between experimental and back-calculated PREs (or Q factor). Because the calculated PREs depend on the number of conformers, conformer structures, and populations, increasing the calculated PRE values of a cluster of protons can be achieved by reducing distances of a spin-label to the protons in a particular conformer (changing structure of a conformer), increasing the conformer population, or adding one more conformer to the existing ensemble. Thus, a small Q-factor value can be achieved by identifying an optimal ensemble with correct conformer number, structures, and populations or by finding a larger ensemble in which one or more true conformers are missing and some false conformers exist. If an optimization procedure is not efficient enough to find the correct ensemble, the second type of overfitting will occur easily. Our results demonstrate that the overfitting can be avoided with the method OPEO. This is accomplished by using a minimal number of fitting parameters or variables to define a structure ensemble and by employing a more efficient optimization protocol to identify the optimal ensemble.

Here we propose to use a three-pseudo-conformer ensemble to approximately represent the multiple conformations of a flexible MTSL. With this approximation, the synthetic PREs can match the back-calculated PREs extremely well with Q factors as small as ∼0.1, but the calculated structure ensembles still deviate from the input reference ensemble by as much as 6.5 Å even in the absence of PRE uncertainties. In a real system, PRE uncertainties exist due to measurement errors, protein dynamics, unaccounted spin-label dynamics, anisotropic tumbling of protein, incomplete spin-labeling, and spin-label reduction. To assess the effect of such potential uncertainties on the quality of calculated structure ensembles, we assume the uncertainties can be accounted by adding two random errors to each synthetic PRE: one absolute error (3 s−1, equivalent to a ∼4% measurement error in 1H R2 for the diamagnetic sample) and one proportional error of 30% of the PRE value. The order parameter (S2), which reflects the mobility of an electron-HN vector, includes two contributions: one from the vector orientation fluctuation caused mainly by rotational motions and the other from vector length fluctuation caused by domain-domain translational motions and spin-label dynamics. Iwahara et al. (56) found that the S2 correlates with the effective vector length and it increases from 0.68 to 0.85 with the increase of the length from 18 to 25 Å. Because the electron-HN vectors used in our calculations are long (>13 Å), the variation in S2 should be relatively small. For PubRRM12, the localized correlation times (τr) derived from different amides were in a range of 8–10 ns (Fig. S9) because of anisotropic tumbling and domain dynamics, so the variation in τcapp (τrS2) can be as large as ±20%. If incomplete spin-labeling occurs, 95% labeling will result in underestimation of PRE values by ∼5%. MTSL reduction is very slow and its effect can be neglected if the PRE data are collected for less than one day by using freshly prepared samples. Taken all of these potential errors together, the uncertainties in PREs are still smaller than the random noise added to the synthetic data (±30%). As shown above, the quality of the ensemble derived from PREs is insensitive to PRE uncertainties, provided that the uncertainties are <30%. Therefore, the complications caused by spin-label and protein dynamics as well as anisotropic tumbling can be ignored if the required structural quality is relatively low (>6.5 Å in RMSD). If a system is more dynamic and more anisotropic than PubRRM12, the quality of the ensemble obtained with the approach proposed here will be lower.

To achieve higher structural quality, rigid spin-labels such as unnatural amino acids (61) and MSTL analogs (55) shall be used to avoid the spin-label ensemble approximation. For rigid spin-labels, the anisotropic tumbling effect can be incorporated easily by modifying Eq. 2. When a large spin-label either rigid or flexible is used, its interference with domain-domain interactions shall be avoided. If the spin-label is located in a domain-domain interface, the spin-label tag may change domain-domain interactions and result in artificial structural members in the calculated ensemble. In this case, SAXS can be used to examine whether any interference exists, provided the interference is significant enough to change domain-domain separation. In addition, the incomplete spin-labeling effect shall be corrected by measuring the extent of labeling by mass spectrometry. The labeling extent can also be estimated from 2D NMR spectra used for PRE measurements when it is <90%. It is noteworthy that the correction is not needed for the spins whose chemical shifts in the labeled species are different from those in the unlabeled species.

The structure ensemble of Pub1p is significant different from its homolog U2AF65, which has been reported previously in Mackereth et al. (19) and Huang et al. (60). Although individual domain structures of U2AF65 and PubRRM12 are very similar, their amino-acid sequences (identity of 27.6% and similarity of 35.3%) are very different (Fig. S11). Examining the electrostatic potential of PubRRM12 (Fig. 7) and U2AF65 (Fig. 5 in Huang et al. (60)), we see that the RRM2 surface of PubRRM12 is totally negative, but that of U2AF65 contains both positively and negatively charged patches. The interactions through a negative charge region (D206 and E207) in RRM1 and a positive charge region (K286, K328, and R334) in RRM2 for U2AF65 are no longer available for PubRRM12. Moreover, U2AF65 has a much longer linker (32 amino acids) than PubRRM12 (10 amino acids), implying that U2AF65 can sample much more conformations than PubRRM12. Therefore, the differences in the electrostatic potential and linker length contribute to the very different structure ensembles of the two homolog proteins.

The mechanism of a single RRM domain binding to RNA/DNA is well understood (57). The four β-strands form a plastic platform for nucleic acid binding. The N- and C-terminal regions, together with loops, can enhance the binding affinity. To bind to longer single-stranded RNA/DNA, two or more RRM domains are required to form a larger binding platform. However, the molecular mechanism of recognition of RNA/DNA by tandem RRM domains is still not well understood. RNA-free U2AF65 was initially found to exist in two distinct conformers in solution: one open state corresponding to the RNA-bound conformation, and one closed state in which the RNA-binding surface of RRM1 is partially blocked by the two helices of RRM2 (19). Based on the structures, a conformational selection mechanism coupled with a population shift of the two states has been proposed for recognition of polypyrimidine tract RNA by U2AF65. Nevertheless, a minor induced-fit mechanism could not be ruled out (19). A more recent study on the same U2AF65 has shown that U2AF65 exists in a large number of conformers (60). In the ensemble a significant portion resembles the previously proposed closed state, a small portion resembles the open state, and many other conformers differ from the open and closed states. This result still underscores the possible contribution of the conformational selection mechanism in the RAN recognition by U2AF65.

In all of the four PubRRM12 conformers obtained here, the RNA-binding platform of RRM1 is nearly fully occupied by RRM2 and the linker. The binding platform of RRM2 is partially blocked by RRM1 in structures E1–E3, while it is unblocked in E4. None of the conformers adopts a fully open conformation to readily interact with single-stranded RNA/DNA. Very likely, Pub1p binds to a DNA or RNA through an induced-fit mechanism. First, a part of the RNA/DNA binds to the RRM2 domain in E4. In the meantime, RRM1 changes the orientation to open its binding platform. Subsequently, the RRM1 domain in the open conformation binds to the second part of the RNA/DNA. Once E4 completes the binding to the nucleic acid, other ensemble members can shift to E4.

Conclusions

A multidomain protein with weak domain-domain interactions often adopts more than one structure in solution. Determination of all the structures is the key to understand how the protein functions, which is still challenging. The overfitting problem commonly suffered by most methods may hinder elucidation of the structure-function relationship because false and true structures cannot be discriminated. As demonstrated on reference ensembles with predefined structures, the method proposed here overcomes the problem by enhancing optimization efficiency via use of a minimal number of parameters to define structures and a more efficient optimization protocol as well as by establishing a new overfitting indicator—the O factor.

MTSL is widely used as a spin-label in structure determination, but it is flexible and can adopt multiple conformations. No matter how complicated its dynamics is, the positions of a MTSL can be represented by a small number of pseudo conformers in structure ensemble determination from PRE data. When the required accuracy of a structural ensemble is not high (∼6.5 Å), the use of three pseudo conformers is a good choice because the quality of PRE-derived ensembles is not improved significantly but the computational time increases greatly with the increase of the pseudo spin-label conformer number.

PubRRM12 exists in four conformers in solution that are in fast exchange in the NMR time regime. Individual conformers can be stabilized by hydrophobic, electrostatic, or both hydrophobic and electrostatic interactions. Because none of the conformers adopts an open conformation to readily interact with single-stranded RNA/DNA, PubRRM12 very likely uses an induced fit mechanism to recognize RNA or DNA.

Author Contributions

D.Y. and G.G. designed the research; W.L., J.Z., J.-S.F., and G.T. performed the research; and W.L. and D.Y. wrote the article.

Acknowledgments

The authors thank Dr. Chun Tang and Dr. Haydyn Mertens for helpful discussions and Saw Wuan Geok for preparing Fig. 8.

This research was supported by a grant from Singapore Ministry of Education (Academic Research Fund Tier 3, No. MOE2012-T3-1-008).

Editor: H. Jane Dyson.

Footnotes

Jingfeng Zhang’s present address is State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, Wuhan Institute of Physics and Mathematics, The Chinese Academy of Sciences, Wuhan, China.

Eleven figures and three tables are available at http://www.biophysj.org/biophysj/supplemental/S0006-3495(16)30171-0.

Supporting Material

Document S1. Figs. S1–S11 and Tables S1–S3
mmc1.pdf (2MB, pdf)
Document S2. Article plus Supporting Material
mmc2.pdf (4.2MB, pdf)

References

  • 1.Mackereth C.D., Sattler M. Dynamics in multi-domain protein recognition of RNA. Curr. Opin. Struct. Biol. 2012;22:287–296. doi: 10.1016/j.sbi.2012.03.013. [DOI] [PubMed] [Google Scholar]
  • 2.Tzeng S.R., Kalodimos C.G. Protein dynamics and allostery: an NMR view. Curr. Opin. Struct. Biol. 2011;21:62–67. doi: 10.1016/j.sbi.2010.10.007. [DOI] [PubMed] [Google Scholar]
  • 3.Smock R.G., Gierasch L.M. Sending signals dynamically. Science. 2009;324:198–203. doi: 10.1126/science.1169377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bahar I., Chennubhotla C., Tobi D. Intrinsic dynamics of enzymes in the unbound state and relation to allosteric regulation. Curr. Opin. Struct. Biol. 2007;17:633–640. doi: 10.1016/j.sbi.2007.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Baldwin A.J., Kay L.E. NMR spectroscopy brings invisible protein states into focus. Nat. Chem. Biol. 2009;5:808–814. doi: 10.1038/nchembio.238. [DOI] [PubMed] [Google Scholar]
  • 6.Baxter N.J., Hosszu L.L., Williamson M.P. Characterisation of low free-energy excited states of folded proteins. J. Mol. Biol. 1998;284:1625–1639. doi: 10.1006/jmbi.1998.2265. [DOI] [PubMed] [Google Scholar]
  • 7.Burnley B.T., Afonine P.V., Gros P. Modelling dynamics in protein crystal structures by ensemble refinement. eLife. 2012;1:e00311. doi: 10.7554/eLife.00311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Ryabov Y.E., Fushman D. A model of interdomain mobility in a multidomain protein. J. Am. Chem. Soc. 2007;129:3315–3327. doi: 10.1021/ja067667r. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Schuler B., Müller-Späth S., Nettels D. Application of confocal single-molecule FRET to intrinsically disordered proteins. Methods Mol. Biol. 2012;896:21–45. doi: 10.1007/978-1-4614-3704-8_2. [DOI] [PubMed] [Google Scholar]
  • 10.Sekhar A., Kay L.E. NMR paves the way for atomic level descriptions of sparsely populated, transiently formed biomolecular conformers. Proc. Natl. Acad. Sci. USA. 2013;110:12867–12874. doi: 10.1073/pnas.1305688110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zhou Y., Yang D. 13Cα CEST experiment on uniformly 13C-labeled proteins. J. Biomol. NMR. 2015;61:89–94. doi: 10.1007/s10858-014-9888-1. [DOI] [PubMed] [Google Scholar]
  • 12.Volkov A.N., Ubbink M., van Nuland N.A. Mapping the encounter state of a transient protein complex by PRE NMR spectroscopy. J. Biomol. NMR. 2010;48:225–236. doi: 10.1007/s10858-010-9452-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ye Y., Blaser G., Komander D. Ubiquitin chain conformation regulates recognition and activity of interacting proteins. Nature. 2012;492:266–270. doi: 10.1038/nature11722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lim J., Xiao T., Yang D. An off-pathway folding intermediate of an acyl carrier protein domain coexists with the folded and unfolded states under native conditions. Angew. Chem. Int. Ed. Engl. 2014;53:2358–2361. doi: 10.1002/anie.201308512. [DOI] [PubMed] [Google Scholar]
  • 15.Long D., Yang D. Millisecond timescale dynamics of human liver fatty acid binding protein: testing of its relevance to the ligand entry process. Biophys. J. 2010;98:3054–3061. doi: 10.1016/j.bpj.2010.03.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Long D., Liu M., Yang D. Accurately probing slow motions on millisecond timescales with a robust NMR relaxation experiment. J. Am. Chem. Soc. 2008;130:2432–2433. doi: 10.1021/ja710477h. [DOI] [PubMed] [Google Scholar]
  • 17.Jiang B., Yu B., Yang D. A 15N CPMG relaxation dispersion experiment more resistant to resonance offset and pulse imperfection. J. Magn. Reson. 2015;257:1–7. doi: 10.1016/j.jmr.2015.05.003. [DOI] [PubMed] [Google Scholar]
  • 18.Kosen P.A. Spin labeling of proteins. Methods Enzymol. 1989;177:86–121. doi: 10.1016/0076-6879(89)77007-5. [DOI] [PubMed] [Google Scholar]
  • 19.Mackereth C.D., Madl T., Sattler M. Multi-domain conformational selection underlies pre-mRNA splicing regulation by U2AF. Nature. 2011;475:408–411. doi: 10.1038/nature10171. [DOI] [PubMed] [Google Scholar]
  • 20.Tang C., Iwahara J., Clore G.M. Visualization of transient encounter complexes in protein-protein association. Nature. 2006;444:383–386. doi: 10.1038/nature05201. [DOI] [PubMed] [Google Scholar]
  • 21.Tang C., Schwieters C.D., Clore G.M. Open-to-closed transition in apo maltose-binding protein observed by paramagnetic NMR. Nature. 2007;449:1078–1082. doi: 10.1038/nature06232. [DOI] [PubMed] [Google Scholar]
  • 22.Clore G.M., Iwahara J. Theory, practice, and applications of paramagnetic relaxation enhancement for the characterization of transient low-population states of biological macromolecules and their complexes. Chem. Rev. 2009;109:4108–4139. doi: 10.1021/cr900033p. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Anthis N.J., Clore G.M. Visualizing transient dark states by NMR spectroscopy. Q. Rev. Biophys. 2015;48:35–116. doi: 10.1017/S0033583514000122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Fisher C.K., Stultz C.M. Constructing ensembles for intrinsically disordered proteins. Curr. Opin. Struct. Biol. 2011;21:426–431. doi: 10.1016/j.sbi.2011.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Anthis N.J., Doucleff M., Clore G.M. Transient, sparsely populated compact states of apo and calcium-loaded calmodulin probed by paramagnetic relaxation enhancement: interplay of conformational selection and induced fit. J. Am. Chem. Soc. 2011;133:18966–18974. doi: 10.1021/ja2082813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Berlin K., Castañeda C.A., Fushman D. Recovering a representative conformational ensemble from underdetermined macromolecular structural data. J. Am. Chem. Soc. 2013;135:16595–16609. doi: 10.1021/ja4083717. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bernadó P., Mylonas E., Svergun D.I. Structural characterization of flexible proteins using small-angle x-ray scattering. J. Am. Chem. Soc. 2007;129:5656–5664. doi: 10.1021/ja069124n. [DOI] [PubMed] [Google Scholar]
  • 28.Bertini I., Giachetti A., Svergun D.I. Conformational space of flexible biological macromolecules from average data. J. Am. Chem. Soc. 2010;132:13553–13558. doi: 10.1021/ja1063923. [DOI] [PubMed] [Google Scholar]
  • 29.Choy W.Y., Forman-Kay J.D. Calculation of ensembles of structures representing the unfolded state of an SH3 domain. J. Mol. Biol. 2001;308:1011–1032. doi: 10.1006/jmbi.2001.4750. [DOI] [PubMed] [Google Scholar]
  • 30.Nodet G., Salmon L., Blackledge M. Quantitative description of backbone conformational sampling of unfolded proteins at amino acid resolution from NMR residual dipolar couplings. J. Am. Chem. Soc. 2009;131:17908–17918. doi: 10.1021/ja9069024. [DOI] [PubMed] [Google Scholar]
  • 31.Pelikan M., Hura G.L., Hammel M. Structure and flexibility within proteins as identified through small angle x-ray scattering. Gen. Physiol. Biophys. 2009;28:174–189. doi: 10.4149/gpb_2009_02_174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Anderson J.T., Paddy M.R., Swanson M.S. PUB1 is a major nuclear and cytoplasmic polyadenylated RNA-binding protein in Saccharomyces cerevisiae. Mol. Cell. Biol. 1993;13:6102–6113. doi: 10.1128/mcb.13.10.6102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Matunis M.J., Matunis E.L., Dreyfuss G. PUB1: a major yeast poly(A)+ RNA-binding protein. Mol. Cell. Biol. 1993;13:6114–6123. doi: 10.1128/mcb.13.10.6114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kedersha N.L., Gupta M., Anderson P. RNA-binding proteins TIA-1 and TIAR link the phosphorylation of eIF-2 α to the assembly of mammalian stress granules. J. Cell Biol. 1999;147:1431–1442. doi: 10.1083/jcb.147.7.1431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Vasudevan S., Garneau N., Peltz S.W. p38 mitogen-activated protein kinase/Hog1p regulates translation of the AU-rich-element-bearing MFA2 transcript. Mol. Cell. Biol. 2005;25:9753–9763. doi: 10.1128/MCB.25.22.9753-9763.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Buchan J.R., Muhlrad D., Parker R. P bodies promote stress granule assembly in Saccharomyces cerevisiae. J. Cell Biol. 2008;183:441–455. doi: 10.1083/jcb.200807043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Melamed D., Pnueli L., Arava Y. Yeast translational response to high salinity: global analysis reveals regulation at multiple levels. RNA. 2008;14:1337–1351. doi: 10.1261/rna.864908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Li H., Shi H., Teng M. Crystal structure of the two N-terminal RRM domains of Pub1 and the poly(U)-binding properties of Pub1. J. Struct. Biol. 2010;171:291–297. doi: 10.1016/j.jsb.2010.04.014. [DOI] [PubMed] [Google Scholar]
  • 39.Yang D., Zheng Y., Wyss D.F. Sequence-specific assignments of methyl groups in high-molecular weight proteins. J. Am. Chem. Soc. 2004;126:3710–3711. doi: 10.1021/ja039102q. [DOI] [PubMed] [Google Scholar]
  • 40.Xu Y., Zheng Y., Yang D. A new strategy for structure determination of large proteins in solution without deuteration. Nat. Methods. 2006;3:931–937. doi: 10.1038/nmeth938. [DOI] [PubMed] [Google Scholar]
  • 41.Xu Y., Long D., Yang D. Rapid data collection for protein structure determination by NMR spectroscopy. J. Am. Chem. Soc. 2007;129:7722–7723. doi: 10.1021/ja071442e. [DOI] [PubMed] [Google Scholar]
  • 42.Yang D., Mok Y.K., Kay L.E. Contributions to protein entropy and heat capacity from bond vector motions measured by NMR spin relaxation. J. Mol. Biol. 1997;272:790–804. doi: 10.1006/jmbi.1997.1285. [DOI] [PubMed] [Google Scholar]
  • 43.Iwahara J., Tang C., Marius Clore G. Practical aspects of 1H transverse paramagnetic relaxation enhancement measurements on macromolecules. J. Magn. Reson. 2007;184:185–195. doi: 10.1016/j.jmr.2006.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Delaglio F., Grzesiek S., Bax A. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR. 1995;6:277–293. doi: 10.1007/BF00197809. [DOI] [PubMed] [Google Scholar]
  • 45.Shen Y., Delaglio F., Bax A. TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J. Biomol. NMR. 2009;44:213–223. doi: 10.1007/s10858-009-9333-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Schwieters C.D., Kuszewski J.J., Clore G.M. The Xplor-NIH NMR molecular structure determination package. J. Magn. Reson. 2003;160:65–73. doi: 10.1016/s1090-7807(02)00014-9. [DOI] [PubMed] [Google Scholar]
  • 47.Schwieters C.D., Kuszewski J.J., Clore G.M. Using Xplor-NIH for NMR molecular structure determination. Prog. Nucl. Magn. Reson. Spectrosc. 2006;48:47–62. [Google Scholar]
  • 48.Fan J.S., Cheng Z., Yang D. Solution and crystal structures of mRNA exporter Dbp5p and its interaction with nucleotides. J. Mol. Biol. 2009;388:1–10. doi: 10.1016/j.jmb.2009.03.004. [DOI] [PubMed] [Google Scholar]
  • 49.Tay M.Y., Saw W.G., Vasudevan S.G. The C-terminal 50 amino acid residues of dengue NS3 protein are important for NS3-NS5 interaction and viral replication. J. Biol. Chem. 2015;290:2379–2394. doi: 10.1074/jbc.M114.607341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Konarev P.V., Petoukhov M.V., Svergun D.I. ATSAS 2.1, a program package for small-angle scattering data analysis. J. Appl. Cryst. 2006;39:277–286. doi: 10.1107/S0021889812007662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Tria G., Mertens H.D.T., Svergun D.I. Advanced ensemble modelling of flexible macromolecules using x-ray solution scattering. IUCrJ. 2015;2:207–217. doi: 10.1107/S205225251500202X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Haupt R.L. An introduction to genetic algorithms for electromagnetics. IEEE Ant. Prop. Mag. 1995;37:7–15. [Google Scholar]
  • 53.Simon B., Madl T., Sattler M. An efficient protocol for NMR-spectroscopy-based structure determination of protein complexes in solution. Angew. Chem. Int. Ed. Engl. 2010;49:1967–1970. doi: 10.1002/anie.200906147. [DOI] [PubMed] [Google Scholar]
  • 54.Svergun D.I., Petoukhov M.V., Koch M.H. Determination of domain structure of proteins from x-ray solution scattering. Biophys. J. 2001;80:2946–2953. doi: 10.1016/S0006-3495(01)76260-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Fawzi N.L., Fleissner M.R., Clore G.M. A rigid disulfide-linked nitroxide side chain simplifies the quantitative analysis of PRE data. J. Biomol. NMR. 2011;51:105–114. doi: 10.1007/s10858-011-9545-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Iwahara J., Schwieters C.D., Clore G.M. Ensemble approach for NMR structure refinement against 1H paramagnetic relaxation enhancement data arising from a flexible paramagnetic group attached to a macromolecule. J. Am. Chem. Soc. 2004;126:5879–5896. doi: 10.1021/ja031580d. [DOI] [PubMed] [Google Scholar]
  • 57.Maris C., Dominguez C., Allain F.H.T. The RNA recognition motif, a plastic RNA-binding platform to regulate post-transcriptional gene expression. FEBS J. 2005;272:2118–2131. doi: 10.1111/j.1742-4658.2005.04653.x. [DOI] [PubMed] [Google Scholar]
  • 58.Rossi P., Swapna G.V., Montelione G.T. A microscale protein NMR sample screening pipeline. J. Biomol. NMR. 2010;46:11–22. doi: 10.1007/s10858-009-9386-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Durand D., Vivès C., Fieschi F. NADPH oxidase activator p67(phox) behaves in solution as a multidomain protein with semi-flexible linkers. J. Struct. Biol. 2010;169:45–53. doi: 10.1016/j.jsb.2009.08.009. [DOI] [PubMed] [Google Scholar]
  • 60.Huang J.R., Warner L.R., Blackledge M. Transient electrostatic interactions dominate the conformational equilibrium sampled by multidomain splicing factor U2AF65: a combined NMR and SAXS study. J. Am. Chem. Soc. 2014;136:7068–7076. doi: 10.1021/ja502030n. [DOI] [PubMed] [Google Scholar]
  • 61.Zhang W.H., Otting G., Jackson C.J. Protein engineering with unnatural amino acids. Curr. Opin. Struct. Biol. 2013;23:581–587. doi: 10.1016/j.sbi.2013.06.009. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figs. S1–S11 and Tables S1–S3
mmc1.pdf (2MB, pdf)
Document S2. Article plus Supporting Material
mmc2.pdf (4.2MB, pdf)

Articles from Biophysical Journal are provided here courtesy of The Biophysical Society

RESOURCES