Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2019 Feb 7;9:1588. doi: 10.1038/s41598-018-36992-y

PFDB: A standardized protein folding database with temperature correction

Balachandran Manavalan 1, Kunihiro Kuwajima 1,2,3,, Jooyoung Lee 1,
PMCID: PMC6367381  PMID: 30733462

Abstract

We constructed a standardized protein folding kinetics database (PFDB) in which the logarithmic rate constants of all listed proteins are calculated at the standard temperature (25 °C). A temperature correction based on the Eyring–Kramers equation was introduced for proteins whose folding kinetics were originally measured at temperatures other than 25 °C. We verified the temperature correction by comparing the logarithmic rate constants predicted and experimentally observed at 25 °C for 14 different proteins, and the results demonstrated improvement of the quality of the database. PFDB consists of 141 (89 two-state and 52 non-two-state) single-domain globular proteins, which has the largest number among the currently available databases of protein folding kinetics. PFDB is thus intended to be used as a standard for developing and testing future predictive and theoretical studies of protein folding. PFDB can be accessed from the following link: http://lee.kias.re.kr/~bala/PFDB.

Introduction

Protein folding is one of the most difficult problems in biophysics and molecular biology. Due to the accumulation of over half a century’s experimental data on reversible folding-unfolding mechanisms1,2, at least 16 protein folding kinetics datasets have been reported319. However, there are many problems in these datasets, including variations in temperatures (from 5 °C to 75 °C) used in kinetic folding experiments, redundant data entries, and inadequate reported data. A more complete dataset of protein folding kinetics with corrections for the above problems is thus required, and once we have such a dataset, it will be very useful for developing and testing future predictive and theoretical studies of protein folding.

Here, we thus carefully examined the existing protein folding datasets, and introduced the necessary corrections. Among the available datasets, ACPro19 and the dataset by Garbuzynskiy et al.17 (hereinafter referred to as the Garbuzynskiy dataset) were the most recent ones, which contained the most updated and largest entries. Therefore, we utilized these two datasets in the current study to construct a new database called PFDB. Furthermore, we added new protein data into the PFDB from our own collection based on extensive literature search, which resulted in the entry size of 141 globular proteins in our dataset; whose size is the biggest among the currently available protein folding datasets.

In this study, we also developed a new temperature correction method for the proteins whose kinetic folding and unfolding experiments had been carried out at a temperature different from the standard temperature (25 °C). Our temperature correction method is based on the Eyring–Kramers equation20, and the logarithmic rate constants of folding and unfolding, ln(kf) and ln(ku), respectively, at 25 °C is provided for all proteins in PFDB. Interestingly, the present study is the first to introduce the temperature corrections into the protein folding dataset, and we show that the introduction of the temperature correction has improved the quality of the database. PFDB is thus currently the most updated database of protein folding kinetics, and hence it can be used as a standard for developing future predictive and theoretical studies of protein folding.

Results and Discussions

Database construction and descriptions

We first combined the two most recent datasets of protein folding, the ACPro and Garbuzynskiy datasets, to construct the combined dataset (hereafter called “the AG dataset”) in which redundant or inappropriate entries were filtered out. We excluded the proteins containing disulfide linkages or covalently bound prosthetic groups, because the presence of these linkages or groups can significantly affect the folding kinetics. Small polypeptides with less than 34 residues were also excluded. We carefully examined each data in the AG dataset. For instance, if there is no updated protein folding kinetics data available for a protein, we included those proteins as such in PFDB, otherwise replaced with the updated data. Furthermore, we added the data of 33 new proteins into the PFDB from our own collection based on extensive literature search, resulting in the entry size of 141 globular proteins (89 two-state (2S) and 52 non-two-state (N2S) proteins) in our dataset (see Methods for details of the database construction).

Our dataset lists the following items: (i) the protein short name with a reference to the original experimental paper(s) on the folding kinetics, (ii) the PDB code, (iii) the structural class (α, β, α/β, and α + β), (iv) folds in the SCOP classification21 (http://scop.mrc-lmb.cam.ac.uk/scop/), (v) the number of residues in the PDB structure (LPDB), (vi) the actual number of residues of the protein used in the folding experiment (L), (vii) the experimental conditions (pH and temperature), (viii) the folding type (2S or N2S), (ix) the ln(kf) value reported, (x) the ln(kf) value after the temperature correction for the proteins whose folding experiments were carried out at a temperature other than 25 °C, (xi) the logarithmic rate constant of formation of a folding intermediate, ln(kI), when the value is available in the literature (only for N2S proteins), (xii) the ln(ku) value reported, (xiii) the ln(ku) value after the temperature correction, and (xiv) the Tanford β (βT) value, which is defined as βT = 1 − (mu/mNU), where mu (kJ/mol/M) and mNU (kJ/mol/M) are the denaturant concentration dependence of the activation free energy of unfolding and the denaturant concentration dependence of the unfolding free energy from the native (N) to the fully unfolded (U) state, respectively22. The ln(kf), ln(kI) and ln(ku) values listed in PFDB are those in the absence of denaturant, usually obtained by linear extrapolation of the logarithmic rate constant along denaturant concentration.

In PFDB, the folding type is thus clearly specified. The proteins that exhibited a stable folding intermediate during the kinetic folding process were classified as N2S proteins, while the proteins, exhibiting the single-exponential kinetics of folding without stable intermediates, were classified as 2S proteins even if the existence of an unstable high-energy intermediate was expected from the unfolding-limb or the folding-limb curvature of the chevron plot23. To discriminate the 2S proteins with a high-energy intermediate from the other 2S proteins, the former proteins were denoted by 2S*. Each entry of the AG dataset is also included in PFDB for comparison. A comment section is provided in the final column of the dataset and interprets discrepancies between the present and the AG datasets if any/necessary. Figure 1 depicts a snapshot of our dataset shown in the PFDB homepage.

Figure 1.

Figure 1

A snapshot of our dataset in the PFDB homepage. For each protein, our dataset lists (i) protein short name, (ii) PDB code, (iii) structural class (α, β, α/β, and α + β), (iv) folds in the SCOP classification, (v) the number of residues in the PDB structure (LPDB), (vi) the actual number of residues of the protein used in the folding experiment (L), (vii) experimental conditions (pH and temperature), (viii) folding type (2S or N2S), (ix) ln(kf) reported, (x) ln(kf) after temperature correction, (xi) ln(kI) (only for N2S proteins), (xii) ln(ku) reported, (xiii) ln(ku) after temperature correction, and (xiv) Tanford β (βT). The AG dataset is also included in our database for comparison. A comment section is provided in the final column.

The protein composition in PFDB in terms of the folding type and the structural class is given in Table 1. It shows that both the 2S and N2S proteins cover all four structural classes of globular proteins. However, the 2S proteins contain only one α/β protein.

Table 1.

The composition of the PFDB in terms of structural and folding class is shown.

Folding type Structural class
α β α + β α/β Total
2S 24 39 25 1 89
N2S 10 13 16 13 52
Total 34 52 41 14 141

Temperature correction

Figure 2A shows a distribution of the temperature at which the ln(kf) was determined experimentally for the proteins in our dataset. Among the 141 proteins in PFDB, 99 were measured at the standard temperature of T0 (25 °C (=298.15 K)), but the other 42 (24 2S and 18 N2S proteins) were measured at different temperatures (Tx). The Tx value ranged from 5 °C to 75 °C. To maintain the consistency of folding temperature in PFDB, we developed a method for temperature correction. The predicted shape of the Eyring plot of a particular protein is determined by two parameters of the folding or unfolding reaction, the activation heat capacity (ΔCp) and the temperature (TH) where the activation enthalpy is zero (see Methods for more details). The predicted logarithmic rate constant at T0 (298.15 K) is thus given by the following equation:

ln[k(T0)]=ln[k(Tx)]+[1+ΔCpR]ln(T0Tx)+ΔCpR[(1T01Tx)TH] 1

where R is the gas constant, T0 and Tx are given by the absolute temperature, and ln[k(Tx)] is the logarithmic rate constant measured at Tx; the detailed derivation of Eq. (1) is given in Methods. We assumed that ΔCp is proportional to the heat capacity change (ΔCp) of the equilibrium protein unfolding. The ΔCp is approximately proportional to the protein chain length in the PDB structure (LPDB) and empirically given by24:

ΔCp=0.062LPDB0.53[kJ/mol/K] 2

Figure 2.

Figure 2

(A) The temperature at which ln(kf) experimentally determined for 2S and N2S is shown. (B) Experimentally observed ln[kf(T0)] and predicted ones after temperature correction (red circles) are shown. Observed ln[kf(Tx)] values are also shown for comparison (blue crosses).

Now, it follows that:

ΔCP=βΔCp=β(0.062LPDB0.53)[kJ/mol/K] 3

where β is a proportionality constant. Therefore, once we have reasonable estimates of TH and β, we can evaluate ln[k(T0)] from ln[k(Tx)] and Tx by Eqs (1) and (3). It is worth mentioning that Eq. 2 is an empirical one, and theoretically, the ΔCp diminishes to zero when LPDB tends to zero. A regression equation between ΔCp and LPDB with the zero intercept has thus also been reported in the original literature as given by ΔCp = 0.058 LPDB24. Whether we used this equation or Eq. 2, the results of temperature correction were essentially identical for the proteins in our dataset, where LPDB ≥ 34.

Temperature correction for folding

We introduced the temperature corrections into the proteins whose kf values were measured at a temperature other than the standard temperature (298.15 K). First, we found that the Eyring plot or the equivalent plot of folding was well described in 14 2S proteins and 3 N2S proteins; the kf values were measured at every few degrees absolute from ~280 K to ~320 K for most of these proteins2541. Both the TH and β values for folding kinetics, THf and βf, respectively, were more or less common among the different 2S proteins (Table 2) and also among the different N2S proteins (Table 3), except for two 2S proteins (1K9Q40 and 1PIN41), for which −ΔCp for folding was larger than ΔCp. Therefore, we employed the 12 2S proteins except for these two and the 3 N2S proteins, and from their Eyring plots, we calculated the THf and ΔCpf. Examples of the Eyring plot for three proteins (1APS34, 1D6O35, and 1AVZ37) are shown in Figure S1. For folding kinetics, the Eyring plot is convexed, and hence, THf corresponds to the temperature of the maximum point in the Eyring plot. The ΔCpf is given by the curvature of the Eyring plot, and the βf was thus evaluated by βf = ΔCpfCp, where ΔCp was obtained by Eq. (2); ΔCpf and βf are negative because the Eyring plot is convexed. The THf and βf values thus obtained were averaged for the 12 2S proteins and for the 3 N2S proteins (Tables 2 and 3). The THf and βf values thus obtained are 315 ± 1 (standard error estimate) K and −0.62 ± 0.03 for the 2S proteins, and 305 ± 4 K and −0.75 ± 0.07 for the N2S proteins.

Table 2.

List of proteins used to estimate THf and βf for two-state proteins.

PDB L PDB Temp. (K) ΔH (kJ/mol) ΔCpf (kJ/mol/K) THf (K) ΔCp (kJ/mol/K) β f
1APS34 98 301.15 40.70 −2.57 316.99 5.55 −0.46
1D6O35 107 298.15 48.53 −2.80 315.46 6.10 −0.46
1E0G28 48 298.15 28.45 −1.76 314.34 2.45 −0.72
1HDN30 85 293.15 86.10 −3.22 319.89 4.74 −0.68
2VH729 94 301.15 23.60 −2.48 310.67 5.30 −0.47
3CI239 64 298.00 53.55 −2.05 324.12 3.44 −0.60
1EHB62 82 298.15 42.40 −3.60 309.93 4.55 −0.79
1CSP38 67 298.15 31.60 −2.70 309.85 3.62 −0.74
1AVZ37 57 293.00 43.09 −1.86 316.20 3.00 −0.62
1SHG36 57 298.00 37.00 −2.30 314.09 3.00 −0.77
1HCD31 118 293.15 57.74 −4.39 306.29 6.79 −0.65
2JMC25 77 298.15 45.00 −2.20 318.60 4.24 −0.52
Mean ± SE 314.70 ± 1.44 −0.62 ± 0.03
Table 3.

List of proteins used to estimate THf and βf for non-two-state proteins.

PDB L PDB Temp. (K) ΔH (kJ/mol) ΔCpf (kJ/mol/K) THf (K) ΔCp (kJ/mol/K) β f
2CRO26 65 293.15 40.70 −3.05 310.50 3.50 −0.87
1PGB32 56 298.15 16.80 −1.90 306.99 2.94 −0.64
1L6333 162 285.15 92.05 −6.84 298.61 9.51 −0.72
Mean ± SE 305.369 ± 3.526 −0.746 ± 0.067

For the proteins whose THf and ΔCpf were not available directly, we employed Eqs (1) and (3) to predict ln[kf(T0)] by assigning the THf and βf values to TH and β in the equations. However, for the proteins whose THf and ΔCpf were available (1E0G28, 1HDN30, 2VH729, 1EHB27, 1HCD31, and 2CRO26), we directly calculated the ln[kf(T0)] values by Eq. (1). To distinguish ln[kf(T0)] predicted by using the averaged THf and βf and that directly calculated by Eq. (1) with the known THf and ΔCpf, the latter values are indicated in boldface type in our dataset. It should be also noted that the above THf and βf estimates were based on the folding data of the proteins from mesophilic organisms, and hence some care may be required when applied to the thermophilic proteins.

Next, we compared predicted ln[kf(T0)] after the temperature correction with the experimentally observed ln[kf(T0)]. For 9 2S and 5 N2S proteins (Table 4), which were not included in those used for estimating THf and βf, the experimental ln(kf) was available at both T0 and Tx. We thus applied the temperature correction to the ln[kf(Tx)] values using the above THf and βf, and compared predicted ln[kf(T0)] with the experimentally observed ln[kf(T0)]. From Fig. 2B, the predicted ln[kf(T0)] values show good agreement with the experimentally observed ones, showing the validity of our temperature correction. Although the number of data points used for this analysis is not very large (only 14 proteins), it may be enough to suggest that the temperature corrections have improved the quality of the database of protein folding.

Table 4.

List of Proteins used for predicting ln(kf) at 25 °C.

PDB ln[kf(Tx)] Tx (K) ln[kf(T0)] observed ln[kf(T0)] predicted
1FNF63 −2.66 278.15 −0.92 −0.12
1IMQ13,43 7.09 283.15 7.33 8.69
1K9Q40,44 8.92 311.15 8.37 8.67
1K9Q40,44 7.41 351.15 8.37 7.87
1RFA45 4.40 281.15 7.00 6.11
1SS146 12.41 323.15 12.08 12.07
1SS146 11.33 283.15 12.08 12.37
1U4Q47,48 9.48 283.15 11.00 11.56
2WXC49,50 11.17 283 11.73 12.00
1BNI 51 2.07 318.15 2.50 2.31
1DWR* 64, 65 1.10 281.15 2.88 3.79
1NFI 66 1.00 288.15 1.76 2.08
1NFI 66 0.62 283.15 1.76 2.60
1EKG 52 2.60 288.15 3.54 3.51

*T0 for 1DWR was 299.15 K (26 °C).

Normal font and bold, respectively, represent the 2S and N2S proteins.

Denaturant m values, the dependence of the free energy of unfolding on denaturant concentration, are well correlated with the ΔCp of unfolding42. Therefore, we can reasonably assume that βf is equivalent to −βT for 2S proteins. Therefore, for the 2S proteins for which the βT is available, we also calculated the ln[kf(T0)] values by assigning the THf and −βT values to TH and β in Eqs (1) and (3). The ln[kf(T0)] values thus obtained are also listed in PFDB and indicated in italic type to distinguish them from those (in roman type) predicted on the basis of THf and βf. As seen from the PFDB dataset, these two types of predicted ln[kf(T0)] are reasonably coincident with each other.

Temperature correction for unfolding

We introduced the temperature corrections into the proteins whose ku values were measured at a temperature other than the standard temperature (298.15 K), and the TH and β values for unfolding kinetics, THu and βu, respectively, were required for temperature correction. For unfolding kinetics, the Eyring plot is usually concaved with a positive βu. For 2S proteins, there is only a single transition state between U and N with a βf of −0.62 ± 0.03, and we can reasonably assume that βu = 1 + βf. Therefore, we find that βu = 0.38 ± 0.03. For N2S proteins, this simple relationship may not hold, because of a contribution from an intermediate (I) state. For the N2S proteins, however, (1 − βT) is expected to be equivalent to βu, because βT represents the relative position of the transition state between U and N in terms of the denaturant m values. The βT was reported for 38 N2S proteins in PFDB, and their average was estimated at 0.79 ± 0.02, and hence βu = 0.21 ± 0.02 for N2S proteins; 1FTG was excluded in this calculation because the I state was mostly off-pathway in this protein.

The THu corresponds to the temperature of the minimum point of the Eyring plot, but this is usually located at far below an observable temperature range of unfolding kinetics, leading to a large error in estimation of THu due to a long extrapolation along temperature. Furthermore, the Eyring plot of unfolding is not available for many of the proteins used above for estimation of THf and βf. Therefore, we had to use a different way to estimate THu. We thus chose 6 2S proteins (1IMQ13,43,1K9Q40,44, 1RFA45, 1SS146, 1U4Q47,48, and 2WXC49,50) and 3 N2S proteins (1BNI51, 1EKG52, and 1ENH53), for which the experimental ln(ku) is available at both T0 and Tx (Table 5). First, we assumed appropriate THu values (e.g., 200 K and 150 K) for 2S and N2S proteins, and assigned these THu values and the above βu values to TH and β in Eqs (1) and (3) to calculated tentative predictions of ln[ku(T0)] for 2S and N2S proteins. Then, the THu values were gradually increased or decreased until the root-mean-square deviation between the experimentally observed ln[ku(T0)] and the predicted ln[ku(T0)] values was minimized. The optimized THu values thus obtained were 224 K and 119 K for the 2S and N2S proteins, respectively. Figure 3 shows a comparison between the experimental ln[ku(T0)] values and those predicted by using the above THu and βu values, which indicates a reasonable coincidence between the experimental and predicted values.

Table 5.

List of proteins used for predicting ln(ku) at 25 °C.

PDB ln[ku(Tx)] Tx (K) ln[ku(T0)] observed ln[ku(T0)] predicted
1IMQ13,43 −4.42 283.15 −1.87 −1.79
1K9Q40,44 10.92 351.15 6.66 6.30
1K9Q40,44 7.38 311.15 6.66 6.33
1RFA45 −3.10 281.15 −1.17 −0.45
1SS146 7.40 323.15 3.40 4.20
1SS146 0.92 283.15 3.40 2.61
1U4Q47,48 −3.37 298.15 0.26 0.06
2WXC49,50 6.65 283 7.65 7.98
1BNI 51 −3.13 318.15 −10.55 −9.51
1EKG 52 −11.02 288.15 −8.87 −7.42
1ENH 53 10.78 325.3 7.00 6.79

Normal font and bold, respectively, represent the 2S and N2S proteins.

Figure 3.

Figure 3

Experimentally observed ln[ku(T0)] and predicted ones after temperature correction (red circles) are shown. Observed ln[ku(Tx)] values are also shown for comparison (blue crosses).

For the proteins whose THu and ΔCpu were not available directly, we thus employed Eqs (1) and (3) to predict the ln[ku(T0)] by assigning the THu and βu values to TH and β in the equations. However, for the proteins whose THu and ΔCpu were available (1EHB27 and 1HCD31), we directly calculated the ln[ku(T0)] values by Eq. (1). To distinguish the ln[ku(T0)] predicted by using the optimized THu and βu and that directly calculated by Eq. (1) with the known THu and ΔCpu, the latter values are indicated in boldface type in our dataset.

For the 2S proteins for which the βT is available, we also calculated the ln[ku(T0)] values by assigning the THu and (1 − βT) values to TH and β in Eqs (1) and (3). The ln[ku(T0)] values thus obtained are also listed in PFDB and indicated in italic type to distinguish them from those (in roman type) predicted on the basis of THu and βu. As seen from the PFDB dataset, these two types of predicted ln[ku(T0)] are reasonably coincident with each other.

Availability of PFDB

As a user-friendly database, PFDB is freely available at http://lee.kias.re.kr/~bala/PFDB. The database main page contains the following options: HOME, N2S, 2S, DOWNLOAD DATASET, and CONTACT. Our dataset can be downloaded by clicking the “DOWNLOAD DATASET” button.

Conclusions

In this study, we have constructed PFDB, a systematically compiled standardized database of protein folding kinetics. It is currently the most updated one with the highest number of unique entries. The quality of the dataset has been improved significantly by our temperature correction method. Therefore, our dataset can be used as a standard for developing and testing future predictive and theoretical studies of protein folding kinetics.

Methods

Construction of the AG dataset

The most recent datasets of protein folding kinetics are ACPro19 and the Garbuzynskiy dataset17. Prior to the filtering processes shown below, the ACPro dataset contained 126 proteins. Among these, we weeded out proteins with less than 34 residues (1PGB (41–56), 1L2Y and 3M48), proteins with disulfide bonds (2HQI, 1HEL, 1E65 and 1HMK), proteins with a covalently-bound prosthetic group (1YCC, 1YEA, 256B and 1HRC), proteins with irrelevant rate constants (i.e., the rate constant for formation of an intermediate instead of the actual folding rate constant (kf) for a few proteins (1AON, 1BD8 and 1JON)), and proteins whose kf was reported in the presence of denaturant (1QOP chain B). In the case of ileal lipid binding protein, the actual folding experiment was performed on the rat protein, but its PDB coordinates were not available at the time of our database creation. Instead, the reported PDB ID of 1EAL is the pig protein that is of 71.1% sequence identity with the rat protein. Since the exact PDB coordinates were not available, we excluded this protein as well as another protein without experimental references (1PSF). Furthermore, 6 proteins had duplicate entries (1NTI–2FDQ, 1SRL–1FMK, 1BF4–1BNZ, 1POH–2HPR, 1O6X–1PBA and 1EAL–2EAL) which we corrected. These filtering processes resulted in the reduction of the size of the ACPro dataset from 126 to 102 proteins. We then applied the same filtering scheme to the Garbuzynskiy dataset (107 proteins) where we weeded out proteins with less than 34 residues (1L2Y, 1T8J, 1PGB (41–56), and the 3rd entry in the Garbuzynskiy dataset), proteins with irrelevant rate constants (1AON and 1BD8), the protein 1EAL (the reason is given above), and a protein with a covalently-bound prosthetic group (256B). This change reduced the size of the Garbuzynskiy dataset from 107 to 99 proteins. When we compared the updated Garbuzynskiy (99 proteins) and ACPro (102 proteins) datasets, 6 unique proteins (1IFC, 1CBI, 1IGS, 1OPA, 2MYO and 3H08) were identified in the Garbuzynskiy dataset. Therefore, we added these 6 proteins to the ACPro dataset, and collectively named it the AG dataset (108 proteins).

Data collection and construction of PFDB

We manually collected the data of protein folding and unfolding kinetics by extensive literature search. Then we compared our collected data with those of the AG dataset. We carefully examined the data of each entry of the AG dataset, and when newer updated data did not exist, the data of that entry were included as such in our dataset of PFDB, otherwise replaced by the updated data. Finally, we added the data of 33 new proteins into the PFDB from our own collection. Of these 33 proteins, 19 are 2S proteins (1DKT, 1FGA, 1IO2, 1KDX, 1NFI,1QAU, 1RG8, 2BKF, 2GA5, 2J5A, 2JMC, 2LLH, 2L6R, 2WQG, 3O48, 3O49, 3O4D, 3ZRT (N-terminal), and 3ZRT (C-terminal)) with the remaining 14 being N2S proteins (1DWR, 1EKG, 1FA3, 1HRH, 1OKS, 1THF, 1UCH, 2BJD, 2FS6, 2KDI, 2KLL, 2X7Z, 3BLM, and 5L8I).

For 4 proteins (1RA9, 1B9C, 1FA3, and 2PQE), the presence of multiple parallel pathways of folding has been reported5456, and the kf value was obtained by averaging the rate constant values along the individual pathways:

kf=i=1nfiki 4

where fi and ki are the fractional amplitude and the observed rate constant, respectively, for the ith pathway of folding, and the ln(kf) values thus obtained are listed in our dataset.

The ln(kf), ln(kI) and ln(ku) values listed in PFDB are those in the absence of denaturant, usually obtained by linear extrapolation of the logarithmic rate constants along molar denaturant concentration. However, for 5 N2S proteins (1PHP (1–175)57, 1PHP (186–394)58, 1L6359, 1HNG60, and 1TTG61), the equilibria and kinetics of folding and unfolding were analyzed in terms of denaturant activity rather than the molar concentration. Whether we use the activity or the concentration in our calculation seriously affects the ln(ku) estimation, because a long extrapolation from high concentrations of denaturant back to the native condition is required. To keep consistency of our dataset, we used the linear extrapolation along the molar concentration, as far as such data were available, to estimate the ln(ku).

Derivation of Eq (1) for the temperature correction

In this study, we introduced a method for temperature correction, which gives the folding and unfolding rate constants at 25 °C (k(T0) where T0 = 298.15 K) for a protein whose rate constant at any temperature (Tx) is known. The following section will describe the derivation of Eq. (1).

According to the Eyring–Kramers equation20, we find that:

ln(kT)=C1RT[ΔH(TH)TΔS(TH)+ΔCp{TTHTln(TTH)}] 5

where ΔH(TH) and ΔS(TH) are the activation enthalpy and the activation entropy, respectively, at a reference temperature TH, and ΔCp is the activation heat capacity; we assume that ΔCp is a constant independent of temperature (T). When we set TH to the temperature where ΔH is zero, i.e., the maximum or minimum point of the Eyring plot, Eq. (5) is rewritten as:

ln(kT)=C2ΔCpRT[TTHTln(TTH)] 6

where C2 is a temperature-independent constant (C2 = C + ΔS(TH)/R). When ΔCp and the ΔH(Ta) at a particular temperature (Ta) are known, TH is simply given by TH = [Ta − ΔH(Ta)/ΔCp]. From Eq. (6), we can obtain the temperature dependence of ln(k/T), once we have TH and ΔCp. The difference in ln(k/T) between T0 (=298.15 K) and Tx is thus given by:

ln[k(T0)T0]ln[k(Tx)Tx]=ΔCpR[THT0THTx+ln(T0Tx)] 7

Therefore, we obtain Eq. (1).

Supplementary information

Supplementary information (772.7KB, docx)

Acknowledgements

The work was supported in part by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2017R1E1A1A01077717), and by JSPS (Japan Society for the Promotion of Science) KAKENHI Grant Numbers JP25440075 and JP16K07314. The authors thank KIAS Center for Advanced Computation for providing computing resources for this work.

Author Contributions

B.M., K.K. and J.L. designed and performed research, analyzed the data and wrote the paper.

Competing Interests

The authors declare no competing interests.

Footnotes

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Kunihiro Kuwajima, Email: kuwajima@ims.ac.jp.

Jooyoung Lee, Email: jlee@kias.re.kr.

Electronic supplementary material

Supplementary information accompanies this paper at 10.1038/s41598-018-36992-y.

References

  • 1.Dill KA, MacCallum JL. The protein-folding problem, 50 years on. Science. 2012;338:1042–1046. doi: 10.1126/science.1219021. [DOI] [PubMed] [Google Scholar]
  • 2.Englander SW, Mayne L. The nature of protein folding pathways. Proc Natl Acad Sci USA. 2014;111:15873–15880. doi: 10.1073/pnas.1411798111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Bogatyreva NS, Osypov AA, Ivankov DN. KineticDB: a database of protein folding kinetics. Nucleic Acids Res. 2009;37:D342–346. doi: 10.1093/nar/gkn696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.De Sancho D, Munoz V. Integrated prediction of protein folding and unfolding rates from only size and structural class. Phys Chem Chem Phys. 2011;13:17030–17043. doi: 10.1039/c1cp20402e. [DOI] [PubMed] [Google Scholar]
  • 5.Guo J, Rao N. Predicting protein folding rate from amino acid sequence. J Bioinform Comput Biol. 2011;9:1–13. doi: 10.1142/S0219720011005306. [DOI] [PubMed] [Google Scholar]
  • 6.Huang JT, Cheng JP, Chen H. Secondary structure length as a determinant of folding rate of proteins with two- and three-state kinetics. Proteins. 2007;67:12–17. doi: 10.1002/prot.21282. [DOI] [PubMed] [Google Scholar]
  • 7.Huang JT, Xing DJ, Huang W. Relationship between protein folding kinetics and amino acid properties. Amino acids. 2012;43:567–572. doi: 10.1007/s00726-011-1189-3. [DOI] [PubMed] [Google Scholar]
  • 8.Istomin AY, Jacobs DJ, Livesay DR. On the role of structural class of a protein with two-state folding kinetics in determining correlations between its size, topology, and folding rate. Protein Sci. 2007;16:2564–2569. doi: 10.1110/ps.073124507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ivankov DN, Finkelstein AV. Prediction of protein folding rates from the amino acid sequence-predicted secondary structure. Proc Natl Acad Sci USA. 2004;101:8942–8944. doi: 10.1073/pnas.0402659101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ivankov DN, et al. Contact order revisited: influence of protein size on the folding rate. Protein Sci. 2003;12:2057–2062. doi: 10.1110/ps.0302503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Jung J, Buglass AJ, Lee E-K. Topological quantities determining the folding/unfolding rate of two-state folding proteins. Journal of solution chemistry. 2010;39:943–958. doi: 10.1007/s10953-010-9556-3. [DOI] [Google Scholar]
  • 12.Jung J, Lee J, Moon HT. Topological determinants of protein unfolding rates. Proteins. 2005;58:389–395. doi: 10.1002/prot.20324. [DOI] [PubMed] [Google Scholar]
  • 13.Maxwell KL, et al. Protein folding: defining a “standard” set of experimental conditions and a preliminary kinetic data set of two-state proteins. Protein Sci. 2005;14:602–616. doi: 10.1110/ps.041205405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ouyang Z, Liang J. Predicting protein folding rates from geometric contact and amino acid sequence. Protein Sci. 2008;17:1256–1263. doi: 10.1110/ps.034660.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhou H, Zhou Y. Folding rate prediction using total contact distance. Biophys J. 2002;82:458–463. doi: 10.1016/S0006-3495(02)75410-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zou T, Ozkan SB. Local and non-local native topologies reveal the underlying folding landscape of proteins. Phys Biol. 2011;8:066011. doi: 10.1088/1478-3975/8/6/066011. [DOI] [PubMed] [Google Scholar]
  • 17.Garbuzynskiy SO, Ivankov DN, Bogatyreva NS, Finkelstein AV. Golden triangle for folding rates of globular proteins. Proc Natl Acad Sci USA. 2013;110:147–150. doi: 10.1073/pnas.1210180110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Gromiha MM, Thangakani AM, Selvaraj S. FOLD-RATE: prediction of protein folding rates from amino acid sequence. Nucleic Acids Res. 2006;34:W70–74. doi: 10.1093/nar/gkl043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wagaman AS, Coburn A, Brand-Thomas I, Dash B, Jaswal SS. A comprehensive database of verified experimental data on protein folding kinetics. Protein Sci. 2014;23:1808–1812. doi: 10.1002/pro.2551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bilsel O, Matthews CR. Barriers in protein folding reactions. Adv Protein Chem. 2000;53:153–207. doi: 10.1016/S0065-3233(00)53004-6. [DOI] [PubMed] [Google Scholar]
  • 21.Andreeva A, et al. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008;36:D419–425. doi: 10.1093/nar/gkm993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Jackson SE. How do small single-domain proteins fold? Folding and Design. 1998;3:R81–R91. doi: 10.1016/S1359-0278(98)00033-9. [DOI] [PubMed] [Google Scholar]
  • 23.Sanchez IE, Kiefhaber T. Evidence for sequential barriers and obligatory intermediates in apparent two-state protein folding. J Mol Biol. 2003;325:367–376. doi: 10.1016/S0022-2836(02)01230-5. [DOI] [PubMed] [Google Scholar]
  • 24.Robertson AD, Murphy KP. Protein Structure and the Energetics of Protein Stability. Chem Rev. 1997;97:1251–1268. doi: 10.1021/cr960383c. [DOI] [PubMed] [Google Scholar]
  • 25.Candel AM, Cobos ES, Conejero-Lara F, Martinez JC. Evaluation of folding co-operativity of a chimeric protein based on the molecular recognition between polyproline ligands and SH3 domains. Protein Eng Des Sel. 2009;22:597–606. doi: 10.1093/protein/gzp041. [DOI] [PubMed] [Google Scholar]
  • 26.Laurents DV, et al. Folding kinetics of phage 434 Cro protein. Biochemistry. 2000;39:13963–13973. doi: 10.1021/bi001388d. [DOI] [PubMed] [Google Scholar]
  • 27.Manyusa S, Whitford D. Defining folding and unfolding reactions of apocytochrome b 5 using equilibrium and kinetic fluorescence measurements. Biochemistry. 1999;38:9533–9540. doi: 10.1021/bi990550d. [DOI] [PubMed] [Google Scholar]
  • 28.Nickson AA, Stoll KE, Clarke J. Folding of a LysM domain: entropy-enthalpy compensation in the transition state of an ideal two-state folder. J Mol Biol. 2008;380:557–569. doi: 10.1016/j.jmb.2008.05.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Taddei N, et al. Thermodynamics and kinetics of folding of common-type acylphosphatase: comparison to the highly homologous muscle isoenzyme. Biochemistry. 1999;38:2135–2142. doi: 10.1021/bi9822630. [DOI] [PubMed] [Google Scholar]
  • 30.Van Nuland NA, et al. Slow cooperative folding of a small globular protein HPr. Biochemistry. 1998;37:622–637. doi: 10.1021/bi9717946. [DOI] [PubMed] [Google Scholar]
  • 31.Wong HJ, Stathopulos PB, Bonner JM, Sawyer M, Meiering EM. Non-linear effects of temperature and urea on the thermodynamics and kinetics of folding and unfolding of hisactophilin. J Mol Biol. 2004;344:1089–1107. doi: 10.1016/j.jmb.2004.09.091. [DOI] [PubMed] [Google Scholar]
  • 32.Alexander P, Orban J, Bryan P. Kinetic analysis of folding and unfolding the 56 amino acid IgG-binding domain of streptococcal protein G. Biochemistry. 1992;31:7243–7248. doi: 10.1021/bi00147a006. [DOI] [PubMed] [Google Scholar]
  • 33.Chen BL, Baase WA, Schellman JA. Low-temperature unfolding of a mutant of phage T4 lysozyme. 2. Kinetic investigations. Biochemistry. 1989;28:691–699. doi: 10.1021/bi00428a042. [DOI] [PubMed] [Google Scholar]
  • 34.Chiti F, et al. Structural characterization of the transition state for folding of muscle acylphosphatase. J Mol Biol. 1998;283:893–903. doi: 10.1006/jmbi.1998.2010. [DOI] [PubMed] [Google Scholar]
  • 35.Main ER, Fulton KF, Jackson SE. Folding pathway of FKBP12 and characterisation of the transition state. J Mol Biol. 1999;291:429–444. doi: 10.1006/jmbi.1999.2941. [DOI] [PubMed] [Google Scholar]
  • 36.Martinez JC, Pisabarro MT, Serrano L. Obligatory steps in protein folding and the conformational diversity of the transition state. Nat Struct Biol. 1998;5:721–729. doi: 10.1038/1418. [DOI] [PubMed] [Google Scholar]
  • 37.Plaxco KW, et al. The folding kinetics and thermodynamics of the Fyn-SH3 domain. Biochemistry. 1998;37:2529–2537. doi: 10.1021/bi972075u. [DOI] [PubMed] [Google Scholar]
  • 38.Schindler T, Schmid FX. Thermodynamic properties of an extremely rapid protein folding reaction. Biochemistry. 1996;35:16833–16842. doi: 10.1021/bi962090j. [DOI] [PubMed] [Google Scholar]
  • 39.Tan YJ, Oliveberg M, Fersht AR. Titration properties and thermodynamics of the transition state for folding: comparison of two-state and multi-state folding pathways. J Mol Biol. 1996;264:377–389. doi: 10.1006/jmbi.1996.0647. [DOI] [PubMed] [Google Scholar]
  • 40.Crane JC, Koepf EK, Kelly JW, Gruebele M. Mapping the transition state of the WW domain beta-sheet. J Mol Biol. 2000;298:283–292. doi: 10.1006/jmbi.2000.3665. [DOI] [PubMed] [Google Scholar]
  • 41.Jager M, Nguyen H, Crane JC, Kelly JW, Gruebele M. The folding mechanism of a beta-sheet: the WW domain. J Mol Biol. 2001;311:373–393. doi: 10.1006/jmbi.2001.4873. [DOI] [PubMed] [Google Scholar]
  • 42.Myers JK, Pace CN, Scholtz JM. Denaturant m values and heat capacity changes: relation to changes in accessible surface areas of protein unfolding. Protein Science. 1995;4:2138–2148. doi: 10.1002/pro.5560041020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Friel CT, Capaldi AP, Radford SE. Structural analysis of the rate-limiting transition states in the folding of Im7 and Im9: similarities and differences in the folding of homologous proteins. J Mol Biol. 2003;326:293–305. doi: 10.1016/S0022-2836(02)01249-4. [DOI] [PubMed] [Google Scholar]
  • 44.Ferguson N, Johnson CM, Macias M, Oschkinat H, Fersht A. Ultrafast folding of WW domains without structured aromatic clusters in the denatured state. Proc Natl Acad Sci USA. 2001;98:13002–13007. doi: 10.1073/pnas.221467198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Vallee-Belisle A, Turcotte JF, Michnick S. W. raf RBD and ubiquitin proteins share similar folds, folding rates and mechanisms despite having unrelated amino acid sequences. Biochemistry. 2004;43:8447–8458. doi: 10.1021/bi0359426. [DOI] [PubMed] [Google Scholar]
  • 46.Dimitriadis G, et al. Microsecond folding dynamics of the F13W G29A mutant of the B domain of staphylococcal protein A by laser-induced temperature jump. Proc Natl Acad Sci USA. 2004;101:3809–3814. doi: 10.1073/pnas.0306433101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Scott KA, Batey S, Hooton KA, Clarke J. The folding of spectrin domains I: wild-type domains have the same stability but very different kinetic properties. J Mol Biol. 2004;344:195–205. doi: 10.1016/j.jmb.2004.09.037. [DOI] [PubMed] [Google Scholar]
  • 48.Wensley BG, Gartner M, Choo WX, Batey S, Clarke J. Different members of a simple three-helix bundle protein family have very different folding rate constants and fold by different mechanisms. J Mol Biol. 2009;390:1074–1085. doi: 10.1016/j.jmb.2009.05.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Neuweiler H, et al. Downhill versus barrier-limited folding of BBL 2: mechanistic insights from kinetics of folding monitored by independent tryptophan probes. J Mol Biol. 2009;387:975–985. doi: 10.1016/j.jmb.2008.12.056. [DOI] [PubMed] [Google Scholar]
  • 50.Neuweiler H, et al. The folding mechanism of BBL: Plasticity of transition-state structure observed within an ultrafast folding protein family. J Mol Biol. 2009;390:1060–1073. doi: 10.1016/j.jmb.2009.05.011. [DOI] [PubMed] [Google Scholar]
  • 51.Dalby PA, Clarke J, Johnson CM, Fersht AR. Folding intermediates of wild-type and mutants of barnase. II. Correlation of changes in equilibrium amide exchange kinetics with the population of the folding intermediate. J Mol Biol. 1998;276:647–656. doi: 10.1006/jmbi.1997.1547. [DOI] [PubMed] [Google Scholar]
  • 52.Faraj SE, Gonzalez-Lebrero RM, Roman EA, Santos J. Human Frataxin Folds Via an Intermediate State. Role of the C-Terminal Region. Sci Rep. 2016;6:20782. doi: 10.1038/srep20782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Mayor U, Johnson CM, Daggett V, Fersht AR. Protein folding and unfolding in microseconds to nanoseconds by experiment and simulation. Proc Natl Acad Sci USA. 2000;97:13518–13522. doi: 10.1073/pnas.250473497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Enoki, S., Saeki, K., Maki, K. & Kuwajima, K. Acid denaturation and refolding of green fluorescent protein. Biochemistry43, 14238–14248, 10.1021/bi048733+ (2004). [DOI] [PubMed]
  • 55.Kamagata K, Sawano Y, Tanokura M, Kuwajima K. Multiple parallel-pathway folding of proline-free Staphylococcal nuclease. J Mol Biol. 2003;332:1143–1153. doi: 10.1016/j.jmb.2003.07.002. [DOI] [PubMed] [Google Scholar]
  • 56.Patra AK, Udgaonkar JB. Characterization of the folding and unfolding reactions of single-chain monellin: evidence for multiple intermediates and competing pathways. Biochemistry. 2007;46:11727–11743. doi: 10.1021/bi701142a. [DOI] [PubMed] [Google Scholar]
  • 57.Parker MJ, Spencer J, Clarke AR. An integrated kinetic analysis of intermediates and transition states in protein folding reactions. J Mol Biol. 1995;253:771–786. doi: 10.1006/jmbi.1995.0590. [DOI] [PubMed] [Google Scholar]
  • 58.Parker MJ, Marqusee S. The cooperativity of burst phase reactions explored. J Mol Biol. 1999;293:1195–1210. doi: 10.1006/jmbi.1999.3204. [DOI] [PubMed] [Google Scholar]
  • 59.Parker MJ, et al. Domain behavior during the folding of a thermostable phosphoglycerate kinase. Biochemistry. 1996;35:15740–15752. doi: 10.1021/bi961330s. [DOI] [PubMed] [Google Scholar]
  • 60.Parker MJ, Dempsey CE, Lorch M, Clarke AR. Acquisition of native beta-strand topology during the rapid collapse phase of protein folding. Biochemistry. 1997;36:13396–13405. doi: 10.1021/bi971294c. [DOI] [PubMed] [Google Scholar]
  • 61.Cota E, Clarke J. Folding of beta-sandwich proteins: three-state transition of a fibronectin type III module. Protein Sci. 2000;9:112–120. doi: 10.1110/ps.9.1.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Manyusa S, Whitford D. Defining folding and unfolding reactions of apocytochrome b5 using equilibrium and kinetic fluorescence measurements. Biochemistry. 1999;38:9533–9540. doi: 10.1021/bi990550d. [DOI] [PubMed] [Google Scholar]
  • 63.Plaxco KW, Spitzfaden C, Campbell ID, Dobson CM. A comparison of the folding kinetics and thermodynamics of two homologous fibronectin type III modules. J Mol Biol. 1997;270:763–770. doi: 10.1006/jmbi.1997.1148. [DOI] [PubMed] [Google Scholar]
  • 64.Mizukami T, Abe Y, Maki K. Evidence for a Shared Mechanism in the Formation of Urea-Induced Kinetic and Equilibrium Intermediates of Horse Apomyoglobin from Ultrarapid Mixing Experiments. PLoS One. 2015;10:e0134238. doi: 10.1371/journal.pone.0134238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Uzawa T, et al. Collapse and search dynamics of apomyoglobin folding revealed by submillisecond observations of alpha-helical content and compactness. Proc Natl Acad Sci USA. 2004;101:1171–1176. doi: 10.1073/pnas.0305376101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.DeVries I, Ferreiro DU, Sanchez IE, Komives EA. Folding kinetics of the cooperatively folded subdomain of the IkappaBalpha ankyrin repeat domain. J Mol Biol. 2011;408:163–176. doi: 10.1016/j.jmb.2011.02.021. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary information (772.7KB, docx)

Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES