Abstract
Frataxin (FXN) is a highly-conserved protein found in prokaryotes and eukaryotes that is required for an efficient regulation of cellular iron homeostasis. Experimental evidence associates amino acid substitutions of the frataxin to Friedreich Ataxia, a neurodegenerative disorder. Recently, new thermodynamic experiments have been performed to study the impact of somatic variations identified in cancer tissues on protein stability. The Critical Assessment of Genome Interpretation (CAGI) data provider at the University of Rome measured the unfolding free energy of a set of variants (frataxin challenge dataset) with far-UV circular dichroism and intrinsic fluorescence spectra. These values have been used to calculate the change in unfolding free energy between the variant and wild-type proteins at zero concentration of denaturant (ΔΔGH2O). The frataxin challenge dataset, composed of eight amino acid substitutions, was used to evaluate the performance of the current computational methods for predicting the ΔΔGH2O value associated with the variants and to classify them as destabilizing and not destabilizing. For the fifth edition of CAGI, six independent research groups from Asia, Australia, Europe and North America submitted 12 sets of predictions from different approaches. In this paper we report the results of our assessment and discuss the limitations of the tested algorithms.
Introduction
The human frataxin is a protein localized in the mitochondria and cytoplasm of the cells that promotes the heme biosynthesis, the assembly and repair of iron-sulfur clusters by delivering Fe2+ to proteins involved in these pathways. Frataxin may play a role in the protection against iron-catalyzed oxidative stress (Lupoli, Vannocci, Longo, Niccolai, & Pastore, 2018).
Frataxin (FXN) single-nucleotide variants have been associated to Friedreich Ataxia (MIM# 229300), a degenerative disorder primarily affecting the nervous system (Pandolfo, 2008). Moreover, frataxin might play a role in cancer as previous studies have shown that it protects tumor cells against oxidative stress and apoptosis, but also acts as a tumor suppressor (Guccini et al., 2011; Schulz et al., 2006). The COSMIC (Catalog of Somatic Mutations in Cancer) database (Tate et al., 2019) collects a set of FXN somatic variations identified in cancer tissues. To investigate the possible thermodynamic effect of those variations on protein stability a subset of eight variants were expressed as soluble recombinant protein in E.coli (Petrosino et al., 2019). For this dataset of amino acid substitutions the stability of the variant proteins is experimentally measured with circular dichroism and fluorescence and compared with wild-type. These measures have been used for the frataxin challenge of the fifth edition of the Critical Assessment of Genome Interpretation (CAGI5). For the frataxin challenge participants were asked to predict the variation of free energy change at zero concentration of denaturant (ΔΔGH2O) upon single-point protein variation. During the last decades several methods have been developed to predict the impact of amino acid variants on protein stability (Compiani & Capriotti, 2013). These available algorithms are mainly based on energy functions designed to assess the stability free energy of the protein and its variants and/or machine-learning-based methods trained to predict the stability changes upon variation. In this manuscript we scored the performance of six research groups in predicting the measured ΔΔGH2O value (regression task) and its class (classification task) for eight frataxin single amino acid variants. The performances of all the groups are compared with those achieved by state-of-the-art methods (Capriotti, Fariselli, & Casadio, 2005; Guerois, Nielsen, & Serrano, 2002) to estimate the possible improvement with respect to previously developed algorithms. For the calibration of the predictions, previous experimental thermodynamic data on a different set of variants (Faraj, Gonzalez-Lebrero, Roman, & Santos, 2016) were used as a reference.
Material and methods
Dataset and classification
The CAGI5 frataxin challenge dataset consists of 8 coding variants of the FXN gene. These variants encode for single amino acid substitutions reported in the COSMIC database. A representation of the variation sites in the three-dimensional structure of frataxin (PDB: 1EKG) is provided in Fig.1.
Figure 1.
Mapping of the eight variation sites of the frataxin challenge data set on the three-dimensional structure of the protein (PDB: 1EKG)
For each protein variant, the unfolding free energy change (ΔGu) at different denaturant concentrations was experimentally determined with circular dichroism and fluorescence. These measures were used to calculate the unfolding free energy at zero concentration of denaturant (ΔGH2O). Finally, the change of ΔGH2O of the variant protein (ΔΔGH2O) was calculated using the following equation:
| [1] |
The average experimental values of the ΔΔGH2O obtained by circular dichroism and fluorescence (Supporting Table S1) were used for the challenge. An exception to this procedure is the case of the variant p.W173C which does not fold. In this case, we assumed its unfolding free energy equal to 0 kcal/mol and the ΔΔGH2O equal to the negative of the ΔGH2O of the wild-type protein.
We also assessed the quality of the predictions by calculating the performance of the methods in classification mode. For this task we selected a threshold of −1.0 kcal/mol to discriminate between destabilizing (ΔΔGH2O <−1.0 kcal/mol) and not destabilizing variants (ΔΔGH2O ≥−1.0 kcal/mol). With this assumption, five variations in the dataset were classified as destabilizing and the remaining three as not destabilizing. A visual representation of the similarity between the ΔΔGH2O obtained by different experimental techniques (circular dichroism and fluorescence) and the classification of the variations is shown in Fig. S1.
The final set of 8 variations with the relative average ΔΔGH2O and their experimental errors are reported in Table 1.
Table 1.
Frataxin challenge dataset of amino acid substitutions. The mean variation of unfolding free energy change at zero solvent concentration (ΔΔGH2O) is calculated as the mean ΔΔGH2O values of Fluorescence and Circular Dichroism experiments (see Supporting Table S1). The standard deviation (σ) is obtained summing the errors associated to both types of measures. Destabilizing are the variants with ΔΔGH2O < −1.0 kcal/mol.
| DNA (hg38) | mRNA (NM_000144.4) | Protein (NP_000135.2) | ΔΔGH20 kcal/mol | Destabilizing |
|---|---|---|---|---|
| chr9:g.69053187A>G | c.311A>G | p.D104G | 0.4±0.4 | No |
| chr9:g.69053196C>T | c.320C>T | p.A107V | 0.0±0.6 | No |
| chr9:g.69053201T>C | c.325T>C | p.F109L | −2.8±0.4 | Yes |
| chr9:g.69053244A>C | c.368A>C | p.Y123S | −5.1±0.3 | Yes |
| chr9:g.69065035G>T | c.482G>T | p.S161I | −3.1±0.4 | Yes |
| chr9:g.69072648G>T | c.519G>T | p.W173C | −9.5±0.3* | Yes |
| chr9:g.69072671C>T | c.542C>T | p.S181F | −3.0±0.4 | Yes |
| chr9:g.69072734C>T | c.605C>T | p.S202F | −0.2±0.4 | No |
The variant p.W173C does not fold into a three-dimensional. Thus, for calculating the ΔΔGH2O of p.W173C we assumed that its ΔGH2O = 0 kcal/mol. It follows that ΔΔGH2O is equal to −ΔGH2O of the wild-type, which is −9.50 kcal/mol.
Experimental measures
Human frataxin (FXN) variants were obtained with specific mutagenesis primers with PCR, using wild-type as a template. Wild-type and variants were then expressed in E. coli and purified. The structural conformation of the variants was compared to that of the wild-type by monitoring the near and far-UV circular dichroism and intrinsic fluorescence spectra. The thermodynamic stability was measured at different concentrations of denaturant (Urea) by monitoring the spectral changes (far-UV circular dichroism and intrinsic fluorescence emission) induced by urea. The spectral changes were extrapolated to zero denaturant concentration (ΔΔGH2O). For equilibrium transition studies, FXN wild-type and variants were incubated at 20 °C at increasing concentrations of urea (0−9 M). After 10 min, equilibrium was reached and both intrinsic fluorescence emission and far-UV CD spectra were recorded in parallel at 20 °C. To test the reversibility of the unfolding, FXN wild type and variants were unfolded at 20 °C in 9.0 M urea. After 10 min, refolding was started by 10-fold dilution of the unfolding mixture at 20 °C into solutions of the same buffer used for unfolding containing decreasing urea concentrations. After 24 h, intrinsic fluorescence emission and far-UV CD spectra were recorded at 20 °C. All denaturation experiments were performed in triplicate. For thermal denaturation studies, FXN wild-type and variants were heated from 20°C to 95°C and then cooled from 95°C to 20°C. The dichroic activity at 222 nm was continuously monitored every 0.5°C. Melting temperature (Tm) values were calculated by taking the first derivative of the ellipticity at 222 nm with respect to temperature. All denaturation experiments were performed in triplicate. More details about the procedure for the calculation of the ΔΔGH2O and the analysis of the thermodynamic data are described in supplementary materials.
Challenge participants and prediction methods
Six groups participated to the CAGI5 frataxin challenge by submitting a total or 12 sets of predictions using different procedures. The Lichtarge Lab at the Baylor College of Medicine, referred as Group 1, submitted one set of predictions (G1–1) using Evolutionary Action (EA) method (Katsonis & Lichtarge, 2014). The output of the program was normalized to return ΔΔGH2O values between 0 and −3 kcal/mol. The Biocomputing Group (Group 2) from the University of Bologna provided one batch of predictions (G2–1) using INPS-3D (Savojardo, Fariselli, Martelli, & Casadio, 2016). For this challenge the 1EKG structure from the Protein Data Bank was considered as wild-type. The Zhou Lab at the Griffith University, labelled as Group 3, submitted three sets of predictions (G3–1, G3–2, G3–3) using EASE-MM (Evolutionary, Amino acid, and Structural Encodings with Multiple Models) algorithm (Folkman, Stantic, Sattar, & Zhou, 2016). For the assessment we considered only one set of predictions (G3–1) because the three batches of predictions returned the same ΔΔGH2O values. The Shen Lab at the Texas A&M University (Group 4) submitted two groups of predictions (G4–1, G4–2) using iCFN (interconnected Cost Function Network) (Karimi & Shen, 2018). This method was modified to fit the experimental ΔΔGH2O values for frataxin variants from a previous work (Correia, Pastore, Adinolfi, Pastore, & Gomes, 2008). The Pal Lab at the Indian Institute of Science in Bangalore, labelled as Group 5, submitted two batches of unscaled predictions (G5–1, G5–2) using GROMACS (Van Der Spoel et al., 2005). This approach uses molecular dynamics simulations to estimate the stability of unfolded and native conformations for the wild-type and variants. The Kim Lab at the University of Toronto (Group 6) submitted three batches of predictions (G6–1, G6–2, G6–3) using the ELAPSIC algorithm (Berliner, Teyra, Colak, Garcia Lopez, & Kim, 2014; Witvliet et al., 2016). ELAPSIC is a meta-predictor that combines predictions from other methods with sequence and structure-based features using a gradient boosting algorithm. During the assessment we observed that predictions submitted by Group 6 showed strong negative correlation with the experimental data. This is due to the difference between the challenge’s request of predicting the variation of unfolding free energy change (ΔΔGu) and the predictions of folding free energy change (ΔΔGf) submitted by Kim’s Lab. For this reason, we also scored the inverse of the three sets of Group 6 predictions (G6–R1, G6–R2, G6–R3).
Finally, to estimate the improvement of the performance between more recent algorithms and state-of-the-art methods, we included in our assessment the performance of FoldX (Guerois et al., 2002) and I-Mutant2.0 (Capriotti et al., 2005).
In the supplementary materials we described more in detail the methods and procedures used by each group to perform their predictions. A summary of all the submissions is reported in Supporting Table S2.
Prediction assessment
For the evaluation of the predictions we considered eight measures of performance for the regression and classification tasks defined in supplementary materials (section Measures of performance). Comparing the predicted and experimental values of ΔΔGH2O of each protein variant, we calculated three types of correlations (Person, Spearman and Kendall-Tau) and two types of errors (Root Mean Square Error and the Mean Absolute Error). Furthermore, we considered a threshold of −1.0 kcal/mol for classifying variants in destabilizing (ΔΔGH2O <−1.0 kcal/mol) and not destabilizing variants (ΔΔGH2O ≥−1.0 kcal/mol). Using this threshold for the binary classification task, we scored the predictions calculating the balanced accuracy (BQ2), the Matthews correlation coefficient (MCC) and the Area Under ROC Curve (AUC). Finally, we ranked all the submissions considering each one of the eight measures of performance and calculating the average value of the ranks which is used to select the best predictions.
In the second part of the assessment, we determined the significance of the differences between the performance of two methods with the Kolmogorov-Smirnov (KS) test. The KS test was used to compare the distribution of the ranks for each measure of performance.
Another important issue in the evaluation of the most reliable predictions is the presence of outliers in the experimental dataset. With outlier, we refer to an experimental measure that, for different reasons, is considered to be less accurate or reliable than others. In general, it is expected that most of the methods will fail in the prediction of the outliers. According to this assumption, in our assessment we also scored the performance of the algorithms removing the outliers from the initial frataxin challenge dataset. In particular for this calculation we removed from the dataset the variant p.W173C for which the ΔGH2O was set to 0 kcal/mol because it was not folding properly.
The definitions of the eight measures of performance considered for this assessment are reported in supplementary materials.
Results
Assessment and performance evaluation
In our assessment we first evaluated the success of the participants in predicting the value of ΔΔGH2O. For this task, we calculated five performance measures, three of which score the correlations between experimental and predicted data (rP, rS and rKT) and two the prediction errors (RMSE and MAE). The performance in the regression task for the best predictions of each group are reported in Fig. S2. According to the calculated scores, Group 3 resulted in the best predictions reaching the highest Pearson correlation coefficient (rP=0.84) and lowest root-mean-square-deviation (RMSE=2.94 kcal/mol). Our analysis also showed that Group 6 resulted in negative values of the Pearson correlation coefficient close to −1 (rP=−0.89). Assuming that Group 6 predicted the variation of the ΔΔGH2O of folding instead of the unfolding, we decided to include in our assessment the opposite of the predictions submitted by Group 6. The performances of participants were compared with those achieved by state-of-the-art methods by including in our assessment the predictions returned by FoldX and I-Mutant2.0. Furthermore, we combined the regression measures with three classification scores (BQ2, MCC and AUC) obtained using a threshold of −1.0 kcal/mol for discriminating between destabilizing and not destabilizing variants. The assessment, including eight scores of performance sorted by the average of the rank orders of each method, is summarized in Table 2.
Table 2.
Assessment of the predictions of the 6 groups and the state-of-the-art methods (FoldX and I-Mutant2.0). The eight measures of performance are defined in supplementary materials. Zhou Lab submitted three set of predictions with the same ΔΔGH2O values. For this reason we reported only the measure of performance for submission 1.
| Group | Submission | rP | rS | rKT | RMSE | MAE | BQ2 | MCC | AUC | <Rank> |
|---|---|---|---|---|---|---|---|---|---|---|
| Kim Lab | G6–R1* | 0.82 | 0.69 | 0.50 | 2.4 | 1.7 | 0.80 | 0.60 | 0.93 | 1.75 |
| FoldX | - | 0.84 | 0.64 | 0.57 | 2.2 | 1.7 | 0.73 | 0.47 | 0.87 | 2.00 |
| Zhou Lab | G3–1 | 0.85 | 0.64 | 0.64 | 3.0 | 2.3 | 0.70 | 0.45 | 0.80 | 2.88 |
| Kim Lab | G6–R2* | 0.71 | 0.57 | 0.43 | 2.7 | 2.0 | 0.63 | 0.26 | 0.80 | 4.13 |
| Biocomp | G2–1 | 0.74 | 0.52 | 0.36 | 3.2 | 2.3 | 0.80 | 0.60 | 0.80 | 4.25 |
| Lichtarge Lab | G1–1 | 0.46 | 0.60 | 0.50 | 3.1 | 2.2 | 0.63 | 0.26 | 0.87 | 4.38 |
| I-Mutant2.0 | - | 0.75 | 0.55 | 0.43 | 3.3 | 2.5 | 0.70 | 0.45 | 0.73 | 4.75 |
| Kim Lab | G6–R3* | 0.89 | 0.57 | 0.50 | 3.9 | 3.7 | 0.50 | 0.00 | 0.80 | 5.25 |
| Shen Lab | G4–2 | −0.02 | 0.12 | 0.07 | 4.1 | 2.6 | 0.70 | 0.45 | 0.60 | 7.00 |
| Shen Lab | G4–1 | −0.09 | 0.17 | 0.07 | 3.9 | 2.7 | 0.60 | 0.29 | 0.60 | 7.25 |
| Pal Lab | G5–1 | 0.57 | 0.43 | 0.29 | 41 | 36 | 0.63 | 0.26 | 0.67 | 7.88 |
| Kim Lab | G6–2 | −0.71 | −0.57 | −0.43 | 6.2 | 4.4 | 0.50 | 0.00 | 0.20 | 9.13 |
| Kim Lab | G6–1 | −0.89 | −0.57 | −0.50 | 10.9 | 9.5 | 0.50 | 0.00 | 0.20 | 10.00 |
| Kim Lab | G6–3 | −0.82 | −0.69 | −0.50 | 6.4 | 4.5 | 0.50 | 0.00 | 0.07 | 10.00 |
| Pal Lab | G5–2 | −0.42 | −0.64 | −0.50 | 1441 | 1378 | 0.50 | 0.00 | 0.27 | 10.13 |
The submissions of the Kim’s Lab that were reversed. Confusion matrices for the binary classification are reported in Supporting Table S3.
The results showed that the opposite predictions of submission 1 from Kim Lab (G6–R1) achieved the top average rank calculated over the eight measures of performance. It is worth noting that FoldX scored second in the ranking achieving on average lower performance on the binary classification task and better results in the prediction of the ΔΔGH2O value with respect to the Kim’s Lab R1 submission. Additional details about the comparison between Kim’s Lab R1 submission and the prediction of the state-of-the-art methods are shown in Fig. 2.
Figure 2.
Comparison between the performance achieved in the regression task by the top ranking submission from Kim Lab (G6-R1), FoldX, and I-Mutant2.0. rP, rS, rKT, RMSE, and MAE are defined in Supporting Information Materials. MAE, mean absolute error; RMSE, root mean square error
Dataset outlier
The analysis of all the submitted predictions revealed that on average all the groups failed in the prediction of the ΔΔGH2O for the variant p.W173C. Excluding Group 5, for this variant the difference between the average predicted and experimental ΔΔGH2O is ~6 kcal/mol (see Fig 3). A possible motivation of the strong discrepancy between predicted and experimental ΔΔGH2O values is the partial indetermination of the ΔGH2O of the unfolding of the p.W173C variant. Indeed, this protein variant did not fold into a three-dimensional structure. For this reason, we arbitrarily assigned to the p.W173C variant a ΔGH2O equal to 0 kcal/mol, which implies an equal fractions of folded and unfolded protein at equilibrium. According to this observation, the protein variant p.W173C was considered an outlier and we performed a second assessment of the predictions removing it from the frataxin challenge dataset. Sorting all the predictions, according to the average ranking based on the eight measures of performance, we observed that the G6–R1 from Kim’s Lab and FoldX predictions scored in the first and second position respectively. The difference with respect to the previous assessment including all the frataxin variants is the third position in the ranking achieved by the Biocomputing Group. As expected for all the submissions the RMSE and MAE values decreased. Thus, removing the variant p.W173C from the dataset, the average RMSE for the top four ranking submissions was ~1.7 kcal/mol while it was ~2.6 kcal/mol for all the variants.
Figure 3.
Linear regression between the average predicted and experimental urn:x-wiley:10597794:media:humu23843:humu23843-math-0044. The average predictions are calculated excluding the prediction from Group 5 and considering only one submission from Group 3. rP, rS, rKT, RMSE, and MAE are defined in Supporting Information Materials. MAE, mean absolute error; RMSE, root mean square error
Methods and predictions similarity
In the last part of our assessment we compared all the submissions to calculate the level of similarity among the predictions. For the comparison we assigned to each submission a ranking vector based on the eight measures of performance defined in supplementary materials. The statistical difference among such vectors was calculated with the Kolmogorov-Smirnov test. In Fig. 4 we summarized our analysis assigning a blue color to the submissions that had significantly different ranking distributions (p-value<0.05). Contrarily, red spots are assigned to the pairs of submissions which were statistically indistinguishable. The results showed that R1 (the reverse of submission 1 from Kim’s Lab) is not statistically different from FoldX. This observation is consistent with the fact that ELAPSIC algorithm, used by Kim’s group, includes the calculation of ΔΔG values with FoldX. Our analysis also revealed that after Kim’s lab and FoldX predictions, the submissions from the Zhou Lab, Biocomputing Group and Lichtarge Lab were statistically indistinguishable. The performances of methods from the previous groups are comparable with those achieved by I-Mutant2.0. These observations, which are valid for the whole frataxin dataset (Fig. 4 panel A), are partially confirmed after removing the p.W173C variant. In this case the ranking of the predictions from Kim’s Lab are statistically different from FoldX (Fig. 4 panel B) while the second group of submissions (Biocomputing Group, Zhou Lab and Lichtarge Lab) remain statistically indistinguishable.
Figure 4.
Similarity between the predictions based on the Kolmogorov–Smirnov test among the ranking vectors from the eight measures of performance. The color of each cell is proportional to the −log10 of the Kolmogorov–Smirnov p value. Similarities calculated considering the whole frataxin data set and excluding the variant p.W173C are plotted in panels (a) and (b), respectively
Discussion
The assessment of the frataxin challenge of the CAGI5 experiment provided an opportunity to evaluate the performance of the available variant annotation methods for predicting the impact of single amino acid variations on protein stability. In detail, we scored each submission by considering the performance of the corresponding method in predicting the ΔΔGH2O values (regression task) and by correctly classifying the variants in destabilizing and not destabilizing (classification task). The results showed that, in the regression task, the best methods achieved a Pearson correlation coefficient higher than 0.8 and a root mean square error (RMSE) lower than 2.4 kcal/mol (see Table 2). After removing from the dataset p.W173C, which represents an outlier with respect to all the other variants, the RMSE values of the best submissions decrease below 1.5 kcal/mol (see Table 3). For the classification task, we select a ΔΔGH2O threshold of −1.0 kcal/mol for discriminating between destabilizing (ΔΔGH2O <−1.0 kcal/mol) and not destabilizing variants (ΔΔGH2O ≥−1.0 kcal/mol). Using such threshold, the best predictions (reversed submission 1 from Kim’s Lab) achieved remarkable balanced accuracy (BQ2), Matthews correlation coefficient (MCC) and area under the Receiving Operator Characteristic curve (AUC) scoring 0.80, 0.60 and 0.93 respectively (see Table 2). Slightly lower performance was obtained when the p.W173C variant was removed from the frataxin dataset. The evaluation of the similarities among the submissions showed that although the reverse submission 1 (R1) from ELAPSIC scores better than FoldX for the classification task, the difference between the ranking distributions of the two methods is not significant (Kolmogorov-Smirnov p-value=0.19). A significant difference between the ranking distribution of G6–R1 Kim’s Lab and FoldX predictions is found when the p.W173C variant is removed from the dataset. In this case, the Kim’s Lab R1 submission ranks in the first position for seven over eight measures of performance considered in our assessment. Comparing the ranking distribution of the second block of groups we found that the predictions from Zhou Lab, Biocomputing Group and Lichtarge Lab are statistically indistinguishable. Finally, the analysis of the predictions from Group 5, which adopted a molecular dynamics-based approach, shows the largest ΔΔGH2O resulting in the highest RMSE values. As suggested by the Group 5 submitters, their predictions could have been improved by normalizing the energies obtained from the simulations.
Table 3.
Assessment of the predictions submitted by the 6 groups and returned by state-of-the-art methods (FoldX and I-Mutant2.0) excluding the p.W173C variant. The eight measures of performance are defined in supplementary materials. Zhou Lab submitted three set of predictions with the same ΔΔGH2O values. For this reason we reported only the measure of performance for submission 1.
| Group | Submission | rP | rS | rKT | RMSE | MAE | BQ2 | MCC | AUC | <Rank> |
|---|---|---|---|---|---|---|---|---|---|---|
| Kim Lab | G6–R1* | 0.75 | 0.57 | 0.43 | 1.5 | 1.2 | 0.75 | 0.55 | 0.92 | 1.13 |
| FoldX | - | 0.73 | 0.46 | 0.43 | 1.5 | 1.3 | 0.71 | 0.42 | 0.83 | 2.13 |
| Biocomp | G2–1 | 0.72 | 0.32 | 0.24 | 1.9 | 1.6 | 0.75 | 0.55 | 0.75 | 3.50 |
| Kim Lab | G6–R2* | 0.65 | 0.39 | 0.33 | 1.8 | 1.4 | 0.58 | 0.17 | 0.75 | 3.75 |
| Zhou Lab | G3–1 | 0.57 | 0.46 | 0.52 | 2.1 | 1.8 | 0.63 | 0.35 | 0.75 | 4.25 |
| Lichtarge Lab | G1–1 | 0.20 | 0.46 | 0.43 | 2.1 | 1.6 | 0.58 | 0.17 | 0.83 | 4.38 |
| Shen Lab | G4–1 | 0.50 | 0.50 | 0.33 | 2.2 | 1.8 | 0.63 | 0.35 | 0.67 | 4.88 |
| I-Mutant2.0 | - | 0.58 | 0.36 | 0.33 | 2.3 | 1.9 | 0.63 | 0.35 | 0.67 | 5.25 |
| Shen Lab | G4–2 | 0.22 | 0.07 | 0.05 | 2.5 | 1.6 | 0.75 | 0.55 | 0.58 | 5.63 |
| Kim Lab | G6–R3* | 0.66 | 0.36 | 0.33 | 4.1 | 3.8 | 0.50 | 0.00 | 0.75 | 6.00 |
| Pal Lab | G5–1 | 0.09 | 0.14 | 0.05 | 38 | 33 | 0.58 | 0.17 | 0.58 | 8.13 |
| Kim Lab | G6–2 | −0.65 | −0.39 | −0.33 | 4.5 | 3.1 | 0.50 | 0.00 | 0.25 | 8.50 |
| Kim Lab | G6–3 | −0.66 | −0.36 | −0.33 | 8.4 | 7.8 | 0.50 | 0.00 | 0.25 | 9.13 |
| Kim Lab | G6–1 | −0.75 | −0.57 | −0.43 | 4.5 | 3.2 | 0.50 | 0.00 | 0.08 | 9.50 |
| Pal Lab | G5–2 | −0.51 | −0.54 | −0.43 | 1472 | 1404 | 0.50 | 0.00 | 0.33 | 9.63 |
The submissions of the Kim’s Lab that were reversed. Confusion matrices for the binary classification are reported in Supporting Table S4.
In conclusion, the assessment of the predictions submitted for the frataxin challenge confirmed that the methods for predicting the protein stability change upon variation achieved a good level of performance especially in the classification task. For the prediction of the ΔΔGH2O values, the best methods achieved good performance in terms of correlation coefficient but the error is still high (RMSE ~2.0 kcal/mol). Finally, we observed that all the algorithms fail to predict the ΔΔGH2O of p.W173C. variant which has a high impact on protein stability. Our hypothesis is that the high error level is due to the low number of experimental data for highly destabilizing variants in the training set. This hypothesis is consistent with the observation that machine learning-based methods such as INPS3D and EASE-MM resulted in higher RMSE than FoldX which implements an energy-functions-based approach.
Although the selection of a single protein and the limited number of variants in frataxin challenge dataset do not allow to generalize the results of our assessment, nevertheless it is noteworthy that the most accurate methods achieved good performance in terms of correlation coefficient and RMSE.
Supplementary Material
Acknowledgments
P.M.K. would acknowledge support from a NSERC Discovery Grant (#386671).
This project is in part supported by the National Institute of General Medical Sciences of the National Institutes of Health (R35GM124952 to YS)
This work was supported in part by Australia Research Council DP180102060 and by National Health and Medical Research Council (1121629) of Australia to Y.Z
We acknowledge John Moult for their support in the organization of the CAGI5 frataxin challenge.
The CAGI experiment coordination is supported by NIH U41 HG007346 and the CAGI conference by NIH R13 HG006650.
Footnotes
Data Availability Statement
The data that support the findings of this study are available on request from the corresponding authors.
References
- Berliner N, Teyra J, Colak R, Garcia Lopez S, & Kim PM (2014). Combining structural modeling with ensemble machine learning to accurately predict protein fold stability and binding affinity effects upon mutation. PLoS One, 9(9), e107353 Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/25243403. doi: 10.1371/journal.pone.0107353 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Capriotti E, Fariselli P, & Casadio R (2005). I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res, 33(Web Server issue), W306–310. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/15980478. doi: 10.1093/nar/gki375 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Compiani M, & Capriotti E (2013). Computational and theoretical methods for protein folding. Biochemistry, 52(48), 8601–8624. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/24187909. doi: 10.1021/bi4001529 [DOI] [PubMed] [Google Scholar]
- Correia AR, Pastore C, Adinolfi S, Pastore A, & Gomes CM (2008). Dynamics, stability and iron-binding activity of frataxin clinical mutants. FEBS J, 275(14), 3680–3690. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/18537827. doi: 10.1111/j.1742-4658.2008.06512.x [DOI] [PubMed] [Google Scholar]
- Faraj SE, Gonzalez-Lebrero RM, Roman EA, & Santos J (2016). Human Frataxin Folds Via an Intermediate State. Role of the C-Terminal Region. Sci Rep, 6, 20782 Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/26856628. doi: 10.1038/srep20782 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Folkman L, Stantic B, Sattar A, & Zhou Y (2016). EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models. J Mol Biol, 428(6), 1394–1405. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/26804571. doi: 10.1016/j.jmb.2016.01.012 [DOI] [PubMed] [Google Scholar]
- Guccini I, Serio D, Condo I, Rufini A, Tomassini B, Mangiola A, … Malisan F (2011). Frataxin participates to the hypoxia-induced response in tumors. Cell Death Dis, 2, e123.Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/21368894. doi: 10.1038/cddis.2011.5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guerois R, Nielsen JE, & Serrano L (2002). Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol, 320(2), 369–387. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/12079393. doi: 10.1016/S0022-2836(02)00442-4 [DOI] [PubMed] [Google Scholar]
- Karimi M, & Shen Y (2018). iCFN: an efficient exact algorithm for multistate protein design. Bioinformatics, 34(17), i811–i820. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/30423073. doi: 10.1093/bioinformatics/bty564 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katsonis P, & Lichtarge O (2014). A formal perturbation equation between genotype and phenotype determines the Evolutionary Action of protein-coding variations on fitness. Genome Res, 24(12), 2050–2058. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/25217195. doi: 10.1101/gr.176214.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lupoli F, Vannocci T, Longo G, Niccolai N, & Pastore A (2018). The role of oxidative stress in Friedreich’s ataxia. FEBS Lett, 592(5), 718–727. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/29197070. doi: 10.1002/1873-3468.12928 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pandolfo M (2008). Friedreich ataxia. Arch Neurol, 65(10), 1296–1303. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/18852343. doi: 10.1001/archneur.65.10.1296 [DOI] [PubMed] [Google Scholar]
- Petrosino M, Pasquo A, Novak L, Toto A, Gianni S, Mantuano E, … Consalvi V (2019). Characterization of human frataxin missense variants in cancer tissues. Hum Mutat. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/31074541. doi: 10.1002/humu.23789 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Savojardo C, Fariselli P, Martelli PL, & Casadio R (2016). INPS-MD: a web server to predict stability of protein variants from sequence and structure. Bioinformatics, 32(16), 2542–2544. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/27153629. doi: 10.1093/bioinformatics/btw192 [DOI] [PubMed] [Google Scholar]
- Schulz TJ, Thierbach R, Voigt A, Drewes G, Mietzner B, Steinberg P, … Ristow M (2006). Induction of oxidative metabolism by mitochondrial frataxin inhibits cancer growth: Otto Warburg revisited. J Biol Chem, 281(2), 977–981. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/16263703. doi: 10.1074/jbc.M511064200 [DOI] [PubMed] [Google Scholar]
- Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, … Forbes SA (2019). COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res, 47(D1), D941–D947. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/30371878. doi: 10.1093/nar/gky1015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Der Spoel D, Lindahl E, Hess B, Groenhof G, Mark AE, & Berendsen HJ (2005). GROMACS: fast, flexible, and free. J Comput Chem, 26(16), 1701–1718. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/16211538. doi: 10.1002/jcc.20291 [DOI] [PubMed] [Google Scholar]
- Witvliet DK, Strokach A, Giraldo-Forero AF, Teyra J, Colak R, & Kim PM (2016). ELASPIC web-server: proteome-wide structure-based prediction of mutation effects on protein stability and binding affinity. Bioinformatics, 32(10), 1589–1591. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/26801957. doi: 10.1093/bioinformatics/btw031 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




