Abstract
Computer-based structure prediction becomes a major tool to provide large-scale structure models for annotating biological function of proteins. Information of residue-level accuracy and thermal mobility (or B-factor), which is critical to decide how biologists utilize the predicted models, is however missed in most structure prediction pipelines. We developed ResQ for unified residue-level model quality and B-factor estimations by combining local structure assembly variations with sequence- and structure-based profiling. ResQ was tested on 635 non-redundant proteins with structure models generated by I-TASSER, where the average difference between estimated and observed distance errors is 1.4 Å for the confidently modeled proteins. ResQ was further tested on structure decoys from CASP9-11 experiments, where the error of local structure quality prediction is consistently lower than or comparable to other state-of-the-art predictors. Finally, ResQ B-factor profile was used to assist molecular replacement, which resulted in successful solutions on several proteins that could not be solved from constant B-factor settings.
Keywords: Protein structure prediction, model quality evaluation, local error estimation, B-factor profile
Graphical abstract
Introduction
With the rapidly increasing gap between the number of protein sequences and the number of experimentally characterized structures, computer-based structure prediction becomes a major means for the molecular and cellular biologists to obtain 3D structure models of proteins for interpreting the biological function or designing new biochemical experiments. Although progress has been constantly witnessed on structure predictions in the community-wide CASP experiments [1], a reliable estimation of the quality of predicted structure models, in particular the residue-level local accuracy that is critical to the structure-based functional analyses (such as active site recognition, ligand-docking and drug screening), is often missed in most of the state-of-the-art structure prediction pipelines [2].
Another highly relevant but often-missed local feature in structure prediction is the inherent thermal mobility of protein atoms. At the absolute zero of temperature, protein atoms are assumed to stay at the equilibrium position of the lowest energy. But as the temperature increases, the ambient thermal energy causes the atoms to oscillate around the equilibrium positions, the extent of which often varies depending on the 3D structure and the interaction with ligand and solvent atoms. The atomic motion can be experimentally measured in the X-ray crystallography as B-factor (or temperature factor), which was introduced as an amendment factor of the structure factor calculations since the scattering effect of X-ray is reduced on the oscillating atoms compared to the atoms at rest [3]. The B-factor of the jth atom can be formally written as , where rj corresponds to the displacement of the jth atom from its equilibrium position. The thermal motion and mobility of protein atoms are closely linked to how the protein folds to the native state and how it interacts with specific binding partners in the cellular environment [4].
In this work, we aim to develop a unified pipeline, ResQ, for the estimation of the residue-specific quality (RSQ) of protein structure prediction and the inherent B-factor profile (BFP) of all residues along the chain. A major advantage of the pipeline over many other sequence or structure-based quality prediction approaches [4, 5] is that ResQ takes into the consideration of the intermediate procedures of structure modeling simulations (e.g. structural variations of the simulation trajectories) and homology-based data-mining (e.g. multiple template alignments from threading and structural alignment searches); the intermediate modeling features have also been used by several other structural error predictors [6, 7]. Since these procedures are common setting in most of the cutting-edge protein structure assembly approaches, the inclusion of the intermediate features should not compromise the general usefulness of the ResQ program. The method also contains the single-model options to model RSQ and BFP from the structure alone. An on-line server and the source code of the ResQ program are freely available at http://zhanglab.ccmb.med.umich.edu/ResQ/.
Results and discussion
Datasets
A total number of 1270 non-redundant single-domain proteins were collected from the PDB, which have a pair-wise sequence identity <30% with size from 50 to 300 residues. We randomly selected half of the proteins for training and the remaining for test of ResQ. Because the thermal motion factors in protein crystals can be affected by systematic errors such as experimental resolution, crystal contact and refinement procedures, the raw B-factor values are usually not comparable between different experimental structures. We calculate a normalized B-factor, which can reduce the systematical variations and is defined as: bj = (Bj − μ)/σ, where Bj is the raw B-factor value of the jth residue for the alpha-carbon atom; μ and σ are respectively the mean and standard deviation along the target sequence. The set of the normalized B-factor values along the target chain are called B-factor profile (BFP).
RSQ and BFP predictions by ResQ on I-TASSER models
Overall RSQ results
I-TASSER was used to generate structure predictions for 635 non-redundant testing proteins. 483 proteins are categorized into Easy and 152 into Hard targets according to the significance score of the LOMETS alignments [8]. After excluding the homologous templates, I-TASSER generated the first models with an average TM-score 0.71 and RMSD 5.7 Å; these modeling results are largely consistent with the results from the I-TASSER modeling in the recent CASP experiments [9, 10].
ResQ was applied to the first I-TASSER models to estimate the distance of each residue to the native. As shown in Table 1, the average distance by ResQ (dp) is 3.4 Å, which is consistently lower than the observed distance (do) of the residues to the native (4.3 Å), resulting in an average difference between dp and do, Δd=2.4 Å. This consistent difference is mainly due to the lower distance estimation for the residues of large modeling errors. If we rescale dp and do by the TM-score scale that was designed to depress contribution of large distance errors [11] (see Eq. 2 in Methods), the consistent difference between dp and do disappear, both being 0.72 (see value in parentheses of Table 1).
Table 1.
Summary of the RSQ predictions on the 635 test proteins. Numbers in parentheses are the values computed after normalizing the distance dp and do using Eq. (2).
| Groups | Npro | TM | do (Å) | dp (Å) | Δd (Å) | PCC | AUC |
|---|---|---|---|---|---|---|---|
| C-score>−1.5 | 506 | 0.80 | 2.7 (0.79) | 2.2 (0.82) | 1.4 (0.12) | 0.69 | 0.89 |
| C-score<−1.5 | 129 | 0.40 | 10.8 (0.4) | 8.6 (0.32) | 6.4 (0.22) | 0.53 | 0.78 |
| Overall | 635 | 0.71 | 4.3 (0.72) | 3.4 (0.72) | 2.4 (0.14) | 0.66 | 0.87 |
Npro: Number of proteins in the set.
TM: Average TM-score of the first I-TASSER model.
do(dp): Observed (predicted) distance between the model and the native structure.
Δd: Average difference between do and dp.
PCC: Pearson correlation coefficient between predicted and observed distances.
AUC: Area under the curve of the receiver-operating characteristic (ROC).
We further split the test proteins into two groups based on the I-TASSER confidence score (C-score), i.e., the high- or low-confidence groups with a C-score above or below −1.5. As expected, models with a high C-score have a much better quality (TM-score=0.8) than those with a low C-score (TM-score=0.4). Accordingly, the RSQ prediction for the high C-score proteins is much more accurate (Δd=1.4 Å) than that of low C-score (Δd=6.4 Å). The average PCC and AUC values of the high C-score models are also higher than those of low C-score ones by 30% and 14%, respectively (Table 1).
The dependence of RSQ prediction on C-score is expected because for the targets of a low confidence, most of the I-TASSER models have an incorrect fold (TM-score<0.5) and the actual distance from model to the native is high (~10.8 Å). Thus, the estimation of such high distance is statistically more difficult. In Figure 1A, we divide the residues into 20 bins according to their observed distance on the I-TASSER models to native (do), and compute the average errors of the predicted distances in each bin (Δd). The majority of residues (>75%) was well-modeled by ITASSER with a distance to the native <3 Å. ResQ predicted the distance for these residues accurately with a small Δd, i.e., 0.56 Å, 0.60 Å, and 0.96 Å for the distance bins [0, 1], (1, 2], and (2, 3], respectively. But with the increase of the modeling error, the error of the RSQ predictions increases almost linearly with do.
Fig. 1.
Residue-specific quality (RSQ) and B-factor profile (PFP) predictions by ResQ. (A) Histogram and RSQ distribution at different modeling errors (do) based on I-TASSER models from 506 test proteins with a C-score >−1.5. Open circles connected by dashed curve show the percentage of residues. Filled circles with solid curve is the average prediction error (Δd) in each distance bin. (B-E) An illustrative example from 1id0A showing the ResQ predictions. (B) Superposition of the X-ray structure (red) and the I-TASSER model (blue). (C) The ensemble of superimposed structure decoys in I-TASSER modeling. Alpha, beta, and loop regions are shown in red, yellow, and green, respectively. Structure regions with a high variation are highlighted by dashed circle. (D) The predicted and observed distance errors. (E) The predicted and observed BFPs. The bottom panel shows secondary structure of the protein.
RSQ predictions on different local structures
In Table S2 in the Supplementary Materials (SM), we present ResQ predictions for the 506 targets with C-score >−1.5, with the residues split into six disjoint subsets: (aligned or unaligned) × (alpha, beta or coil). A residue is defined as aligned if there are >40% of LOMETS templates aligned on it, or unaligned otherwise, where alpha, beta and coil follow standard secondary structure definition.
Since I-TASSER modeling is built on multiple threading templates, there is a strong correlation between the structural error (do) and the alignment coverage of the residues. The residues on the threading aligned regions have a much smaller distance deviation from the native (2.5 Å) than the unaligned residues (11.7 Å), because structures of on the threading aligned regions have a higher number of spatial restraints which result in a higher accuracy of modeling. Accordingly, the error of the estimated RSQ is lower in the threading aligned regions than those in the unaligned regions, as shown by the decreased Δd in Table S2 (Columns 2–7), mainly because the residues with smaller modeling errors tend to have a lower uncertainty and therefore ResQ can generate more accurate distance estimation on them (Figure 1A). Similarly, structural modeling in the regular secondary structure regions (alpha and beta) is generally more accurate than that in the coil residues, which are true in both the threading aligned and unaligned regions. The estimated RSQ is therefore closer to the actual modeling errors in these regions.
In Figure 1B–D, we present an illustrative example from the PhoQ histidine kinase domain (PDB ID: 1id0A). In this example, the I-TASSER assembly simulations are more divergent in the loop region (D96-L111) as shown in Figure 1C; this leads to a higher predicted distance deviation by ResQ that is consistent with the actual high modeling error of this region shown in Figure 1B. The overall RSQ profile is in close agreement with the observed distance deviations as shown in Figure 1D.
B-factor profile prediction
Three approaches were tested to generate BFP predictions. The template-based prediction is generated by transferring the B-factors of the threading template proteins using Eq. S3, while the profile-based prediction is by training the BFP data on the sequence profile generated by PSI-BLAST search. The third approach is a combinational method that trains the BFP on a combination of threading templates and sequence profiles. A summary of the PCC and AUC between the observed and predicted B-factors from the three approaches are listed in Table S3.
The profile-based training approach generated a slightly higher PCC value (0.59) than the template-based transferal (0.54), while the combination of the threading templates and sequence profiles achieves the highest PCC (0.61). The difference between the two methods (profile-based and combined) is statistically significant with the p-value of the student t-test below 10−12. A similar tendency is followed by the AUC assessment, where the combined prediction outperforms both template- or profile-based prediction methods.
In Figure 1E, we show the predicted BFP for 1id0A by the combination-based approach. Interestingly, the highest B-factors occur mostly on the regions around the loop region D96-L111, which shows some level of correlation between local modeling error and B-factor profile.
Comparison of ResQ predictions with other methods on the I-TASSER models
To benchmark ResQ with other methods, we downloaded and installed two recently published RSQ and BFP prediction programs to our computers. The SMOQ program was designed to predict residue-specific local quality based on machine learning, which has three options of basic (B), basic+profile (B+P), basic+profile+SOV (B+P+S) [5]. The PROFbval program was developed for B-factor prediction trained on sequence profile and secondary structure predictions [4]. Both programs were run with default settings on the 635 testing proteins.
For the RSQ prediction, SMOQ is tested on the same set of the I-TASSER models. As shown in Table S4, the average error of the estimated distance deviation (Δd) by ResQ is 2.4 Å, which is about 1.2 Å lower than the best results from the SMOQ program that uses the basic option (3.63 Å). If considering TM-score normalized distance, Δd of ResQ is also lower (0.14 vs. 0.26). The average PCC of the RSQ predictions by ResQ is 47% higher than the best SMOQ results. The difference between ResQ and SMOQ is statistically significant with a p-value <10−89 in the student t-test. One reason for the significant improvement of RSQ predictions might be that the ResQ has been trained on the I-TASSER-specific features, including structural variations of the templates and the simulation decoys (see Table S1), which were not included in the SMOQ program. In fact, we have re-run ResQ by turning off the immediate features from I-TASSER simulations. The result of ResQ becomes much worse which only marginally outperforms the best results from SMOQ (see Tables S1 and S4). Although the degradation is partly due to the fact that ResQ has been trained on the full set of sequence and structure features, this result indeed highlights the importance of the intermediate modeling features in model quality evaluations.
The last row in Table S4 lists the result of the B-factor prediction, where the PCC of ResQ (0.61) is 17% higher than that by PROFbval (0.52). Again, the difference is statistically significant, which corresponds to a p-value in the student t-test below 10−70.
Comparison of RSQ prediction with other methods on CASP decoys
In this section, we test ResQ for the local structure quality prediction using the decoys that were generated in the recent CASP experiments, which gives us an opportunity to compare ResQ with many state-of-the-art model quality assessment programs (MQAPs). Because the structural decoys were generated by different methods from multiple laboratories, this testing data should also allow examining the robustness of the ResQ method, which was primarily trained on the ITASSER decoys. Since the intermediate features from structure assembly simulation are not available on the CASP decoys, ResQ will skip these features to generate RSQ predictions. But other features, including sequence and structured based database searches, are still implemented.
The detailed comparisons of ResQ with the top CASP predictors are listed in Table S5 (for CASP9 decoys), Table S6–7 (CASP10), and Table S8–9 (CASP11), with analysis and discussion presented in Text S4. In general, local distance error prediction of ResQ consistently outperforms most of the CASP MQAP predictors, where the PCC and AUC scores by ResQ are among the top but often slightly outperformed by the best predictors (in particular for AUC). The overall data suggests that the prediction results of the ResQ are comparable to or better than the state-of-the-art MQAP methods in CASP.
Application of ResQ B-factor prediction to molecular replacement
One of the important uses of the B-factor prediction is for assisting the molecular replacement (MR) based structure determination in X-ray crystallography. In MR, close homologous structure models are used to replace the unknown targets for deciding the phase of the diffraction waves so that the electron density of the target protein can be calculated using Fourier transformations from the diffraction pattern data. Recently, a closely related study suggests that local error estimates improve MR dramatically [12]. Different from the local error estimates that evaluate the quality of target models, thermal mobility of protein atoms can directly the intensity of the reflection waves. Therefore, appropriate B-factor estimations, which roughly reflect the atomic thermal mobility, should be important for correct structure factor calculation and MR solution.
To test the effect of ResQ B-factor prediction on MR, we collected 100 non-redundant proteins from the PDB, which have both X-ray structure and electron density data available, and with fewer than 300 residues and ≤4 copies in the asymmetric unit. The I-TASSER models without using homologous templates are then used as the probe for molecular replacement. A progressive model truncation and editing procedure based on structural deviation score [13] was applied to generate up to 40 truncated models for each target, which are submitted to Phenix [14] for automated phase determination and model reconstruction. It was shown that correct MR solutions could be obtained for 54 out of the 100 proteins if we use an optimal constant B-factor (=20). Here, a MR solution is defined as correct if more than 25% of the target sequence can be built by the automated MR procedure with the final structure models being closer to the experimental structure than the starting model.
When we applied the predicted B-factor by ResQ to the MR program, three additional proteins (PDB ID: 1i12, 2ra9 and 2tnf) have the MR solution successfully obtained. Figure 2 shows a comparison of the final models for the three proteins overlaid on the experimental electron density maps, which were obtained using the ResQ B-factor prediction and the optimal constant B-factor, respectively. Even though the same I-TASSER model and the model editing procedure were applied, the use of the ResQ B-factor prediction resulted in much closer fits of the final models with the electron density maps. The average Rfree value, which measures how well the simulated diffraction data matches with the experimentally observed diffraction pattern, was significantly reduced from 0.53 to 0.27.
Fig. 2.
Structure models generated by molecular replacement overlaid on the experimental electron density image. The left and right panels are MR results using the ResQ predicted and constant B-factors, respectively. (A) 2ra9; (B) 1i12; (C) 2tnf.
We note that the current strategy, i.e., using RSQ to truncate structural models and then using BFP to scale atomic motion, generated the best MR results with 57 successful targets in our test. Simply replacing B-factor by RSQ does not improve the results, probably due to the fact that the RSQ information was already used in the model truncation procedure. We also tested the procedure with using RSQ as B-factor on the full-length models without truncation, which could result in slightly better MR results than that using constant B-factor; but the overall results are much worse than the optimal results with model truncations [13].
Conclusion
We developed a new algorithm, ResQ, for unified predictions of protein residue-specific quality (RSQ) and B-factor profile (BFP). One of the major advantages of the method is the integration of the intermediate information of structure modeling, including threading coverage and conformational variations, with the structure profile information from homologous database searches.
ResQ was first tested on a set of 635 non-homologous proteins with structural models by I-TASSER simulations. The residue-level distance to the native could be predicted with an average error =2.4 Å. For the models with C-score >−1.5, which have a better global quality and are thus more relevant to biological uses, the distance error of RSQ is reduced to 1.4 Å. Detailed data analyses showed that the absolute error of RSQ is highly correlated with the quality of the structure models, where the RSQ has a much higher accuracy in the conserved regions of regular secondary structures than that in other threading unaligned and coil/tail regions. The overall results of ResQ on both RSQ and B-factor predictions showed advantage over other methods that do not use the intermediate structural features.
Second, we tested ResQ on the CASP9-11 decoys generated by various structure prediction methods. Although no intermediate structure features are used, the single-model mode prediction of ResQ, built on the sequence- and structure-based homologous database searches, generated local structural quality estimations with an accuracy comparable (or superior) to the best performing MQAP methods that have been trained from various consensus features and/or statistics potentials. The data demonstrate the robustness of the ResQ algorithm to deal with both cases with small and large number of structure decoys. However, when the number of reference decoys decreased, ResQ's performance was slightly degraded, as seen by the difference in performance on the CASP Stage-1 and -2 decoys. Part of the degratation is due to the reduction of the overall decoys in Stage-1; this suggests the room for further improving ResQ, e.g., by exploring multiple statistical potentials [15], in particular for the low-quality decoys.
As a test of the usefulness of the ResQ predictions, the predicted B-factor profile was applied to 100 medium- to large-size proteins for assisting molecular replacement in X-ray crystallography. Starting from the same set of the I-TASSER models, the use of the ResQ predicted B-factor resulted in successful MR solutions on three proteins (PDD ID: 1i12, 2ra9 and 2tnf) that was not able to solve from the constant B-factor setting, whereby the average Rfree value was dramatically reduced from 0.53 to 0.27 in the three examples.
The ResQ algorithm has been successfully integrated with the I-TASSER pipeline, including on-line server and standalone package, for the RSQ and BFP estimations. Although ResQ was primarily trained with the I-TASSER decoys, the program can be used for models generated by other structure prediction methods since the intermediate features, including query-to-template alignment and structure decoy clustering, are standard output in the pipelines of most state-of-the-art structure prediction approaches. As an illustration, we list in Table S10 the application of ResQ to the models generated by QUARK-based ab initio simulation [16] on a set of 50 non-redundant proteins. Although the average RSQ error is higher than that for the I-TASSER models due to the relatively lower accuracy of the global models, for the top 20 proteins, most having a TM-score above 0.5, the average RSQ error is 1.6 Å for the QUARK models, comparable to the results for the confidently predicted I-TASSER models.
Methods
ResQ method
Support vector regressions (SVRs), with the implementation provided in the SVM-light package (http://svmlight.joachims.org/), are used to train the RSQ and BFP data, with parameters and combinations of features optimized on the 635 training proteins. The I-TASSER simulations [17] are conducted to generate structure predictions for the training proteins, where homologous templates have been excluded from the threading template library. A brief outline of the I-TASSER pipeline is described in Text S1 in Supplementary Materials.
For RSQ prediction, a total of 12 features are used to represent each residue: 3 from structural assembly simulations (i.e., average variation and standard deviation in Eq. S1 and relative size of SPICKER cluster); 1 from threading templates distance variation (Eq. S2); 1 from TM-align structure alignment templates distance variation; 2 from threading alignment coverage computed from the top 200 and 10 templates respectively; and 5 from the consistency between model and sequence-based feature predictions (i.e., three probabilities in alpha-helix, beta-strand and coil states, one for secondary structure state, and one for relative solvent accessibility). A detailed description of the RSQ features and their respective contributions to the predictions are provided in Text S2 and Table S1.
For the BFP prediction, by using the window size of 9, a total of 72 (=8×9) features are used to represent each residue. For each residue, the 8 features include 2 from template-based assignments based on LOMETS and TM-align respectively (see Eq. S3); 1 from alignment coverage with the top 200 LOMETS templates; 2 from relative solvent accessibility and secondary structure predictions; and 3 from secondary structure profile including probabilities in alpha-helix, beta-strand and random coil states (see Text S3).
After trials and errors on the training dataset, we found that the best results are generated using the linear SVRs in SVM-light with the parameters “-z r −c 0.5 −w 0.5”, for both RSQ and BFP predictions.
Although ResQ has been designed to generate both RSQ and BFP prediction, we note that RSQ and BFP are two distinct concepts, as RSQ is closely related with model construction methods while B-factor reflects inherent motions of atoms in the physiological environment that are independent from the modeling process. In fact, we collected the data of all the 1270 training and testing proteins, and found that the Pearson's correlation coefficient between B-factor and the local error of I-TASSER models is only 0.38 (see data at http://zhanglab.ccmb.med.umich.edu/ResQ/benchmark/RB_cc.txt). This weak correlation is probably due to the fact that the sequence regions with a high B-factor are often less conserved and thus models generated by template-based methods (such as I-TASSER) tend to have a higher local error on these regions. However, a determinate relation between these two qualities does not exist. Nevertheless, a combined prediction on RSQ and BFP is feasible since both request training on multiple local sequence and structure features. Such combination should help make ResQ convenient for multiple uses on, for example, local accuracy estimation and molecular replacement.
Assessment criteria of the RSQ and BFP predictions
Three measures are used to evaluate the accuracy of the RSQ prediction, where each measure is first computed per model and then presented as an average over all models. The first measure is the average difference (Δd) between the predicted (dp) and observed (do) distances of the model relative to the native, i.e.
| (1) |
where L is the length of the protein and the distances are calculated based on the TM-score superposition [11]. One flaw of this metric is that Δd can be dominated by the large distance pairs, where distinguishing distance errors beyond some cutoffs can be meaningless (e.g. an error of 10 Å is not necessarily better than 20 Å). Thus, we also use a metric based on a normalized distance to depress the contribution of large distance errors in RSQ evaluation, i.e.,
| (2) |
where and is a TM-score-like scale to rule out protein length dependence of the distances [11].
The second measure is the Pearson's correlation coefficient (PCC) between dp and do. The third measure is the area under the curve (AUC) of the receiver-operating characteristic (ROC), designed to evaluate the ability of ResQ in discriminating between well and badly modeled regions, where a residue is defined as `well modeled' (positive) if the distance from model to the native is <3.8 Å upon the TM-score superposition, otherwise as `badly modeled' (or negative). PCC and AUC are the same as used by the CASP assessors for evaluating the accuracy of model quality estimation [18]. Following Kryshtafovych et al, we converted dp into the range of (0, 1) by dpn=1/[1+(dp/5)2] in the AUC calculation, so that a fixed number of divisions can be used for different data samples to draw the ROC curves. Here, the selection of 3.8 Å as cutoff is from the CASP assessors, which may be too high in this study since the AUC values are generally high for most predictions. We tested the impact of the distance cutoff on the AUC calculations using 635 test proteins. As expected, the AUC value decrease slightly with reduced distance cutoff, i.e., the average AUC decreases from 0.88 to 0.87, 0.86, 0.84 and 0.79, respectively, when the distance cutoff reduced from 5 to 4, 3, 2 and 1 Å.
The BFP prediction is evaluated by the Pearson's correlation between the predicted and the experimental B-factors, which was also used in previous B-factor prediction studies [4]. Similar to the RSQ evaluation, we also use AUC for measuring the ability in discriminating between stable and flexible residues in structures, where a residue is defined as stable (positive) if the normalized B-factor is below 0 or as flexible (negative) otherwise. For uniform ROC division, we have renormalized the predicted B-factor values (b) to the range of (0, 1) by 1/[1+exp(-b)].
Supplementary Material
Highlights.
Providing an unified prediction of local modeling errors and B-factor
High-resolution local modeling error estimation with average deviation 1.4 Å
High-resolution B-factor estimation with area under the curve 0.79
B-factor data found useful to improve molecular replacement
Highlight importance of intermediate structural variation in local error estimation
Acknowledgements
We are grateful to Baoji He for providing the QUARK models. The project is supported in part by the NIGMS (GM083107 and GM084222), and the start-up grant of Nankai University (ZB15006103).
Abbreviations
- (RSQ)
Residue-specific Quality
- (BFP)
B-factor profile
- (PCC)
Pearson correlation coefficient
- (AUC)
area under the curve
- (MQAP)
model quality assessment program
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- [1].Moult J. A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Current Opinion in Structural Biology. 2005;15:285–9. doi: 10.1016/j.sbi.2005.05.011. [DOI] [PubMed] [Google Scholar]
- [2].Zhang Y. Protein structure prediction: when is it useful? Curr Opin Struct Biol. 2009;19:145–55. doi: 10.1016/j.sbi.2009.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Sherwood D, Cooper J. Crystals, X-rays and Proteins: Comprehensive Protein Crystallography: Oxford Univ Pr. 2011 [Google Scholar]
- [4].Schlessinger A, Yachdav G, Rost B. PROFbval: predict flexible and rigid residues in proteins. Bioinformatics. 2006;22:891–3. doi: 10.1093/bioinformatics/btl032. [DOI] [PubMed] [Google Scholar]
- [5].Cao R, Wang Z, Wang Y, Cheng J. SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines. BMC Bioinformatics. 2014;15:120. doi: 10.1186/1471-2105-15-120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Wallner B. ProQM-resample: improved model quality assessment for membrane proteins by limited conformational sampling. Bioinformatics. 2014;30:2221–3. doi: 10.1093/bioinformatics/btu187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].McGuffin LJ, Buenavista MT, Roche DB. The ModFOLD4 server for the quality assessment of 3D protein models. Nucleic Acids Res. 2013;41:W368–72. doi: 10.1093/nar/gkt294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Wu S, Zhang Y. LOMETS: A local meta-threading-server for protein structure prediction. Nucl Acids Res. 2007;35:3375–82. doi: 10.1093/nar/gkm251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Zhang Y. Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10. Proteins. 2014;82(Suppl 2):175–87. doi: 10.1002/prot.24341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Xu D, Zhang J, Roy A, Zhang Y. Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based structure refinement. Proteins. 2011;79(Suppl 10):147–60. doi: 10.1002/prot.23111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins. 2004;57:702–10. doi: 10.1002/prot.20264. [DOI] [PubMed] [Google Scholar]
- [12].Bunkoczi G, Wallner B, Read RJ. Local error estimates dramatically improve the utility of homology models for solving crystal structures by molecular replacement. Structure. 2015;23:397–406. doi: 10.1016/j.str.2014.11.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Wang Y, Xue Z, Virtanen J, Zhang Y. Molecular replacement for distant-homology proteins using iterative fragment assembly and progressive sequence truncation. 2015. submitted. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Adams PD, Afonine PV, Bunkoczi G, Chen VB, Davis IW, Echols N, et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta crystallographica Section D, Biological crystallography. 2010;66:213–21. doi: 10.1107/S0907444909052925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Benkert P, Tosatto SC, Schomburg D. QMEAN: A comprehensive scoring function for model quality assessment. Proteins. 2008;71:261–77. doi: 10.1002/prot.21715. [DOI] [PubMed] [Google Scholar]
- [16].Xu D, Zhang Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins. 2012;80:1715–35. doi: 10.1002/prot.24065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y. The I-TASSER Suite: protein structure and function prediction. Nature Methods. 2015;12:7–8. doi: 10.1038/nmeth.3213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Kryshtafovych A, Barbato A, Fidelis K, Monastyrskyy B, Schwede T, Tramontano A. Assessment of the assessment: evaluation of the model quality estimates in CASP10. Proteins. 2014;82(Suppl 2):112–26. doi: 10.1002/prot.24347. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



