CASP PREDICTION CENTER INFRASTRUCTURE AND EVALUATION MEASURES IN CASP10 AND CASP ROLL

Andriy Kryshtafovych; Bohdan Monastyrskyy; Krzysztof Fidelis

doi:10.1002/prot.24399

. Author manuscript; available in PMC: 2015 Apr 14.

Published in final edited form as: Proteins. 2013 Oct 18;82(0 2):7–13. doi: 10.1002/prot.24399

CASP PREDICTION CENTER INFRASTRUCTURE AND EVALUATION MEASURES IN CASP10 AND CASP ROLL

Andriy Kryshtafovych ¹, Bohdan Monastyrskyy ¹, Krzysztof Fidelis ^1,^*

PMCID: PMC4396618 NIHMSID: NIHMS678590 PMID: 24038551

Abstract

The Protein Structure Prediction Center at the University of California, Davis, supports the CASP experiments by identifying prediction targets, accepting predictions, performing standard evaluations, assisting independent CASP assessors, presenting and archiving results, and facilitating information exchange relating to CASP and structure prediction in general. We provide an overview of the CASP infrastructure implemented at the Center, and summarize standard measures used for evaluating predictions in the latest round of CASP. Several components were introduced or significantly redesigned for CASP10, in particular an improved assessors' common web-workspace; a Sphere Grinder visualization tool for analyzing local accuracy of predictions; brand new blocks for evaluation contact prediction and contact-assisted structure prediction; expanded evaluation and visualization tools for tertiary structure, refinement and quality assessment. Technical aspects of conducting the CASP10 and CASP ROLL experiments and relevant statistics are also provided.

Keywords: CASP, protein structure prediction

Introduction

We present an update on the operations of the CASP Prediction Center in form of a short note. The basic CASP infrastructure and majority of standard prediction evaluation methods were discussed in previous papers^1–4. Here we provide a brief overview of the technical aspects of conducting CASP experiments, paying closer attention to the newly introduced evaluation measures and web resources.

Prediction targets

Regular CASP10 targets (T0xxx)

As in previous CASPs, collection, preparation and release of the targets were performed by the Prediction Center. During the course of CASP10, we received 130 sequences, which were pre-screened and if necessary verified with the centers or research groups that submitted them. One hundred and fourteen sequences have been selected as targets (T0644 through T0758, excluding T0748) and released for prediction. Details of some of the most interesting targets are discussed elsewhere in this issue⁵.

Similarly to the previous two CASPs, the selected target sequences were divided into two categories: (1) targets for prediction by all groups (all-group targets, or expert/server targets), and (2) targets for server prediction alone (server-only targets). The all-group targets were typically selected from among the more challenging targets, as indicated by the difficulty estimates based on the results of HHsearch^6,7 and PSI-BLAST⁸ template searches. For example, all targets for which HHsearch failed to identify templates covering more than 75% of the sequence with probability over 60% were included in the “all-group” dataset. Some easier targets were also included to balance the dataset difficulty. The daily release package typically consisted of one expert/server target, and one or two server-only targets. Targets were posted at the CASP10 website for expert groups 5 days a week for 11 weeks (May 1 through July 17, 2012). At the time of the web posting, the targets were also automatically forwarded to the participating servers through an automatic distribution system, which tracked submission/receipt status of the servers and generated warnings in case of apparent communication issues.

CASP ROLL targets (R0xxx)

In addition to the targets in the CASP10 experiment, from December 2011 through April 2012 we have been collecting and releasing sequences for prediction in the CASP ROLL experiment (see the ORGANIZERS' INTRODUCTION paper – THIS ISSUE). The target candidates were selected for CASP ROLL only if they passed the difficulty filters for the “all group” targets described above. All in all, we have prepared and released 18 challenging targets in CASP ROLL prior to CASP10 (target names starting with “R”, i.e. R0001 through R0018). Models submitted on these targets were evaluated by the Prediction Center and assessed by the CASP10 free modeling assessors in time for discussion at the CASP10 meeting in Gaeta, Italy. We continue to identify and release targets within the framework of the rolling experiment, and following CASP10, we have released 19 more targets (as of now, May 2013).

Refinement CASP10 targets (TRxxx)

During the three-week window for prediction on the regular targets or immediately thereafter, the organizers have solicited the coordinates of the released targets from the Structural Genomics Initiative centers and individual research groups. After the preliminary evaluation of server predictions, the targets, for which predictors submitted quite good models (GDT_TS^9,10 >60), were considered as candidates for the refinement experiment (names starting with TR), while the targets that appeared to be more difficult were considered for the contact-assisted experiment (names starting with Tc - see below).

Selection of the targets and starting models for the refinement experiment was a collaborative effort of the Prediction Center and the refinement assessor. A target was considered as appropriate for the refinement experiment if it was relatively short (<250 residues), missed a few residues at most, and had no apparent crystal contact distortions (if X-ray) or loosely defined loops (if NMR). For the detailed explanation of the selection process please refer to the refinement assessment paper [Ref. – THIS ISSUE]. All in all, 28 targets were selected for the refinement category.

Targets for contact-assisted structure prediction (Tcxxx, Rcxxx)

CASP10 commenced a new category of prediction, the contact-assisted modeling, suggested for inclusion as a CASP category at the CASP9 predictors' meeting. This category was introduced to test the ability of predictors to build /improve 3D models provided the knowledge of a few important long-range contacts. The experiment was carried out only for the more challenging CASP10 targets and for the CASP ROLL targets, and only for those for which coordinates have been available to the organizers for at least two weeks prior to their public release, allowing sufficient time to generate the appropriate sets of contacts and actual modeling.

Targets and relevant lists of contacts for this category were selected by the CASP organizers according to the following procedure. After a regular target (T0xxx or R0xxx) was evaluated and proved challenging, the Prediction Center analyzed all submitted models to identify which pairs of residues were predicted as being in contact (<8Å distance between C_β atoms, C_αs in case of GLY). Comparing the predicted contacts with the contacts extracted from the experimental structure, we composed two lists for every candidate domain. The first list included contacts from the native structure ordered according to their descending separation along the sequence. For each native contact, we calculated the percentage of predictions, where this contact was present. Working this list top down, for CASP10 we selected approximately L/12 - L/10 (L is the domain length) non-redundant contacts, and 3–5 contacts for CASP ROLL, that were represented in fewer than 10% of the submitted predictions. Contacts were considered redundant if they held together the same secondary structure elements in the protein. Contacts most difficult to predict were released as lists of pairs of residues, e.g., CONTACT_1: 199Q:28D, CONTACT_2:176S:34T, etc. The second list comprised all contacts from predictions, regardless of whether they were present in the native structure. We ordered this list based on the same principles as the native contact list (above), and noted which residues were often predicted as being in contact while in fact they were not. The non-contacts were released to predictors instead of contacts in cases where this had more sense, e.g. for a leucine-rich repeat target Tc653.

Registration

CASP10 registration opened in March 2012 and lasted until the end of the prediction season (August 2012). Registration for the rolling CASP experiment (CASP ROLL) opened in November 2011 and is open ended. Groups that participated in CASP ROLL prior to the start of CASP10 were automatically enrolled in CASP10. If a target was selected for both CASP10 and CASP ROLL experiments, the CASP10 predictions were automatically copied to the CASP ROLL infrastructure.

The CASP system allows registration of participants for both experiments in one of the following roles: (1) an expert group leader; (2) a server group leader; (3) a member of a prediction group; or (4) an observer. The group leaders were assigned a registration code allowing submission of predictions and were solely responsible for submissions from their group, even though they could delegate submission privileges to any registered member of the group. The expert groups were allowed to use any combination of knowledge and computational techniques, and had approximately three weeks to submit predictions for a given target. Server groups had to respond to the Prediction Center queries automatically and return models within 72 hours, without any human intervention.

Participants and predictions

In CASP10, 217 predictor groups representing 98 scientific centers from 23 countries submitted over 66,000 predictions on 114 targets. In each of the latest three CASPs, over 100 server groups participated (122 in CASP10), and the number of server groups exceeded the number of expert groups, reflecting the tendency for increased automation of methods in structure prediction. Sixty-three of the CASP10 groups also actively participated in CASP ROLL and submitted over 7,000 models on 18 targets.

To accommodate high volume of predictions in both experiments, we modified principles for prediction processing, storage, evaluation, and visualization. All predictions were collected, checked for format consistency and stored in separate CASP10 and CASP ROLL relational databases at the Prediction Center. The data processing lines for the two experiments were separated, but at the same time could use the same (sometimes slightly modified) software packages and evaluation facilities. In CASP10 we accepted predictions in 5 different formats: tertiary structure (TS), residue-residue contacts (RR), disordered regions (DR), model quality assessment (QA), and binding sites predictions (FN) (see http://predictioncenter.org/casp10/index.cgi?page=format for format details); in CASP ROLL we accepted only tertiary structure and contact predictions.

There were several changes in the acceptance procedure and formats since CASP9. The biggest changes were in the quality assessment category, where a new, two-stage submission procedure was introduced allowing for a more thorough analysis of methods in this category¹¹. Besides QA, we also slightly changed rules for the RR and DR categories, where we restricted the number of predictions to one per target^12,13. In the DR category we also begun to require that prediction of a residue in a disordered state should be necessarily accompanied by a probability value in the range [0.5–1.0]. In the future CASP experiments we also plan to require that all records in the residue-residue contact predictions are ordered according to the predicted probability of contacts.

Target structures

Submitted models in all prediction categories were evaluated against the reference structures prepared from the corresponding experimental structures by the Prediction Center in consultation with the independent assessors [DOMAIN DEFINITION paper, LOOP MODELING paper – both this issue]. If the experimental structure was a multimer, it was analyzed in terms of chain similarity and the most typical chain and/or the one missing fewest residues was selected as a reference. NMR structures were checked for ensemble model agreement and only well-defined regions were included. For evaluation purposes, structure's sequence, residue numbering, and the chain ID were brought into agreement with the released sequence. The target structures and details of their analysis (resolution, R-factor, space group, ligands, suggested parsing into domains, templates, etc.) were posted in a secure web workspace, which assessors used as a discussion forum for defining and categorizing domains by difficulty.

For some targets multiple reference structures or several evaluation unit definition schemes were considered to accommodate assessors' requests for additional analyses based on alternative domain definitions or alternative reference structures.

Evaluation at the Prediction Center

Evaluation of the submitted models was performed on a cluster of computation servers at the Genome Center in the University of California, Davis. The evaluation involved calculation of several million scores with various software packages^10,14–19. Some of the evaluation measures were already used in previous CASPs but quite a few were only recently developed.

Evaluation measures for tertiary structure predictions: global superposition scores

In historic perspective the RMSD (root mean square deviation) was the first measure used in the CASP evaluations and it is still reported in our automatic evaluation system. This measure is appropriate for comparing very similar structures (e.g., in high resolution homology modeling), but not optimal for cases where models are far away from the experimental structure⁴.

GDT-TS score was developed to overcome RMSD's shortcomings and became a standard evaluation measure in CASP (see papers^9,10 for details). Covering a wide range of cutoffs, the GDT-TS proved to be effective for the automatic evaluation of predictions as it quite adequately reflects absolute and relative accuracy of models for a wide range of target difficulty.

In addition to GDT-TS, several other conceptually similar measures are calculated with the LGA program and are reported in our Results pages (http://predictioncenter.org/casp10/results.cgi). GDT-HA (“HA” stands for high accuracy) is a modification of the GDT-TS score that uses tighter Cα distance cut-offs (0.5, 1, 2 and 4Å) and is better suited for the evaluation of backbone accuracy on high homology targets. GDC-SC²⁰ (“SC” stands for side-chain), instead of C_α s, uses characteristic atoms near the end of each side-chain to compare residue positions and therefore accentuates on the differences of models with respect to side-chain positioning. GDC-ALL is another representative of the GDT family of measures that compares similarity of structures taking into account positions of all atoms.

The GDT-like scores evaluate overall similarity of models and targets assuming that residue #1 in a model corresponds to residue #1 in the target, and so forth. These scores, by their sequence-dependent nature, cannot distinguish models with incorrect conformations from those having high structural similarity to the target but systematic shift in alignment along the sequence. To enable revealing these differences, we report the alignment accuracy scores AL0 [AL4], that show percentage of residues that were aligned correctly [with a shift not larger than 4 residues]. Since its inclusion in CASP⁹, the alignment accuracy scores were calculated based on the LGA sequence-independent superpositions of the model and the experimental structure with a distance threshold of 5Å. This cutoff now seems to be a little larger than appropriate, and in CASP10 we calculated the alignment scores using the tighter 4Å cutoff. A residue in the model is considered correctly aligned if its Cα atom is within 3.8 Å from the position of the corresponding experimental atom, and no other Cα atom lies closer. Besides AL0 and AL4 scores, we also provide the LGA_S scores as reported by the LGA program in the sequence-independent section of our Results pages¹⁰.

In addition to the scores calculated with the LGA, we also calculated global accuracy scores with other structure superposition programs routinely used in the field: DALI¹⁴ and MAMMOTH¹⁵.

Evaluation measures for tertiary structure predictions: superposition-free and local model accuracy scores

CAD-score¹⁶ is a new evaluation measure to compare two structures based on the differences in their residue-residue contact areas (CAD). The score favors physically realistic models and can be directly used to assess quality of submitted models on multi-domain targets without the prior splitting into separate evaluation units. Two variants of the CAD-score, based on the side-chain-atom and all-atom comparisons of contact areas are calculated.

lDDT¹⁷ is another newly introduced superposition-free measure based on the comparison of all-atom distance maps of model and target structures. Similarly to the CAD-score, it is well suited for evaluation of local model quality in presence of domain movements, still maintaining good correlation to global measures. It is also well adapted for evaluating quality of models with respect to NMR structures as it can use the whole ensemble of equivalent structures in place of a single reference structure.

SG (SphereGrinder) score is an all-atom local structure fitness score that has been introduced in CASP9. It reports the model-target similarity based on the local similarity of their substructures. For every residue, the RMSD score is calculated on sets of atoms inside the spheres of a selected radius centered on the same Cα atoms in model and target. For a selected sphere radius (from 1 to 300 Å) and an RMSD cutoff (from 1 to 15 Å), the SG-score reports percentage of spheres (residue-attached substructures) for which the model structure is in agreement with target. The SG score provided on the web is calculated for spheres with the radius of 6Å and for an RMSD cutoff of 2Å. The page allows reporting of scores for any other sphere radius and any other RMSD structure agreement cutoff within the above ranges.

RPF score was originally developed for assessing accuracy of NMR structures²¹ and adopted for evaluation of template-based models in CASP10. Similarly to lDDT, the RPF score is a superposition-free measure based on the comparison of model and target distance matrices. In essence, the RPF score is a normalized F₂-score (from descriptive statistics) that estimates accuracy of a model based on differences in distances between atoms in the model and the target.

Evaluation measures for tertiary structure predictions: model validation scores

Molprobity¹⁸ scores were calculated to help assessors differentiate between models with correct and distorted stereochemical features. The overall Molprobity score incorporates four components (all reported on the web) defining correctness of the evaluated structures: clash score, rotamer outlier score, Ramachandran outlier score, and Ramachandran favored score. ProSA's z-score ¹⁹ indicates overall model quality and measures the deviation of the total pseudo-energy of the structure with respect to an energy distribution derived from random conformations.

Evaluation measures for other prediction categories

The residue-residue contact predictions in CASP10 were evaluated with a battery of measures, including precision, recall, Xd-score, the Matthews correlation coefficient, the F-score and PR-ROC curves. Please consult our contact assessment paper¹³ for a detailed description of these and other measures, and for the results of the evaluation. The assessment paper discusses only a small portion of the evaluation data as calculations were performed on multiple datasets. The comprehensive results of the contact prediction evaluations can be accessed through our contact evaluation infrastructure specifically developed and implemented for the CASP10 experiment.

A variety of scores is reported for the QA assessment at the CASP10 website. Please check the QA evaluation paper¹¹ for more details.

Evaluation results for disorder and FN predictions are currently not reported at the CASP website. These results and description of the evaluation measures used can be found in the corresponding assessment papers [Disorder assessment, FN assessment – THIS ISSUE]

Reporting and visualizing results

During the CASP prediction season and thereafter, the evaluation results were made available to the independent assessors through a password-protected gateway on a continuing target by target basis, as soon as the calculations were completed at the Prediction Center. A week before the CASP10 meeting, all evaluation data for CASP10 and CASP ROLL were publicly released through the Prediction Center website (http://predictioncenter.org/{casp10|casprol}/results.cgi.

The general organizational principles of the infrastructure for displaying CASP results were outlined in our previous papers^1–3. Here we briefly describe the new elements of the system, such as dynamic web pages with cumulative per-group performance scores, new evaluation blocks for contact and contact-assisted predictions, expanded evaluation and visualization tools for tertiary structure, refinement and quality assessment, and the new visualization tool, the SphereGrinder²².

The basis for the SphereGrinder visualizations is the calculation performed as described in the SG score section above. In the 1D Plot mode, RMSD values are shown for each residue of the protein sequence and for a selected value of the sphere radius. Models may be selected and colored individually. In the 2D Map mode, models are visualized one at a time. RMSD values are represented with a color scale, as a function of the sphere radius and of position along the protein sequence. In the 3D Profile view, the RMSD values within spheres are shown along the Z-axis, in addition to the color coding. Otherwise the layout is as in the 2D Map. Additional two modes of visualization are dedicated to (A) displaying average RMSD results for a model as a function of the sphere size, and (B) displaying a fraction of residues for which model-target agreement fulfills the pre-selected RMSD criterion plotted also as a function of the sphere size. The latter option is used as the CASP10 cumulative SG score with the sphere radius of 6Å and the RMSD cutoff of 2Å. Both A and B modes have the option of displaying any subset of models on a given target. Figure 1 shows the 3D Profile view of two models submitted on target T0653.

A comparison of the SphereGrinder 3D Profile views for two of the 502 models submitted on the CASP10 free modeling target T0693_D1. The SphereGrinder visualization allows a single glance perception of model quality, which for larger structures may become challenging, even when using structural representations. RMSD values (shown along the vertical z-axis) are color coded (1–15 A ° blue to red scale) and are plotted as a two-dimensional function of the position along the protein sequence (x axis) and of the sphere radius (y axis). Shift, rotation, and zoom in/out controls allow more detailed viewing. In this particular case, it can be seen that the first model (upper part of the figure) is of consistent quality along the chain, while the second is better in the first approximately one-fourth of the chain and considerably worse in the remaining sections. The sphere radii are calculated for values of 1–30 A ° and for 300 A ° (back plane of the profile).

Results of contact prediction assessment are presented at the CASP website for the first time (http://predictioncenter.org/{casp10|casprol}/rr_results.cgi). The data are organized in tables dynamically generated from the CASP10 and CASP ROLL SQL databases for a set of user specified parameters including group selection, target/domain ID, the range of contacts according to the separation along the sequence (long, medium, or short-range) and the lists of contacts evaluated (L/5, L/10 or Top5, where L is the length of the domain in residues). Two additional tabs are available for CASP10 data: the “Summary” tab (http://predictioncenter.org/casp10/rr_summary_results.cgi) reporting cumulative group results on the selected types of domains (FM or FM+TBM_hard) and lists of contacts; and the “Additional Analysis” tab (http://predictioncenter.org/casp10/rr_additional.cgi) reporting general tendencies in CASP10 contact prediction and evaluation scores computed outside of the Prediction Center.

For all structure predictions, including those in the refinement and contact-assisted categories, we added tables and graphs visualizing the fine-grained local accuracy of the predictions expressed in terms of the distance between the corresponding residues in model and target. Clicking on a strip in the bar chart opens a diagram with a superposition of the selected model and the native structure.

For the refinement and contact-assisted targets we added the visualization tool displaying the improvement of the submitted models over the reference ones: in case of the refinement – over the starting model released by the organizers, and in case of the contact-assisted targets – over the corresponding unassisted model submitted by the same group, if any (Figure 2).

Improvement of the contact-assisted models over the original ones for target Tc649. For every model, the color-coded bars show regions of improvement (blue-green) or decline (orange-red) over the original model submitted by the same group. Clicking on the strip graph opens a line plot (see callout) showing actual distances between the corresponding Cα atoms in the native structure and the assisted (blue line) and non-assisted (green line) models, and difference between them (red line).

The results of the overall performance of the structure prediction groups are summarized in the dynamic web-pages http://predictioncenter.org/{casp10|casprol}/groups_analysis.cgi. The cumulative score tables can be generated for the first model or the best model submitted by the group on every target; for all groups on “expert/server” targets or for server groups on all targets; and for each target difficulty category separately or combined.

ACKNOWLEDGMENTS

We acknowledge the crystallographers and NMR spectroscopists taking part in CASP10, especially the researchers from the three structural genomic centers - JCSG, NESG and MCSG – who provided 89 out of the 114 prediction targets (see http://predictioncenter.org/casp10/numbers.cgi). Special thanks are extended to the staff of Protein Data Bank for providing targets to the experiment through the CASP hold structure submission option. This work was supported by NIH/NIGMS grant R01GM100482.

REFERENCES

1.Kryshtafovych A, Krysko O, Daniluk P, Dmytriv Z, Fidelis K. Protein structure prediction center in CASP8. Proteins. 2009;77(Suppl 9):5–9. doi: 10.1002/prot.22517. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Kryshtafovych A, Prlic A, Dmytriv Z, Daniluk P, Milostan M, Eyrich V, Hubbard T, Fidelis K. New tools and expanded data analysis capabilities at the Protein Structure Prediction Center. Proteins. 2007;69(Suppl 8):19–26. doi: 10.1002/prot.21653. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Kryshtafovych A, Milostan M, Szajkowski L, Daniluk P, Fidelis K. CASP6 data processing and automatic evaluation at the protein structure prediction center. Proteins. 2005;61(Suppl 7):19–23. doi: 10.1002/prot.20718. [DOI] [PubMed] [Google Scholar]
4.Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A. Evaluation of template-based models in CASP8 with standard measures. Proteins. 2009;77(Suppl 9):18–28. doi: 10.1002/prot.22561. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Kryshtafovych A, et al. Challenging the state-of-the-art in protein structure prediction: Highlights of experimental target structures for the 10th Critical Assessment of Techniques for Protein Structure Prediction Experiment CASP10. Proteins. 2013;(This issue) doi: 10.1002/prot.24489. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Soding J, Biegert A, Lupas AN. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 2005;33(Web Server issue):W244–248. doi: 10.1093/nar/gki408. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Remmert M, Biegert A, Hauser A, Soding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2012;9(2):173–175. doi: 10.1038/nmeth.1818. [DOI] [PubMed] [Google Scholar]
8.Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Zemla A, Venclovas, Moult J, Fidelis K. Processing and evaluation of predictions in CASP4. Proteins. 2001;(Suppl 5):13–21. doi: 10.1002/prot.10052. [DOI] [PubMed] [Google Scholar]
10.Zemla A. LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003;31(13):3370–3374. doi: 10.1093/nar/gkg571. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Kryshtafovych A, Barbato A, Fidelis K, Monastyrskyy B, Schwede T, Tramontano A. Assessment of the assessment: evaluation of the model quality estimates in CASP10. Proteins. 2013;(THIS ISSUE) doi: 10.1002/prot.24347. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Monastyrskyy B, Kryshtafovych A, Moult J, Tramontano A, Fidelis K. Assessment of protein disorder region predictions in CASP10. Proteins. 2013;(This issue) doi: 10.1002/prot.24391. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Monastyrskyy B, D'Andrea D, Fidelis K, Tramontano A, Kryshtafovych A. Evaluation of residue-residue contact prediction in CASP10. Proteins. 2013;(This issue) doi: 10.1002/prot.24340. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Holm L, Kaariainen S, Rosenstrom P, Schenkel A. Searching protein structure databases with DaliLite v.3. Bioinformatics. 2008;24(23):2780–2781. doi: 10.1093/bioinformatics/btn507. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Ortiz AR, Strauss CE, Olmea O. MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci. 2002;11(11):2606–2621. doi: 10.1110/ps.0215902. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Olechnovic K, Kulberkyte E, Venclovas C. CAD-score: a new contact area difference-based function for evaluation of protein structural models. Proteins. 2013;81(1):149–162. doi: 10.1002/prot.24172. [DOI] [PubMed] [Google Scholar]
17.Mariani V, Biasini M, Barbato A, Schwede T. lDDT: A local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics. 2013 doi: 10.1093/bioinformatics/btt473. Submitted. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Chen VB, Arendall WB, 3rd, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr. 2010;66(Pt 1):12–21. doi: 10.1107/S0907444909042073. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Wiederstein M, Sippl MJ. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res. 2007;35(Web Server issue):W407–410. doi: 10.1093/nar/gkm290. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Keedy D, Williams CJ, Arendall WB, Chen VB, Kapral GJ, Gillespie RA, Zemla A, Richardson DC, Richardson JS. The other 90% of the protein: Assessment beyond Cas for CASP8 template-based models. Proteins. 2009;(This issue) doi: 10.1002/prot.22551. III. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Huang YJ, Powers R, Montelione GT. Protein NMR recall, precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics. J Am Chem Soc. 2005;127(6):1665–1674. doi: 10.1021/ja047109h. [DOI] [PubMed] [Google Scholar]
22.Lukasiak P, Wojciechowski M, Ratajczak T, Hasinski K, Monastyrskyy B, Kryshtafovych A, Fidelis K. SphereGrinder - estimating similarity of structures on a local scale. Proceedings of CASP10 conference; Gaeta, Italy. 2012. pp. 274–275. [Google Scholar]

[R1] 1.Kryshtafovych A, Krysko O, Daniluk P, Dmytriv Z, Fidelis K. Protein structure prediction center in CASP8. Proteins. 2009;77(Suppl 9):5–9. doi: 10.1002/prot.22517. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Kryshtafovych A, Prlic A, Dmytriv Z, Daniluk P, Milostan M, Eyrich V, Hubbard T, Fidelis K. New tools and expanded data analysis capabilities at the Protein Structure Prediction Center. Proteins. 2007;69(Suppl 8):19–26. doi: 10.1002/prot.21653. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Kryshtafovych A, Milostan M, Szajkowski L, Daniluk P, Fidelis K. CASP6 data processing and automatic evaluation at the protein structure prediction center. Proteins. 2005;61(Suppl 7):19–23. doi: 10.1002/prot.20718. [DOI] [PubMed] [Google Scholar]

[R4] 4.Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A. Evaluation of template-based models in CASP8 with standard measures. Proteins. 2009;77(Suppl 9):18–28. doi: 10.1002/prot.22561. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Kryshtafovych A, et al. Challenging the state-of-the-art in protein structure prediction: Highlights of experimental target structures for the 10th Critical Assessment of Techniques for Protein Structure Prediction Experiment CASP10. Proteins. 2013;(This issue) doi: 10.1002/prot.24489. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Soding J, Biegert A, Lupas AN. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 2005;33(Web Server issue):W244–248. doi: 10.1093/nar/gki408. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Remmert M, Biegert A, Hauser A, Soding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2012;9(2):173–175. doi: 10.1038/nmeth.1818. [DOI] [PubMed] [Google Scholar]

[R8] 8.Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Zemla A, Venclovas, Moult J, Fidelis K. Processing and evaluation of predictions in CASP4. Proteins. 2001;(Suppl 5):13–21. doi: 10.1002/prot.10052. [DOI] [PubMed] [Google Scholar]

[R10] 10.Zemla A. LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003;31(13):3370–3374. doi: 10.1093/nar/gkg571. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Kryshtafovych A, Barbato A, Fidelis K, Monastyrskyy B, Schwede T, Tramontano A. Assessment of the assessment: evaluation of the model quality estimates in CASP10. Proteins. 2013;(THIS ISSUE) doi: 10.1002/prot.24347. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Monastyrskyy B, Kryshtafovych A, Moult J, Tramontano A, Fidelis K. Assessment of protein disorder region predictions in CASP10. Proteins. 2013;(This issue) doi: 10.1002/prot.24391. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Monastyrskyy B, D'Andrea D, Fidelis K, Tramontano A, Kryshtafovych A. Evaluation of residue-residue contact prediction in CASP10. Proteins. 2013;(This issue) doi: 10.1002/prot.24340. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Holm L, Kaariainen S, Rosenstrom P, Schenkel A. Searching protein structure databases with DaliLite v.3. Bioinformatics. 2008;24(23):2780–2781. doi: 10.1093/bioinformatics/btn507. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Ortiz AR, Strauss CE, Olmea O. MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci. 2002;11(11):2606–2621. doi: 10.1110/ps.0215902. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Olechnovic K, Kulberkyte E, Venclovas C. CAD-score: a new contact area difference-based function for evaluation of protein structural models. Proteins. 2013;81(1):149–162. doi: 10.1002/prot.24172. [DOI] [PubMed] [Google Scholar]

[R17] 17.Mariani V, Biasini M, Barbato A, Schwede T. lDDT: A local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics. 2013 doi: 10.1093/bioinformatics/btt473. Submitted. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Chen VB, Arendall WB, 3rd, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr. 2010;66(Pt 1):12–21. doi: 10.1107/S0907444909042073. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Wiederstein M, Sippl MJ. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res. 2007;35(Web Server issue):W407–410. doi: 10.1093/nar/gkm290. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Keedy D, Williams CJ, Arendall WB, Chen VB, Kapral GJ, Gillespie RA, Zemla A, Richardson DC, Richardson JS. The other 90% of the protein: Assessment beyond Cas for CASP8 template-based models. Proteins. 2009;(This issue) doi: 10.1002/prot.22551. III. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Huang YJ, Powers R, Montelione GT. Protein NMR recall, precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics. J Am Chem Soc. 2005;127(6):1665–1674. doi: 10.1021/ja047109h. [DOI] [PubMed] [Google Scholar]

[R22] 22.Lukasiak P, Wojciechowski M, Ratajczak T, Hasinski K, Monastyrskyy B, Kryshtafovych A, Fidelis K. SphereGrinder - estimating similarity of structures on a local scale. Proceedings of CASP10 conference; Gaeta, Italy. 2012. pp. 274–275. [Google Scholar]

PERMALINK

CASP PREDICTION CENTER INFRASTRUCTURE AND EVALUATION MEASURES IN CASP10 AND CASP ROLL

Andriy Kryshtafovych

Bohdan Monastyrskyy

Krzysztof Fidelis

Abstract

Introduction

Prediction targets

Regular CASP10 targets (T0xxx)

CASP ROLL targets (R0xxx)

Refinement CASP10 targets (TRxxx)

Targets for contact-assisted structure prediction (Tcxxx, Rcxxx)

Registration

Participants and predictions

Target structures

Evaluation at the Prediction Center

Evaluation measures for tertiary structure predictions: global superposition scores

Evaluation measures for tertiary structure predictions: superposition-free and local model accuracy scores

Evaluation measures for tertiary structure predictions: model validation scores

Evaluation measures for other prediction categories

Reporting and visualizing results

Figure 1.

Figure 2.

ACKNOWLEDGMENTS

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

CASP PREDICTION CENTER INFRASTRUCTURE AND EVALUATION MEASURES IN CASP10 AND CASP ROLL

Andriy Kryshtafovych

Bohdan Monastyrskyy

Krzysztof Fidelis

Abstract

Introduction

Prediction targets

Regular CASP10 targets (T0xxx)

CASP ROLL targets (R0xxx)

Refinement CASP10 targets (TRxxx)

Targets for contact-assisted structure prediction (Tcxxx, Rcxxx)

Registration

Participants and predictions

Target structures

Evaluation at the Prediction Center

Evaluation measures for tertiary structure predictions: global superposition scores

Evaluation measures for tertiary structure predictions: superposition-free and local model accuracy scores

Evaluation measures for tertiary structure predictions: model validation scores

Evaluation measures for other prediction categories

Reporting and visualizing results

Figure 1.

Figure 2.

ACKNOWLEDGMENTS

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases