Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Jan 15.
Published in final edited form as: Proteins. 2013 Oct 17;82(Suppl 2):84–97. doi: 10.1002/prot.24367

Assessment of CASP10 contact-assisted predictions

Todd J Taylor 1, Hongjun Bai 1, Chin-Hsien Tai 1, Byungkook Lee 1,*
PMCID: PMC6961783  NIHMSID: NIHMS1066295  PMID: 23873510

Abstract

In CASP10, for the first time, contact-assisted structure predictions have been assessed. Sets of pairs of contacting residues from target structures were provided to predictors for a second round of prediction after the initial round in which they were given only sequences. The objective of the experiment was to measure model quality improvement resulting from the added contact information and thereby assess and help develop so-called hybrid prediction methods—methods where some experimentally determined distance constraints are used to augment de novo computational prediction methods. The results of the experiment were, overall, quite promising.

Keywords: protein structure prediction, CASP10, contact assisted, hybrid prediction

INTRODUCTION

Contact-assisted (or contact-aided, CA) structure predictions have been assessed for the first time in CASP10. For this experiment, in addition to the amino acid sequences, sets of pairs of contacting residues from selected target structures were provided to predictors for a second round of prediction after the initial round to measure the resulting improvement in prediction quality. Ten targets from CASP ROLL (See Tai et al., Assessment of Template-free Modeling in CASP10 and ROLL in this issue) were chosen for the CA experiment, as were 17 targets from CASP10. Relatively few contacts were provided for the former (typically ~5), while more were provided for the CASP10 targets. See Figure 1 for an example structure with contacts shown. Only more difficult targets were chosen, as gross contact constraints, for example CB-CB distance ≤ 8.5Å, should not significantly improve predictions for a target for which a good template exists.

Figure 1.

Figure 1

Target T0719-D6 with the 13 provided contacts. The ribbon rendering is rainbow-colored blue to red from the N- to C-terminii. Each provided contact is indicated by two balls at the Cα positions of the two contacting residues and a black broken line between them.

The experiment was intended to objectively assess hybrid prediction methods (e.g., Refs. 19) where sparse contact constraints, for example from NMR or correlated mutation data, are combined with de novo structure prediction methods. For this initial experiment, however, the contacts come from a simple computational procedure devised by the Prediction Center (see below) to give a set of long range contacts that most predictors missed in the initial, unassisted round. They were not designed to mimic the contacts likely to come from real life experiments, nor those that are expected to be important for folding or function.

To choose the list of contacts for a particular target, the Prediction Center tabulated all contacts in the target with CB-CB separation ≤ 6.5Å. The resulting list was sorted in decreasing order of sequence separation. Proceeding down the sorted list, if a contact was present in ~10%–15% or less of the unassisted predictions, it was included in the list of contacts with the caveat that only one representative was included from any related set of contacts on this list (e.g., (res 72, res 101), (res 70, res 99), (res 69, res 102), etc.).

The procedure was terminated after the number of contacts reached approximately one tenth the number of residues for CASP10 targets, or some small, predetermined number in the case of CASP ROLL targets. This procedure ensured that the most abundant sequence separation between selected contacting residue pairs in each target was typically well above 30. No contacts were included that crossed domain boundaries in multidomain targets.

All 17 CASP10-CA targets were X-ray structures, and all belonged to the all groups prediction track. Six were FM, four TBM, six TBM-hard, and one FM/TBM (See Taylor et al., Definition and Classification of Evaluation Units for CASP10 in this issue). Twenty-four groups submitted a total of 1102 predictions for the CASP10-CA targets. All 10 CASP ROLL-CA targets were X-ray structures and all belonged to the human/server prediction track. Twenty-two groups submitted a total of 395 predictions for the CASP ROLL-CA targets. See Tables I and II for more details.

Table I.

Targets from CASP10 (A) and ROLL (B) Used in the Contact-Assisted Experiment Along with the Number of Residues in the Domain (num res), Prediction Class for CASP10 Targets, PDB ID if Available, and Description of Architecture

A
ID Num res Pred class PDB Architecture
T0649 184 TBM-hard 4f54 (A/B) 5-strand sheet, 2 big helices on one side
T0653 383 FM/TBM 4fs7 (A/B) bent LRR (SCOP c.10)
T0658-D1 166 FM 4fj6 (B) 4-strand /5-strand barrel-like B-sandwich
T0666 195 FM NA (A) 6-helix trans-membrane protein
T0673 62 TBM 4f98 (B) 3-strand sheet
T0676 173 TBM-hard 4e6f (A/B) 5-strand sheet, 2 big helices on one side
T0678 160 TBM-hard 4epz (A) alpha-alpha superhelix
T0680 96 TBM 4fm3 (A) big 3-helix bundle
T0684-D1 73 TBM 4gl6 (A+B) 4-strand sheet tied together by 1 helix (SCOP c.37)
T0684-D2 168 FM 4gl6 (A/B) 4-strand sheet surrounded by 4+2 helices
T0691 130 TBM 4gzv (B) 4-hairpin B-barrel
T0705-D2 344 TBM-hard 4ftd (B) 6-bladed B-propeller (SCOP b.68)
T0717-D2 166 TBM-hard 4h0a (A/B) 6-strand sheet with helices on top and one side
T0719-D6 163 FM 4ak1 (B) 4-strand sheet/4-strand sheet B-sandwich + 1 helix
T0734 213 FM NA (A/B) two 5-helical bundles joined by a B-ribbon; roughly dumbbell
T0735-D1 233 TBM-hard 4g2a (B) 7-strand sheet/7-strand sheet B-sandwich
T0735-D2 88 FM 4g2a (A) small 5-helix domain
B
ID Num res PDB Architecture
Rc001-D1 93 4a0u (A+B) 5-strand sheet, 3-strand sheet, helix all in a line
Rc001-D2 90 4a0u (B) 4-strand sheet/4-strand sheet B-sandwich
Rc006-D9 169 4e0e (A+B) 5-strand sheet/6-strand sheet B-sandwich, tied by 1 helix on both (SCOP b.30)
Rc007 161 4dkc (A) alpha-alpha superhelix
Rc012-D1 308 4dwe (A+B) TIM or TIM-like
Rc012-D2 104 4dwe (B) 4-strand sheet/4-strand sheet B-sandwich
Rc012-D3 48 4dwe (A) 3-helix bundle
Rc013 121 4ecn (B) 4-strand sheet/3-strand sheet B-sandwich
Rc014 136 4ecn (B) 4-strand sheet/4-strand sheet B-sandwich (SCOP d.17)
Rc015 240 4e9k (B) 4-strand sheet/8-strand sheet B-sandwich plus small helix

Table II.

Number of CASP10 (A) and ROLL (B) Targets for which Contact-Aided and Unaided Predictions were Submitted by Each Group

Group Name Type Aided Unaided
A
045 Zhang_Ab_Initio Human 17 17
077 FLOUDAS Human 15 14
103 PconsM Server 2 2
108 PMS Server 17 17
124 PconsD Server 2 2
179 Lenserver Server 4 1
198 chuo-fams-server Server 17 17
201 TsaiLab Human 10 10
222 MULTICOM-CONSTRUCT Server 14 14
238 chuo-repack-server Server 17 17
292 Pcons-net Server 2 2
294 chuo-repack Human 17 17
301 LEE Human 17 17
311 Laufer Human 11 1
365 chuo-fams Human 17 17
373 Kim_Kihara Human 17 17
434 chuo-fams-consensus Human 16 16
471 chuo-binding-sites Human 17 16
473 Seok Human 16 16
477 BAKER Human 17 17
482 biouv Human 1 1
489 MULTICOM Human 10 10
490 Zhang_Refinement Human 17 17
493 LEEMO Human 15 15
B
045 Zhang_Ab_Initio Human 10 10
087 Distill_roll Server 2 2
110 DAVIS-7 Human 1 1
113 SAM-T08-server Server 2 2
124 PconsD Server 10 10
179 Lenserver Server 3 0
249 Wsb Human 1 0
261 Seok-server Server 1 1
267 Pcons Human 4 4
292 Pcons-net Server 10 10
298 MidwayFolding Human 1 1
308 nns Server 4 4
311 Laufer Human 1 0
330 BAKER-ROSETTASERVER Server 2 2
344 Jones-UCL Human 1 1
381 SAM-T06-server Server 2 2
413 ZHOU-SPARKS-X Server 1 0
428 PconsQ Human 4 4
435 ossia Human 9 8
438 FALCON-server Server 1 1
444 Lenregular Human 7 4
477 BAKER Human 10 9

MATERIALS AND METHODS

Both in the interest of time and to detect small differences in the improvements between CA predictions that the human eye might miss, we resorted to purely score-based schemes to rank predictions and measure the overall increase in model quality for the contact assisted predictions. No clustering was done before ranking, as in our assessment of FM and ROLL (Tai et al., this issue) predictions, because ranking was purely score based and it was not necessary to minimize the number of models visually assessed. However, we did cluster the models after ranking to see if any server models had been submitted by multiple groups.

Except when we measured the community-wide improvement (see below), we used only one model from each group for a particular target, which was the best model as measured by the score function in use and not necessarily model 1. We chose two scores, GDT-TS,10 often abbreviated here as GDT, and QCS,11,12 created for CASP9 FM assessment, to rank the predictions. GDT-TS, an alignment based score, has been a staple in CASP evaluation for years.1218 But constructing an alignment for marginal or poor predictions can be problematic, and GDT can be less reliable for such predictions,1214,1921 which is why we also used QCS, a score that is not alignment dependent.

Structure comparison functions not reliant on alignment are not new (e.g., Refs. 2225) and have been used by previous FM assessors to supplement GDT-TS (e.g., QCS,11,12 lDDT,18 and Q16). We chose QCS to supplement GDT in evaluating the CA experiment because it reproduced our own visual assessments on test sets better than other previously published nonalignment-based scores we evaluated (data not shown). QCS gave results similar to GDT, so we have put much of the QCS data in the Supporting Information in the interest of space. Most, though not all, of the results presented will be those obtained using the GDT score.

To measure community-wide improvement, we performed t-tests for each target on all, not just the best, aided predictions versus all unaided predictions from those groups who submitted both predictions.

We measured group performance several ways. First, we defined two simple quantities, the absolute improvement and individual improvement.

Define A(g, T) as the CA best prediction by group g for target T, and U(h, T) as the unaided best prediction by group h on target T. Let S be a target-prediction scoring function. Then, the absolute improvement under S of group g with respect to target T is defined Ia(g,T) = S(A(g, T)) − max over h S(U(h, T)), in other words, the score of the best assisted model submitted by a group g for a particular target T minus the score of the best overall unassisted model from among all groups for T. Similarly, the individual improvement of group g with respect to target T is defined Ii(g,T) = S(A(g, T)) − S(U(g, T)), in other words, the difference between the scores of the best assisted and unassisted models submitted by a group G for a particular target T. We computed the Z-scores of these two quantities and then ranked a group both by the sum of their Z-scores over all predictions and by the average of their Z-scores. We set negative Z-scores to 0 before summing or averaging, as most recent CASP assessors have done, not to penalize bad predictions more severely than nonsubmissions.

We also measured group performance by running head-to-head paired t-tests on the best assisted predictions for all common targets between each pair of predictors, as previous CASP assessors have done.12,26,27 Predictors were ranked by the number of statistically significant (at a 5% level) wins.

As a final measure of the performance of group G, we counted the total number of models submitted by all other groups, counting only the best models from each group, that were inferior (wins) or superior (losses) to the best models from G. This sum was taken over all targets. For example, suppose group G beat groups H and I on target T1, and that G, H, and I were the only groups who predicted T1. Further suppose that G beat groups H and J on target T2, and G, H, and J were the only groups who predicted T2. Finally, suppose that G lost to H and J on target T3 where again G, H, and J were the only groups who predicted T3. We tally four wins and two losses for group G among the total of all six possible head-to-head trials. The advantage of this score is that it can seamlessly handle uneven number of predicting groups for different targets and that it can be compared with the outcome from a null hypothesis of random wins. Assuming that the probability of win per trial for group G is p0 and uniform over all trials and assuming that ties happen with negligible frequency, the probability that G will have exactly k wins is:28

p(k,n)=n!k!(nk)!p0k(1p0)nk

where n is the total number of wins and losses (e.g., six in the above example). Under the null hypothesis that G performs no better or worse on average than any other group, p0 = ½. The probability P that G will have at least k wins is given by:

P=i=knp(i,n)

RESULTS

Community-wide improvement and the number of contacts required to improve predictions

The best GDT-TS, abbreviated as GDT, and the best QCS scores by each prediction group with and without contact information for all the targets are given in the Supporting Information ST1ST8.

The community-wide t-tests (Table IIIA) show significant improvement in the mean scores resulting from contact information for all 17 CASP10-CA (Tc) targets. The box plot of GDT scores for Tc targets [Fig. 2(a)] shows that the median score increased in all 17 cases and the best score improved by at least ~ 8 GDT points in all but two cases after including the contact information. The bar plot of the absolute improvements (see Materials and Methods) of the five best predictions for each Tc target [Fig. 3(a)] also shows improvements in most cases. Many top predictions improved by 20 GDT points or more. The results from using the QCS score are similar (see Supporting Information ST9ST12, SF1SF6).

Table III.

One-Tailed t-tests of GDT Scores of all (not just the Best) Assisted Versus All Unassisted Models for CASP10 (A) and CASP ROLL (B) Targets from the Groups Submitting both Types of Models

ID Num res nc Num_T0 Num_Tc Mean_GDT_T0 Mean_GDT_Tc P-value
A
Tc649 184 16 87 76 22.31 24.51 0.0394
Tc653 383 12 75 63 22.35 25.59 2.60 E –03
Tc658-D1 166 16 71 55 13.26 20.51 3.30 E –08
Tc666 195 14 80 71 22.20 31.34 8.45 E –11
Tc673 62 5 78 59 39.60 47.10 6.10 E –06
Tc676 173 17 95 76 23.83 31.99 2.90 E –10
Tc678 160 12 84 71 26.82 37.41 1.30 E –13
Tc680 96 3 82 64 48.39 61.69 1.17 E –08
Tc684-D1 73 8 85 72 26.32 37.77 1.96 E –10
Tc684-D2 168 18 85 72 16.81 24.35 6.16 E –12
Tc691 130 15 84 69 30.24 34.56 4.70 E –04
Tc705-D2 344 34 70 56 23.34 28.69 3.64 E –04
Tc717-D2 166 15 71 56 24.74 31.53 8.62 E –04
Tc719-D6 163 13 74 66 13.57 16.53 9.33 E –04
Tc734 213 20 72 54 14.13 21.83 8.30 E –07
Tc735-D1 233 28 81 61 14.43 27.46 9.41 E –14
Tc735-D2 88 7 81 61 28.94 35.23 2.03 E –06
B
Rc001-D1 93 0 42 40 23.20 22.55 0.72
Rc001-D2 90 4 37 42 25.28 25.25 0.51
Rc006-D9 169 4 61 72 18.47 17.11 0.82
Rc007 161 4 74 74 23.09 24.00 0.24
Rc012-D1 308 6 27 26 14.27 13.77 0.59
Rc012-D2 104 4 25 25 18.80 18.98 0.45
Rc012-D3 48 0 26 26 43.81 49.53 0.09
Rc013 121 6 22 26 24.45 25.02 0.36
Rc014 136 6 15 33 20.54 24.36 0.030
Rc015 240 10 24 26 10.08 11.81 0.020

Cases where the assisted means are significantly greater (with P-value < 0.05) are shaded. Abbreviations are nc: number of contacts; num_T0 and num_Tc: number of unassisted and assisted models, respectively, in the t-test; mean_GDT_T0 and mean_GDT_Tc: mean GDT scores of unassisted and assisted models, respectively.

Figure 2.

Figure 2

Box plots of GDT-TS scores of assisted (red) and unassisted (green) predictions for CASP10 (top, a) and CASP-ROLL (bottom, b) targets. The boxes show the 25th to 75th percentiles and the whiskers extend to the 0th and 100th percentiles. The medians are indicated by the short horizontal bars. These plots include all predictions, not just the best.

Figure 3.

Figure 3

The five best absolute (upper left and lower left) and individual (upper right and lower right) improvements for Tc (top) and Rc (bottom) targets under GDT-TS. The predictor ID numbers are shown below the bars.

On the other hand, Table IIIB for CASP ROLL (Rc) targets shows that there was no significant community-wide gain in ROLL target prediction scores from including contact information, except for Rc014 and Rc015. The box plot [Fig. 2(b)] and the bar plot of absolute improvements [Fig. 3(c)] show that improvements were made for Rc014 and Rc015 by a few at the very top, but that the median scores were essentially unchanged for these targets as well. A larger number of contacts were provided for Rc014 and Rc015 than for most other Rc targets, and for these two there was improvement at least for the very best predictions. Clearly, the relatively few contacts provided for the Rc targets were not enough to broadly improve predictions, while the larger number provided for the Tc targets usually did improve predictions.

Figure 4 shows the scatter plot of the best absolute improvement in GDT achieved for all Tc and Rc targets against the number of provided contacts per residue. The overall trend is increasing improvement with increasing number of contacts per residue but with much variation. Despite a relatively large number of provided contacts, predictors as a group did not improve quality of the best models for Tc649 and Tc691. Conversely, they achieved considerable improvement from relatively few contacts for Tc653, Tc680, and Rc014. The case of Tc680 is understandable because of its simple 3-helix bundle architecture—if the helices are predicted correctly, only a very few contacts are needed to restrict a prediction to the correct structure. However, predictors were told that this target is a tetramer and given additional interchain contacts, not counted in Figure 4, which may have contributed to the improvement of the predicted monomer structures. Tc653 is a unique case, for which were provided “negative” contacts, that is pairs of residues that were not in contact. The better models for this curved LRR, both assisted and unassisted, are clearly leucine rich repeats. However, the GDT scores of the unassisted models are low because they, like all searchable LRR template structures, are bent in the opposite direction to that in the target. The twelve negative contacts given, relatively few for such a large target, are apparently enough to constrain the direction and degree of curvature to significantly improve the best prediction, although not so more broadly [see Fig. 3(a)]. Figure 4 together with Tables IIIA and IIIB indicate that the number of contacts per residue where we begin to see significant improvement is in the range ~ 0.04–0.06.

Figure 4.

Figure 4

The best absolute improvement in GDT scores achieved for all Tc (circles) and Rc (triangles) targets plotted against the number of contacts provided per residue.

Per group improvement

The performances of individual groups relative to other predictors are shown in Table IV, which gives the sums and averages of the GDT and QCS Z-scores, in Table V, which gives win counts over all predictors and all targets in terms of the GDT scores, in Supporting Information ST13ST16, which gives the similar win counts but based on the QCS scores, and in Table VI, which gives the win count on head-to-head comparisons with other predictors. The actual amounts of improvements in terms of the GDT scores are given in Figure 3 for the top five predictions for each target and in the Supporting Information SF3SF6 in terms of the QCS scores.

Table IV.

Per Group Sums and Means of Z-Scores over both GDT and QCS Scores (e.g., if a Group Submitted 17 predictions, there are 34 Terms in the Sum, 17 GDT Z-scores, and 17 QCS Z-scores)

Gr# Sum Z Mean Z #Target submit
A
477 67.14 1.97 17
045 25.66 0.75 17
301 19.11 0.56 17
108 19.11 0.56 17
490 19.06 0.56 17
311 9.49 0.43 11
077 9.37 0.31 15
222 9.31 0.33 14
493 8.48 0.28 15
238 5.85 0.17 17
294 5.70 0.17 17
124 4.67 1.17 2
198 4.35 0.13 17
103 4.18 1.04 2
292 3.59 0.90 2
365 3.55 0.10 17
471 3.54 0.10 17
434 3.21 0.10 16
373 3.00 0.09 17
489 2.74 0.14 10
473 1.24 0.04 16
201 1.08 0.05 10
482 0.00 0.00 1
179 0.00 0.00 4
B
477 47.54 1.40 17
471 28.39 0.89 16
493 19.79 0.66 15
108 18.91 0.56 17
045 17.30 0.51 17
301 15.74 0.46 17
238 15.05 0.44 17
198 14.81 0.44 17
222 11.91 0.43 14
077 10.14 0.36 14
124 6.31 1.58 2
490 5.95 0.18 17
201 4.44 0.22 10
473 4.12 0.13 16
103 2.85 0.71 2
292 2.08 0.52 2
434 0.14 0.00 16
294 0.14 0.00 17
489 0.11 0.01 10
373 0.04 0.00 17
482 0.00 0.00 1
365 0.00 0.00 17
311 0.00 0.00 1
179 0.00 0.00 1
C
477 20.52 1.03 10
045 16.20 0.81 10
435 10.53 0.59 9
413 4.16 2.08 1
330 3.83 0.96 2
124 3.13 0.16 10
428 2.52 0.31 4
381 2.07 0.52 2
444 0.88 0.06 7
087 0.57 0.14 2
292 0.52 0.03 10
267 0.52 0.07 4
110 0.21 0.10 1
308 0.06 0.01
249 0.06 0.03 1
438 0.00 0.00 1
344 0.00 0.00 1
311 0.00 0.00 1
298 0.00 0.00 1
261 0.00 0.00 1
179 0.00 0.00 3
113 0.00 0.00 2
D
435 12.79 0.80 8
045 9.69 0.48 10
124 7.03 0.35 10
477 6.34 0.35 9
330 4.94 1.24 2
292 4.41 0.22 10
308 2.95 0.42 4
444 1.78 0.22 4
428 1.74 0.22 4
267 1.01 0.13 4
113 0.81 0.20 2
087 0.62 0.16 2
261 0.58 0.29 1
438 0.51 0.26 1
381 0.38 0.10 2
344 0.15 0.07 1
110 0.03 0.01 1
298 0.00 0.00 1

The tables show absolute (A,C) and individual (B,D) improvements on Tc (A,B) and Rc (C,D) targets. Absolute improvement is a group’s best assisted score minus the community-wide best unassisted score. Individual improvement is a group’s best assisted score minus the same group’s best unassisted score. Any Z-score less than 0 was set to 0. Note that the number of predictions submitted by a group can differ between these two tables as some groups submitted an assisted prediction but no corresponding unassisted predictions.

Table V.

Overall Win/Loss Counts for Each Group in All-Against-All Pair-wise Comparisons of Assisted Predictions for the Tc (VA) and Rc (VB) Targets using GDT Score

Group Num pred Wins Losses Fraction wins P
A
477 17 272 14 0.951 <2.2 E –16
045 17 211 76 0.735 3.7 E –16
301 17 197 72 0.732 6.6 E –15
108 17 197 72 0.732 6.6 E –15
124 2 36 2 0.947 2.7 E –09
103 2 35 2 0.946 5.1 E –09
490 17 192 96 0.667 8.1 E –09
292 2 34 3 0.919 6.2 E –08
222 14 124 116 0.517 0.33
311 11 95 90 0.514 0.38
077 15 123 132 0.482 0.73
493 15 113 143 0.441 0.97
238 17 110 140 0.440 0.98
294 17 103 154 0.401 0.9994
198 17 94 148 0.388 0.9998
434 16 87 138 0.387 0.9998
489 10 60 104 0.366 0.9998
471 17 86 158 0.352 1
365 17 78 162 0.325 1
373 17 81 206 0.282 1
201 10 49 126 0.280 1
473 16 64 208 0.235 1
179 4 7 67 0.095 1
482 1 0 19 0.000 1
B
045 10 59 18 0.766 1.5 E –06
413 1 14 0 1.000 6.1 E –05
330 2 24 5 0.828 2.7 E –04
477 10 52 25 0.675 1.4 E –03
428 4 28 16 0.636 4.8 E –01
435 9 41 32 0.562 0.17
249 1 6 3 0.667 0.25
110 1 9 6 0.600 0.30
124 10 38 34 0.528 0.36
087 2 15 14 0.517 0.50
311 1 3 3 0.500 0.66
308 4 22 25 0.468 0.72
344 1 5 8 0.385 0.87
381 2 12 17 0.414 0.87
261 1 4 11 0.267 0.98
267 4 14 29 0.326 0.9931
444 7 20 41 0.328 0.9978
113 2 7 22 0.241 0.9988
292 10 23 48 0.324 0.9991
179 3 0 16 0.000 1
298 1 0 9 0.000 1
438 1 0 14 0.000 1

P is the probability that a win/loss record equal to or better than the observed record could have been obtained by chance.

Table VI.

Number of Wins in Head-to-Head Pairwise Comparisons of Predicting groups.

Group Num pred Num wins
A
477 17 38
045 17 32
490 17 28
108 17 26
301 17 25
222 14 12
124 2 8
238 17 7
294 17 7
493 15 6
103 2 6
292 2 6
077 15 5
198 17 5
311 11 4
434 16 3
471 17 3
201 10 2
365 17 2
373 17 1
473 16 1
489 10 1
179 4 0
482 1 0
B
045 10 11
124 10 8
428 4 6
435 9 4
477 10 4
292 10 1
330 2 1
087 2 0
110 1 0
113 2 0
179 3 0
249 1 0
261 1 0
267 4 0
298 1 0
308 4 0
311 1 0
344 1 0
381 2 0
413 1 0
438 1 0
444 7 0

The pairwise comparison is made by a t-test of contact-assisted GDT or QCS scores for all common Tc (VIA) or Rc (right VIB) targets. A group wins the pairwise comparison if its set of scores is significantly (at 5% significance level) better than its competitor’s by the T-test. The number of wins reported is the sum of the number of wins under GDT and that under QCS.

For the Tc targets, Tables (IVVI) show that group 477 (BAKER) performed better than all others by wide margins. Figure 3(a,b) also shows that this group made larger improvements than others for the most number of targets. For example, for Tc734, this group used the contact information to improve their GDT score by more than 40 [Fig. 3(b)] to produce a high quality model (Fig. 5), which they could not before the contact information.

Figure 5.

Figure 5

Top row: the target T0734 (left), the best assisted model Tc734TS477_4 (middle, GDT 5 56.34), and the best unassisted model T0734TS108_4 (right, GDT 5 18.31). Contact information resulted in much improvement in model quality here. Middle row: the target T0691 (left), the best assisted model Tc691TS222_5 (middle, GDT 5 42.91), and the best unassisted model T0691TS301_1 (right, GDT 5 45.03). Contact information resulted in no improvement in model quality here. Bottom row: the targetT0680 (left), the best assisted model Tc680TS477_4 (middle, GDT 5 89.6), and the best unassisted model T0680TS489_1 (right, GDT 5 75.5). All molecules are rainbow colored blue to red from the N- to C-termini. Models were optimally superimposed to the target, and then separated by translations along the horizontal direction.

The next group of predictors who used the contact information better than others are 045 (Zhang_Ab_Initio), 301 (LEE), 108 (PMS), and 490 (Zhang_Refinement) by all three measures although the rank order varied depending on the particular measure used. Groups 471 (chuo-binding-sites) and 493 (LEEMO) made more improvement over their own unassisted predictions than most others (Table IVB) but were not among the top performers when compared with the best unassisted predictions (Table IVA). Groups 124 (PconsD), 103 (PconsM), and 292 (Pcons-net) produced highly competitive models (high mean Z-scores in Tables IVA and IVB and high win fractions in Table V) but submitted for only two targets.

For the Rc targets, group 477 (BAKER) again made the most absolute improvement, followed by 045 (Zhang_Ab_Initio) and 435 (ossia). Figure 3(c,d) show that, while a number of predictors made substantial improvements over their own predictions by using the contact information, the improved models were not any better than the best unassisted models for most targets. Figure 3(c) shows that improvements over the best unassisted models (positive absolute improvements) were made only by the above three groups. Tables VB and VIB show that there are other groups who did well in terms of win counts. However, their winning models are not better than the best unassisted models.

Selected examples

Figure 3(a) shows that most improvement was made for T0/Tc734 and that no absolute improvement was made for T0/Tc691 by any predictor. As pointed out above already, the best contact-assisted model for the primarily helical T0/Tc734 structure (Fig. 5) is impressively better than the best unassisted model.

Figure 5 shows, for T0/Tc691, the target structure, the best assisted, and best unassisted predictions. The GDT scores of the two predictions are 42.9 and 45.0, respectively. In contrast to the large improvement seen for T0/Tc734 (Fig. 5), the best assisted model in this case is slightly better than the unassisted model from the same group but poorer than the best unassisted model. It is interesting to speculate if there is some correlation of degree of improvement with protein architecture, but there are too few targets to make any strong claims.

Figure 2(a) shows that the contact-assisted model that has the highest GDT score (89.6) is for T0/Tc680. The median score (72) is also the highest for this target. The best assisted and unassisted models for this target are shown in Figure 5. Many models for this target are so close to the target structure that they begin to be similar among themselves. For example, taking only the best assisted models from each group, there were 51 pairs of models (out of total 153 possible pairs) that were highly similar (RMSD < 3Å) but non-identical (RMSD > 0.1Å). Each member of these pairs had an RMSD with respect to the target no greater than 4.52A with the mean being 3.5Å. This number of 51 highly similar model pairs for Tc680 is more than three times as many as for any other CA target and also more than three times as many as for T0680.

Correlation of scores with the fraction of satisfied contacts

Table VII shows the fraction of provided contacts satisfied (averaged over all contacts and predictions) for each Tc target (frac_Tc) and the T0 predictions (frac_T0) from the groups who also predicted Tc. The fraction satisfied for T0 is quite low, at or below 10%. This is because the Prediction Center deliberately chose contacts for the CA experiment that were mostly missed in the unassisted predictions. More interestingly, the frac_Tc numbers show that, even knowing a list of contacts from the target structure, predictors were able to satisfy only about half of these contacts in their models, on average. It should be noted that there are many predictions that satisfy much more than half, and indeed some that satisfy all of them.

Table VII.

Correlation Coefficients of the Fraction of Contacts Satisfied (Defined as CB-CB < 6.5Å or 8.5Å) and GDT-TS or QCS Scores for Assisted Predictions for each CASP10 Target

Target nc Median GDT Corr/GDT 6.5 Å Corr/GDT 8.5 Å Median QCS Corr/QCS 6.5 Å Corr/QCS 8.5 Å Frac T0 Frac Tc
Tc649 16 27.65 0.015 0.010 60.09 0.041 0.11 0.027 0.42
Tc653 12 22.98 −0.30 −0.31 38.85 −0.15 −0.16 0.16 0.54
Tc658-D1 16 19.43 0.78 0.65 42.25 0.74 0.65 0.034 0.38
Tc666 14 30.00 0.76 0.62 47.45 0.69 0.57 0.063 0.44
Tc673 5 45.56 0.21 −0.13 60.60 0.15 −0.20 0.086 0.68
Tc676 17 32.80 0.39 0.38 58.32 0.45 0.30 0.048 0.49
Tc678 12 38.47 0.71 0.83 61.97 0.40 0.72 0.087 0.59
Tc680 3 59.12 0.15 0.082 67.98 0.13 0.11 0.012 0.59
Tc684-D1 8 36.82 0.52 0.70 45.99 0.46 0.68 0.082 0.54
Tc684-D2 18 22.77 0.79 0.81 46.59 0.78 0.83 0.033 0.49
Tc691 15 36.88 −0.11 0.11 67.36 −0.16 0.084 0.10 0.61
Tc705-D2 34 32.31 −0.063 0.42 61.05 −0.018 0.48 0.067 0.49
Tc717-D2 15 30.57 0.70 0.39 54.14 0.70 0.22 0.045 0.57
Tc719-D6 13 14.11 0.75 0.40 35.55 0.75 0.44 0.050 0.55
Tc734 20 18.02 0.87 0.48 40.83 0.86 0.52 0.024 0.51
Tc735-D1 26 31.65 0.47 0.42 58.49 0.54 0.48 0.049 0.45
Tc735-D2 7 31.82 0.67 0.64 37.60 0.73 0.69 0.062 0.53

The final two columns: frac_T0, the fraction of contacts satisfied in unassisted predictions (averaged over all contacts and all models) and frac_Tc, the fraction of contacts satisfied in assisted predictions. Note that column nc shows number of contacts provided and that, for Tc653, the nc column gives the number of non-contacts and the last two columns indicate the absence of contacts in the models. These statistics are for all predictions, not just the best from each group.

In good models, many provided contacting residue pairs are indeed in contact according to the criterion that their Cβ-Cβ distance be less than 6.5–8.5Å. But the converse is often not true, that is a model that has all the given set of contacts is not necessarily good overall. We counted the fraction of provided contacts satisfied in each model and calculated the correlation coefficient between these numbers and the GDT or QCS scores of all assisted models for each target. Table VII shows the correlations for all Tc targets. For at least five of the 17 targets, the correlation coefficient is less than 0.2, including the bottom 3 Tc targets in Figure 4. Even when the correlation coefficient is relatively high, poor models exist that have many contacts satisfied. As an example, Figure 6 shows two models for Tc719-D6, both of which satisfy all the provided contacts. But one (477_2, GDT 5 46.93) is clearly superior to the second (301_5, GDT 5 18.86). To build a good model, it is not sufficient to simply satisfy the relatively few contact constraints given in this experiment.

Figure 6.

Figure 6

The target T0719-D6 (left), the best assisted model Tc719TS477_2-D6 (middle, GDT 5 46.93), and another assisted model Tc719TS301_5-D6 (right, GDT 5 18.86). Both models satisfy all contacts (CB-CB < 8.5Å), although Tc719TS477_2-D6 is clearly better than Tc719TS301_5-D6.

Redundancy among predictions

As mentioned before, we clustered the models after ranking to see if any server models had been submitted by multiple groups. The RMSD distance matrices used to do the clustering may be more instructive than the clustering itself. There were between 16 and 20 groups who submitted predictions for each Tc target. Considering only the best prediction from each group for a particular target, there are therefore between 120 and 190 pairs of best predictions for each target (number of pairs = Ni = ni(ni − 1)/2 where ni is the number of groups predicting target i), and we computed the RMSD for each such pair. There were 2603 such pairs for all 17 CASP10 CA targets combined. Removing those from Tc680, which was a special case as detailed in a previous section, and eliminating pairs with RMSD > 3Å leaves 208. For 186 of these 208 pairs, both models have distinct group id’s, but both group id’s are associated with the same research group (e.g., LEE group 301 and PMS group 108). The remaining 22 pairs consist of a model from one research group paired with several identical models all from one other research group (with different group IDs, at least one of which is a server) for each of the targets Tc666, Tc673, Tc676, Tc678, and Tc684-D1. It appears, therefore, that five server models were copied, each by only one other research group. This is an amount of presumed copying less than half that detectable in the T0 submissions for the Tc targets (data not shown), and an amount small compared with those among FM models (See Tai et al., Assessment of Template-free Modeling in CASP10 and ROLL in this issue).

Contact-assisted predictions for the T0680 tetramer

In addition to the three intrachain contacts provided for the contact assisted monomeric target Tc680, the Prediction Center also told predictors that this target was a tetramer and provided six interchain contacts. Predictors were asked to submit contact-assisted models for the tetramer. This was the only target in the CA experiment for which oligomer predictions were solicited.

Considering the great abundance of programs to compare protein monomers, there are surprisingly few programs available designed to compare multichain complexes, and hence deal with the added complication of chain id ambiguity and cross-chain alignments,29 where a whole chain can be aligned to parts of two other chains. We chose two such programs, MM-align29 and SCPC,30 to score Tc680o predictions (Table VIII). MM-align is not ideal for this task, however, as it is designed to compare complexes with non-identical sequences and therefore inserts gaps. Neither is SCPC ideal, since it compares constituent subunits of complexes by matching secondary structure elements, but the matches need not respect topology.31 We want a sequence dependent alignment for complexes, as we are dealing with structure predictions.

Table VIII.

Scoring Tc680o Predictions using GDT and QCS on Tetramers Converted to Pseudo-Monomers with the Predictions Sorted in Decreasing Order of GDT Score

Model GDT Perm QCS Perm MM SCPC RMSD (TM) Perm RMSD (MM) SD top 4 perm GDT
477_5o 75.59 DCBA 96.82 BADC 0.89 86.24 2.32 ABCD 2.20 0.00
477_2o 69.01 DCBA 95.38 DCBA 0.87 83.07 2.68 ABCD 2.74 0.00
477_4o 66.97 DCBA 96.29 DCBA 0.86 64.02 2.81 ABCD 2.77 0.00
477_3o 54.21 DCBA 95.81 CDAB 0.74 74.60 4.47 ABCD 4.50 0.00
477_1o 47.04 DCBA 88.77 CDAB 0.71 48.15 6.64 ABCD 4.12 0.00
045_2o 43.82 ABCD 82.92 CDAB 0.65 10.85 6.47 BADC 4.67 0.38
490_5o 42.11 BADC 83.24 ABCD 0.66 29.63 6.25 ABCD 5.32 0.42
490_1o 38.75 DCBA 81.19 ABCD 0.66 29.10 6.53 DCBA 4.73 0.18
490_3o 37.63 BCDA 83.79 CBAD 0.38 13.23 7.26 CBAD 5.40 0.48
490_2o 37.30 ABCD 83.48 DCBA 0.62 34.92 6.83 BADC 5.31 0.27
490_4o 33.55 BADC 80.72 DCBA 0.62 11.38 8.39 DCBA 5.01 0.33
045_3o 33.09 DCBA 81.80 DCBA 0.57 17.99 7.65 DCBA 5.96 0.33
045_5o 32.83 DCBA 77.94 ABCD 0.61 20.11 7.89 DCBA 5.46 0.22
045_4o 32.50 CDAB 81.96 DCBA 0.61 27.51 8.58 DCBA 5.07 0.05
045_1o 31.51 DCBA 77.11 ABCD 0.61 18.52 8.10 DCBA 5.23 0.14
01_1013 28.68 BADC 59.88 BADC 0.51 5.03 22.30 CADB 5.00 0.22
108_1o 28.68 BADC 59.88 BADC 0.51 5.03 22.30 CADB 5.00 0.22
493_1o 28.68 BADC 59.88 BADC 0.51 5.03 22.30 CADB 5.00 0.22
077_1o 25.86 DABC 61.90 CBAD 0.35 13.49 12.99 DABC 5.93 0.20
077_4o 23.42 ABCD 60.99 CDAB 0.44 5.82 13.95 BADC 6.07 0.11
077_2o 22.96 BADC 60.19 DCBA 0.44 8.99 14.00 BCDA 6.02 0.30
01_1012 22.30 DCBA 51.32 CDAB 0.37 1.32 14.18 DABC 5.33 0.18
077_5o 21.65 DABC 59.53 CBAD 0.34 7.41 13.55 DCBA 5.92 0.07
077_3o 21.58 CDAB 59.86 BADC 0.42 4.23 13.38 BCDA 5.61 0.06
201_3o 19.74 BADC 50.80 DCBA 0.33 6.61 13.83 DABC 6.65 0.13
201_2o 19.28 DCAB 51.23 DBAC 0.29 NA 15.07 DBAC 6.74 0.16

Results from MM-align and SCPC, designed specifically to compare multichain complexes, are also shown. The RMSD, calculated by TM-score, on the sequence dependent, ungapped alignment and RMSD from the gapped MM-align alignment are also shown. GDT, QCS and RMSD were obtained by considering the T0680 tetramer target and the oligomer predictions as single chains by concatenating the chains and scoring them as pseudo-monomers. All 24 possible orderings of the prediction’s chain id’s in the pseudo-monomer were scored, and the highest scoring one is shown along with that permutation (in the perm column). The SD top 4 perm GDT column is explained in the text. The target pseudo-monomer was simply constructed by concatenating chains A, B, C, and D in that order. MM-align and SCPC were run on the unaltered tetrameric target/predictions. NA indicates that the scoring program failed to give a score.

We chose another quite straightforward way to also score such oligomeric predictions considering them as pseudo-monomers by concatenating all monomers into one chain and using conventional structure comparison functions after renumbering residues and merging chain id’s. As the chain ids of prediction and target will not be the same in general, one must score all permutations of chain id’s in the prediction against the target.29 Cross-chain alignments29 can be avoided by performing sequence-dependent alignments.

We used the score functions GDT-TS (sequence dependent mode), QCS,11and TM-score32 (the sequence dependent version of TM-align, used here only to calculate RMSD) to score Tc680o predictions with chains concatenated to form pseudo-monomers (Table VIII). Nonalignment-based comparison methods, like QCS, obviously cannot give cross-chain alignments, and using the sequence dependent mode of the other two scores again prohibits cross-chain alignment. For each prediction, all 24 permutations of chain order were converted to pseudo-monomers and were scored against the target, similarly converted to a pseudo-monomer with chain order ABCD. The top scoring such permutation gave the score for that tetramer prediction. As there are only 26 predictions, we also looked at the superpositions of each to the target manually.

The target homotetramer consists of four helix bundle monomers arranged in a rectangular X shaped configuration (Fig. 7), with each monomer forming one arm of the X. With the A chain on the upper left and proceeding clockwise around the X, the chains are A, D, B, and C. The tetramer has D2 symmetry, with three mutually perpendicular symmetry axes. Therefore, the scores of the 24 chain id permutations of the pseudomonomers for each prediction fall into groups of four with similar or identical scores, for example ABCD, BADC, CDAB, and DCBA. The standard deviation of the scores of the group of four with the highest GDT scores is shown in Table VIII in the SD top 4 perm column, and is a measure of model symmetry, lower score indicating higher symmetry.

Figure 7.

Figure 7

The tetrameric target T0680o (shown in red) and the best tetramer model Tc680oTS477_5o (shown in blue). The RMSD of the tetramer model with respect to the tetramer target is 2.32Å. The monomer is shown in Fig.5 and the chains are, clockwise starting on the upper left, chain B, chain C, chain A, and chain D.

All the score functions put predictions Tc680TS477_5o, Tc680TS477_2o, and Tc680TS477_4o in the top five predictions, and visual assessment confirms that all three are excellent. Although models like Tc680TS490_2o are still very good, there is a clear drop off in model quality below these three predictions from the BAKER group. The superposition of Tc680TS477_5o and the target, shown in Figure 7, is very impressive.

Note that 24 oligomeric structure predictions for T0680 (before the contact information was given) were also submitted. However, all of them predicted the structure as a dimer, and only one group submitted oligomer predictions in both the assisted and unassisted cases. It makes little sense, therefore, to try to compute absolute or individual improvements for Tc680o, as we did in the monomeric target cases.

CONCLUSIONS AND SUGGESTIONS FOR THE FUTURE

The CASP10-CA experiment has shown that added contact information can substantially improve protein structure predictions. The experiment was one of the best unassisted, and in all cases except one for Tc targets, most hopeful parts of CASP10. In the best cases, assisted the best assisted models scored higher than the best predictions had GDT scores 40 points higher than the unassisted. This experiment has also provided some information on the number of contacts required to produce improved predictions. The number of contacts per residue where we begin to see significant improvement is in the range 0.04–0.06.

We note that the contact information provided to the CA predictors indirectly carried information on domain boundaries because no contacts were provided that crossed a domain boundary. The domain boundary information implicit in such sets of contacts might provide an advantage for contact-assisted modeling over and above the direct benefit from the contact information. In Figure 4, there does not appear to be any systematic tendency for larger improvement in domains from multidomain targets than single domain targets, but there is such a large per target variation that it is difficult to rule out this effect. If no inter-domain contacts are given, one way of more purely assessing the benefits of the contact information would be to provide the domain boundary information both for the assisted and unassisted predictions.

Early in our analysis of CA predictions, we attempted to measure the relative importance of the provided contacts. One of the ways this was done was to split the predictions into two classes, good and bad. As the distribution of prediction scores is most often bimodal,17 it was straightforward to find a natural good/bad score cutoff for each target. We then built decision tree classifiers to map a vector representing if each contact was satisfied in a model (0 = unsatisfied, 1 = satisfied) to the binary classes good/bad. We used classifiers rather than simply measuring the degree to which models that satisfied a single particular contact were over-represented in the good predictions, because it is possible that combinations of contacts may be better indicators of model goodness (e.g., contacts A and B may be satisfied individually in many bad models, but both satisfied only in good models). Although we could almost always build classifiers that correctly labeled the great majority of predictions good/bad, the results of such experiments were ultimately not convincing because the number of predictions was not very much greater than the number of provided contacts. Hopefully, given the promising results of this experiment, there will be very many more contact-assisted predictions in the next round of CASP, and it may then be possible to say something about the relative importance of the provided contacts. When this becomes possible, it may also be interesting to include disulfides, salt bridges, high network centrality contacts, and so forth to see if contacts more important for folding or function are also more important for structure prediction.

Although the results of this initial CASP10 experiment are highly encouraging for the future of contact-assisted modeling, we must caution that we may not have put hybrid prediction to the true test here because the contacts provided in this first experiment were highly artificial and may not resemble those that are likely to be obtained from a real hybrid modeling experiment. The results of contact-assisted CASP experiments will be most convincing if the contacts used come from experimental data or are at least chosen to closely mimic experimental data.

Supplementary Material

1
18
19
2
20
21
22
3
4
5
6
10
7
8
9
11
12
13
14
15
16
17

ACKNOWLEDGMENTS

The authors thank the experimental groups who provided the target structures and the prediction groups who provided the predictions. They also thank Dr. Andriy Kryshtafovych and the staff of the Prediction Center for the extensive support provided for this work. Molecule pictures were generated by the UCSF Chimera package33 and PyMOL (the PyMOL Molecular Graphics System, Version 1.4, Schroedinger, LLC). Chimera is developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco (supported by NIGMS P41-GM103311). This research was supported by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research.

Footnotes

Additional Supporting Information may be found in the online version of this article.

REFERENCES

  • 1.Schmitz C, Vernon R, Otting G, Baker D, Huber T. Protein structure determination from pseudocontact shifts using ROSETTA. J Mol Biol 2012;416:668–677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Lange OF, Baker D. Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation. Proteins 2012;80:884–895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Lange OF, Rossi P, Sgourakis NG, Song Y, Lee HW, Aramini JM, Ertekin A, Xiao R, Acton TB, Montelione GT, Baker D. Determination of solution structures of proteins up to 40 kDa using CS-Rosetta with sparse NMR data from deuterated samples. Proc Natl Acad Sci USA 2012;109:10873–10878. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Raman S, Huang YJ, Mao B, Rossi P, Aramini JM, Liu G, Montelione GT, Baker D. Accurate automated protein NMR structure determination using unassigned NOESY data. J Am Chem Soc 2010;132:202–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bowers PM, Strauss CE, Baker D. De novo protein structure determination using sparse NMR data. J Biomol NMR 2000;18:311–318. [DOI] [PubMed] [Google Scholar]
  • 6.Shen Y, Vernon R, Baker D, Bax A. De novo protein structure generation from incomplete chemical shift assignments. J Biomol NMR 2009;43:63–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Meiler J, Baker D. Rapid protein fold determination using unassigned NMR data. Proc Natl Acad Sci USA 2003;100:15404–15409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sulkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN. Genomicsaided structure prediction. Proc Natl Acad Sci USA 2012;109: 10340–10345. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Hirst SJ, Alexander N, McHaourab HS, Meiler J. RosettaEPR: an integrated tool for protein structure determination from sparse EPR data. J Struct Biol 2011;173:506–514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Zemla A LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res 2003;31:3370–3374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Cong Q, Kinch LN, Pei J, Shi S, Grishin VN, Li W, Grishin NV. An automatic method for CASP9 free modeling structure prediction assessment. Bioinformatics 2011;27:3371–3378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kinch L, Yong Shi S, Cong Q, Cheng H, Liao Y, Grishin NV. CASP9 assessment of free modeling target predictions. Proteins 2011; 79(Suppl 10):59–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Vincent JJ, Tai CH, Sathyanarayana BK, Lee B. Assessment of CASP6 predictions for new and nearly new fold targets. Proteins 2005;61(Suppl 7):67–83. [DOI] [PubMed] [Google Scholar]
  • 14.Jauch R, Yeo HC, Kolatkar PR, Clarke ND. Assessment of CASP7 structure predictions for template free targets. Proteins 2007; 69(Suppl 8):57–67. [DOI] [PubMed] [Google Scholar]
  • 15.Kopp J, Bordoli L, Battey JN, Kiefer F, Schwede T. Assessment of CASP7 predictions for template-based modeling targets. Proteins 2007;69(Suppl 8):38–56. [DOI] [PubMed] [Google Scholar]
  • 16.Ben-David M, Noivirt-Brik O, Paz A, Prilusky J, Sussman JL, Levy Y. Assessment of CASP8 structure predictions for template free targets. Proteins 2009;77(Suppl 9):50–65. [DOI] [PubMed] [Google Scholar]
  • 17.Keedy DA, Williams CJ, Headd JJ, Arendall WB III, Chen VB, Kapral GJ, Gillespie RA, Block JN, Zemla A, Richardson DC, Richardson JS. The other 90% of the protein: assessment beyond the Calphas for CASP8 template-based and high-accuracy models. Proteins 2009;77(Suppl 9):29–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Mariani V, Kiefer F, Schmidt T, Haas J, Schwede T. Assessment of template based protein structure predictions in CASP9. Proteins 2011;79(Suppl 10):37–58. [DOI] [PubMed] [Google Scholar]
  • 19.Kryshtafovych A, Fidelis K, Moult J. CASP9 results compared to those of previous CASP experiments. Proteins 2011;79(Suppl 10): 196–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kinch LN, Wrabl JO, Krishna SS, Majumdar I, Sadreyev RI, Qi Y, Pei J, Cheng H, Grishin NV. CASP5 assessment of fold recognition target predictions. Proteins 2003;53(Suppl 6):395–409. [DOI] [PubMed] [Google Scholar]
  • 21.Tress M, Ezkurdia I, Grana O, Lopez G, Valencia A. Assessment of predictions submitted for the CASP6 comparative modeling category. Proteins 2005;61(Suppl 7):27–45. [DOI] [PubMed] [Google Scholar]
  • 22.Holm L, Park J. DaliLite workbench for protein structure comparison. Bioinformatics 2000;16:566–567. [DOI] [PubMed] [Google Scholar]
  • 23.Bostick DL, Shen M, Vaisman II. A simple topological representation of protein structure: implications for new, fast, and robust structural classification. Proteins 2004;56:487–501. [DOI] [PubMed] [Google Scholar]
  • 24.Zotenko E, Dogan RI, Wilbur WJ, O’Leary DP, Przytycka TM. Structural footprinting in protein structure comparison: the impact of structural fragments. BMC Struct Biol 2007;7:53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gibrat JF, Madej T, Bryant SH. Surprising similarities in structure comparison. Curr Opin Struct Biol 1996;6:377–385. [DOI] [PubMed] [Google Scholar]
  • 26.Tramontano A, Morea V. Assessment of homology-based predictions in CASP5. Proteins 2003;53(Suppl 6):352–368. [DOI] [PubMed] [Google Scholar]
  • 27.Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A. Evaluation of template-based models in CASP8 with standard measures. Proteins 2009;77(Suppl 9):18–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hollander M, Wolfe DA. Nonparametric statistical methods. New York: Wiley; 1973. xviii, 503 p. [Google Scholar]
  • 29.Mukherjee S, Zhang Y. MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming. Nucleic Acids Res 2009;37:e83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Koike R, Ota M. SCPC: a method to structurally compare protein complexes. Bioinformatics 2012;28:324–330. [DOI] [PubMed] [Google Scholar]
  • 31.Mizuguchi K, Go N. Comparison of spatial arrangements of secondary structural elements in proteins. Protein Eng 1995;8:353–362. [DOI] [PubMed] [Google Scholar]
  • 32.Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins 2004;57:702–710. [DOI] [PubMed] [Google Scholar]
  • 33.Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. UCSF Chimera—a visualization system for exploratory research and analysis. J Comput Chem 2004;25:1605–1612. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
18
19
2
20
21
22
3
4
5
6
10
7
8
9
11
12
13
14
15
16
17

RESOURCES