Assessment of CASP10 contact-assisted predictions

Todd J Taylor; Hongjun Bai; Chin-Hsien Tai; Byungkook Lee

doi:10.1002/prot.24367

. Author manuscript; available in PMC: 2020 Jan 15.

Published in final edited form as: Proteins. 2013 Oct 17;82(Suppl 2):84–97. doi: 10.1002/prot.24367

Assessment of CASP10 contact-assisted predictions

Todd J Taylor ¹, Hongjun Bai ¹, Chin-Hsien Tai ¹, Byungkook Lee ^1,^*

PMCID: PMC6961783 NIHMSID: NIHMS1066295 PMID: 23873510

Abstract

In CASP10, for the first time, contact-assisted structure predictions have been assessed. Sets of pairs of contacting residues from target structures were provided to predictors for a second round of prediction after the initial round in which they were given only sequences. The objective of the experiment was to measure model quality improvement resulting from the added contact information and thereby assess and help develop so-called hybrid prediction methods—methods where some experimentally determined distance constraints are used to augment de novo computational prediction methods. The results of the experiment were, overall, quite promising.

Keywords: protein structure prediction, CASP10, contact assisted, hybrid prediction

INTRODUCTION

Contact-assisted (or contact-aided, CA) structure predictions have been assessed for the first time in CASP10. For this experiment, in addition to the amino acid sequences, sets of pairs of contacting residues from selected target structures were provided to predictors for a second round of prediction after the initial round to measure the resulting improvement in prediction quality. Ten targets from CASP ROLL (See Tai et al., Assessment of Template-free Modeling in CASP10 and ROLL in this issue) were chosen for the CA experiment, as were 17 targets from CASP10. Relatively few contacts were provided for the former (typically ~5), while more were provided for the CASP10 targets. See Figure 1 for an example structure with contacts shown. Only more difficult targets were chosen, as gross contact constraints, for example CB-CB distance ≤ 8.5Å_, should not significantly improve predictions for a target for which a good template exists.

Target T0719-D6 with the 13 provided contacts. The ribbon rendering is rainbow-colored blue to red from the N- to C-terminii. Each provided contact is indicated by two balls at the Cα positions of the two contacting residues and a black broken line between them.

The experiment was intended to objectively assess hybrid prediction methods (e.g., Refs. 1–9) where sparse contact constraints, for example from NMR or correlated mutation data, are combined with de novo structure prediction methods. For this initial experiment, however, the contacts come from a simple computational procedure devised by the Prediction Center (see below) to give a set of long range contacts that most predictors missed in the initial, unassisted round. They were not designed to mimic the contacts likely to come from real life experiments, nor those that are expected to be important for folding or function.

To choose the list of contacts for a particular target, the Prediction Center tabulated all contacts in the target with CB-CB separation ≤ 6.5Å_. The resulting list was sorted in decreasing order of sequence separation. Proceeding down the sorted list, if a contact was present in ~10%–15% or less of the unassisted predictions, it was included in the list of contacts with the caveat that only one representative was included from any related set of contacts on this list (e.g., (res 72, res 101), (res 70, res 99), (res 69, res 102), etc.).

The procedure was terminated after the number of contacts reached approximately one tenth the number of residues for CASP10 targets, or some small, predetermined number in the case of CASP ROLL targets. This procedure ensured that the most abundant sequence separation between selected contacting residue pairs in each target was typically well above 30. No contacts were included that crossed domain boundaries in multidomain targets.

All 17 CASP10-CA targets were X-ray structures, and all belonged to the all groups prediction track. Six were FM, four TBM, six TBM-hard, and one FM/TBM (See Taylor et al., Definition and Classification of Evaluation Units for CASP10 in this issue). Twenty-four groups submitted a total of 1102 predictions for the CASP10-CA targets. All 10 CASP ROLL-CA targets were X-ray structures and all belonged to the human/server prediction track. Twenty-two groups submitted a total of 395 predictions for the CASP ROLL-CA targets. See Tables I and II for more details.

Table I.

Targets from CASP10 (A) and ROLL (B) Used in the Contact-Assisted Experiment Along with the Number of Residues in the Domain (num res), Prediction Class for CASP10 Targets, PDB ID if Available, and Description of Architecture

A
ID	Num res	Pred class	PDB	Architecture
T0649	184	TBM-hard	4f54	(A/B) 5-strand sheet, 2 big helices on one side
T0653	383	FM/TBM	4fs7	(A/B) bent LRR (SCOP c.10)
T0658-D1	166	FM	4fj6	(B) 4-strand /5-strand barrel-like B-sandwich
T0666	195	FM	NA	(A) 6-helix trans-membrane protein
T0673	62	TBM	4f98	(B) 3-strand sheet
T0676	173	TBM-hard	4e6f	(A/B) 5-strand sheet, 2 big helices on one side
T0678	160	TBM-hard	4epz	(A) alpha-alpha superhelix
T0680	96	TBM	4fm3	(A) big 3-helix bundle
T0684-D1	73	TBM	4gl6	(A+B) 4-strand sheet tied together by 1 helix (SCOP c.37)
T0684-D2	168	FM	4gl6	(A/B) 4-strand sheet surrounded by 4+2 helices
T0691	130	TBM	4gzv	(B) 4-hairpin B-barrel
T0705-D2	344	TBM-hard	4ftd	(B) 6-bladed B-propeller (SCOP b.68)
T0717-D2	166	TBM-hard	4h0a	(A/B) 6-strand sheet with helices on top and one side
T0719-D6	163	FM	4ak1	(B) 4-strand sheet/4-strand sheet B-sandwich + 1 helix
T0734	213	FM	NA	(A/B) two 5-helical bundles joined by a B-ribbon; roughly dumbbell
T0735-D1	233	TBM-hard	4g2a	(B) 7-strand sheet/7-strand sheet B-sandwich
T0735-D2	88	FM	4g2a	(A) small 5-helix domain

B
ID	Num res	PDB	Architecture
Rc001-D1	93	4a0u	(A+B) 5-strand sheet, 3-strand sheet, helix all in a line
Rc001-D2	90	4a0u	(B) 4-strand sheet/4-strand sheet B-sandwich
Rc006-D9	169	4e0e	(A+B) 5-strand sheet/6-strand sheet B-sandwich, tied by 1 helix on both (SCOP b.30)
Rc007	161	4dkc	(A) alpha-alpha superhelix
Rc012-D1	308	4dwe	(A+B) TIM or TIM-like
Rc012-D2	104	4dwe	(B) 4-strand sheet/4-strand sheet B-sandwich
Rc012-D3	48	4dwe	(A) 3-helix bundle
Rc013	121	4ecn	(B) 4-strand sheet/3-strand sheet B-sandwich
Rc014	136	4ecn	(B) 4-strand sheet/4-strand sheet B-sandwich (SCOP d.17)
Rc015	240	4e9k	(B) 4-strand sheet/8-strand sheet B-sandwich plus small helix

Open in a new tab

Table II.

Number of CASP10 (A) and ROLL (B) Targets for which Contact-Aided and Unaided Predictions were Submitted by Each Group

Group	Name	Type	Aided	Unaided
A
045	Zhang_Ab_Initio	Human	17	17
077	FLOUDAS	Human	15	14
103	PconsM	Server	2	2
108	PMS	Server	17	17
124	PconsD	Server	2	2
179	Lenserver	Server	4	1
198	chuo-fams-server	Server	17	17
201	TsaiLab	Human	10	10
222	MULTICOM-CONSTRUCT	Server	14	14
238	chuo-repack-server	Server	17	17
292	Pcons-net	Server	2	2
294	chuo-repack	Human	17	17
301	LEE	Human	17	17
311	Laufer	Human	11	1
365	chuo-fams	Human	17	17
373	Kim_Kihara	Human	17	17
434	chuo-fams-consensus	Human	16	16
471	chuo-binding-sites	Human	17	16
473	Seok	Human	16	16
477	BAKER	Human	17	17
482	biouv	Human	1	1
489	MULTICOM	Human	10	10
490	Zhang_Refinement	Human	17	17
493	LEEMO	Human	15	15
B
045	Zhang_Ab_Initio	Human	10	10
087	Distill_roll	Server	2	2
110	DAVIS-7	Human	1	1
113	SAM-T08-server	Server	2	2
124	PconsD	Server	10	10
179	Lenserver	Server	3	0
249	Wsb	Human	1	0
261	Seok-server	Server	1	1
267	Pcons	Human	4	4
292	Pcons-net	Server	10	10
298	MidwayFolding	Human	1	1
308	nns	Server	4	4
311	Laufer	Human	1	0
330	BAKER-ROSETTASERVER	Server	2	2
344	Jones-UCL	Human	1	1
381	SAM-T06-server	Server	2	2
413	ZHOU-SPARKS-X	Server	1	0
428	PconsQ	Human	4	4
435	ossia	Human	9	8
438	FALCON-server	Server	1	1
444	Lenregular	Human	7	4
477	BAKER	Human	10	9

Open in a new tab

MATERIALS AND METHODS

Both in the interest of time and to detect small differences in the improvements between CA predictions that the human eye might miss, we resorted to purely score-based schemes to rank predictions and measure the overall increase in model quality for the contact assisted predictions. No clustering was done before ranking, as in our assessment of FM and ROLL (Tai et al., this issue) predictions, because ranking was purely score based and it was not necessary to minimize the number of models visually assessed. However, we did cluster the models after ranking to see if any server models had been submitted by multiple groups.

Except when we measured the community-wide improvement (see below), we used only one model from each group for a particular target, which was the best model as measured by the score function in use and not necessarily model 1. We chose two scores, GDT-TS,¹⁰ often abbreviated here as GDT, and QCS,^11,12 created for CASP9 FM assessment, to rank the predictions. GDT-TS, an alignment based score, has been a staple in CASP evaluation for years.^12–18 But constructing an alignment for marginal or poor predictions can be problematic, and GDT can be less reliable for such predictions,^{12–14,19–21} which is why we also used QCS, a score that is not alignment dependent.

Structure comparison functions not reliant on alignment are not new (e.g., Refs. 22–25) and have been used by previous FM assessors to supplement GDT-TS (e.g., QCS,^11,12 lDDT,¹⁸ and Q¹⁶). We chose QCS to supplement GDT in evaluating the CA experiment because it reproduced our own visual assessments on test sets better than other previously published nonalignment-based scores we evaluated (data not shown). QCS gave results similar to GDT, so we have put much of the QCS data in the Supporting Information in the interest of space. Most, though not all, of the results presented will be those obtained using the GDT score.

To measure community-wide improvement, we performed t-tests for each target on all, not just the best, aided predictions versus all unaided predictions from those groups who submitted both predictions.

We measured group performance several ways. First, we defined two simple quantities, the absolute improvement and individual improvement.

Define A(g, T) as the CA best prediction by group g for target T, and U(h, T) as the unaided best prediction by group h on target T. Let S be a target-prediction scoring function. Then, the absolute improvement under S of group g with respect to target T is defined Ia(g,T) = S(A(g, T)) − max over h S(U(h, T)), in other words, the score of the best assisted model submitted by a group g for a particular target T minus the score of the best overall unassisted model from among all groups for T. Similarly, the individual improvement of group g with respect to target T is defined Ii(g,T) = S(A(g, T)) − S(U(g, T)), in other words, the difference between the scores of the best assisted and unassisted models submitted by a group G for a particular target T. We computed the Z-scores of these two quantities and then ranked a group both by the sum of their Z-scores over all predictions and by the average of their Z-scores. We set negative Z-scores to 0 before summing or averaging, as most recent CASP assessors have done, not to penalize bad predictions more severely than nonsubmissions.

We also measured group performance by running head-to-head paired t-tests on the best assisted predictions for all common targets between each pair of predictors, as previous CASP assessors have done.^12,26,27 Predictors were ranked by the number of statistically significant (at a 5% level) wins.

As a final measure of the performance of group G, we counted the total number of models submitted by all other groups, counting only the best models from each group, that were inferior (wins) or superior (losses) to the best models from G. This sum was taken over all targets. For example, suppose group G beat groups H and I on target T1, and that G, H, and I were the only groups who predicted T1. Further suppose that G beat groups H and J on target T2, and G, H, and J were the only groups who predicted T2. Finally, suppose that G lost to H and J on target T3 where again G, H, and J were the only groups who predicted T3. We tally four wins and two losses for group G among the total of all six possible head-to-head trials. The advantage of this score is that it can seamlessly handle uneven number of predicting groups for different targets and that it can be compared with the outcome from a null hypothesis of random wins. Assuming that the probability of win per trial for group G is p₀ and uniform over all trials and assuming that ties happen with negligible frequency, the probability that G will have exactly k wins is:²⁸

p (k, n) = \frac{n!}{k! (n - k)!} p_{0}^{k} (1 - p_{0}) n - k

where n is the total number of wins and losses (e.g., six in the above example). Under the null hypothesis that G performs no better or worse on average than any other group, p₀ = ½. The probability P that G will have at least k wins is given by:

P = \sum_{i = k}^{n} p (i, n)

RESULTS

Community-wide improvement and the number of contacts required to improve predictions

The best GDT-TS, abbreviated as GDT, and the best QCS scores by each prediction group with and without contact information for all the targets are given in the Supporting Information ST1–ST8.

The community-wide t-tests (Table IIIA) show significant improvement in the mean scores resulting from contact information for all 17 CASP10-CA (Tc) targets. The box plot of GDT scores for Tc targets [Fig. 2(a)] shows that the median score increased in all 17 cases and the best score improved by at least ~ 8 GDT points in all but two cases after including the contact information. The bar plot of the absolute improvements (see Materials and Methods) of the five best predictions for each Tc target [Fig. 3(a)] also shows improvements in most cases. Many top predictions improved by 20 GDT points or more. The results from using the QCS score are similar (see Supporting Information ST9–ST12, SF1–SF6).

Table III.

One-Tailed t-tests of GDT Scores of all (not just the Best) Assisted Versus All Unassisted Models for CASP10 (A) and CASP ROLL (B) Targets from the Groups Submitting both Types of Models

ID	Num res	nc	Num_T0	Num_Tc	Mean_GDT_T0	Mean_GDT_Tc	P-value
A
Tc649	184	16	87	76	22.31	24.51	0.0394
Tc653	383	12	75	63	22.35	25.59	2.60 E –03
Tc658-D1	166	16	71	55	13.26	20.51	3.30 E –08
Tc666	195	14	80	71	22.20	31.34	8.45 E –11
Tc673	62	5	78	59	39.60	47.10	6.10 E –06
Tc676	173	17	95	76	23.83	31.99	2.90 E –10
Tc678	160	12	84	71	26.82	37.41	1.30 E –13
Tc680	96	3	82	64	48.39	61.69	1.17 E –08
Tc684-D1	73	8	85	72	26.32	37.77	1.96 E –10
Tc684-D2	168	18	85	72	16.81	24.35	6.16 E –12
Tc691	130	15	84	69	30.24	34.56	4.70 E –04
Tc705-D2	344	34	70	56	23.34	28.69	3.64 E –04
Tc717-D2	166	15	71	56	24.74	31.53	8.62 E –04
Tc719-D6	163	13	74	66	13.57	16.53	9.33 E –04
Tc734	213	20	72	54	14.13	21.83	8.30 E –07
Tc735-D1	233	28	81	61	14.43	27.46	9.41 E –14
Tc735-D2	88	7	81	61	28.94	35.23	2.03 E –06
B
Rc001-D1	93	0	42	40	23.20	22.55	0.72
Rc001-D2	90	4	37	42	25.28	25.25	0.51
Rc006-D9	169	4	61	72	18.47	17.11	0.82
Rc007	161	4	74	74	23.09	24.00	0.24
Rc012-D1	308	6	27	26	14.27	13.77	0.59
Rc012-D2	104	4	25	25	18.80	18.98	0.45
Rc012-D3	48	0	26	26	43.81	49.53	0.09
Rc013	121	6	22	26	24.45	25.02	0.36
Rc014	136	6	15	33	20.54	24.36	0.030
Rc015	240	10	24	26	10.08	11.81	0.020

Open in a new tab

Cases where the assisted means are significantly greater (with P-value < 0.05) are shaded. Abbreviations are nc: number of contacts; num_T0 and num_Tc: number of unassisted and assisted models, respectively, in the t-test; mean_GDT_T0 and mean_GDT_Tc: mean GDT scores of unassisted and assisted models, respectively.

Box plots of GDT-TS scores of assisted (red) and unassisted (green) predictions for CASP10 (top, a) and CASP-ROLL (bottom, b) targets. The boxes show the 25th to 75th percentiles and the whiskers extend to the 0th and 100th percentiles. The medians are indicated by the short horizontal bars. These plots include all predictions, not just the best.

The five best absolute (upper left and lower left) and individual (upper right and lower right) improvements for Tc (top) and Rc (bottom) targets under GDT-TS. The predictor ID numbers are shown below the bars.

On the other hand, Table IIIB for CASP ROLL (Rc) targets shows that there was no significant community-wide gain in ROLL target prediction scores from including contact information, except for Rc014 and Rc015. The box plot [Fig. 2(b)] and the bar plot of absolute improvements [Fig. 3(c)] show that improvements were made for Rc014 and Rc015 by a few at the very top, but that the median scores were essentially unchanged for these targets as well. A larger number of contacts were provided for Rc014 and Rc015 than for most other Rc targets, and for these two there was improvement at least for the very best predictions. Clearly, the relatively few contacts provided for the Rc targets were not enough to broadly improve predictions, while the larger number provided for the Tc targets usually did improve predictions.

Figure 4 shows the scatter plot of the best absolute improvement in GDT achieved for all Tc and Rc targets against the number of provided contacts per residue. The overall trend is increasing improvement with increasing number of contacts per residue but with much variation. Despite a relatively large number of provided contacts, predictors as a group did not improve quality of the best models for Tc649 and Tc691. Conversely, they achieved considerable improvement from relatively few contacts for Tc653, Tc680, and Rc014. The case of Tc680 is understandable because of its simple 3-helix bundle architecture—if the helices are predicted correctly, only a very few contacts are needed to restrict a prediction to the correct structure. However, predictors were told that this target is a tetramer and given additional interchain contacts, not counted in Figure 4, which may have contributed to the improvement of the predicted monomer structures. Tc653 is a unique case, for which were provided “negative” contacts, that is pairs of residues that were not in contact. The better models for this curved LRR, both assisted and unassisted, are clearly leucine rich repeats. However, the GDT scores of the unassisted models are low because they, like all searchable LRR template structures, are bent in the opposite direction to that in the target. The twelve negative contacts given, relatively few for such a large target, are apparently enough to constrain the direction and degree of curvature to significantly improve the best prediction, although not so more broadly [see Fig. 3(a)]. Figure 4 together with Tables IIIA and IIIB indicate that the number of contacts per residue where we begin to see significant improvement is in the range ~ 0.04–0.06.

The best absolute improvement in GDT scores achieved for all Tc (circles) and Rc (triangles) targets plotted against the number of contacts provided per residue.

Per group improvement

The performances of individual groups relative to other predictors are shown in Table IV, which gives the sums and averages of the GDT and QCS Z-scores, in Table V, which gives win counts over all predictors and all targets in terms of the GDT scores, in Supporting Information ST13–ST16, which gives the similar win counts but based on the QCS scores, and in Table VI, which gives the win count on head-to-head comparisons with other predictors. The actual amounts of improvements in terms of the GDT scores are given in Figure 3 for the top five predictions for each target and in the Supporting Information SF3–SF6 in terms of the QCS scores.

Table IV.

Per Group Sums and Means of Z-Scores over both GDT and QCS Scores (e.g., if a Group Submitted 17 predictions, there are 34 Terms in the Sum, 17 GDT Z-scores, and 17 QCS Z-scores)

Gr#	Sum Z	Mean Z	#Target submit
A
477	67.14	1.97	17
045	25.66	0.75	17
301	19.11	0.56	17
108	19.11	0.56	17
490	19.06	0.56	17
311	9.49	0.43	11
077	9.37	0.31	15
222	9.31	0.33	14
493	8.48	0.28	15
238	5.85	0.17	17
294	5.70	0.17	17
124	4.67	1.17	2
198	4.35	0.13	17
103	4.18	1.04	2
292	3.59	0.90	2
365	3.55	0.10	17
471	3.54	0.10	17
434	3.21	0.10	16
373	3.00	0.09	17
489	2.74	0.14	10
473	1.24	0.04	16
201	1.08	0.05	10
482	0.00	0.00	1
179	0.00	0.00	4
B
477	47.54	1.40	17
471	28.39	0.89	16
493	19.79	0.66	15
108	18.91	0.56	17
045	17.30	0.51	17
301	15.74	0.46	17
238	15.05	0.44	17
198	14.81	0.44	17
222	11.91	0.43	14
077	10.14	0.36	14
124	6.31	1.58	2
490	5.95	0.18	17
201	4.44	0.22	10
473	4.12	0.13	16
103	2.85	0.71	2
292	2.08	0.52	2
434	0.14	0.00	16
294	0.14	0.00	17
489	0.11	0.01	10
373	0.04	0.00	17
482	0.00	0.00	1
365	0.00	0.00	17
311	0.00	0.00	1
179	0.00	0.00	1
C
477	20.52	1.03	10
045	16.20	0.81	10
435	10.53	0.59	9
413	4.16	2.08	1
330	3.83	0.96	2
124	3.13	0.16	10
428	2.52	0.31	4
381	2.07	0.52	2
444	0.88	0.06	7
087	0.57	0.14	2
292	0.52	0.03	10
267	0.52	0.07	4
110	0.21	0.10	1
308	0.06	0.01
249	0.06	0.03	1
438	0.00	0.00	1
344	0.00	0.00	1
311	0.00	0.00	1
298	0.00	0.00	1
261	0.00	0.00	1
179	0.00	0.00	3
113	0.00	0.00	2
D
435	12.79	0.80	8
045	9.69	0.48	10
124	7.03	0.35	10
477	6.34	0.35	9
330	4.94	1.24	2
292	4.41	0.22	10
308	2.95	0.42	4
444	1.78	0.22	4
428	1.74	0.22	4
267	1.01	0.13	4
113	0.81	0.20	2
087	0.62	0.16	2
261	0.58	0.29	1
438	0.51	0.26	1
381	0.38	0.10	2
344	0.15	0.07	1
110	0.03	0.01	1
298	0.00	0.00	1

Open in a new tab

The tables show absolute (A,C) and individual (B,D) improvements on Tc (A,B) and Rc (C,D) targets. Absolute improvement is a group’s best assisted score minus the community-wide best unassisted score. Individual improvement is a group’s best assisted score minus the same group’s best unassisted score. Any Z-score less than 0 was set to 0. Note that the number of predictions submitted by a group can differ between these two tables as some groups submitted an assisted prediction but no corresponding unassisted predictions.

Table V.

Overall Win/Loss Counts for Each Group in All-Against-All Pair-wise Comparisons of Assisted Predictions for the Tc (VA) and Rc (VB) Targets using GDT Score

Group	Num pred	Wins	Losses	Fraction wins	P
A
477	17	272	14	0.951	<2.2 E –16
045	17	211	76	0.735	3.7 E –16
301	17	197	72	0.732	6.6 E –15
108	17	197	72	0.732	6.6 E –15
124	2	36	2	0.947	2.7 E –09
103	2	35	2	0.946	5.1 E –09
490	17	192	96	0.667	8.1 E –09
292	2	34	3	0.919	6.2 E –08
222	14	124	116	0.517	0.33
311	11	95	90	0.514	0.38
077	15	123	132	0.482	0.73
493	15	113	143	0.441	0.97
238	17	110	140	0.440	0.98
294	17	103	154	0.401	0.9994
198	17	94	148	0.388	0.9998
434	16	87	138	0.387	0.9998
489	10	60	104	0.366	0.9998
471	17	86	158	0.352	1
365	17	78	162	0.325	1
373	17	81	206	0.282	1
201	10	49	126	0.280	1
473	16	64	208	0.235	1
179	4	7	67	0.095	1
482	1	0	19	0.000	1
B
045	10	59	18	0.766	1.5 E –06
413	1	14	0	1.000	6.1 E –05
330	2	24	5	0.828	2.7 E –04
477	10	52	25	0.675	1.4 E –03
428	4	28	16	0.636	4.8 E –01
435	9	41	32	0.562	0.17
249	1	6	3	0.667	0.25
110	1	9	6	0.600	0.30
124	10	38	34	0.528	0.36
087	2	15	14	0.517	0.50
311	1	3	3	0.500	0.66
308	4	22	25	0.468	0.72
344	1	5	8	0.385	0.87
381	2	12	17	0.414	0.87
261	1	4	11	0.267	0.98
267	4	14	29	0.326	0.9931
444	7	20	41	0.328	0.9978
113	2	7	22	0.241	0.9988
292	10	23	48	0.324	0.9991
179	3	0	16	0.000	1
298	1	0	9	0.000	1
438	1	0	14	0.000	1

Open in a new tab

P is the probability that a win/loss record equal to or better than the observed record could have been obtained by chance.

Table VI.

Number of Wins in Head-to-Head Pairwise Comparisons of Predicting groups.

Group	Num pred	Num wins
A
477	17	38
045	17	32
490	17	28
108	17	26
301	17	25
222	14	12
124	2	8
238	17	7
294	17	7
493	15	6
103	2	6
292	2	6
077	15	5
198	17	5
311	11	4
434	16	3
471	17	3
201	10	2
365	17	2
373	17	1
473	16	1
489	10	1
179	4	0
482	1	0
B
045	10	11
124	10	8
428	4	6
435	9	4
477	10	4
292	10	1
330	2	1
087	2	0
110	1	0
113	2	0
179	3	0
249	1	0
261	1	0
267	4	0
298	1	0
308	4	0
311	1	0
344	1	0
381	2	0
413	1	0
438	1	0
444	7	0

Open in a new tab

The pairwise comparison is made by a t-test of contact-assisted GDT or QCS scores for all common Tc (VIA) or Rc (right VIB) targets. A group wins the pairwise comparison if its set of scores is significantly (at 5% significance level) better than its competitor’s by the T-test. The number of wins reported is the sum of the number of wins under GDT and that under QCS.

For the Tc targets, Tables (IV–VI) show that group 477 (BAKER) performed better than all others by wide margins. Figure 3(a,b) also shows that this group made larger improvements than others for the most number of targets. For example, for Tc734, this group used the contact information to improve their GDT score by more than 40 [Fig. 3(b)] to produce a high quality model (Fig. 5), which they could not before the contact information.

Top row: the target T0734 (left), the best assisted model Tc734TS477_4 (middle, GDT 5 56.34), and the best unassisted model T0734TS108_4 (right, GDT 5 18.31). Contact information resulted in much improvement in model quality here. Middle row: the target T0691 (left), the best assisted model Tc691TS222_5 (middle, GDT 5 42.91), and the best unassisted model T0691TS301_1 (right, GDT 5 45.03). Contact information resulted in no improvement in model quality here. Bottom row: the targetT0680 (left), the best assisted model Tc680TS477_4 (middle, GDT 5 89.6), and the best unassisted model T0680TS489_1 (right, GDT 5 75.5). All molecules are rainbow colored blue to red from the N- to C-termini. Models were optimally superimposed to the target, and then separated by translations along the horizontal direction.

The next group of predictors who used the contact information better than others are 045 (Zhang_Ab_Initio), 301 (LEE), 108 (PMS), and 490 (Zhang_Refinement) by all three measures although the rank order varied depending on the particular measure used. Groups 471 (chuo-binding-sites) and 493 (LEEMO) made more improvement over their own unassisted predictions than most others (Table IVB) but were not among the top performers when compared with the best unassisted predictions (Table IVA). Groups 124 (PconsD), 103 (PconsM), and 292 (Pcons-net) produced highly competitive models (high mean Z-scores in Tables IVA and IVB and high win fractions in Table V) but submitted for only two targets.

For the Rc targets, group 477 (BAKER) again made the most absolute improvement, followed by 045 (Zhang_Ab_Initio) and 435 (ossia). Figure 3(c,d) show that, while a number of predictors made substantial improvements over their own predictions by using the contact information, the improved models were not any better than the best unassisted models for most targets. Figure 3(c) shows that improvements over the best unassisted models (positive absolute improvements) were made only by the above three groups. Tables VB and VIB show that there are other groups who did well in terms of win counts. However, their winning models are not better than the best unassisted models.

Selected examples

Figure 3(a) shows that most improvement was made for T0/Tc734 and that no absolute improvement was made for T0/Tc691 by any predictor. As pointed out above already, the best contact-assisted model for the primarily helical T0/Tc734 structure (Fig. 5) is impressively better than the best unassisted model.

Figure 5 shows, for T0/Tc691, the target structure, the best assisted, and best unassisted predictions. The GDT scores of the two predictions are 42.9 and 45.0, respectively. In contrast to the large improvement seen for T0/Tc734 (Fig. 5), the best assisted model in this case is slightly better than the unassisted model from the same group but poorer than the best unassisted model. It is interesting to speculate if there is some correlation of degree of improvement with protein architecture, but there are too few targets to make any strong claims.

Figure 2(a) shows that the contact-assisted model that has the highest GDT score (89.6) is for T0/Tc680. The median score (72) is also the highest for this target. The best assisted and unassisted models for this target are shown in Figure 5. Many models for this target are so close to the target structure that they begin to be similar among themselves. For example, taking only the best assisted models from each group, there were 51 pairs of models (out of total 153 possible pairs) that were highly similar (RMSD < 3Å) but non-identical (RMSD > 0.1Å)_. Each member of these pairs had an RMSD with respect to the target no greater than 4.52A with the mean being 3.5Å. This number of 51 highly similar model pairs for Tc680 is more than three times as many as for any other CA target and also more than three times as many as for T0680.

Correlation of scores with the fraction of satisfied contacts

Table VII shows the fraction of provided contacts satisfied (averaged over all contacts and predictions) for each Tc target (frac_Tc) and the T0 predictions (frac_T0) from the groups who also predicted Tc. The fraction satisfied for T0 is quite low, at or below 10%. This is because the Prediction Center deliberately chose contacts for the CA experiment that were mostly missed in the unassisted predictions. More interestingly, the frac_Tc numbers show that, even knowing a list of contacts from the target structure, predictors were able to satisfy only about half of these contacts in their models, on average. It should be noted that there are many predictions that satisfy much more than half, and indeed some that satisfy all of them.

Table VII.

Correlation Coefficients of the Fraction of Contacts Satisfied (Defined as CB-CB < 6.5Å or 8.5Å₎ and GDT-TS or QCS Scores for Assisted Predictions for each CASP10 Target

Target	nc	Median GDT	Corr/GDT 6.5 Å	Corr/GDT 8.5 Å	Median QCS	Corr/QCS 6.5 Å	Corr/QCS 8.5 Å	Frac T0	Frac Tc
Tc649	16	27.65	0.015	0.010	60.09	0.041	0.11	0.027	0.42
Tc653	12	22.98	−0.30	−0.31	38.85	−0.15	−0.16	0.16	0.54
Tc658-D1	16	19.43	0.78	0.65	42.25	0.74	0.65	0.034	0.38
Tc666	14	30.00	0.76	0.62	47.45	0.69	0.57	0.063	0.44
Tc673	5	45.56	0.21	−0.13	60.60	0.15	−0.20	0.086	0.68
Tc676	17	32.80	0.39	0.38	58.32	0.45	0.30	0.048	0.49
Tc678	12	38.47	0.71	0.83	61.97	0.40	0.72	0.087	0.59
Tc680	3	59.12	0.15	0.082	67.98	0.13	0.11	0.012	0.59
Tc684-D1	8	36.82	0.52	0.70	45.99	0.46	0.68	0.082	0.54
Tc684-D2	18	22.77	0.79	0.81	46.59	0.78	0.83	0.033	0.49
Tc691	15	36.88	−0.11	0.11	67.36	−0.16	0.084	0.10	0.61
Tc705-D2	34	32.31	−0.063	0.42	61.05	−0.018	0.48	0.067	0.49
Tc717-D2	15	30.57	0.70	0.39	54.14	0.70	0.22	0.045	0.57
Tc719-D6	13	14.11	0.75	0.40	35.55	0.75	0.44	0.050	0.55
Tc734	20	18.02	0.87	0.48	40.83	0.86	0.52	0.024	0.51
Tc735-D1	26	31.65	0.47	0.42	58.49	0.54	0.48	0.049	0.45
Tc735-D2	7	31.82	0.67	0.64	37.60	0.73	0.69	0.062	0.53

Open in a new tab

The final two columns: frac_T0, the fraction of contacts satisfied in unassisted predictions (averaged over all contacts and all models) and frac_Tc, the fraction of contacts satisfied in assisted predictions. Note that column nc shows number of contacts provided and that, for Tc653, the nc column gives the number of non-contacts and the last two columns indicate the absence of contacts in the models. These statistics are for all predictions, not just the best from each group.

In good models, many provided contacting residue pairs are indeed in contact according to the criterion that their Cβ-Cβ distance be less than 6.5–8.5Å. But the converse is often not true, that is a model that has all the given set of contacts is not necessarily good overall. We counted the fraction of provided contacts satisfied in each model and calculated the correlation coefficient between these numbers and the GDT or QCS scores of all assisted models for each target. Table VII shows the correlations for all Tc targets. For at least five of the 17 targets, the correlation coefficient is less than 0.2, including the bottom 3 Tc targets in Figure 4. Even when the correlation coefficient is relatively high, poor models exist that have many contacts satisfied. As an example, Figure 6 shows two models for Tc719-D6, both of which satisfy all the provided contacts. But one (477_2, GDT 5 46.93) is clearly superior to the second (301_5, GDT 5 18.86). To build a good model, it is not sufficient to simply satisfy the relatively few contact constraints given in this experiment.

The target T0719-D6 (left), the best assisted model Tc719TS477_2-D6 (middle, GDT 5 46.93), and another assisted model Tc719TS301_5-D6 (right, GDT 5 18.86). Both models satisfy all contacts (CB-CB < 8.5Å), although Tc719TS477_2-D6 is clearly better than Tc719TS301_5-D6.

Redundancy among predictions

As mentioned before, we clustered the models after ranking to see if any server models had been submitted by multiple groups. The RMSD distance matrices used to do the clustering may be more instructive than the clustering itself. There were between 16 and 20 groups who submitted predictions for each Tc target. Considering only the best prediction from each group for a particular target, there are therefore between 120 and 190 pairs of best predictions for each target (number of pairs = N_i = n_i(n_i − 1)/2 where n_i is the number of groups predicting target i), and we computed the RMSD for each such pair. There were 2603 such pairs for all 17 CASP10 CA targets combined. Removing those from Tc680, which was a special case as detailed in a previous section, and eliminating pairs with RMSD > 3Å leaves 208. For 186 of these 208 pairs, both models have distinct group id’s, but both group id’s are associated with the same research group (e.g., LEE group 301 and PMS group 108). The remaining 22 pairs consist of a model from one research group paired with several identical models all from one other research group (with different group IDs, at least one of which is a server) for each of the targets Tc666, Tc673, Tc676, Tc678, and Tc684-D1. It appears, therefore, that five server models were copied, each by only one other research group. This is an amount of presumed copying less than half that detectable in the T0 submissions for the Tc targets (data not shown), and an amount small compared with those among FM models (See Tai et al., Assessment of Template-free Modeling in CASP10 and ROLL in this issue).

Contact-assisted predictions for the T0680 tetramer

In addition to the three intrachain contacts provided for the contact assisted monomeric target Tc680, the Prediction Center also told predictors that this target was a tetramer and provided six interchain contacts. Predictors were asked to submit contact-assisted models for the tetramer. This was the only target in the CA experiment for which oligomer predictions were solicited.

Considering the great abundance of programs to compare protein monomers, there are surprisingly few programs available designed to compare multichain complexes, and hence deal with the added complication of chain id ambiguity and cross-chain alignments,²⁹ where a whole chain can be aligned to parts of two other chains. We chose two such programs, MM-align²⁹ and SCPC,³⁰ to score Tc680o predictions (Table VIII). MM-align is not ideal for this task, however, as it is designed to compare complexes with non-identical sequences and therefore inserts gaps. Neither is SCPC ideal, since it compares constituent subunits of complexes by matching secondary structure elements, but the matches need not respect topology.³¹ We want a sequence dependent alignment for complexes, as we are dealing with structure predictions.

Table VIII.

Scoring Tc680o Predictions using GDT and QCS on Tetramers Converted to Pseudo-Monomers with the Predictions Sorted in Decreasing Order of GDT Score

Model	GDT	Perm	QCS	Perm	MM	SCPC	RMSD (TM)	Perm	RMSD (MM)	SD top 4 perm GDT
477_5o	75.59	DCBA	96.82	BADC	0.89	86.24	2.32	ABCD	2.20	0.00
477_2o	69.01	DCBA	95.38	DCBA	0.87	83.07	2.68	ABCD	2.74	0.00
477_4o	66.97	DCBA	96.29	DCBA	0.86	64.02	2.81	ABCD	2.77	0.00
477_3o	54.21	DCBA	95.81	CDAB	0.74	74.60	4.47	ABCD	4.50	0.00
477_1o	47.04	DCBA	88.77	CDAB	0.71	48.15	6.64	ABCD	4.12	0.00
045_2o	43.82	ABCD	82.92	CDAB	0.65	10.85	6.47	BADC	4.67	0.38
490_5o	42.11	BADC	83.24	ABCD	0.66	29.63	6.25	ABCD	5.32	0.42
490_1o	38.75	DCBA	81.19	ABCD	0.66	29.10	6.53	DCBA	4.73	0.18
490_3o	37.63	BCDA	83.79	CBAD	0.38	13.23	7.26	CBAD	5.40	0.48
490_2o	37.30	ABCD	83.48	DCBA	0.62	34.92	6.83	BADC	5.31	0.27
490_4o	33.55	BADC	80.72	DCBA	0.62	11.38	8.39	DCBA	5.01	0.33
045_3o	33.09	DCBA	81.80	DCBA	0.57	17.99	7.65	DCBA	5.96	0.33
045_5o	32.83	DCBA	77.94	ABCD	0.61	20.11	7.89	DCBA	5.46	0.22
045_4o	32.50	CDAB	81.96	DCBA	0.61	27.51	8.58	DCBA	5.07	0.05
045_1o	31.51	DCBA	77.11	ABCD	0.61	18.52	8.10	DCBA	5.23	0.14
01_1013	28.68	BADC	59.88	BADC	0.51	5.03	22.30	CADB	5.00	0.22
108_1o	28.68	BADC	59.88	BADC	0.51	5.03	22.30	CADB	5.00	0.22
493_1o	28.68	BADC	59.88	BADC	0.51	5.03	22.30	CADB	5.00	0.22
077_1o	25.86	DABC	61.90	CBAD	0.35	13.49	12.99	DABC	5.93	0.20
077_4o	23.42	ABCD	60.99	CDAB	0.44	5.82	13.95	BADC	6.07	0.11
077_2o	22.96	BADC	60.19	DCBA	0.44	8.99	14.00	BCDA	6.02	0.30
01_1012	22.30	DCBA	51.32	CDAB	0.37	1.32	14.18	DABC	5.33	0.18
077_5o	21.65	DABC	59.53	CBAD	0.34	7.41	13.55	DCBA	5.92	0.07
077_3o	21.58	CDAB	59.86	BADC	0.42	4.23	13.38	BCDA	5.61	0.06
201_3o	19.74	BADC	50.80	DCBA	0.33	6.61	13.83	DABC	6.65	0.13
201_2o	19.28	DCAB	51.23	DBAC	0.29	NA	15.07	DBAC	6.74	0.16

Open in a new tab

Results from MM-align and SCPC, designed specifically to compare multichain complexes, are also shown. The RMSD, calculated by TM-score, on the sequence dependent, ungapped alignment and RMSD from the gapped MM-align alignment are also shown. GDT, QCS and RMSD were obtained by considering the T0680 tetramer target and the oligomer predictions as single chains by concatenating the chains and scoring them as pseudo-monomers. All 24 possible orderings of the prediction’s chain id’s in the pseudo-monomer were scored, and the highest scoring one is shown along with that permutation (in the perm column). The SD top 4 perm GDT column is explained in the text. The target pseudo-monomer was simply constructed by concatenating chains A, B, C, and D in that order. MM-align and SCPC were run on the unaltered tetrameric target/predictions. NA indicates that the scoring program failed to give a score.

We chose another quite straightforward way to also score such oligomeric predictions considering them as pseudo-monomers by concatenating all monomers into one chain and using conventional structure comparison functions after renumbering residues and merging chain id’s. As the chain ids of prediction and target will not be the same in general, one must score all permutations of chain id’s in the prediction against the target.²⁹ Cross-chain alignments²⁹ can be avoided by performing sequence-dependent alignments.

We used the score functions GDT-TS (sequence dependent mode), QCS,¹¹and TM-score³² (the sequence dependent version of TM-align, used here only to calculate RMSD) to score Tc680o predictions with chains concatenated to form pseudo-monomers (Table VIII). Nonalignment-based comparison methods, like QCS, obviously cannot give cross-chain alignments, and using the sequence dependent mode of the other two scores again prohibits cross-chain alignment. For each prediction, all 24 permutations of chain order were converted to pseudo-monomers and were scored against the target, similarly converted to a pseudo-monomer with chain order ABCD. The top scoring such permutation gave the score for that tetramer prediction. As there are only 26 predictions, we also looked at the superpositions of each to the target manually.

The target homotetramer consists of four helix bundle monomers arranged in a rectangular X shaped configuration (Fig. 7), with each monomer forming one arm of the X. With the A chain on the upper left and proceeding clockwise around the X, the chains are A, D, B, and C. The tetramer has D₂ symmetry, with three mutually perpendicular symmetry axes. Therefore, the scores of the 24 chain id permutations of the pseudomonomers for each prediction fall into groups of four with similar or identical scores, for example ABCD, BADC, CDAB, and DCBA. The standard deviation of the scores of the group of four with the highest GDT scores is shown in Table VIII in the SD top 4 perm column, and is a measure of model symmetry, lower score indicating higher symmetry.

The tetrameric target T0680o (shown in red) and the best tetramer model Tc680oTS477_5o (shown in blue). The RMSD of the tetramer model with respect to the tetramer target is 2.32Å. The monomer is shown in Fig.5 and the chains are, clockwise starting on the upper left, chain B, chain C, chain A, and chain D.

All the score functions put predictions Tc680TS477_5o, Tc680TS477_2o, and Tc680TS477_4o in the top five predictions, and visual assessment confirms that all three are excellent. Although models like Tc680TS490_2o are still very good, there is a clear drop off in model quality below these three predictions from the BAKER group. The superposition of Tc680TS477_5o and the target, shown in Figure 7, is very impressive.

Note that 24 oligomeric structure predictions for T0680 (before the contact information was given) were also submitted. However, all of them predicted the structure as a dimer, and only one group submitted oligomer predictions in both the assisted and unassisted cases. It makes little sense, therefore, to try to compute absolute or individual improvements for Tc680o, as we did in the monomeric target cases.

CONCLUSIONS AND SUGGESTIONS FOR THE FUTURE

The CASP10-CA experiment has shown that added contact information can substantially improve protein structure predictions. The experiment was one of the best unassisted, and in all cases except one for Tc targets, most hopeful parts of CASP10. In the best cases, assisted the best assisted models scored higher than the best predictions had GDT scores 40 points higher than the unassisted. This experiment has also provided some information on the number of contacts required to produce improved predictions. The number of contacts per residue where we begin to see significant improvement is in the range 0.04–0.06.

We note that the contact information provided to the CA predictors indirectly carried information on domain boundaries because no contacts were provided that crossed a domain boundary. The domain boundary information implicit in such sets of contacts might provide an advantage for contact-assisted modeling over and above the direct benefit from the contact information. In Figure 4, there does not appear to be any systematic tendency for larger improvement in domains from multidomain targets than single domain targets, but there is such a large per target variation that it is difficult to rule out this effect. If no inter-domain contacts are given, one way of more purely assessing the benefits of the contact information would be to provide the domain boundary information both for the assisted and unassisted predictions.

Early in our analysis of CA predictions, we attempted to measure the relative importance of the provided contacts. One of the ways this was done was to split the predictions into two classes, good and bad. As the distribution of prediction scores is most often bimodal,¹⁷ it was straightforward to find a natural good/bad score cutoff for each target. We then built decision tree classifiers to map a vector representing if each contact was satisfied in a model (0 = unsatisfied, 1 = satisfied) to the binary classes good/bad. We used classifiers rather than simply measuring the degree to which models that satisfied a single particular contact were over-represented in the good predictions, because it is possible that combinations of contacts may be better indicators of model goodness (e.g., contacts A and B may be satisfied individually in many bad models, but both satisfied only in good models). Although we could almost always build classifiers that correctly labeled the great majority of predictions good/bad, the results of such experiments were ultimately not convincing because the number of predictions was not very much greater than the number of provided contacts. Hopefully, given the promising results of this experiment, there will be very many more contact-assisted predictions in the next round of CASP, and it may then be possible to say something about the relative importance of the provided contacts. When this becomes possible, it may also be interesting to include disulfides, salt bridges, high network centrality contacts, and so forth to see if contacts more important for folding or function are also more important for structure prediction.

Although the results of this initial CASP10 experiment are highly encouraging for the future of contact-assisted modeling, we must caution that we may not have put hybrid prediction to the true test here because the contacts provided in this first experiment were highly artificial and may not resemble those that are likely to be obtained from a real hybrid modeling experiment. The results of contact-assisted CASP experiments will be most convincing if the contacts used come from experimental data or are at least chosen to closely mimic experimental data.

Supplementary Material

NIHMS1066295-supplement-1.pdf^{(1.7MB, pdf)}

NIHMS1066295-supplement-18.pdf^{(6.8KB, pdf)}

NIHMS1066295-supplement-19.pdf^{(5.9KB, pdf)}

NIHMS1066295-supplement-2.pdf^{(2.1MB, pdf)}

NIHMS1066295-supplement-20.pdf^{(5.9KB, pdf)}

NIHMS1066295-supplement-21.pdf^{(6KB, pdf)}

NIHMS1066295-supplement-22.pdf^{(6.1KB, pdf)}

NIHMS1066295-supplement-3.pdf^{(2.3MB, pdf)}

NIHMS1066295-supplement-4.pdf^{(2.7MB, pdf)}

NIHMS1066295-supplement-5.pdf^{(2.2MB, pdf)}

NIHMS1066295-supplement-6.pdf^{(3.3MB, pdf)}

NIHMS1066295-supplement-10.pdf^{(15.6KB, pdf)}

NIHMS1066295-supplement-7.pdf^{(13.8KB, pdf)}

NIHMS1066295-supplement-8.pdf^{(32.5KB, pdf)}

NIHMS1066295-supplement-9.pdf^{(7.5KB, pdf)}

NIHMS1066295-supplement-11.pdf^{(13.5KB, pdf)}

NIHMS1066295-supplement-12.pdf^{(33.1KB, pdf)}

NIHMS1066295-supplement-13.pdf^{(7.6KB, pdf)}

NIHMS1066295-supplement-14.pdf^{(16.1KB, pdf)}

NIHMS1066295-supplement-15.pdf^{(6KB, pdf)}

NIHMS1066295-supplement-16.pdf^{(6KB, pdf)}

NIHMS1066295-supplement-17.pdf^{(6.7KB, pdf)}

ACKNOWLEDGMENTS

The authors thank the experimental groups who provided the target structures and the prediction groups who provided the predictions. They also thank Dr. Andriy Kryshtafovych and the staff of the Prediction Center for the extensive support provided for this work. Molecule pictures were generated by the UCSF Chimera package³³ and PyMOL (the PyMOL Molecular Graphics System, Version 1.4, Schroedinger, LLC). Chimera is developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco (supported by NIGMS P41-GM103311). This research was supported by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research.

Footnotes

Additional Supporting Information may be found in the online version of this article.

REFERENCES

1.Schmitz C, Vernon R, Otting G, Baker D, Huber T. Protein structure determination from pseudocontact shifts using ROSETTA. J Mol Biol 2012;416:668–677. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Lange OF, Baker D. Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation. Proteins 2012;80:884–895. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Lange OF, Rossi P, Sgourakis NG, Song Y, Lee HW, Aramini JM, Ertekin A, Xiao R, Acton TB, Montelione GT, Baker D. Determination of solution structures of proteins up to 40 kDa using CS-Rosetta with sparse NMR data from deuterated samples. Proc Natl Acad Sci USA 2012;109:10873–10878. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Raman S, Huang YJ, Mao B, Rossi P, Aramini JM, Liu G, Montelione GT, Baker D. Accurate automated protein NMR structure determination using unassigned NOESY data. J Am Chem Soc 2010;132:202–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Bowers PM, Strauss CE, Baker D. De novo protein structure determination using sparse NMR data. J Biomol NMR 2000;18:311–318. [DOI] [PubMed] [Google Scholar]
6.Shen Y, Vernon R, Baker D, Bax A. De novo protein structure generation from incomplete chemical shift assignments. J Biomol NMR 2009;43:63–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Meiler J, Baker D. Rapid protein fold determination using unassigned NMR data. Proc Natl Acad Sci USA 2003;100:15404–15409. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Sulkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN. Genomicsaided structure prediction. Proc Natl Acad Sci USA 2012;109: 10340–10345. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Hirst SJ, Alexander N, McHaourab HS, Meiler J. RosettaEPR: an integrated tool for protein structure determination from sparse EPR data. J Struct Biol 2011;173:506–514. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Zemla A LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res 2003;31:3370–3374. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Cong Q, Kinch LN, Pei J, Shi S, Grishin VN, Li W, Grishin NV. An automatic method for CASP9 free modeling structure prediction assessment. Bioinformatics 2011;27:3371–3378. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Kinch L, Yong Shi S, Cong Q, Cheng H, Liao Y, Grishin NV. CASP9 assessment of free modeling target predictions. Proteins 2011; 79(Suppl 10):59–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Vincent JJ, Tai CH, Sathyanarayana BK, Lee B. Assessment of CASP6 predictions for new and nearly new fold targets. Proteins 2005;61(Suppl 7):67–83. [DOI] [PubMed] [Google Scholar]
14.Jauch R, Yeo HC, Kolatkar PR, Clarke ND. Assessment of CASP7 structure predictions for template free targets. Proteins 2007; 69(Suppl 8):57–67. [DOI] [PubMed] [Google Scholar]
15.Kopp J, Bordoli L, Battey JN, Kiefer F, Schwede T. Assessment of CASP7 predictions for template-based modeling targets. Proteins 2007;69(Suppl 8):38–56. [DOI] [PubMed] [Google Scholar]
16.Ben-David M, Noivirt-Brik O, Paz A, Prilusky J, Sussman JL, Levy Y. Assessment of CASP8 structure predictions for template free targets. Proteins 2009;77(Suppl 9):50–65. [DOI] [PubMed] [Google Scholar]
17.Keedy DA, Williams CJ, Headd JJ, Arendall WB III, Chen VB, Kapral GJ, Gillespie RA, Block JN, Zemla A, Richardson DC, Richardson JS. The other 90% of the protein: assessment beyond the Calphas for CASP8 template-based and high-accuracy models. Proteins 2009;77(Suppl 9):29–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Mariani V, Kiefer F, Schmidt T, Haas J, Schwede T. Assessment of template based protein structure predictions in CASP9. Proteins 2011;79(Suppl 10):37–58. [DOI] [PubMed] [Google Scholar]
19.Kryshtafovych A, Fidelis K, Moult J. CASP9 results compared to those of previous CASP experiments. Proteins 2011;79(Suppl 10): 196–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Kinch LN, Wrabl JO, Krishna SS, Majumdar I, Sadreyev RI, Qi Y, Pei J, Cheng H, Grishin NV. CASP5 assessment of fold recognition target predictions. Proteins 2003;53(Suppl 6):395–409. [DOI] [PubMed] [Google Scholar]
21.Tress M, Ezkurdia I, Grana O, Lopez G, Valencia A. Assessment of predictions submitted for the CASP6 comparative modeling category. Proteins 2005;61(Suppl 7):27–45. [DOI] [PubMed] [Google Scholar]
22.Holm L, Park J. DaliLite workbench for protein structure comparison. Bioinformatics 2000;16:566–567. [DOI] [PubMed] [Google Scholar]
23.Bostick DL, Shen M, Vaisman II. A simple topological representation of protein structure: implications for new, fast, and robust structural classification. Proteins 2004;56:487–501. [DOI] [PubMed] [Google Scholar]
24.Zotenko E, Dogan RI, Wilbur WJ, O’Leary DP, Przytycka TM. Structural footprinting in protein structure comparison: the impact of structural fragments. BMC Struct Biol 2007;7:53. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Gibrat JF, Madej T, Bryant SH. Surprising similarities in structure comparison. Curr Opin Struct Biol 1996;6:377–385. [DOI] [PubMed] [Google Scholar]
26.Tramontano A, Morea V. Assessment of homology-based predictions in CASP5. Proteins 2003;53(Suppl 6):352–368. [DOI] [PubMed] [Google Scholar]
27.Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A. Evaluation of template-based models in CASP8 with standard measures. Proteins 2009;77(Suppl 9):18–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Hollander M, Wolfe DA. Nonparametric statistical methods. New York: Wiley; 1973. xviii, 503 p. [Google Scholar]
29.Mukherjee S, Zhang Y. MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming. Nucleic Acids Res 2009;37:e83. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Koike R, Ota M. SCPC: a method to structurally compare protein complexes. Bioinformatics 2012;28:324–330. [DOI] [PubMed] [Google Scholar]
31.Mizuguchi K, Go N. Comparison of spatial arrangements of secondary structural elements in proteins. Protein Eng 1995;8:353–362. [DOI] [PubMed] [Google Scholar]
32.Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins 2004;57:702–710. [DOI] [PubMed] [Google Scholar]
33.Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. UCSF Chimera—a visualization system for exploratory research and analysis. J Comput Chem 2004;25:1605–1612. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1066295-supplement-1.pdf^{(1.7MB, pdf)}

NIHMS1066295-supplement-18.pdf^{(6.8KB, pdf)}

NIHMS1066295-supplement-19.pdf^{(5.9KB, pdf)}

NIHMS1066295-supplement-2.pdf^{(2.1MB, pdf)}

NIHMS1066295-supplement-20.pdf^{(5.9KB, pdf)}

NIHMS1066295-supplement-21.pdf^{(6KB, pdf)}

NIHMS1066295-supplement-22.pdf^{(6.1KB, pdf)}

NIHMS1066295-supplement-3.pdf^{(2.3MB, pdf)}

NIHMS1066295-supplement-4.pdf^{(2.7MB, pdf)}

NIHMS1066295-supplement-5.pdf^{(2.2MB, pdf)}

NIHMS1066295-supplement-6.pdf^{(3.3MB, pdf)}

NIHMS1066295-supplement-10.pdf^{(15.6KB, pdf)}

NIHMS1066295-supplement-7.pdf^{(13.8KB, pdf)}

NIHMS1066295-supplement-8.pdf^{(32.5KB, pdf)}

NIHMS1066295-supplement-9.pdf^{(7.5KB, pdf)}

NIHMS1066295-supplement-11.pdf^{(13.5KB, pdf)}

NIHMS1066295-supplement-12.pdf^{(33.1KB, pdf)}

NIHMS1066295-supplement-13.pdf^{(7.6KB, pdf)}

NIHMS1066295-supplement-14.pdf^{(16.1KB, pdf)}

NIHMS1066295-supplement-15.pdf^{(6KB, pdf)}

NIHMS1066295-supplement-16.pdf^{(6KB, pdf)}

NIHMS1066295-supplement-17.pdf^{(6.7KB, pdf)}

[R1] 1.Schmitz C, Vernon R, Otting G, Baker D, Huber T. Protein structure determination from pseudocontact shifts using ROSETTA. J Mol Biol 2012;416:668–677. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Lange OF, Baker D. Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation. Proteins 2012;80:884–895. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Lange OF, Rossi P, Sgourakis NG, Song Y, Lee HW, Aramini JM, Ertekin A, Xiao R, Acton TB, Montelione GT, Baker D. Determination of solution structures of proteins up to 40 kDa using CS-Rosetta with sparse NMR data from deuterated samples. Proc Natl Acad Sci USA 2012;109:10873–10878. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Raman S, Huang YJ, Mao B, Rossi P, Aramini JM, Liu G, Montelione GT, Baker D. Accurate automated protein NMR structure determination using unassigned NOESY data. J Am Chem Soc 2010;132:202–207. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Bowers PM, Strauss CE, Baker D. De novo protein structure determination using sparse NMR data. J Biomol NMR 2000;18:311–318. [DOI] [PubMed] [Google Scholar]

[R6] 6.Shen Y, Vernon R, Baker D, Bax A. De novo protein structure generation from incomplete chemical shift assignments. J Biomol NMR 2009;43:63–78. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Meiler J, Baker D. Rapid protein fold determination using unassigned NMR data. Proc Natl Acad Sci USA 2003;100:15404–15409. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Sulkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN. Genomicsaided structure prediction. Proc Natl Acad Sci USA 2012;109: 10340–10345. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Hirst SJ, Alexander N, McHaourab HS, Meiler J. RosettaEPR: an integrated tool for protein structure determination from sparse EPR data. J Struct Biol 2011;173:506–514. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Zemla A LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res 2003;31:3370–3374. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Cong Q, Kinch LN, Pei J, Shi S, Grishin VN, Li W, Grishin NV. An automatic method for CASP9 free modeling structure prediction assessment. Bioinformatics 2011;27:3371–3378. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Kinch L, Yong Shi S, Cong Q, Cheng H, Liao Y, Grishin NV. CASP9 assessment of free modeling target predictions. Proteins 2011; 79(Suppl 10):59–73. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Vincent JJ, Tai CH, Sathyanarayana BK, Lee B. Assessment of CASP6 predictions for new and nearly new fold targets. Proteins 2005;61(Suppl 7):67–83. [DOI] [PubMed] [Google Scholar]

[R14] 14.Jauch R, Yeo HC, Kolatkar PR, Clarke ND. Assessment of CASP7 structure predictions for template free targets. Proteins 2007; 69(Suppl 8):57–67. [DOI] [PubMed] [Google Scholar]

[R15] 15.Kopp J, Bordoli L, Battey JN, Kiefer F, Schwede T. Assessment of CASP7 predictions for template-based modeling targets. Proteins 2007;69(Suppl 8):38–56. [DOI] [PubMed] [Google Scholar]

[R16] 16.Ben-David M, Noivirt-Brik O, Paz A, Prilusky J, Sussman JL, Levy Y. Assessment of CASP8 structure predictions for template free targets. Proteins 2009;77(Suppl 9):50–65. [DOI] [PubMed] [Google Scholar]

[R17] 17.Keedy DA, Williams CJ, Headd JJ, Arendall WB III, Chen VB, Kapral GJ, Gillespie RA, Block JN, Zemla A, Richardson DC, Richardson JS. The other 90% of the protein: assessment beyond the Calphas for CASP8 template-based and high-accuracy models. Proteins 2009;77(Suppl 9):29–49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Mariani V, Kiefer F, Schmidt T, Haas J, Schwede T. Assessment of template based protein structure predictions in CASP9. Proteins 2011;79(Suppl 10):37–58. [DOI] [PubMed] [Google Scholar]

[R19] 19.Kryshtafovych A, Fidelis K, Moult J. CASP9 results compared to those of previous CASP experiments. Proteins 2011;79(Suppl 10): 196–207. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Kinch LN, Wrabl JO, Krishna SS, Majumdar I, Sadreyev RI, Qi Y, Pei J, Cheng H, Grishin NV. CASP5 assessment of fold recognition target predictions. Proteins 2003;53(Suppl 6):395–409. [DOI] [PubMed] [Google Scholar]

[R21] 21.Tress M, Ezkurdia I, Grana O, Lopez G, Valencia A. Assessment of predictions submitted for the CASP6 comparative modeling category. Proteins 2005;61(Suppl 7):27–45. [DOI] [PubMed] [Google Scholar]

[R22] 22.Holm L, Park J. DaliLite workbench for protein structure comparison. Bioinformatics 2000;16:566–567. [DOI] [PubMed] [Google Scholar]

[R23] 23.Bostick DL, Shen M, Vaisman II. A simple topological representation of protein structure: implications for new, fast, and robust structural classification. Proteins 2004;56:487–501. [DOI] [PubMed] [Google Scholar]

[R24] 24.Zotenko E, Dogan RI, Wilbur WJ, O’Leary DP, Przytycka TM. Structural footprinting in protein structure comparison: the impact of structural fragments. BMC Struct Biol 2007;7:53. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Gibrat JF, Madej T, Bryant SH. Surprising similarities in structure comparison. Curr Opin Struct Biol 1996;6:377–385. [DOI] [PubMed] [Google Scholar]

[R26] 26.Tramontano A, Morea V. Assessment of homology-based predictions in CASP5. Proteins 2003;53(Suppl 6):352–368. [DOI] [PubMed] [Google Scholar]

[R27] 27.Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A. Evaluation of template-based models in CASP8 with standard measures. Proteins 2009;77(Suppl 9):18–28. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Hollander M, Wolfe DA. Nonparametric statistical methods. New York: Wiley; 1973. xviii, 503 p. [Google Scholar]

[R29] 29.Mukherjee S, Zhang Y. MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming. Nucleic Acids Res 2009;37:e83. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Koike R, Ota M. SCPC: a method to structurally compare protein complexes. Bioinformatics 2012;28:324–330. [DOI] [PubMed] [Google Scholar]

[R31] 31.Mizuguchi K, Go N. Comparison of spatial arrangements of secondary structural elements in proteins. Protein Eng 1995;8:353–362. [DOI] [PubMed] [Google Scholar]

[R32] 32.Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins 2004;57:702–710. [DOI] [PubMed] [Google Scholar]

[R33] 33.Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. UCSF Chimera—a visualization system for exploratory research and analysis. J Comput Chem 2004;25:1605–1612. [DOI] [PubMed] [Google Scholar]

Group	Num pred	Num wins
A
477	17	38
045	17	32
490	17	28
108	17	26
301	17	25
222	14	12
124	2	8
238	17	7
294	17	7
493	15	6
103	2	6
292	2	6
077	15	5
198	17	5
311	11	4
434	16	3
471	17	3
201	10	2
365	17	2
373	17	1
473	16	1
489	10	1
179	4	0
482	1	0
B
045	10	11
124	10	8
428	4	6
435	9	4
477	10	4
292	10	1
330	2	1
087	2	0
110	1	0
113	2	0
179	3	0
249	1	0
261	1	0
267	4	0
298	1	0
308	4	0
311	1	0
344	1	0
381	2	0
413	1	0
438	1	0
444	7	0

Group	Num pred	Num wins
A
477	17	38
045	17	32
490	17	28
108	17	26
301	17	25
222	14	12
124	2	8
238	17	7
294	17	7
493	15	6
103	2	6
292	2	6
077	15	5
198	17	5
311	11	4
434	16	3
471	17	3
201	10	2
365	17	2
373	17	1
473	16	1
489	10	1
179	4	0
482	1	0
B
045	10	11
124	10	8
428	4	6
435	9	4
477	10	4
292	10	1
330	2	1
087	2	0
110	1	0
113	2	0
179	3	0
249	1	0
261	1	0
267	4	0
298	1	0
308	4	0
311	1	0
344	1	0
381	2	0
413	1	0
438	1	0
444	7	0

PERMALINK

Assessment of CASP10 contact-assisted predictions

Todd J Taylor

Hongjun Bai

Chin-Hsien Tai

Byungkook Lee

Abstract

INTRODUCTION

Figure 1.

Table I.

Table II.

MATERIALS AND METHODS

RESULTS

Community-wide improvement and the number of contacts required to improve predictions

Table III.

Figure 2.

Figure 3.

Figure 4.

Per group improvement

Table IV.

Table V.

Table VI.

Figure 5.

Selected examples

Correlation of scores with the fraction of satisfied contacts

Table VII.

Figure 6.

Redundancy among predictions

Contact-assisted predictions for the T0680 tetramer

Table VIII.

Figure 7.

CONCLUSIONS AND SUGGESTIONS FOR THE FUTURE

Supplementary Material

ACKNOWLEDGMENTS

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Group	Num pred	Num wins
A
477	17	38
045	17	32
490	17	28
108	17	26
301	17	25
222	14	12
124	2	8
238	17	7
294	17	7
493	15	6
103	2	6
292	2	6
077	15	5
198	17	5
311	11	4
434	16	3
471	17	3
201	10	2
365	17	2
373	17	1
473	16	1
489	10	1
179	4	0
482	1	0
B
045	10	11
124	10	8
428	4	6
435	9	4
477	10	4
292	10	1
330	2	1
087	2	0
110	1	0
113	2	0
179	3	0
249	1	0
261	1	0
267	4	0
298	1	0
308	4	0
311	1	0
344	1	0
381	2	0
413	1	0
438	1	0
444	7	0