Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Dec 1.
Published in final edited form as: Proteins. 2019 Aug 27;87(12):1190–1199. doi: 10.1002/prot.25795

Assessment of protein assembly prediction in CASP13

Dmytro Guzenko 1, Aleix Lafita 2, Bohdan Monastyrskyy 3, Andriy Kryshtafovych 3, Jose M Duarte 1
PMCID: PMC6851419  NIHMSID: NIHMS1048339  PMID: 31374138

Abstract

We present the assembly category assessment in the 13th edition of the CASP community-wide experiment. For the second time, protein assemblies constitute an independent assessment category. Compared to the last edition we see a clear uptake in participation, more oligomeric targets released, and consistent, albeit modest, improvement of the predictions quality. Looking at the tertiary structure predictions we observe that ignoring the oligomeric state of the targets hinders modelling success. We also note that some contact prediction groups successfully predicted homomeric interfacial contacts, though it appears that these predictions were not used for assembly modelling. Homology modelling with sizeable human intervention appears to form the basis of the assembly prediction techniques in this round of CASP. Future developments should see more integrated approaches where subunits are modelled in the context of the assemblies they form.

Keywords: CASP, protein assembly, protein interfaces, structure prediction

1 |. INTRODUCTION

In their physiological environment, protein chains commonly associate with other chains or copies of themselves to form protein assemblies. This is the so-called quaternary structure, an intrinsic property of the native state of a protein, known before the first atomic structures were solved [1]. Protein function is linked and often is determined or regulated by the oligomeric structure [2, 3, 4]. As of March 2019, the average structure in the Protein Data Bank (PDB) [5] is a dimer and approximately half of the PDB is annotated as oligomeric. Estimates of the average protein oligomeric state in the cell point to an even higher tetrameric assembly [6].

Protein oligomerization is a broad term that encompasses states with different degrees of affinity. The association between polypeptide chains in stable obligate oligomers can be regarded as an extension of protein folding and often occurs simultaneously [7]. At the other extreme are transient protein-protein complexes where the association is opportunistic and promiscuous, representing the functions of the proteins involved [8]. It is important to note that there is a continuum between these states, and in the context of CASP no effort has yet been made to distinguish them.

Due to intrinsic limitations of the different experimental methods used for structure determination, protein assemblies are likely underrepresented in the PDB. The three methods most commonly used are X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy and 3-dimensional electron microscopy (3DEM).

X-ray crystallography has been and remains the main source of atomic-resolution protein structures in the PDB. The majority of these are homomeric (85% of depositions in 2018), from which about half are oligomeric. Crystallization of hetero-oligomers is more technically challenging, especially as the interaction becomes more transient [9]. Consequently, hetero-oligomeric complexes are severely underrepresented in the X-Ray crystallographic output.

Historically the second-most popular method for protein structure determination, NMR spectroscopy, does not contribute significantly to their oligomerization knowledge. It accounted for 3% of overall depositions to PDB in 2018 with 90% of entries being monomers. The reasons are mostly technical: protein complexes are often large and symmetric and both of these factors complicate NMR data interpretation.

The rapidly expanding 3DEM technique is naturally suited for determination of protein complexes (95% of the EM entries) and has the most potential to boost our quaternary structure knowledge. In 2018 3DEM accounted for 10% of PDB depositions and, notably, for about a third of all deposited hetero-oligomeric complexes. Traditionally, the interpretation of the experimental maps was more challenging due to low resolution (median 4.3 Å) and less well-developed data-model fit quality metrics. However there is plenty of room for optimism as the technique continues to actively develop and achieves ever higher resolutions (the median resolution was 3.8 Å in 2018) [10, 11].

The Critical Assessment of protein Structure Prediction (CASP) experiment was established as a means to consistently evaluate the state of the protein structure modeling field. The experiment focuses on problems at the frontier of the research and evolves together with it. New prediction categories deemed attainable are regularly introduced, and those where the progress is believed to have been exhausted are discontinued [12].

Quaternary structure has a rather peculiar history within the experiment. While oligomeric protein targets were incidentally featured in CASP2 (1996) [13], CASP7 (2006) [14] and CASP9 (2010) [15], the experiment was mainly focused on tertiary structure prediction. On the other hand, the Critical Assessment of PRedicted Interactions (CAPRI), an independent experiment inspired by CASP, was established in 2001 to address the protein-protein docking problem. With such an arrangement, the assessment of the quaternary structure modeling was explicitly branched into “subunits” (CASP) and “interfaces” (CAPRI). Recognizing the growing importance of integrated quaternary structure prediction, CASP and CAPRI conducted the parallel assessment of selected oligomeric targets in 2014 (CASP11/CAPRI30) [16]. In 2016 (CASP12), a separate “Assembly” category was introduced to evaluate predictions of the complete 3-dimensional functional units on all oligomeric CASP targets [17]. The assembly category serves to highlight the importance of considering proteins in their native solution state, with the ultimate goal of producing complete models, that can shed light into the biology and function of the molecular systems under scrutiny.

By introducing new assessment categories, the CASP experiment shapes and drives the development of methods necessary to excel in them [12]. Recent break-throughs in both domain structure [18] and contact predictions [19] suggest that higher-order complexity targets, protein assemblies, are feasible. Here we present our analysis of the CASP13 assembly predictions, compare the results to those of CASP12 and discuss the status and outlook of the field.

2 |. METHODS

2.1 |. Assembly targets

In CASP13, the organizers proactively gathered protein assemblies, specifically targeting heteromeric complexes. This has resulted in 64% of the targets (42 out of 66) being oligomeric - a marked increase from 42% (30 targets out of 71) in CASP12 [20]. 20 targets were selected for the combined CAPRI/CASP experiment.

In terms of experimental methods the vast majority of targets came from X-ray crystallography (36 out of 42), whilst the rest were solved with 3DEM techniques. Compared to CASP12 (26 X-ray, 2 NMR and 2 3DEM) we observe a significant increase in structures solved with 3DEM, consistent with the recent developments in experimental structural biology.

Assigning the oligomeric state of targets was not always a straightforward task, specifically in the case of crystal structures, where the contacts in the crystal lattice can lead to different interpretations [21]. This step was done in collaboration with the CAPRI assessment team, with contributions from the CASP organizers. In broad terms, to assign the oligomeric state we considered the following (in order of priority):

  1. experimentalists indication, preferred if backed by experimental perimental evidence;

  2. if structure was known, EPPIC [22] and PISA [23] analysis;

  3. stoichiometry consensus of homologous structures in the PDB found with HHpred [24].

All CASP13 targets were examined in this way, even when assumed to be monomers by the experimentalists. After this procedure, 5 cases remained ambiguous and were assigned with low confidence (see Table S1). This shows how the definition of the ground truth remains as one of the challenges in assembly prediction [21].

The selection process resulted in a wide range of stoi-chiometries and symmetries (see Table S1). They included a helical symmetry (T0995) and a very large complex with A6B6C6 stoichiometry (H1021) solved by 3DEM. Out of 42 targets, 12 were heteromeric and 30 homomeric, double the proportion of heteromers as would be expected if drawn randomly from the PDB [25]. Two of the heteromeric targets presented uneven stoichiometry (H0953 with stoichiometry A3B1 and H1022 with A6B3), a rather unusual event in the PDB with only 10% occurrence among all known heteromers [25].

2.2 |. Target difficulty

We have classified the targets into three difficulty levels based on the information available to the predictors prior to the experiment, similarly to the CASP12 assembly assessment[17]. Outcome of predictions (i.e., posterior difficulty) was not considered.

We define three difficulty classes with the following criteria:

  • Easy: the target has templates for both the subunits and the overall assembly, findable by sequence homology detection methods.

  • Medium: the target has partial templates identifiable by sequence homology detection methods. Partial can mean that the full subunit templates are known but no information to model the interface can be found, or that information of only part of the interfaces is known (e.g. a dimer template available for half of a tetrameric target).

  • Difficult: the target does not have templates findable by sequence homology detection methods, for either the subunits or the assembly.

One of the targets (T0965) was classifed as Medium (see Table S1), despite availability of a complete template, because the arrangement of helices at the interface differed substantially in the target structure.

2.3 |. Evaluation scores

We assess the accuracy of the predicted protein-protein interfaces with the two measures introduced in the CASP12 assembly assessment: Interface Contact Similarity (ICS) and Interface Patch Similarity (IPS) [17], both in the range [0,1] with 0 worst and 1 best. ICS is an F1-score of the sets of predicted and native contacts, whilst IPS is a Jaccard index of the sets of predicted and native residues composing each of the two sides of the interface. In the official evaluation tables in the predictioncenter.org website, these scores are called F1 and Jaccard respectively. IPS is less sensitive than ICS to rotations and translations at the interface.

Evaluation of the interfaces is sufficient if the sub-units are known or are relatively easy to model independently of each other. However, CASP assembly targets are not selected with this assumption in mind and in practice often require non-trivial subunit modelling. To capture performance of the tertiary structure prediction methods in the context of quaternary structure, we have chosen to add two other scores to the pool. First, Global Distance Test (GDT) [26] quantifes the maximum fraction of residues that can be superposed under specifed distance thresholds. Segments of a model beyond a threshold are essentially ignored and do not penalize the score further, no matter how far they deviate from the target. It is most popular for assessment of the tertiary structure prediction in CASP, and, as the name implies, GDT is best suited for assessing globally rigid structures. Finally, Local Distance Difference Test (lDDT) [27] is in a sense orthogonal to GDT, comparing interatomic distances within a certain radius of each target residue to those of a model. Thus it is a superposition-free score that is sensitive to the local model quality along the entire polypeptide chain, while the global domain movements have no signifcant effect.

These scores are not directly applicable to the multichain models, as the order of chains in the fle is not necessarily preserved with respect to their 3-dimensional arrangement. Therefore, ‘chain mapping’ has to be established between the target and the prediction prior to regular scoring. For this purpose, we used the QS-score algorithm, which evaluates all possible non-symmetrical chain mapping combinations [28]. For large targets with many subunits the combinatorial problem becomes intractable. This was the case in H1021, for which we used the QS-align tool [29], a greedy algorithm based on the iterative addition of chain pairs close in space after an initial super-position. The obtained scores were rescaled to the [0,1] range and are referred here as GDT/lDDT Oligomeric (or GDTo/lDDTo for brevity). In addition, we calculated these scores for the CASP12 targets and predictions to enable direct comparison of the results. Figure 1 shows score correlations for all models in CASP13, with clear blocks differentiating how interface (local) scores capture different information than assembly (global) scores.

FIGURE 1.

FIGURE 1

Score correlations. A heat map with correlations among all relevant scores used in the predictioncenter.org web site. The “local” block of scores captures interface features, the “global” block captures features of the whole assembly.

Z-scores were calculated for every score per evaluation target. The frst submitted model (supposedly the best out of five allowed) was used for each group. To avoid penalizing unsuccessful prediction attempts and software glitches, we followed the CASP convention of removing outliers (Z < −2), recalculating the Z-scores and flattening negative values to zero. The total group score is a simple sum of all Z-scores for all targets it submitted predictions for. It has been noted [30] that difficult targets with few good predictions may result in inflated Z-scores. To mitigate this effect we performed ‘leave-one-out ranking’, whereby each target is consecutively removed from consideration, and groups’ mean total score is used for the ranking. The maximum and minimum total score values can be used to assess the signifcance of the differences between the closely ranked groups (shown in Figure 4 as error bars).

FIGURE 4.

FIGURE 4

Group rankings in the assembly category. The groups are sorted by the sum of Z -scores for all difficulty classes. The error bars are obtained by iteratively excluding every target from each difficulty class and recalculating the cumulative Z -scores. The server groups are labeled in violet.

3 |. RESULTS

A total of 45 groups participated in the CASP13 assembly category. From those, 22 groups participated only in the subset of targets selected for the joint CASP/CAPRI experiment, while 23 submitted predictions for all targets. 17 groups submitted models for more than 10 targets. That compares to only 10 groups submitting models for more than 10 targets in CASP12 assembly category [17]. In terms of number of models submitted there was a dramatic increase from 1600 in CASP12 to more than 5000 in CASP13.

Clear improvements in the prediction format and methodology were introduced in this edition compared to the first assembly category experiment in CASP12. First, the stoichiometry information is now provided to the prediction servers in an automated way. Second, model files can now be multi-chain, eliminating the need for assessors to guess whether predictors are actually attempting assembly prediction or not. Whether stoichiometries should continue to be given as input to predictors is an open question for future CASP assembly editions.

3.1 |. Performance

We present detailed score distributions for all targets in Figure 2, each panel corresponding to one of the 4 scores used. We used the Seok-naive_assembly method [16] as an indication of baseline for each homomeric target.

FIGURE 2.

FIGURE 2

Per-target score distributions and comparison to the baseline (naïve) values, if present. The targets for which the median prediction is worse than the baseline in each score are labeled in red. The naïve baseline comes from the Seok-naive_assembly method, which employs HHsearch to find the best templates for homo-oligomers, then builds an alignment per template with HHalign and finally calculates several models with Modeller [46], from which the lowest energy (as calculated by Modeller) model is selected.

In order to qualitatively analyze the predictions outcome, we consider a target to be solved if there exist models for which all four scores (ICS, IPS, lDDTo, GDTo) have values greater than 0.5. It follows that 9 assembly targets out of 42 are solved in CASP13: T0961o, T0973o, H0974, T0983o, T1003o, T1004o, T1006o, T1016o, T1020o (Figure 2). However, 4 of these are also solved by the baseline method. T1004o is a notable improvement on the baseline, as it had two partial assembly templates (PDB IDs 5EFVand 5M9F), which most groups successfully combined. In contrast to the results of tertiary structure prediction in this round of CASP, absence of detectable assembly templates with near-complete coverage guarantees absence of good models.

Using the same criteria as above, we find that 6 (easy) targets out of 30 were solved in CASP12 - the same proportion as in CASP13. To evaluate the progress quantitatively, we assume that the difficulty of the assembly targets in CASP12 and CASP13 has roughly the same distribution (evidence in [31]), and compare the relative performance of the predictors by matching score percentiles. For example, GDTo value of 0.5 in CASP12 is at the 76th percentile of all best predictions. In CASP13, the 76th percentile corresponds to the GDTo value of 0.55, which indicates 5% improvement. Figure 3 reveals the complete picture of such analysis and shows 5–15% improvement for all scores across the board.

FIGURE 3.

FIGURE 3

Performance comparison between CASP13 and CASP12. 5 top predictions per target (maximum 1 per group) were selected for each score from CASP12 and CASP13 submissions. The scores were matched by percentiles and plotted as CASP12 (x axis) vs. CASP13 (y axis). Values above the diagonal correspond to improvement in CASP13.

Finally, the CASP13 group ranking is shown in Figure 4. The Venclovas group consistently outperformed the rest in all difficulty classes, followed by Seok and BAKER.. Success of the top-performing groups appears to be in large part due to the human intervention, as all participating servers are ranked similarly to the naïve strategy.

3.2 |. Prediction highlights

An interesting and quite successful prediction target was T0976. The homodimer target is composed of 4 copies of a well known domain with many templates available in the PDB (CATH superfamily 3.40.250.10, Oxidized Rho-danese domain 1 [32]). However, there were no templates with this particular dimer. Rather, a monomeric template (PDB ID: 1YT8) had a similar overall arrangement of the 4 domains with interdomain interfaces resembling the dimeric interface in the target (see Figure 7A). Groups like D-Haven, ZouTeam and ClusPro achieved relatively good scores for the dimeric interface and for the assembly, see Table 1.

FIGURE 7.

FIGURE 7

Prediction highlights. A) The homodimeric target T0976 and the monomeric template that matches the global arrangement of the 4 domains, B) Homodimeric target T1001 and the template PDB entry 5LLW, a much larger protein, the highlighted central domain has a very close tertiary structure and a similar interface region. C) The A2B2 heterotetramer T0968 with a main homomeric interface (cyan and yellow chains) via beta pairing, composing a large beta sandwich. The other subunit attaches on either side of the beta sheets.

TABLE 1.

Summary of best scores for the highlighted targets

ICS IPS lDDTo GDTo
Score Model/Group Score Model/Group Score Model/Group Score Model/Group
T0976 0.4 TS155_4
ZouTeam
0.58 TS155_4
ZouTeam
0.619 TS155_4
ZouTeam
0.47 TS329_1
D-Haven
T1001 0.52 TS086_3
BAKER
0.5 TS470_2
Seok-assembly
0.7 TS068_1
Seok
0.45 TS068_1
Seok
H0968 0.05 TS208_4
KIAS-Gdansk
0.31 TS163_4
Bates-BMM
0.47 TS163_4
Bates-BMM
0.26 TS208_4
KIAS-Gdansk

Target T1001, classified as difficult, was another success story from predictors. A good dimeric template exists in the PDB (PDB ID: 5LLW), however, the matching domain in 5LLW is only a small part of the full length protein (Figure 7B) and importantly contains a very long insertion when compared to T1001. Indeed, HHpred is not able to find either this or a tertiary-only template (PDB ID: 3OOV) when submitting different subsets of the target sequence. Relatively good predictions were submitted by Seok and BAKER, groups.

An example of an unsuccessful multimeric prediction was H0968, classified as difficult due to lack of assembly templates and with both monomers being FM targets. The subunits were well modelled by a few groups, pre-sumably aided by contact prediction. However there was essentially no group that came close to either of the two interfaces present in the target (Figure 7C). Nevertheless, some groups could predict interface contacts for this target’s homomeric interface, as detailed in the Contact Prediction section below.

3.3 |. Importance of quaternary modelling

While analyzing the results, we noticed a tendency in how the quaternary structure is handled by the predictors, in particular those who did not participate in the assembly category. Most groups seemingly split the problem into two consecutive steps: 1) modelling the subunits, 2) modelling the complex. However, results from this CASP show that such strategy is flawed. This can be appreciated very clearly in multiple targets (Figure 5) which we discuss below.

FIGURE 5.

FIGURE 5

Importance of quaternary modelling. A) Targets T0973, T0991 and T0998 with very large dimeric interfaces and the main hydrophobic core split at the interface. The best regular prediction GDT_TS scores for their corresponding monomeric evaluation units were: 82.62 for T0973-D1, 37.16 for T0991-D1 and 35.54 for T0998-D1. B) Trimeric part of target H0953 showing the intertwined beta-strand geometry in the C-terminal half of the fold.

T0973, T0991 and T0998: all 3 targets have similar folds and dimeric quaternary structures. The dimeric interface is formed by the swapping of a helix folding onto the beta sheet of the other monomer, with an enormous buried surface area resulting in an intimate and very stable dimer1. However, the evaluation unit for the regular prediction was the full monomer (including the swapped helix) in all 3 cases. Unsurprisingly, these targets received poor overall predictions. A good quaternary template was available for the target T0973, which resulted in some modellers achieving good scores. Notably, the best performing group in the regular category, A7D, did not use templates explicitly and showed poor performance for T0973 (GDT_TS=32.62).

Target H0953 is an A3B1 multimer, composed of a trimeric part with a beta helix fold attached to a monomeric receptor recognition protein. The trimer consists of single-chain beta sheets in the N-terminal and of interdigitated beta strands coming from each of the chains in the C-terminal. The interface buried area is not exceptionally large but the intertwining geometry makes it an obligate multimer. Again in this case, the evaluation unit (T0953s1-D1) was assigned to a single full-length monomer out of the trimer. This resulted in overall bad predictions in the C-terminal region for regular category models. is the only group that comes close to a reasonable prediction for the C-terminal.

Other examples are T0981, T0989 and H0957. With-out going into detail, all of these had relatively low-quality predictions due to treating the chains as completely independent folding units.

3.4 |. Sequence-based contact predictions for homomeric interfaces

Given the success of sequence-based contact predictions in CASP12 [18, 19] and in this CASP13 edition [33, 34], we were interested to see if these techniques could translate into the prediction of protein interfaces. Thus we looked into contact predictions in relation to quaternary structure modelling. Although interface contacts are not considered in the contact prediction category in CASP13 [35], homomeric interfaces are formed by contacts within a single target and should therefore be accounted for. In total, 37 CASP13 targets form homomeric interactions, which in average account for 13% of all contacts in the target, ranging from 2% to over 50% (Figure S1). To our surprise, we find that homomeric contacts are usually among the top ranked predictions from the best groups in each respective target. In the examples shown in Figure 6, good predictions exist for both the tertiary and interface contacts, but the interface contacts are currently regarded as false positives for the contact assessment category. In fact, we find that considering homomeric contacts would have changed group rankings for some targets, e.g. T0968s2. In view of these results, we encourage future CASP editions to consider evaluating homomeric contacts as part of the contact prediction category.

FIGURE 6.

FIGURE 6

Sequence-based prediction of homomeric interface contacts for three CASP13 FM targets: A) interdigitated trimer T0953s1 and prediction by group RR106; B) dimeric interface (isologous) of T0968s2 and prediction by group RR036; and C) hexameric subunit T1022s1 and prediction by group RR164. Contacts in the target assembly are shown in the upper-right of the matrix and predicted contacts are shown in the lower-left, with transparency levels proportional to the contact rank. Contacts are classified based on the nature of the interaction: interface (orange), tertiary (blue), tertiary and interface (shared, purple) or prediction false positives (no contact, black).

Homomeric interface contacts also present a challenge for protein structure modelling from contact matrix predictions, since currently most regular predictors try to fold a single subunit. The additional interface contacts in the matrix would impose unrealistic constraints between residues in the folding protocol, similarly to false positives, known to negatively affect 3D reconstruction [36,37]. Modellers would need to disentangle intra-chain from inter-chain contacts in the matrix and adapt their pipelines to fold multiple chains according to the given stoichiometry. Previous work has had some success in the heteromeric case [38, 39, 40] but to our knowledge there is no such studies for the homomeric case.

Among all types of homomeric interactions, isologous interfaces (as found in dimers and dihedral symmetries) present yet another challenge for protein assembly modelling from contact predictions. Due to their two-fold symmetry, many of the contacts at the interface, specially those close to the axis of symmetry, will be between the same residues (residue interacting with itself in another subunit) or residues very close in sequence, as is the case for the homodimeric interface in target T0968s2. Prediction methods currently ignore by design contacts between residues close in sequence, in part due to technical limitations, which would need to be overcome in the future.

3.5 |. Data-assisted predictions and assemblies

A total of 7 assembly targets were also released as ‘data assisted’ targets (Figure S2), a category that attempts to evaluate advances in integrative modelling methods [41]. SAXS data was collected for all 7 of the targets, whilst cross-link data was collected for 5 of them and NMR data for 1 (H0980). The experimental details and data-assisted specific assessment is discussed in the respective papers [42, 35]. Here, as part of our assembly analysis, we looked into how the data-assisted assembly predictions compare with the regular ones, using the regular evaluation strategy. All 7 targets were selected from the difficult group, for which there is little homology information available to perform traditional modelling. SAXS data has the potential to provide valuable information about the global shape of the assemblies and thus should be particularly helpful for this category. At the same time, cross-linking and NMR data can provide information on the inter-chain interfaces, potentially helping the assembly modelling process.

Figure S3 presents the evaluation of all the targets on the 4 scores used here (see Methods). The score ranges for all of them are not significantly different from the regular predictions. Barring target X0957 (Figure S4), no systematic improvement is detectable in this experiment. The reasons appear to be twofold. First, difficulty of the targets may have limited the search space of the prediction methods too early in the pipeline. Second, the groups with the best non-assisted predictions generally did not participate in the data-assisted category, which limits comparability of the outcomes between the categories. A further possible problem can be disagreement of assisted experimental data and structural data (crystallography), which is discussed in the accompanying papers in this issue [42, 35].

4 |. CONCLUSIONS AND OUTLOOK

We have presented the CASP13 assembly category assessment, the second edition of CASP with a dedicated assembly category. We have seen significant increase in participation, indicating more interest in quaternary structure modelling, a trend that can only be beneficial to the further development of methods. In addition, quality of the predictions consistently increased as well. We are hoping that the trend will continue in the next CASPs and that quaternary structure modelling becomes main-stream. Unfortunately, predictions in the regular categories are still not taking into account quaternary structure as an essential part of their modelling pipelines. We also showed that contact prediction for homomeric interfaces is already surprisingly successful, an aspect likely ignored by predictors, assessors and organizers at the moment.

We still see room for improvement in several places. Automation is rather limited in this category. For instance, only 2 servers (Swiss-Model[43] and Robetta [44]) participate in the multimeric section of the fully automated CAMEO experiment [45]. The sophistication of the methods in assembly modelling is falling behind traditional tertiary modelling. Specifically, we have not seen much utilization of the machine learning methods, popular in the tertiary structure and contact prediction categories. It appears that traditional homology modelling still dominates the field.

In conclusion, we would like to emphasize that quaternary modelling is intrinsic to the protein modelling problem and must be considered from the outset in the design of modelling pipelines. Correspondingly, a CASP evaluation unit should match the functional form of a protein structure, be it a monomer or an assembly, with consistent metrics throughout.

Supplementary Material

Supplementary
Table S1

Acknowledgements

We would like to thank the CASP management committee for organization and support. We are grateful to Chaok Seok for contributing the naïve prediction method. We thank Susan Tsutakawa, Gregory Hura, Gaetano Montelione and Andras Fiser for their help in interpreting data-assisted predictions. We thank Spencer Bliven for discussions of the CASP12 assembly prediction results.

Funding information

DG and JD were supported by the RCSB PDB, jointly funded by the National Science Foundation, the National Institute of General Medical Sciences, the National Cancer Institute, and the Department of Energy (NSF-DBI 1338415; Principal Investigator: Stephen K. Burley). AK and BM were supported by the US National Institute of General Medical Sciences (NIGMS/NIH) grant GM100482. AL was supported by the EMBL International PhD Programme.

Footnotes

1

Indeed, quoting Kaspars Tars (Latvian Biomedical Research and Study Centre) who provided the experimental structure: “Monomers do not exist in a free state, so modelling a monomer structure makes no sense. (…) The hydrophobic core of the protein is in part composed of inter-monomer contacts in dimer.”

references

  • [1].Svedberg T, Nichols J. The application of the oil turbine type of ultracentrifuge to the study of the stability region of carbon monoxide-hemoglobin. Journal of the American Chemical Society 1927;49(11):2920–2934. [Google Scholar]
  • [2].Goodsell DS, Olson AJ. Structural symmetry and protein function. Annual review of biophysics and biomolecular structure 2000;29(1):105–153. [DOI] [PubMed] [Google Scholar]
  • [3].Selwood T, Jaffe EK. Dynamic dissociating homooligomers and the control of protein function. Archives of Biochemistry and Biophysics 2012;519(2):131–143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Hashimoto K, Panchenko AR. Mechanisms of protein oligomerization, the critical role of insertions and deletions in maintaining different oligomeric states. Proceedings of the National Academy of Sciences 2010;107(47):20352–20357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Burley SK, Berman HM, Bhikadiya C, Bi C, Chen L, Di Costanzo L, et al. RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic acids research 2018;47(D1):D464–D474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Goodsell DS. Inside a living cell. Trends in biochemical sciences 1991;16:203–206. [DOI] [PubMed] [Google Scholar]
  • [7].Tsai CJ, Xu D, Nussinov R. Protein folding via binding and vice versa. Folding and Design 1998;3(4):R71–R80. [DOI] [PubMed] [Google Scholar]
  • [8].Nooren IM, Thornton JM. Diversity of protein-protein interactions. The EMBO journal 2003;22(14):3486–3492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Radaev S, Li S, Sun PD. A survey of protein-protein complex crystallizations. Acta Crystallographica Section D: Biological Crystallography 2006;62(6):605–612. [DOI] [PubMed] [Google Scholar]
  • [10].Kühlbrandt W The resolution revolution. Science 2014;343(6178):1443–1444. [DOI] [PubMed] [Google Scholar]
  • [11].Ognjenović J, Grisshammer R, Subramaniam S. Frontiers in Cryo Electron Microscopy of Complex Macro-molecular Assemblies. Annual review of biomedical engineering 2019;21. [DOI] [PubMed] [Google Scholar]
  • [12].Kryshtafovych A, Fidelis K, Moult J 2. In: CASP: A Driving Force in Protein Structure Modeling John Wiley & Sons, Ltd; 2010. p. 15–32. [Google Scholar]
  • [13].Dunbrack RL, Gerloff DL, Bower M, Chen X, Lichtarge O, Cohen FE. Meeting review: the Second Meeting on the Critical Assessment of Techniques for Protein Structure Prediction (CASP2), Asilomar, California, December 13–16, 1996. Folding and Design 1997;2(2):R27 – R42. http://www.sciencedirect.com/science/article/pii/S1359027897000114. [DOI] [PubMed] [Google Scholar]
  • [14].Jauch R, Yeo HC, Kolatkar PR, Clarke ND. Assessment of CASP7 structure predictions for template free targets. Proteins: Structure, Function, and Bioinformatics 2007;69(S8):57–67. https://onlinelibrary.wiley.com/doi/abs/10.1002/prot.21771. [DOI] [PubMed] [Google Scholar]
  • [15].Mariani V, Kiefer F, Schmidt T, Haas J, Schwede T. Assessment of template based protein structure predictions in CASP9. Proteins: Structure, Function and Bioin-formatics 2011;79(SUPPL. 10):37–58. [DOI] [PubMed] [Google Scholar]
  • [16].Lensink MF, Velankar S, Baek M, Heo L, Seok C, Wodak SJ. The challenge of modeling protein assemblies: the CASP12-CAPRI experiment. Proteins: Structure, Function, and Bioinformatics 2018;86:257–273. [DOI] [PubMed] [Google Scholar]
  • [17].Lafta A, Bliven S, Kryshtafovych A, Bertoni M, Monastyrskyy B, Duarte JM, et al. Assessment of protein assembly prediction in CASP12. Proteins: Structure, Function, and Bioinformatics 2018;86:247–256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Abriata LA, Tamò GE, Monastyrskyy B, Kryshtafovych A, Dal Peraro M. Assessment of hard target modeling in CASP12 reveals an emerging role of alignment-based contact prediction methods. Proteins: Structure, Function, and Bioinformatics 2018;86:97–112. [DOI] [PubMed] [Google Scholar]
  • [19].Schaarschmidt J, Monastyrskyy B, Kryshtafovych A, Bonvin AM. Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age. Proteins: Structure, Function, and Bioinformatics 2018;86:51–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)-Round XII. Proteins: Structure, Function, and Bioinformatics 2018;86:7–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Capitani G, Duarte JM, Baskaran K, Bliven S, Somody JC. Understanding the fabric of protein crystals: computational classifcation of biological interfaces and crystal contacts. Bioinformatics 2015;32(4):481–489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Bliven S, Lafta A, Parker A, Capitani G, Duarte JM. Automated evaluation of quaternary structures from protein crystals. PLoS computational biology 2018;14(4):e1006104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Krissinel E Stock-based detection of protein oligomeric states in jsPISA. Nucleic acids research 2015;43(W1):W314–W319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Zimmermann L, Stephens A, Nam SZ, Rau D, Kübler J, Lozajic M, et al. A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. Journal of molecular biology 2018;430(15):2237–2243. [DOI] [PubMed] [Google Scholar]
  • [25].Xu Q, Dunbrack RL Jr. Principles and characteristics of biological assemblies in experimentally determined protein structures. Current opinion in structural biology 2019;55:34–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Zemla A, Venclovas Č, Moult J, Fidelis K. Processing and analysis of CASP3 protein structure predictions. Proteins: Structure, Function, and Bioinformatics 1999;37(S3):22–29. [DOI] [PubMed] [Google Scholar]
  • [27].Mariani V, Biasini M, Barbato A, Schwede T. lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 2013;29(21):2722–2728. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Bertoni M, Kiefer F, Biasini M, Bordoli L, Schwede T. Modeling protein quaternary structure of homo-and hetero-oligomers beyond binary interactions by homology. Scientific reports 2017;7(1):10480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Lafita A, Bliven S, Prlić A, Guzenko D, Rose PW, Bradley A, et al. BioJava 5: A community driven open-source bioinformatics library. PLoS computational biology 2019;15(2):e1006791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, TramontanoA. Evaluation of template-based models in CASP8 with standard measures. Proteins: Structure, Function, and Bioinformatics 2009;77(S9):18–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Kinch LN, Kryshtafovych A, Monastyrskyy B, Grishin NV. CASP13 target classification into tertiary structure prediction categories. Proteins: Structure, Function, and Bioinformatics 2019;. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Dawson NL, Lewis TE, Das S, Lees JG, Lee D, Ashford P, et al. CATH: an expanded resource to predict protein function through structure and sequence. Nucleic acids research 2016;45(D1):D289–D295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Shrestha R, Fajardo E, Gil N, Fidelis K, Kryshtafovych A, Monastyrskyy B, et al. Assessing the accuracy of contact predictions in CASP13. Proteins: Structure, Function, and Bioinformatics 2019;. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Abriata LA, Tamò GE, Dal Peraro M. A further leap of improvement in tertiary structure prediction in CASP13 prompts new routes for future assessments. Proteins: Structure, Function, and Bioinformatics 2019;. [DOI] [PubMed] [Google Scholar]
  • [35].Eduardo Fajardo J, Shrestha R, Gil1 N, et al. Assessment of chemical-crosslink-assisted protein structure modeling in CASP13. Proteins: Structure, Function, and Bioin-formatics 2019;. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Duarte JM, Sathyapriya R, Stehr H, Filippis I, Lappe M. Optimal contact definition for reconstruction of contact maps. BMC bioinformatics 2010;11(1):283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Sathyapriya R, Duarte JM, Stehr H, Filippis I, Lappe M. Defining an essence of structure determining residue contacts in proteins. PLoS computational biology 2009;5(12):e1000584. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Hopf TA, Schärfe CP, Rodrigues JP, Green AG, Kohlbacher O, Sander C, et al. Sequence co-evolution gives 3D contacts and structures of protein complexes. Elife 2014;3:e03430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [39].Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. Elife 2014;3:e02030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Quignot C, Rey J, Yu J, Tufféry P, Guerois R, Andreani J. InterEvDock2: an expanded server for protein docking using evolutionary and biological information from homology models and multimeric inputs. Nucleic acids research 2018;46(W1):W408–W416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Ogorzalek TL, Hura GL, Belsom A, Burnett KH, Kryshtafovych A,TainerJA, et al. Small angleX-ray scattering and cross-linking for data assisted protein structure prediction in CASP 12 with prospects for improved accuracy. Proteins: Structure, Function, and Bioinformatics 2018;86:202–214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Hura G, Hodge C, Rosenberg D, Guzenko D, Duarte JM, Monastyrskyy B, et al. Small angle X-ray scattering-assisted protein structure prediction in CASP13 and emergence of solution structure differences. Proteins: Structure, Function, and Bioinformatics 2019;. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G, Gumienny R, et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic acids research 2018;46(W1):W296–W303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Kim DE, Chivian D, Baker D. Protein structure prediction and analysis using the Robetta server. Nucleic acids research 2004;32(suppl_2):W526–W531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45].Haas J, Barbato A, Behringer D, Studer G, Roth S, Bertoni M, et al. Continuous Automated Model Evaluation (CAMEO) complementing the critical assessment of structure prediction in CASP12. Proteins: Structure, Function, and Bioinformatics 2018;86:387–398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [46].Šali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. Journal of molecular biology 1993;234(3):779–815. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary
Table S1

RESOURCES