Towards a structurally resolved human protein interaction network

David F Burke; Patrick Bryant; Inigo Barrio-Hernandez; Danish Memon; Gabriele Pozzati; Aditi Shenoy; Wensi Zhu; Alistair S Dunham; Pascal Albanese; Andrew Keller; Richard A Scheltema; James E Bruce; Alexander Leitner; Petras Kundrotas; Pedro Beltrao; Arne Elofsson

doi:10.1038/s41594-022-00910-8

. 2023 Jan 23;30(2):216–225. doi: 10.1038/s41594-022-00910-8

Towards a structurally resolved human protein interaction network

David F Burke ^1,^#, Patrick Bryant ^2,^3,^#, Inigo Barrio-Hernandez ^1,^#, Danish Memon ^1,^#, Gabriele Pozzati ^2,^3,^#, Aditi Shenoy ^2,³, Wensi Zhu ^2,³, Alistair S Dunham ¹, Pascal Albanese ^4,⁵, Andrew Keller ⁶, Richard A Scheltema ^4,⁵, James E Bruce ⁶, Alexander Leitner ⁷, Petras Kundrotas ^2,^3,^8,^✉, Pedro Beltrao ^1,^7,^✉, Arne Elofsson ^2,^3,^✉

PMCID: PMC9935395 PMID: 36690744

Abstract

Cellular functions are governed by molecular machines that assemble through protein-protein interactions. Their atomic details are critical to studying their molecular mechanisms. However, fewer than 5% of hundreds of thousands of human protein interactions have been structurally characterized. Here we test the potential and limitations of recent progress in deep-learning methods using AlphaFold2 to predict structures for 65,484 human protein interactions. We show that experiments can orthogonally confirm higher-confidence models. We identify 3,137 high-confidence models, of which 1,371 have no homology to a known structure. We identify interface residues harboring disease mutations, suggesting potential mechanisms for pathogenic variants. Groups of interface phosphorylation sites show patterns of co-regulation across conditions, suggestive of coordinated tuning of multiple protein interactions as signaling responses. Finally, we provide examples of how the predicted binary complexes can be used to build larger assemblies helping to expand our understanding of human cell biology.

Subject terms: Systems biology, Structural biology, Protein folding

Here the authors explore the ability of AlphaFold2 to predict structures across the human protein-protein interactome and the limitations thereof.

Main

Proteins are key cellular effectors determining most cellular processes. These rarely act in isolation, but instead, the coordination of the diversity of processes arises from the interaction among multiple proteins and other biomolecules. The characterization of protein-protein interactions (PPIs) is crucial for understanding which groups of proteins form functional units and underlies the study of the biology of the cell. Diverse experimental and computational approaches have been developed to determine the PPI network of the cell (that is, the interactome), with hundreds of thousands of human protein interactions determined to date^1–3. Protein interactions vary from transient interactions that regulate an enzyme to permanent interactions in molecular machines.

The structural characterization of the human interactome has lagged behind, with experimental and homology models currently covering an estimated 15 protein interactions^4,5. The structural characterization of protein complexes is a critical step in understanding the mechanisms of protein function, and in studying the impact of mutations^4,6–8 and the regulation of cellular processes via the post-translational tuning of binding affinities^9–12.

Computational approaches for predicting the structures of interacting protein pairs are primarily based on identifying structural similarity for pairs of proteins against experimentally determined protein complexes^4,6,13,14. The Interactome3D (refs. ⁴^,¹⁴) repository currently lists 7,625 predicted models based on homology of domains, a number similar to the 8,359 pairs listed having an experimentally determined model. In addition, co-evolution-based information has been used to predict protein interactions and to guide structural docking for bacterial proteins¹⁵. Recently, neural network-based approaches have demonstrated the ability to accurately predict the structures of individual proteins^16,17 and protein complexes^16,18–21. These approaches can correctly predict the structures of up to 60% of dimers¹⁸, and have been used to predict structures of 1,506 Saccharomyces cerevisiae protein interactions²². However, the application of these neural network models for the large-scale prediction of human complex structures has not been tested yet.

Here, we assess the possibilities and limitations of applying AlphaFold2 to modeling human protein interactions on a large scale. We predicted the complex structures for two sets of human interactions obtained using different experimental methods, comprising 65,484 unique human interactions. We show that it is possible to rank the models according to confidence, with 3,137 predicted structures ranked as highly confident. Further, we show that the higher-confidence predictions are enriched among those supported by a combination of experimental methods. We showcase the value of a structurally resolved interactome by studying disease mutations and phosphorylation of interface residues. Finally, we provide some indication that binary complexes can be used to build higher-order assemblies.

Structure prediction of human protein interactions

We selected experimentally identified human protein interactions from the Human Reference Interactome (HuRI)² and the Human Protein Complex Map (hu.MAP v.2.0)³. HuRI comprises protein interactions determined by yeast two-hybrid (Y2H) screening² from which we modeled 55,586 pairs. From hu.MAP we selected 10,207 high-quality PPIs³. While HuRI is more likely to be enriched for direct protein interactions, including transient partners, the hu.MAP set is more likely to reflect stable protein interactions, including members of the same complex that may not be interacting directly. The overlap between the two datasets is small (309 pairs), and a comparison with two large-scale compendiums of structural models⁴ indicates that 62,019 of the combined pairs do not have experimental models nor can they be modeled easily by homology, suggesting a large potential gain in structural knowledge.

We predicted the structure of 65,484 nonredundant pairs using the FoldDock pipeline¹⁸, based on AlphaFold2 (ref. ¹⁷). As in the FoldDock pipeline, we combined size and the predicted local Distance Difference Test (plDDT) scores of the interface into a single score to predict the DockQ score of a complex, dubbed pDockQ (Methods), which can rank models by confidence. We tested pDockQ score by comparing the predicted models with 1,465 experimental models, of which 742 (50%) were correct (DockQ > 0.23). For predictions with pDockQ > 0.23, 70% (671 of 955) are well modeled, and for pDockQ > 0.5, 80% (521 of 651).

We show in Fig. 1a the distribution of pDockQ for the predicted and random protein interactions, and provide data for all models in Supplementary Table 1. The pDockQ of known interacting proteins tends to be higher than for the random set, with the predictions for hu.MAP showing on average higher confidence than the HuRI set. Additionally, when selecting hu.MAP interactions also supported by Y2H or crosslink data (crosslinking) results in even higher-confidence values (Fig. 1a). This suggests that high-confidence models are enriched for protein interactions supported by the two types of methods associated with high affinity and direct interactions. We identified 3,137 structures (Fig. 1b) as high-confidence models (pDockQ > 0.5). The number of structures increased to 10,061 if a cut-off of 0.23 was used. Only 0.3% of the random set of models would be considered confident predictions at this cut-off. In Fig. 1c we show examples of predicted structures aligned to experimental or homology models, showing how the predictions and the confidence score relate to the observed alignments. For the majority of these cases, even with lower-confidence values, the interaction interface is generally in good agreement, except for the interaction between subunits of the proteasome 26S complex, ATPpase domain 2 (PSMC2) and non-ATPase domain 11 (PSMD11). It can be noted that several of the models in Fig. 1c are parts of large complexes: PRDX2–PRDX3: members of the peroxiredoxin family of antioxidant enzymes; RFC2–RFC5: subunits of heteropentameric Replication factor C (RF-C); YWHAB–YWHAG: parts of the 14-3-3 family of proteins tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation proteins beta (YWHAB) and gamma (YWHAG); and RPL9–RPL18A: ribosomal proteins L9 (RPL9) and L18a (RPL18A). This shows that AlphaFold2 can predict the structures of directly interacting protein pairs present in large complexes.

Features impacting prediction confidence

As shown in Fig. 1a, protein pairs present in the Protein Data Bank (PDB) are enriched in high-scoring models compared with pairs in HuRI and Hu.MAP. There could exist several possible explanations for this, such as the inability of AlphaFold2 to identify transient or indirect interactions. Nevertheless, it is also possible that the two high-throughput datasets contain noninteracting pairs. Therefore, to understand this difference better, we first studied an additional dataset created from large (>10 chains) heteromeric protein complexes.

The set of large complexes consists of 12 large heteromeric protein complexes, and all (nonidentical) pairs of protein chains in each complex were docked with each other. These pairs can be divided into the ones with direct interaction and those that do not interact directly. Here, we used a definition of more than 20 contacts of less than 8 Å between Calphas to exclude small interaction interfaces. When a complex contained multiple copies of identical chains, all interactions were included to allow for alternative interactions between the chains. The difference in pDockQ scores between the direct and indirect interacting pairs is striking, where only 6% of the indirect pairs have a pDockQ score > 0.5 compared with 38% of the directly interacting pairs (Fig. 2a). This shows that directly interacting pairs often can be predicted even when they are part of large complexes, in contrast to indirectly interacting pairs.

Fig. 2 — In all subfigures, proteins from HuRI in green, hu.MAP gray, CORUM orange and from large PDB complexes blue. a, pDockQ values of directly and indirectly interacting proteins from the same complex (blue); for comparison, HuRI and hu.MAP data are shown with thin lines. b, pDockQ values of CORUM (orange), HuRI (green) and hu.MAP (gray) datasets. c, Fraction of residues predicted to be disordered (pLDDT < 0.5) shows that protein pairs in HuRI are enriched in disorder. d, Proteins in HuRI have fewer sequences in the paired MSAs as measured by the mean number of efficient sequences in the MSA (meff). e, Proteins that share subcellular localization (solid lines) are enriched in high pDockQ scores in all three datasets. f, Only protein pairs in hu.MAP are coexpressed according to STRING, using similarity in Genotype-Tissue Expression (gtex), and coexpressed pairs are enriched in pairs with high pDockQ scores.

Source data

hu.MAP has many more high-confidence predictions than HuRI, which is based on Y2H experiments. To further understand this difference, we first analyzed a subset of all protein pairs from the CORUM²³ database, the best manually curated database of mammalian protein complexes, and predicted the interaction of all pairs in the same complex. The average pDockQ score of CORUM is slightly higher than for hu.MAP, but the number of high-quality predictions is similar (16% versus 19%), indicating that the different databases of protein complexes have a similar fraction of high-confidence predictions and that HuRI is the outlier (Fig. 2b).

It is unlikely that the Y2H in HuRI data should contain a large set of indirect interactions, as only two human proteins are expressed in the same cell. Therefore, there must be another reason for the few high-confidence predictions. We examined the properties of the pairs present in the two datasets. Here, it can be seen that HuRI proteins differ from the hu.MAP (and other datasets) in two ways. HuRI protein pairs contain more intrinsic disorder (Fig. 2c) and have fewer efficient sequences (meff) in their multiple sequence alignments (MSAs) (Fig. 2d). In these figures it can also be seen that the pDockQ values tend to increase with less disorder and more sequences in the alignments, although it is clearly not an absolute relationship. Further, protein pairs in HuRI are less likely to be found in the same subcellular compartment (Fig. 2e), and have similar coexpression profiles (Fig. 2f). Considering all this, it is likely that many protein interactions in HuRI are transient and that AlphaFold2 cannot reliably predict such interactions.

Crosslinking support for predicted complex structures

Chemical crosslinking followed by mass spectrometry is an approach which can be used to identify reactive residues (usually lysines) that are in proximity, as constrained by the geometry of the crosslink agent used. The identification of such residues across a pair of proteins can help define the likely protein interface. To determine if the predicted complex structures agree with such orthogonal spatial constraints, we obtained a compilation of crosslinks for pairs of residues across 528 protein pairs with predicted models (Fig. 3a, Supplementary Table 1 and Methods). In total, 51% of the models had one or more crosslinks at a distance below the expected maximal distance possible (Fig. 3a). Restricting the predicted models to higher confidence by the pDockQ score increased the fraction of complexes with acceptable crosslinks, reaching 75% for pDockQ scores greater than 0.5 (Fig. 3a). This result is in line with the benchmark results above.

Fig. 3 — a, The numbers and ratios of predicted structures having crosslink information for pairs of residues that bridge the two proteins in the predicted structure, broken down by the crosslinks that satisfy their expected maximal distance and by the predicted quality of the model (pDockQ). b–e, Examples of predicted structures of high confidence, with no previous structural information and supported by at least one crosslink (indicated with blue line): ERLIN1/ERLIN2 complex (b), IMMT and CHCHD3, components of the mitochondrial inner membrane MICOS complex (c), the complex of transfer RNA-guanine-N(7)-methyltransferase (METTL) with its noncatalytic subunit (WDR4) (d) and the heterogeneous nuclear ribonucleoprotein C (HNRNPC) and the RNA-binding protein, RALY (e).

Source data

In total, we have identified 479 crosslinks providing supporting evidence for 171 predicted complex structures with pDockQ > 0.5. Of these, 41 correspond to complex structures with no experimental structure or homology models, from which we selected some to illustrate (Fig. 3b–e). Figure 3b shows the AlphaFold2 (AF2) model for the full length of the ERLIN1/ERLIN2 complex, which mediates the endoplasmic reticulum-associated degradation (ERAD) of inositol 1,4,5-trisphosphate receptors (IP3Rs). AlphaFold2 predicts a globular domain (1–190) followed by an extended helical region with a kink around amino acid position 280. Unlike the model in Interactome3D, the paralogous proteins are stacked side-by-side with the hydrophobic face of the helices buried and the hydrophilic face (mainly Lys) exposed to solvent. A crosslink between the C-terminal residues K275 (ERLIN1) and K287 is predicted to bridge a distance of 18 Å, supporting the predicted model. In Fig. 3c we show the model for proteins IMMT and CHCHD3, components of the mitochondrial inner membrane MICOS complex. AlphaFold2 predicts a globular helical domain at the C-terminal end of IMMT (550–750) to interact with the C-terminal end of CHCHD3 (150–225). This is supported by data of three crosslinks: between K173 (CHCD3) and K565 (IMMT), and K203 (CHCD3) to both K714 and K726 of IMMT. Figure 3d shows the complex of transfer RNA-guanine-N(7)-methyltransferase (METTL) with its noncatalytic subunit (WDR4). The structure of WDR4 has not yet been solved experimentally but contains WD40 repeats, which are expected to form a β-propeller domain, as predicted here. The METTL domain is predicted to interact with the side of the WDR40, away from the ligand-binding pore. This orientation is supported by a crosslink between K122 (WDR4) and K143 (METTL) (18 Å). Finally, in Fig. 3e we show the predicted complex structure for the heterogeneous nuclear ribonucleoprotein C (HNRNPC) and the RNA-binding protein, RALY. Two regions in both proteins are predicted with high confidence (plDDT > 70), with the lower-confidence regions not shown. The N-terminal domain in HNRNPC (16–85) is predicted to interact with the N-terminal domain of RALY (1–100). A long helix in HNRNPC (185–233) is predicted to interact with a helix in RALY (169–228). This interhelix interface is supported by crosslinking data for three pairs of lysines at either end of the helices (189 → 222; 229 → 179; and 232 → 183).

Disease-associated missense mutations at interfaces

Missense mutations associated with human diseases can alter protein function via diverse mechanisms, including disrupting protein stability, allosterically modulating enzyme activity and altering PPIs. Structural models can allow the rationalization of possible mechanisms of interface disease mutations. To determine the usefulness of the predicted structures, we compiled a set of mutations located at interface residues that were previously experimentally tested for the impact on the corresponding interaction²⁴. We then performed in silico predictions of changes in binding affinity upon mutations using FoldX²⁵ and observed that mutations known to disrupt the interactions are predicted to have a strong destabilization of binding compared with mutations known not to have an effect (Fig. 4a and Supplementary Table 2). Very high confidence (plDDT > 90) of the mutated residues led to more substantial discrimination between mutations known and not known to disrupt the complex formation (Fig. 4a), indicating that only very accurate models are useful when using the FoldX forcefield for estimating the impact of binding affinity of mutations.

Next, we mapped human disease (from ClinVar) and cancer mutations (from The Cancer Genome Atlas) to the interface residues defined by the set of high-confidence protein complex predictions (pDockQ > 0.5) (Supplementary Table 1). The hu.MAP and HuRI confident predictions identified 280 interfaces carrying pathogenic mutations and 602 interfaces corresponding to the top 25% of recurrently mutated interfaces in cancer, defined as the highest number of mutations per interface position (Fig. 4b and Methods). We find a strong enrichment in pathogenic versus benign mutations at interface residues relative to the rest of the protein (2.3-fold enrichment, P value 2.7 × 10⁻³¹).

We illustrate in Fig. 4c examples of protein network clusters with interface disease mutations across a range of biological functions. For example, interface mutations in chromatin remodeling, including members of SWI/SNF complex (SMARCD1, SMARCD2, SMARCD3), and several transcription factors related to development (for example, TCF3, TCF4, LMO1 and LMO2).

We selected examples of interfaces with disease mutations and no previous experimental data or homology to available models (Fig. 4d–g). Figure 4d shows the interface of WDR4-METTL1, which has supporting crosslink information described above. WDR4 has two annotated pathogenic variants at this interface, linked with Galloway-Mowat Syndrome 6, with the highlighted R170 participating in interactions with a negatively charged residue of METTL1. Figure 4e shows an example of an interface with 32 recorded interface mutations in cancer for both proteins, including the highlighted arginines in LDOC1, which form electrostatic interactions with the opposite chain. TWIST1 has several annotated pathogenic mutations, including L149R and L159H, which are at residues buried in the interface (Fig. 4f). In particular, the L149R mutation, associated with Saethre–Chotzen syndrome, would strongly disrupt packing. The R118G mutation would disrupt the interaction with residue F22 mainchain O in TCF4. In RAD51D we found the mutation R266C (Breast-ovarian cancer, familial), which interacts across the interface with XRCC2 (Fig. 4g) and paralogous genes involved in the repair of DNA double-strand breaks by homologous recombination. Interestingly, we also found mutations at R239, to Trp/Gln/Gly, associated with Breast-ovarian cancer which interacts with Tyr119 in XRCC2, which itself is also annotated as having mutations linked to hereditary cancer-predisposing syndrome.

Phospho-regulation of protein complex interfaces

Protein phosphorylation can regulate protein interactions by modulating the binding affinity via the change in size and charge of the modified residue. Over 100,000 experimental human phosphorylation sites have been determined to date^26,27, but only 5–10% of these have a known function²⁸. Mapping phosphorylation site positions to protein interfaces can generate mechanistic hypotheses for their functional roles in controlling protein interactions. We used a recent characterization of the human phosphoproteome²⁶ to identify 4,145 unique phosphosites at interface residues among the highly confident models. The average functional importance, defined by the functional score described earlier²⁶, was generally higher than random for phosphorylation sites at interfaces (Fig. 5a), and we found some enrichment for targets of multiple kinases, including tyrosine kinases (ERBB2, AXL, ABL2, FER) (Fig. 5b). This suggests that some interfaces may be under coordinated regulation by specific kinases and conditions.

Fig. 5 — a, Distribution of phosphosite functional scores for phosphosites at interface residues and random phosphosites. The min, mean and max values were as follows: Random = 0.02, 0.26, 0.98; HuRI = 0.06, 0.37, 0.99; hu.MAP = 0.06, 0.33, 0.99. The boxes represent the first and third quartiles. The upper whisker extends from the third quartile to the largest value no further than 1.5 × IQR. The lower whisker extends from the first quartile to the smallest value at most 1.5 × IQR. b, Enrichment of kinase substrates among phosphosites at interface residues. The P value was derived from an over-representation analysis using a one-sided hyper-geometric test (N = 7,150). c, Hierarchical clustering of the pairwise correlation values for changes in phosphosite levels across conditions. Groups of phosphosites showing high correlation values were defined as clusters (1 to 16), as indicated in colors along the outside of the clustergram. d, Degree of regulation of phosphosites from each cluster in a select panel of conditions, defined by a one-sided Z-test comparing the fold change of the phosphosites in a cluster compared with the entire distribution of fold changes in that condition. The result is summarized as the −log(P value), and signed as positive if the median value is above the background or negative otherwise. e, Gene ontology enrichment analysis for the proteins with phosphosites annotated to select clusters.

Source data

To identify potentially co-regulated interfaces, we collected measurements of changes in phosphorylation levels across a large panel of over 200 conditions²⁹. We retained 260 phosphosites that had a significant regulation in three conditions and then computed all-by-all pairwise correlations in phosphosite fold changes across conditions (Supplementary Table 1). We clustered these phosphosites by their profile of correlations (Fig. 5c), identifying 16 groups of co-regulated interface phosphorylation sites (Fig. 5c and Supplementary Table 3). For each group of phosphosites, we identified the conditions where these have the strongest up- or down-regulation (Supplementary Fig. 1) and plotted a subset of conditions in Fig. 5d. We also performed a gene ontology enrichment analysis for each group of co-regulated phosphosites, including both proteins of the modified interfaces, to search for common biological functions (Fig. 5e and Supplementary Table 4). Here, one-sided hyper-geometric tests were used for statistical analysis. For example, we observed a cluster of interface phosphosites in proteins related to intermediate filaments (cluster 7) which show strong regulation patterns along the cell cycle, downregulated in S-phase and up-regulated in G1 and mitosis. Phosphosites in cluster 1 (cell cycle G1-S phase transition) show the opposite trends, with up-regulation in late S-phase and down-regulation in G1 and mitosis. Some clusters show regulation under specific kinase inhibition, which may provide novel hypotheses for kinase regulation of specific processes. For example, phosphosites in cluster 9 (regulation of chromosome assembly) tend to be up-regulated after inhibition of ROCK and up-regulated after inhibition of mTOR.

While not all phosphosites at interfaces are likely to regulate the binding affinity, this analysis provides hypotheses for the potentially coordinated regulation of multiple proteins by tuning of their interactions after specific perturbations.

Higher-order assemblies from binary protein interactions

Proteins interact with multiple partners either simultaneously, as part of larger protein complexes, or separated in time and space. This is also reflected in our structurally characterized network, where proteins can be found in groups, as illustrated in a global network view of the protein interactions with confident models (Fig. 6, Supplementary Fig. 2 and Supplementary Data 1). One key benefit of structurally characterizing an interaction network is the identification of shared interfaces for multiple interactors. As an example, we highlight GDI1 (RabGDP dissociation inhibitor alpha) which interacts with multiple Rab proteins, regulating their activity by inhibiting the dissociation of GDP. The predicted complex structures for these interactions show how these share the same interface and therefore cannot co-occur. Other clusters in the network suggest that the proteins form larger protein complex assemblies with many-to-many interactions. As the use of AlphaFold2 for predicting larger complex assemblies can be limited by computational requirements, we tested whether the structures for pairs of proteins could be iteratively structurally aligned. We tested this procedure on a small set of complexes covered in this network, with known structures and the number of subunits ranging from five (RFC complex, TFIIH core complex) to 14 (20S proteasome). We then aligned an experimentally determined structure with the predicted models (Fig. 6; gray, experimental model). These examples showcase the potential and also limitations of this procedure.

Fig. 6 — The middle circle is a network view of all PPIs predicted with high confidence (pDockQ > 0.5). The edges and nodes are colored in red if there is a previous experimental or homology model for the interaction and blue if such information is unavailable. We selected four examples of recapitulated complexes (yellow circles and black arrows) plotted in further detail. In these small networks, only the edges are colored based on structural evidence. In the case of RabGDP, the faded nodes and edges represent predictions with slightly lower confidence (pDockQ > 0.3).

The TFIIH core complex is composed of five subunits with 1-to-1 stoichiometry. All subunits can be modeled, with the final complex generally agreeing (Fig. 6) with a cryoEM structure for these subunits (PDB:6NMI). The most significant difference to the cryoEM model is the relative positioning of the ERCC3 subunit. The exact final model obtained can vary depending on the aligned pairs, with multiple possible final conformations (Supplementary Fig. 3). Figure 6 illustrates the conformation that best matches the cryoEM model in PDB:6NMI. For example, for the TFIIH core complex, there is a predicted model where the complex adopts a more open conformation (as seen in PDB:5OQJ) and alternative predicted placements of the GTF2H1 subunit.

The RFC complex is also composed of five subunits with 1-to-1 stoichiometry. One iterative alignment of pairwise protein interactions builds a model that includes all five subunits organized similarly to that observed in the PDB:6VVO cryoEM structure (Fig. 6). In this predicted model, the subunits RFC2/5/4/3 match the experimentally observed model well, but there are apparent deviations introduced by compounding errors in alignment by this iterative process. Individual subunits in the cryoEM structure can be aligned to each of the model subunits well, but then the alignment of the rest of the model is progressively worse the further away the subunits are positioned from the aligned subunit. The RFC1 subunit is individually not well predicted. Further, the RFC3-RFC5 interaction pair is predicted with high confidence, while, in fact, these do not share a direct contact in the experimental structure. AlphaFold2 places RFC3 at the RFC5-RFC4 interface, likely due to the structural similarity between RFC3 and RFC4.

Encouraged by the examples tested, we defined an automatic procedure to generate larger models by iterative alignment of pairs (Methods). We start building all possible dimers in a complex, then sort them by pDockQ, and start building from the first ranked dimers. Next, we add the highest-ranked dimer, which shares one subunit with the complex if it does not overlap; this is repeated for all dimers until the complex is complete or no additional proteins can be added. We tested this on the 20S proteasome, a particularly challenging example, with stoichiometries different from 1-to-1 and homologous subunits. This automatic procedure could build a model containing all 14 subunits (half of the proteasome), which are mostly placed in agreement within the experimental model (Fig. 6). However, the exact order of the chains is incorrect, that is, at each location an incorrect protein is placed, highlighting that AF2 cannot distinguish which two proteins interact from a set of homologous proteins.

Two additional proteins where we could build a good model are Heterodisulfide reductase from Methanothermococcus thermolithotrophicus (PDB:5ODC) and the eukaryotic translation initiation factor 2B from Schizosaccharomyces pombe (PDB:5B04) (Supplementary Fig. 4). For PDB:5ODC we could build a complete model of the protein with an r.m.s. deviation of 6.0 Å (TM-score 0.90)³⁰ starting from dimers. However, for PDB:5B04 it was not possible as the chains started overlapping when we tried to build a larger model. However, if we build trimers and then use all three dimers from these trimers we can build a complete model with an r.m.s. deviation of 7.3 Å (TM-score 0.86), showing that it is sometimes necessary to use larger subunits to assemble the complexes. Results from a follow-up study³¹ show that it is often possible to build the structures of complexes if the subunits are well predicted. In summary, we find that it is possible to iteratively align structures of pairs of interacting proteins to build larger assemblies, but we also identified issues that limit this procedure at the moment.

Concluding discussion

We have predicted complex structures for pairs of human proteins known to physically interact from two different datasets based on different experimental approaches. We note that the source of data used for the protein interactions is important and impacts the fraction of models that can be confidently predicted. Our analysis suggests that protein interactions supported by a combination of affinity-, co-fraction- and complementation-based methods result in higher-confidence models. We believe these protein interactions tend to correspond to high-affinity interactions which are very likely to share a direct physical permanent interaction. We show that it is possible to use metrics from the models (for example, pDockQ score) to rank higher-confidence models, providing an additional accuracy level to large-scale PPI studies, and in the future to provide additional high-quality targets for detailed studies of stable complexes. Experimental data from crosslink mass spectrometry experiments provide an ideal resource for further validating these predictions via orthogonal means.

Based on comparisons with solved structures, we suggest that models with pDockQ > 0.5 are 80% likely to be correct. Additionally, models with lower scores (pDockQ > 0.23) are still 70% likely to contain many correct solutions and may highlight correct interfaces. Such lower-confidence models are likely to be useful for generating hypotheses and large-scale analyses of global properties. Equally important is the caveat that high-confidence predictions will still contain errors, and, in particular, we note that in protein complexes containing paralogous proteins (which is common in higher eukaryotes³²), the current procedure cannot identify the exact pairing of the protein. For such cases, additional methods need to be developed.

Structural models for protein interfaces are critical for understanding molecular mechanisms and the impact of mutations and post-translational modifications. We illustrate this using disease mutations and phosphorylation data. While much disease-associated variation is often found in noncoding regions of the genome, the growth of exome sequencing of large cohorts of patients will lead to discovering many more protein mutations linked to disease, which will require such large structural characteristics. Both for mutations and for phosphorylation sites, we think these analyses should be seen as generating hypotheses for further testing, and we make this information available in the supplementary material to facilitate such future work.

Finally, we show that it is in principle possible to build structural models for larger assemblies from predicted binary complexes. In a follow-up paper we have shown that it is possible to build large assemblies fully automatically by using predictions of dimers and trimers³¹. Aspects that may limit this include the structural homology between subunits, unknown subunit stoichiometries and limits in the predicted interactions³¹. Additional work will be needed to determine the exact stoichiometry and to design methods and score systems to build such larger complex assemblies, as well as to predict the interactions of proteins with weak and transient interactions.

Methods

Protein interaction data and annotations

Human protein pairs known to physically interact were obtained from the hu.MAP dataset, retaining pairwise protein interactions with ≥0.5 confidence, and most interactions from the HuRI dataset. These interactions were further enriched by obtaining annotations on crosslinked peptides matched across pairs of interaction proteins, disease-related mutations and protein phosphorylation sites in the selected proteins. In addition, all nonhomologous pairs from 12 protein complexes (Supplementary Table 5) and 4,320 protein pairs from 2,102 different protein complexes (Supplementary Table 6) in CORUM²³ were used for additional analyses. A complete list of all datasets is available from the supplementary data. A subset of crosslink data was collected from refs. ^33–43, and filtered for peptides unique to only one protein sequence. A crosslink was considered validated by the structure if the distance between the epsilon amino groups on the side-chains of the relevant pair of lysine residues was within 32 Å. Clinical missense variants associated with disease were collected from ClinVar. We selected only those having pathogenic or likely pathogenic effects, which were mapped to Uniprot protein sequences using VarMap. The final list of mutated positions was then compared with the interface positions. We obtained a list of protein phosphorylation sites with predicted functional relevance²⁶, phosphosite annotations²⁸ and regulation of phosphorylation sites across a large panel of conditions²⁹. These phosphosites were also mapped to interface positions as defined by the predicted models. All protein interaction networks were processed using R packages igraph (v.1.2.5) and qgraph (v.1.9), and further graphical editing was done using Cytoscape⁴⁴.

Protein complex prediction

To predict protein complexes of pairwise interactions, we used the FoldDock pipeline¹⁸ based on AlphaFold2 (ref. ¹⁷). We used the option of fused + paired MSAs and ran the model configuration m1-10-1 as this provides the highest success rate accompanied by a 20-fold speed-up. Both the fused and paired MSAs were constructed by running HHblits on every single chain against Uniclust30. The fused MSA was generated by simply concatenating the output of each of the single-chain HHblits runs for two interacting chains. The paired MSA was constructed by combining the top hit for each matching OX identifier between two interacting chains, using the output from the single-chain HHblits runs.

pDockQ confidence score

To score models, we used features from the predicted complexes to calculate the predicted DockQ score, pDockQ. This score is defined with the following sigmoidal equation:

pDockQ = \frac{0.707}{1 + e^{- 0.03148 (x - 388.06)}} + 0.03138

where

x = average interface plDDT*log(number of interface contacts).

The parameters were optimized to predict the DockQ score using the dataset from ref. ⁴⁵. The number of interface contacts is defined as elsewhere in this paper (any residues with an interface atom within 10 Å of the other chain), and the plDDT is the predicted lDDT score from AlphaFold2 taken over the interface residues as defined by the interface contacts.

Building larger complexes from binary protein interactions

A simple procedure to build larger complexes from a set of paired models was developed. All dimers in the set are by default ranked by their pDockQ values.

The building is started from a single dimer, by default the dimer with the highest pDockQ value. This is referred to as the ‘complex’.
All other dimers in the set are then tried to be added to the ‘complex’. Starting with the one with the second highest pDockQ, a chain is added to the complex if:
1. Exactly one chain of the dimer is identical to one chain in the complex
2. The structure of these two chains is similar enough (default TM-score > 0.8)
3. The dimer is then rotated so that the two chains overlap
4. The second chain in the dimer does not clash with more than 25% of its residues (Cα-Cα distance < 5 Å) with any chain in the complex.
If a chain is added, the procedure is started over again and repeated until no more chains can be added.

Analysis of phosphosites in the protein-protein interfaces

Phosphosite residues in interfaces were identified from a previously published comprehensive list of known human phosphosites²⁶. Kinases associated with phosphorylation of interface residues were obtained from the PhosphositePlus database, and over-representation analysis of kinases was performed using a hyper-geometric test. Highly regulated interface phosphosites were defined as those with more than twofold change in phosphorylation in more than two perturbation conditions across a collated phosphoproteomics dataset comprising a range of physiological conditions and drug treatments²⁹. Pearson correlation was calculated amongst these regulated phosphosites and clusters of co-regulated phosphosites were identified using hierarchical clustering (‘Ward’ method) of Euclidean distances of the correlation matrix. Phosphosite clusters were created by cutting the dendrogram at the appropriate level using the cutree (h = 17) function in R. Phosphosite clusters that were significantly regulated in each perturbation condition were identified by a Z-test from the comparison of fold changes in phosphosite measurements of all phosphosites in a cluster against the overall distribution of phosphorylation fold changes across the condition. Gene ontology over-representation of each cluster was performed separately using a hyper-geometric test in R. The gene ontology terms were obtained from the c5 category of the Molecular Signature Database (MSigDBv7.1)⁴⁶. All over-representation analyses were performed using the enricher function of the clusterProfiler package (v.3.12.0)⁶ in R.

Comparison with other databases

All proteins used here were mapped to UniProt⁴⁷ to retrieve subcellular localization, STRING⁴⁸ for coexpression and other interaction data, and gtex⁴⁹ for tissue-specific expression.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Online content

Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41594-022-00910-8.

Supplementary information

Supplementary Information^{(5.2MB, pdf)}

Supplementary Figs. 1–4.

Reporting Summary^{(5MB, pdf)}

Peer Review File^{(1.2MB, pdf)}

Supplementary Tables^{(13.8MB, xlsx)}

Supplementary Tables 1–6.

Supplementary Data 1^{(5.2MB, zip)}

Supplementary Data (cytoscape session) to Supplementary Table 1.

Acknowledgements

R.A.S. acknowledges funding through the European Union Horizon 2020 program INFRAIA project Epic-XS (project no. 823839) and the research program NWO TA with project no. 741.018.201, which is partly financed by the Dutch Research Council (NWO). A.E. was funded by the Vetenskapsrådet (grant no. 2016-03798 and 2021-03979) and the Knut and Alice Wallenberg Foundation. The computations/data handling were enabled by the supercomputing resource Berzelius provided by the National Supercomputer Centre at Linköping University and the Knut and Alice Wallenberg Foundation and SNIC, grant nos. SNIC 2021/5-297 and Berzelius-2021-29. J.E.B. acknowledges funding from the National Heart Lung and Blood Institute (grant no. 5R35GM13625) and the National Institute for General Medical Sciences (grant no. 5R01HL144778). P. Beltrao is supported by the Helmut Horten Stiftung and the ETH Zurich Foundation.

Source data

Source Data Fig. 1^{(4.5MB, zip)}

Source data.

Source Data Fig. 2^{(13.5MB, zip)}

Source data.

Source Data Fig. 3^{(18.8KB, csv)}

Source data.

Source Data Fig. 4^{(144KB, csv)}

Source data.

Source Data Fig. 5^{(66.5KB, zip)}

Source data.

Author contributions

D.F.B. analyzed disease-causing mutations in interfaces, phosphosites and crosslinking with help from I.B.-H., D.M., A.S.D. and P. Beltrao. P. Bryant ran the prediction for the HuRI dataset. P.A., A.K., R.A.S., J.E.B. and A.L. provided data for the analysis. A.E. provided the prediction for the hu.MAP dataset and analyzed the complex structural features with help from P.K., G.P., A.S., P. Bryant and W.Z. P. Beltrao and A.E. wrote the manuscript with help from all authors.

Peer review

Peer review information

Nature Structural & Molecular Biology thanks Shoshana Wodak and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Sara Osman, in collaboration with the Nature Structural & Molecular Biology team. Peer reviewer reports are available.

Funding

Open access funding provided by Stockholm University.

Data availability

All datasets and meta-data are available from 10.17044/scilifelab.16866202.v1. Further, all models generated as well as some of the multiple sequence alignments can be found at https://archive.bioinfo.se/huintaf2/. Source data are provided with this paper.

Code availability

All code used in this project can be found at https://gitlab.com/ElofssonLab/huintaf2/. Tools to run AlphaFold2 for combined folding and docking can be found at https://gitlab.com/ElofssonLab/FoldDock/.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: David F. Burke, Patrick Bryant, Inigo Barrio-Hernandez, Danish Memon and Gabriele Pozzati.

Contributor Information

Petras Kundrotas, Email: pkundro@ku.edu.

Pedro Beltrao, Email: pbeltrao@ebi.ac.uk.

Arne Elofsson, Email: arne@bioinfo.se.

Supplementary information

The online version contains supplementary material available at 10.1038/s41594-022-00910-8.

References

1.Orchard S, et al. The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 2014;42:D358–D363. doi: 10.1093/nar/gkt1115. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Luck K, et al. A reference map of the human binary protein interactome. Nature. 2020;580:402–408. doi: 10.1038/s41586-020-2188-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Drew K, Wallingford JB, Marcotte EM. hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies. Mol. Syst. Biol. 2021;17:e10016. doi: 10.15252/msb.202010016. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Mosca R, Céol A, Aloy P. Interactome3D: adding structural details to protein networks. Nat. Methods. 2012;10:47–53. doi: 10.1038/nmeth.2289. [DOI] [PubMed] [Google Scholar]
5.Burley SK, et al. RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res. 2021;49:D437–D451. doi: 10.1093/nar/gkaa1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Wang X, et al. Three-dimensional reconstruction of protein networks provides insight into human genetic disease. Nat. Biotechnol. 2012;30:159–164. doi: 10.1038/nbt.2106. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Kamburov A, et al. Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc. Natl Acad. Sci. USA. 2015;112:E5486–E5495. doi: 10.1073/pnas.1516373112. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Porta-Pardo E, Garcia-Alonso L, Hrabe T, Dopazo J, Godzik A. A pan-cancer catalogue of cancer driver protein interaction interfaces. PLoS Comput. Biol. 2015;11:e1004518. doi: 10.1371/journal.pcbi.1004518. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Beltrao P, et al. Systematic functional prioritization of protein posttranslational modifications. Cell. 2012;150:413–425. doi: 10.1016/j.cell.2012.05.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Nishi H, Hashimoto K, Panchenko AR. Phosphorylation in protein-protein binding: effect on stability and function. Structure. 2011;19:1807–1815. doi: 10.1016/j.str.2011.09.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Šoštarić N, et al. Effects of acetylation and phosphorylation on subunit interactions in three large eukaryotic complexes. Mol. Cell. Proteom. 2018;17:2387–2401. doi: 10.1074/mcp.RA118.000892. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Betts MJ, et al. Systematic identification of phosphorylation-mediated protein interaction switches. PLoS Comput. Biol. 2017;13:e1005462. doi: 10.1371/journal.pcbi.1005462. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Zhang QC, et al. Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature. 2012;490:556–560. doi: 10.1038/nature11503. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Mosca R, Céol A, Stein A, Olivella R, Aloy P. 3did: a catalog of domain-based interactions of known three-dimensional structure. Nucleic Acids Res. 2014;42:D374–D379. doi: 10.1093/nar/gkt887. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Cong Q, Anishchenko I, Ovchinnikov S, Baker D. Protein interaction networks revealed by proteome coevolution. Science. 2019;365:185–189. doi: 10.1126/science.aaw6718. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Baek M, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021;373:871–876. doi: 10.1126/science.abj8754. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Jumper J, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Bryant P, Pozzati G, Elofsson A. Improved prediction of protein-protein interactions using AlphaFold2. Nat. Commun. 2022;13:1265. doi: 10.1038/s41467-022-28865-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Pozzati, G. et al. Limits and potential of combined folding and docking using PconsDock. Bioinformatics38, 954–961 (2022). [DOI] [PMC free article] [PubMed]
20.Akdel, M. et al. A structural biology community assessment of AlphaFold 2 applications. Nat. Struct. Mol. Biol.29, 1056–1067 (2022). [DOI] [PMC free article] [PubMed]
21.Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv10.1101/2021.10.04.463034 (2021).
22.Humphreys, I. R. et al. Computed structures of core eukaryotic protein complexes. Science374, eabm4805 (2021). [DOI] [PMC free article] [PubMed]
23.Giurgiu M, et al. CORUM: the comprehensive resource of mammalian protein complexes—2019. Nucleic Acids Res. 2019;47:D559–D563. doi: 10.1093/nar/gky973. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.IMEx Consortium Curators et al. Capturing variation impact on molecular interactions in the IMEx Consortium mutations data set. Nat. Commun. 2019;10:10. doi: 10.1038/s41467-018-07709-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Delgado J, Radusky LG, Cianferoni D, Serrano L. FoldX 5.0: working with RNA, small molecules and a new graphical interface. Bioinformatics. 2019;35:4168–4169. doi: 10.1093/bioinformatics/btz184. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Ochoa D, et al. The functional landscape of the human phosphoproteome. Nat. Biotechnol. 2020;38:365–373. doi: 10.1038/s41587-019-0344-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Lawrence RT, Searle BC, Llovet A, Villén J. Plug-and-play analysis of the human phosphoproteome by targeted high-resolution mass spectrometry. Nat. Methods. 2016;13:431–434. doi: 10.1038/nmeth.3811. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Hornbeck PV, et al. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 2015;43:D512–D520. doi: 10.1093/nar/gku1267. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Ochoa D, et al. An atlas of human kinase regulation. Mol. Syst. Biol. 2016;12:888. doi: 10.15252/msb.20167295. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins. 2004;57:702–710. doi: 10.1002/prot.20264. [DOI] [PubMed] [Google Scholar]
31.Bryant P, et al. Predicting the structure of large protein complexes using AlphaFold and sequential assembly. Nat. Commun. 2022;13:6027. doi: 10.1038/s41467-022-33729-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Marchant A, et al. The role of structural pleiotropy and regulatory evolution in the retention of heteromers of paralogs. eLife. 2019;8:e46754. doi: 10.7554/eLife.46754. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Yugandhar K, et al. MaXLinker: proteome-wide cross-link identifications with high specificity and sensitivity. Mol. Cell. Proteom. 2020;19:554–568. doi: 10.1074/mcp.TIR119.001847. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Schweppe DK, et al. XLinkDB 2.0: integrated, large-scale structural analysis of protein crosslinking data. Bioinformatics. 2016;32:2716–2718. doi: 10.1093/bioinformatics/btw232. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Klykov O, van der Zwaan C, Heck AJR, Meijer AB, Scheltema RA. Missing regions within the molecular architecture of human fibrin clots structurally resolved by XL-MS and integrative structural modeling. Proc. Natl Acad. Sci. USA. 2020;117:1976–1987. doi: 10.1073/pnas.1911785117. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Steigenberger B, Pieters RJ, Heck AJR, Scheltema RA. PhoX: an IMAC-enrichable cross-linking reagent. ACS Cent. Sci. 2019;5:1514–1522. doi: 10.1021/acscentsci.9b00416. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Klykov O, et al. Efficient and robust proteome-wide approaches for cross-linking mass spectrometry. Nat. Protoc. 2018;13:2964–2990. doi: 10.1038/s41596-018-0074-x. [DOI] [PubMed] [Google Scholar]
38.Fasci D, van Ingen H, Scheltema RA, Heck AJR. Histone interaction landscapes visualized by crosslinking mass spectrometry in intact cell nuclei. Mol. Cell. Proteom. 2018;17:2018–2033. doi: 10.1074/mcp.RA118.000924. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Eliseev B, et al. Structure of a human cap-dependent 48S translation pre-initiation complex. Nucleic Acids Res. 2018;46:2678–2689. doi: 10.1093/nar/gky054. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Gestaut D, et al. The chaperonin TRiC/CCT associates with prefoldin through a conserved electrostatic interface essential for cellular proteostasis. Cell. 2019;177:751–765.e15. doi: 10.1016/j.cell.2019.03.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Klatt F, et al. A precisely positioned MED12 activation helix stimulates CDK8 kinase activity. Proc. Natl Acad. Sci. USA. 2020;117:2894–2905. doi: 10.1073/pnas.1917635117. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Sabath K, et al. INTS10-INTS13-INTS14 form a functional module of Integrator that binds nucleic acids and the cleavage module. Nat. Commun. 2020;11:3422. doi: 10.1038/s41467-020-17232-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Mohamed WI, et al. The human GID complex engages two independent modules for substrate recruitment. EMBO Rep. 2021;22:e52981. doi: 10.15252/embr.202152981. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Shannon P, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Green AG, et al. Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences. Nat. Commun. 2021;12:1396. doi: 10.1038/s41467-021-21636-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Subramaniam V, Vincent IR, Jothy S. Upregulation and dephosphorylation of cofilin: modulation by CD44 variant isoform in human colon cancer cells. Exp. Mol. Pathol. 2005;79:187–193. doi: 10.1016/j.yexmp.2005.08.004. [DOI] [PubMed] [Google Scholar]
47.The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res.46, 2699 (2018). [DOI] [PMC free article] [PubMed]
48.Szklarczyk D, et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019;47:D607–D613. doi: 10.1093/nar/gky1131. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Aguet, F. et al. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science369, 1318–1330 (2020). [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information^{(5.2MB, pdf)}

Supplementary Figs. 1–4.

Reporting Summary^{(5MB, pdf)}

Peer Review File^{(1.2MB, pdf)}

Supplementary Tables^{(13.8MB, xlsx)}

Supplementary Tables 1–6.

Supplementary Data 1^{(5.2MB, zip)}

Supplementary Data (cytoscape session) to Supplementary Table 1.

Data Availability Statement

[CR1] 1.Orchard S, et al. The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 2014;42:D358–D363. doi: 10.1093/nar/gkt1115. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Luck K, et al. A reference map of the human binary protein interactome. Nature. 2020;580:402–408. doi: 10.1038/s41586-020-2188-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Drew K, Wallingford JB, Marcotte EM. hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies. Mol. Syst. Biol. 2021;17:e10016. doi: 10.15252/msb.202010016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Mosca R, Céol A, Aloy P. Interactome3D: adding structural details to protein networks. Nat. Methods. 2012;10:47–53. doi: 10.1038/nmeth.2289. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Burley SK, et al. RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res. 2021;49:D437–D451. doi: 10.1093/nar/gkaa1038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Wang X, et al. Three-dimensional reconstruction of protein networks provides insight into human genetic disease. Nat. Biotechnol. 2012;30:159–164. doi: 10.1038/nbt.2106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Kamburov A, et al. Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc. Natl Acad. Sci. USA. 2015;112:E5486–E5495. doi: 10.1073/pnas.1516373112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Porta-Pardo E, Garcia-Alonso L, Hrabe T, Dopazo J, Godzik A. A pan-cancer catalogue of cancer driver protein interaction interfaces. PLoS Comput. Biol. 2015;11:e1004518. doi: 10.1371/journal.pcbi.1004518. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Beltrao P, et al. Systematic functional prioritization of protein posttranslational modifications. Cell. 2012;150:413–425. doi: 10.1016/j.cell.2012.05.036. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Nishi H, Hashimoto K, Panchenko AR. Phosphorylation in protein-protein binding: effect on stability and function. Structure. 2011;19:1807–1815. doi: 10.1016/j.str.2011.09.021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Šoštarić N, et al. Effects of acetylation and phosphorylation on subunit interactions in three large eukaryotic complexes. Mol. Cell. Proteom. 2018;17:2387–2401. doi: 10.1074/mcp.RA118.000892. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Betts MJ, et al. Systematic identification of phosphorylation-mediated protein interaction switches. PLoS Comput. Biol. 2017;13:e1005462. doi: 10.1371/journal.pcbi.1005462. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Zhang QC, et al. Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature. 2012;490:556–560. doi: 10.1038/nature11503. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Mosca R, Céol A, Stein A, Olivella R, Aloy P. 3did: a catalog of domain-based interactions of known three-dimensional structure. Nucleic Acids Res. 2014;42:D374–D379. doi: 10.1093/nar/gkt887. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Cong Q, Anishchenko I, Ovchinnikov S, Baker D. Protein interaction networks revealed by proteome coevolution. Science. 2019;365:185–189. doi: 10.1126/science.aaw6718. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Baek M, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021;373:871–876. doi: 10.1126/science.abj8754. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Jumper J, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Bryant P, Pozzati G, Elofsson A. Improved prediction of protein-protein interactions using AlphaFold2. Nat. Commun. 2022;13:1265. doi: 10.1038/s41467-022-28865-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Pozzati, G. et al. Limits and potential of combined folding and docking using PconsDock. Bioinformatics38, 954–961 (2022). [DOI] [PMC free article] [PubMed]

[CR20] 20.Akdel, M. et al. A structural biology community assessment of AlphaFold 2 applications. Nat. Struct. Mol. Biol.29, 1056–1067 (2022). [DOI] [PMC free article] [PubMed]

[CR21] 21.Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv10.1101/2021.10.04.463034 (2021).

[CR22] 22.Humphreys, I. R. et al. Computed structures of core eukaryotic protein complexes. Science374, eabm4805 (2021). [DOI] [PMC free article] [PubMed]

[CR23] 23.Giurgiu M, et al. CORUM: the comprehensive resource of mammalian protein complexes—2019. Nucleic Acids Res. 2019;47:D559–D563. doi: 10.1093/nar/gky973. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.IMEx Consortium Curators et al. Capturing variation impact on molecular interactions in the IMEx Consortium mutations data set. Nat. Commun. 2019;10:10. doi: 10.1038/s41467-018-07709-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Delgado J, Radusky LG, Cianferoni D, Serrano L. FoldX 5.0: working with RNA, small molecules and a new graphical interface. Bioinformatics. 2019;35:4168–4169. doi: 10.1093/bioinformatics/btz184. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Ochoa D, et al. The functional landscape of the human phosphoproteome. Nat. Biotechnol. 2020;38:365–373. doi: 10.1038/s41587-019-0344-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Lawrence RT, Searle BC, Llovet A, Villén J. Plug-and-play analysis of the human phosphoproteome by targeted high-resolution mass spectrometry. Nat. Methods. 2016;13:431–434. doi: 10.1038/nmeth.3811. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Hornbeck PV, et al. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 2015;43:D512–D520. doi: 10.1093/nar/gku1267. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Ochoa D, et al. An atlas of human kinase regulation. Mol. Syst. Biol. 2016;12:888. doi: 10.15252/msb.20167295. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins. 2004;57:702–710. doi: 10.1002/prot.20264. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Bryant P, et al. Predicting the structure of large protein complexes using AlphaFold and sequential assembly. Nat. Commun. 2022;13:6027. doi: 10.1038/s41467-022-33729-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Marchant A, et al. The role of structural pleiotropy and regulatory evolution in the retention of heteromers of paralogs. eLife. 2019;8:e46754. doi: 10.7554/eLife.46754. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Yugandhar K, et al. MaXLinker: proteome-wide cross-link identifications with high specificity and sensitivity. Mol. Cell. Proteom. 2020;19:554–568. doi: 10.1074/mcp.TIR119.001847. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Schweppe DK, et al. XLinkDB 2.0: integrated, large-scale structural analysis of protein crosslinking data. Bioinformatics. 2016;32:2716–2718. doi: 10.1093/bioinformatics/btw232. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Klykov O, van der Zwaan C, Heck AJR, Meijer AB, Scheltema RA. Missing regions within the molecular architecture of human fibrin clots structurally resolved by XL-MS and integrative structural modeling. Proc. Natl Acad. Sci. USA. 2020;117:1976–1987. doi: 10.1073/pnas.1911785117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Steigenberger B, Pieters RJ, Heck AJR, Scheltema RA. PhoX: an IMAC-enrichable cross-linking reagent. ACS Cent. Sci. 2019;5:1514–1522. doi: 10.1021/acscentsci.9b00416. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Klykov O, et al. Efficient and robust proteome-wide approaches for cross-linking mass spectrometry. Nat. Protoc. 2018;13:2964–2990. doi: 10.1038/s41596-018-0074-x. [DOI] [PubMed] [Google Scholar]

[CR38] 38.Fasci D, van Ingen H, Scheltema RA, Heck AJR. Histone interaction landscapes visualized by crosslinking mass spectrometry in intact cell nuclei. Mol. Cell. Proteom. 2018;17:2018–2033. doi: 10.1074/mcp.RA118.000924. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Eliseev B, et al. Structure of a human cap-dependent 48S translation pre-initiation complex. Nucleic Acids Res. 2018;46:2678–2689. doi: 10.1093/nar/gky054. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Gestaut D, et al. The chaperonin TRiC/CCT associates with prefoldin through a conserved electrostatic interface essential for cellular proteostasis. Cell. 2019;177:751–765.e15. doi: 10.1016/j.cell.2019.03.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Klatt F, et al. A precisely positioned MED12 activation helix stimulates CDK8 kinase activity. Proc. Natl Acad. Sci. USA. 2020;117:2894–2905. doi: 10.1073/pnas.1917635117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Sabath K, et al. INTS10-INTS13-INTS14 form a functional module of Integrator that binds nucleic acids and the cleavage module. Nat. Commun. 2020;11:3422. doi: 10.1038/s41467-020-17232-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Mohamed WI, et al. The human GID complex engages two independent modules for substrate recruitment. EMBO Rep. 2021;22:e52981. doi: 10.15252/embr.202152981. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Shannon P, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Green AG, et al. Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences. Nat. Commun. 2021;12:1396. doi: 10.1038/s41467-021-21636-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Subramaniam V, Vincent IR, Jothy S. Upregulation and dephosphorylation of cofilin: modulation by CD44 variant isoform in human colon cancer cells. Exp. Mol. Pathol. 2005;79:187–193. doi: 10.1016/j.yexmp.2005.08.004. [DOI] [PubMed] [Google Scholar]

[CR47] 47.The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res.46, 2699 (2018). [DOI] [PMC free article] [PubMed]

[CR48] 48.Szklarczyk D, et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019;47:D607–D613. doi: 10.1093/nar/gky1131. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Aguet, F. et al. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science369, 1318–1330 (2020). [DOI] [PMC free article] [PubMed]

PERMALINK

Towards a structurally resolved human protein interaction network

David F Burke

Patrick Bryant

Inigo Barrio-Hernandez

Danish Memon

Gabriele Pozzati

Aditi Shenoy

Wensi Zhu

Alistair S Dunham

Pascal Albanese

Andrew Keller

Richard A Scheltema

James E Bruce

Alexander Leitner

Petras Kundrotas

Pedro Beltrao

Arne Elofsson

Abstract

Main

Structure prediction of human protein interactions

Fig. 1. Application of AlphaFold2 complex predictions to a large dataset of human PPIs.

Features impacting prediction confidence

Fig. 2. Protein and interaction features impacting on prediction confidence: analysis of different datasets.

Crosslinking support for predicted complex structures

Fig. 3. Crosslink support for predicted complex models.

Disease-associated missense mutations at interfaces

Fig. 4. Disease mutations at protein complex interface residues.

Phospho-regulation of protein complex interfaces

Fig. 5. Co-regulation of phosphorylation sites at interface residues.

Higher-order assemblies from binary protein interactions

Fig. 6. Protein complex predictions for higher-order assemblies.

Concluding discussion

Methods

Protein interaction data and annotations

Protein complex prediction

pDockQ confidence score

Building larger complexes from binary protein interactions

Analysis of phosphosites in the protein-protein interfaces

Comparison with other databases

Reporting summary

Online content

Supplementary information

Acknowledgements

Source data

Author contributions

Peer review

Peer review information

Funding

Data availability

Code availability

Competing interests

Footnotes

Contributor Information

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases