Abstract
Protein–protein interaction experiments still yield many false positive interactions. The socioaffinity metric can distinguish true protein-protein interactions from noise based on available data. Here, we present WeSA (Weighted SocioAffinity), which considers large datasets of interaction proteomics data (IntAct, BioGRID, the BioPlex) to score human protein interactions and, in a statistically robust way, flag those (even from a single experiment) that are likely to be false positives. ROC analysis (using CORUM-PDB positives and Negatome negatives) shows that WeSA improves over other measures of interaction confidence. WeSA shows consistently good results over all datasets (up to: AUC = 0.93 and at best threshold: TPR = 0.84, FPR = 0.11, Precision = 0.98). WeSA is freely available without login (wesa.russelllab.org). Users can submit their own data or look for organized information on human protein interactions using the web server. Users can either retrieve available information for a list of proteins of interest or calculate scores for new experiments. The server outputs either pre-computed or updated WeSA scores for the input enriched with information from databases. The summary is presented as a table and a network-based visualization allowing the user to remove those nodes/edges that the method considers spurious.
Graphical Abstract
Graphical Abstract.
Introduction
Affinity proteomics is extremely effective in suggesting interacting partners for a protein of interest. Methods vary, but in general, a copy of the protein of interest (the bait) is genetically fused to other proteins that, when usually over-expressed, enable its purification and, under the right conditions, co-purification with interaction partners (the preys). An early popular method was the tandem-affinity purification (TAP) method (1). It was shown in several organisms to be able to detect complexes, though less adapted at finding more transient interactors. Today scientists tend to use less stringent purification techniques that provide more interactors, with the disadvantage of a much greater proportion of noise (2–4).
An issue with affinity proteomics is that the experiments tend to identify a great many prey proteins for any bait and it is clear that many of these are not true, likely arising from over-expression of the bait and non-specific interactions. To improve interaction sets derived from affinity proteomics, Socioaffinity (5), and other related methods (e.g. (6)), use the database itself to provide a score for every observed interaction inside an affinity proteomics dataset. When applied to yeast (5), bacterial (7) or human (8) affinity proteomics data, the method was found to rank true interactions highly, while giving low ranks to noise, ‘sticky’ proteins (9) or contaminants (10). With sufficient data, the method can sometimes even resolve sub-complexes, such as the IFT-B1 and -B2 complexes, first proposed by applying the method to a ciliary affinity proteomics dataset (8).
Today, many scientists have access to affinity proteomics, and often obtain data for one or a handful of baits. Socioaffinity-like methods do not work on a single sample, as they require a larger dataset in order to obtain any statistical power. It would nevertheless be useful to be able to compute similar scores when one has such a limited dataset. Accordingly, we present here WeSA (Weighted SocioAffinity), a tool designed to aid scientists performing human affinity proteomics studies, particularly when only a handful of bait/prey pairs have been determined. We use a slightly adjusted version of the socioaffinity metric used previously in affinity proteomics studies in several species. The tool scores and ranks interactions identified in new affinity proteomics experiments by exploiting the large protein–protein interaction datasets currently available from high and low-throughput studies now stored in databases (e.g. (11–13)). For one or multiple affinity proteomics result bait/prey sets, the tool integrates these data in the context of all previously observed data. This has the effect of down-weighting frequently occurring prey proteins and up-weighting those likely to be true interactors with the baits of interest. Beyond the initial data processing, this analysis does not remove—nor require users to remove—common contaminants as these are naturally down-weighted.
WeSA uses the ratio observed-to-expected, which can reduce the need for controls or replicates. Nevertheless, replicates can be used as well as every observation adds further evidence to make WeSA scores more accurate. This sets the method apart from other statistical techniques which are performed on a single dataset (e.g. (14,15)). Moreover, the idea of only using information from the experimental precipitate avoids further biases arising from exponentially scaling inaccuracy with the addition of further complexity (e.g. (16)). These methods, which rely on well-curated details, can form a part of further analysis, while WeSA can save time by shortening the list of potential interactions and overcoming biases such as dependencies between the target and control or between the target and the CRAPome (10).
The scores potentially provide a less biased alternative to the regular case-control experiments (e.g. (4)) as the concept of a control experiment is effectively substituted by the large dataset of previous observations. Heterogeneity between datasets and methods might be expected to require individual adaptations based on the method, such as fitting of parameters or incorporating a measure of estimated accuracy for each individual study (6). We find that our scores perform surprisingly well in combining datasets even without any dataset-specific corrections (5,17).
Materials and methods
Implementation
WeSA was built using Python3 and it is available as a Flask application. The interactive network display is created with the Cytoscape.js graph library (18). All code can be accessed at: https://github.com/russelllab/wesa.
Data sources and preprocessing
We only include human protein interactions. The details on data size for the three data sources are shown in Figure 1B. From human interactions, we also make an effort to exclude direct pairwise studies of interactions such as yeast two hybrid and protein complementation assays. This is done because there are many interactions which, while existing, are random (non-specific) with respect to function (19) and they do not carry information for protein complexes as a whole. Since the idea of the WeSA score described below is to separate interactions by their specificity, random pairwise testing is not of much use. We left splicing unchanged in this context but different splicing forms can be considered as equivalent if needed.
Figure 1.
(A) A schema linking affinity purification experiments to WeSA. The former get as an end result a list of proteins in the precipitate with no information on the true configuration which may be as in the top left: there is no direct connection between bait and prey C and prey B is a contaminant. The results can be recorded using one of two models: spoke (the target protein is linked to everything) or matrix (all retrieved proteins are linked). WeSA combines all this information in the score. (B) Number of records obtained from each data source as used in building WeSA. The information from each source is recorded using a spoke model. For BioPlex the studies identifier is substituted by the number of experiments in the dataset. (C) ROC and Precision-Recall curves for WeSA scores using each of the three datasets as well as their combinations. The point corresponding to the optimal threshold for each dataset is marked.
IntAct
The filtered IntAct data contains 291.6 thousand pairs of human protein–protein interactions from affinity-purification-like experiments (11). Since IntAct records are done using a spoke model (i.e. every row contains a tagged bait protein and one of the proteins retrieved in the experiment, Figure 1A, top right), a matrix expansion (i.e. protein pairs which were observed together in the precipitate of the same bait as in Figure 1A, bottom left) resulted in >14 million pairs whose marginal data provided more context for the WeSA score.
BioPlex
While we include information from IntAct, a critique of interactions recorded there is their bias towards overly stable or high affinity interactions (20). AP methods used until recently, and which predominate in IntAct and BioGRID, aimed to wash away artifacts and had a tendency to remove weaker interaction partners. While proteins tend to have between 2 and not more than 100 partners, which is indeed the case for most of the records on IntAct, the BioPlex (13) aims to capture more than just the few and stably attached prey. Information in that dataset is part of the effort to understand transient relations by changing experiments towards less washing and larger prey spectra. To our knowledge, the BioPlex Interactome is the largest study on the human interactome in which only a single tag (and thus a single washing stage) is used and so, filtering is significantly reduced to allow the observation of weaker interactions.
BioGRID
Compared to the above databases, BioGRID (12) is less filtered and consists of more data than IntAct. There is no complete curation for the Homo Sapiens data, so the landscape still contains a lot of noise. On the other hand, BioGRID contains fewer interactions per experiment than BioPlex. The full database records information across species and experiment types, but we have filtered for relevance.
Scoring and testing
Socioaffinity is implemented exactly as in the original paper (5). Weighted socio-affinity (WeSA,
) uses the same components multiplied by the weights so that:
![]() |
where
is the observed number of retrievals of j, when i is the experiment bait,
is the observed number of retrievals of i and j simultaneously with a third bait and
is the scaled observed number of joint retrievals where
denotes the respective sample averages. The terms
,
and
are the same as defined in (5). The higher the WeSA score, the better is the likelihood that the interaction is real as it is observed more times than expected. Conversely, low scores mean fewer than expected occurrences and a possible lack of biological specificity.
In the resulting table output of the server, both WeSA and SA are presented. The multiplication factors (weights) that WeSA incorporate essentially re-evaluate the importance of the score for each particular part of the score. More observations in each of the categories, spoke and/ or matrix, increase the importance of that term, while scores having few pieces of supporting evidence receive a relative score which is weighted down.
Socioaffinity has previously been shown to be a reliable metric often outperforming methods of similar design (5,6,17). We also tested its weighted version, the WeSA score, against its predecessor, the socioaffinity score and while both are implemented online, our metrics point to the WeSA score having better indicators in most cases (Table 1). To construct an independent test set of positives, we expanded the full list of human protein complexes from CORUM (21), we then restricted it to 3D structures of interactions from the Protein Data Bank (PDB) (22). Pairs of proteins having at least three amino acids in the interface were considered and additionally validated using information from pairwise studies (Y2H, PCA) from IntAct. For negatives we enriched the Negatome (23) with pairs of proteins which were tested in pairwise studies (not together), but were never observed to interact (24). From that we excluded any interactions identified in PDB or in pairwise IntAct studies. This resulted in 4493 positives and 8468 negatives. With those, the performance of WeSA on each dataset is shown in Figure 1C and Table 1.
Table 1.
Statistics from ROC analysis for all datasets and scores calculated for pairs with at least three direct observations (spoke evidence) from AP experiments (WeSA/SA)
| Data | AUC | TPR | FPR | Precision | Negatives | Positives |
|---|---|---|---|---|---|---|
| IntAct | 0.873/0.804 | 0.778/0.767 | 0.126/0.191 | 0.988/0.981 | 230 | 1609 |
| BioPlex | 0.826/0.803 | 0.782/0.687 | 0.254/0.205 | 0.904/0.911 | 697 | 1635 |
| BioGRID | 0.914/0.849 | 0.837/0.832 | 0.126/0.184 | 0.98/0.972 | 523 | 2307 |
| IntAct & BioPlex | 0.852/0.848 | 0.815/0.765 | 0.248/0.182 | 0.932/0.946 | 848 | 2469 |
| IntAct & BioGRID | 0.925/0.861 | 0.84/0.853 | 0.106/0.196 | 0.981/0.967 | 606 | 2442 |
| BioPlex & BioGRID | 0.916/0.857 | 0.868/0.81 | 0.165/0.212 | 0.951/0.934 | 1099 | 2938 |
| IntAct & BioPlex & BioGRID | 0.922/0.861 | 0.857/0.822 | 0.148/0.212 | 0.954/0.932 | 1159 | 3048 |
Performance metrics are given for the optimal threshold for each dataset.
Researchers should avoid inputting experiments which have clearly gone wrong, despite the score being fairly robust to single observations. Normally, adding a single piece of spoke evidence will not change the score noticeably, but this is still especially important to keep in mind around thresholds and when comparing scores which are close to each other. In a very extreme hypothetical experiment in which we test a bait which has been observed only twice in the past and we retrieve 20 000 prey proteins, if we simply submit those results and enrich them with BioGRID data, related WeSA scores may increase by up to 4.6.
In general, however, it is the case that many interactions from a single experiment ‘spread’ more, i.e. they have less of an effect on single scores. The change in WeSA score increases with smaller experiments. Other relevant factors to consider are the raw observation counts; that is why they are also shown in the results table. If the bait considered has only been seen once, seeing it a second time (all else being equal) will decrease the scores by 0.7.
Errors can stack up, but their impact is also decreased whenever an experiment does not contain the erroneous pair. WeSA (and indeed all socioaffinity based methods) work under the assumption that ‘wrong’ interactions will not repeat as often as true, meaning they tend to average out.
Optimal threshold
After plotting the ROC curves (Figure 1C), we also look at the optimal threshold. This is defined as the cutoff of the ‘best balance’ between TPR and FPR. There are multiple popular ways to calculate this threshold (25). We have tested three of the most popular methods, namely, the Youden index (26), the concordance method (27) and the closest-to-(0,1) method (28). In our implementations they are all close to overlapping with each other, so for simplicity we have only presented the closest-to-(0,1)-defined threshold. This threshold is defined from the point on the ROC curve which is closest to the top left edge of the graph, i.e. the point with coordinates (0,1). That is, the threshold is defined as the value minimizing the distance
. Statistics for the optimal thresholds depending on the dataset are presented in table 1.
Results and discussion
WeSA is available at wesa.russelllab.org. The web tool is free and open to all users without any login requirement.
WeSA: home page and input
The server allows users to integrate one or more human bait/prey datasets derived from affinity proteomics results with the pre-processed data from Intact (11), BioGrid (12), BioPlex (13) or combinations of these. As we make use of published datasets, the presumption is that the user has done a similar kind of pre-processing of the data. This can include imposing filters on peptide coverage (8), other mass-spectrometry quality metrics, or even differential identification compared to a control-bait (4). This assumes typically a few hundred (at most) interactors for any given bait.
WeSA offers two main functionalities or query types
Submission of new data for scoring
Submissions of this type have two or three space-separated columns, recording the bait and prey proteins and the experiment identifier. The experiment identifier is optional in the case when the results from a single experiment are submitted. The algorithm then merges the input data with the chosen background database and completes the WeSA calculations. The returned output contains scores based on the selected dataset updated with the new information. It is helpful to re-rank results from big and noisy experiments and focus on validating the most likely interacting pairs.
Retrieving all known information about a protein or a protein list
This type of submission has only one column listing all proteins for which information is desired (can be a single protein). The algorithm then searches through the selected pre-recorded dataset and outputs only pairs in which the queried proteins are present. This is useful if a researcher wants to obtain an unbiased overview of the data which is already published with scores that help them to rank the most likely interactors.
When submitting their query, users can paste their data in a submission box or present a file (Figure 2A, option 1). The input should be space-separated. Depending on whether there are one, two or three columns in the submission, the program continues by calculating the WeSA scores for the new submissions or by returning pre-calculated records.
Figure 2.
(A) Submission options: (1) the user can insert their input in the submission box, select a file or choose an example; (2) dropdown menu to choose a dataset to enrich the user data and use in the calculations of WeSA scores. (B) Results page: network view. The connection between NEW and IFT172 has no previous supporting evidence (only from the user input) and therefore is shown as a dashed line. The WeSA score between COG6 and IFT57 is below the optimal pre-defined threshold, so it is colored red (as is the link between IFT172 and NEW). Edge width corresponds to the calculated WeSA score. By clicking on a protein (here: CLUAP1) or an edge, an information box appears. (C) A table summarizing the scores and absolute counts used for calculations is placed below the network.
Along with their input, users can choose which PPI repository to use for calculations or querying (Figure 2A, option 2). For calculations of updated scores based on user-submitted results, the data from the database they choose will be used as background to enrich the submission and calculate scores based on the combined information. For querying already discovered protein interactions, the results will be displayed only from the dataset chosen. The current options are: IntAct (11), BioGRID (12) BioPlex (13) and their various combinations.
We provide several examples (Figure 2A, option 1) to demonstrate the submission format and to show typical results. A more in-depth explanation of the submission options together with examples can be found in the help page of the WeSA website: wesa.russelllab.org/help.
The output network and table of results
Once processed, the output page presents results in both a network and a table view. The network visualizes the protein-protein interactions from the query for which there is any evidence of interaction. Nodes correspond to the relevant proteins, while edges depict possible physical interactions between them and are colored according to the WeSA score: low-scoring, thus unlikely interactions, are colored red, while the high-scoring pairs are linked by a blue edge. The threshold for determining the two categories is defined depending on the chosen background dataset (see Materials and methods for details). Nodes are also color coded based on the scores of their connections. If they have only low-scoring adjacent edges with WeSA scores, they are unlikely to be relevant to this network and are colored red. Conversely, nodes connected to likely true edges are colored blue.
In addition, clicking on nodes and edges displays tooltips with extra information. For nodes, these show the protein name, and the number and names of complexes the protein has already been recorded to appear in CORUM (21). For edges, tooltips show the exact WeSA and SA scores, as well as a summary of the amount and type of evidence from AP experiments, an aggregation of the background and user-submitted data that is used for calculations.
Most importantly, the network is fully interactive to facilitate exploration of results and customization of the view before exporting it as an image. Users can drag and rearrange nodes manually or change the layout automatically using the side panel option. Other options allow filtering based on a specified false positive rate (FPR) cut-off (1%, 5%, 10% or 20%), as well as to toggle on or off proteins with no interactions, with previously unobserved interactions or with low WeSA-ranked interactions.
In addition to the network, in an adjacent tab, users can find a table that contains the full information about the pairs of proteins together with their calculated WeSA and SA scores and the raw observation numbers from the whole data. The latter are presented separately depending on the role of the two proteins in the experiments, tagged (bait) or not (prey). While the table is sorted using the WeSA score by default, users can sort by any chosen column. Results are also searchable and can be printed or exported in a couple of different formats.
Examples
We selected affinity proteomics data from recent publications to illustrate the method on data that is not inside the current databases used.
For instance, Hoffmann et al. recently performed affinity proteomics on the TTC30B protein involved in ciliary transport (2). They identified a total of 73 other proteins interacting with this bait. Submitting these data to WeSA using all three databases together gave insignificant scores to all but ten proteins, which are all members of the Intraflagellar transport (IFT) B complex (Figure 3A–D).
Figure 3.
Example submissions with some of the filtering options on the website. The examples use data from (2) (A–D) and (3) (E–G). (A) Results of the complete submission scored after enrichment with the combined data from all three datasets. (B) Removing dashed edges option to show only links with supporting evidence from prior studies. (C) Removing links with scores below the pre-computed optimal threshold for this background dataset (combined data from IntAct, BioGRID and the BioPlex). (D) Two slider options: removing links with scores above the 20% FPR (top) and the 5% FPR level (bottom). (E) Network of the full data (including all 8 transcription factors). (F) Results from submitting just the DCAF7 data. The combined BioGRID plus IntAct data is used for the calculations. (G) Removing dashed edges option. (H) Slider option removing links with scores above the 5% FPR level but leaving in the unconnected edges. Here shown unlinked nodes are proteins with links above the optimal threshold but below the 5% FPR threshold.
Elsewhere, a screen for interaction with human transcription factors identified 302 bait/prey pairs in a search for interactors with human transcription factors (3). WeSA (using IntAct and BioGrid together) reduces these to 21, including several known interactions (e.g. involving DCAF7, such as DRYK1A/B and DIAPH1) and importantly removing several interactions likely to be non-specific, including interactions with the prefoldin complex and ribosomal proteins (Figure 3E). The filtering is not perfect, as some known interactions (e.g. ZNF702) are filtered, though different results are obtained with other parameters/datasets (indeed, ZNF702 is significant when focusing on the DCAF7 data; Figure 3F–H). This highlights both that the current datasets are still incomplete/inconsistent but also that results are likely to improve as more data become available.
Limitations
WeSA relies on the background datasets (e.g. IntAct, Biogrid, BioPlex) being unbiased. For IntAct and Biogrid this is not true: bias in datasets has been discussed for decades (29,30), stemming from the tendency of researchers to focus on ‘trendy’ proteins (e.g. TP53, particularly kinases, etc.). Nevertheless, as datasets grow this bias diminishes and large, systemically acquired datasets such as BioPlex are unbiased by design. Users should be aware of these potential biases.
The method also relies on users defining what constitutes a ‘real’ bait-prey relationship from raw datasets that can contain thousands of preys for an individual bait. There are, of course, several ways to do this, but our tests and the examples above demonstrate that, provided the number of bait-prey pairs is not in the thousands, that the method is fairly resilient to different methods of defining the initial bait-prey set.
Conclusion
WeSA arises directly from our interaction with those performing proteomics experiments. The ability to exploit and profit from the growing dataset of interactions determined previously allows noise to be more systematically filtered and, as the examples above illustrate, effectively identify true interactions.
Future updates will include datasets from other organisms (e.g. yeast, Drosophila, E. coli) and we anticipate that once sufficient data are available, we will be able separate data into finer subsets, such as tissue or cell types. We also plan to extend the tool to BioID and other newer data types as they become available.
Acknowledgements
We would like to thank Ivan Shtetinski for the reliable technical support.
Author contributions: M.S. and R.B.R.: Conceptualization, Formal analysis, Methodology, Validation, Writing. J.C.G.S.: Methodology, Writing. K.B.: Conceptualization, Methodology. T.B., M.U.: Methodology.
Contributor Information
Magdalena M Shtetinska, BioQuant, Heidelberg University, 69120 Heidelberg, Germany; Biochemistry Center (BZH), Heidelberg University, 69120 Heidelberg, Germany.
Juan-Carlos González-Sánchez, BioQuant, Heidelberg University, 69120 Heidelberg, Germany; Biochemistry Center (BZH), Heidelberg University, 69120 Heidelberg, Germany.
Tina Beyer, Institute for Ophthalmic Research, Center for Ophthalmology, University of Tübingen, 72076 Tübingen, Germany.
Karsten Boldt, Institute for Ophthalmic Research, Center for Ophthalmology, University of Tübingen, 72076 Tübingen, Germany.
Marius Ueffing, Institute for Ophthalmic Research, Center for Ophthalmology, University of Tübingen, 72076 Tübingen, Germany.
Robert B Russell, BioQuant, Heidelberg University, 69120 Heidelberg, Germany; Biochemistry Center (BZH), Heidelberg University, 69120 Heidelberg, Germany.
Data availability
Data for developing the tool are downloaded from the three public sources: IntAct, BioGRID, the BioPlex Interactome. Data used for the example section can be found within the papers or under the following links: http://www.russelllab.org/wesa_eg/TTC20B_eg.txt and http://www.russelllab.org/wesa_eg/DCAF7_eg.txt.
All code can be accessed at: https://github.com/russelllab/wesa and https://figshare.com/articles/software/WeSA/25467793.
Funding
This project has received funding (support for MS) from the European Union's Horizon 2020 research and innovation programme Marie Sklodowska-Curie Innovative Training Networks (ITN) under grant No. 861 329 (SCiLS), JCGS was supported by a grant from the Swedish Research Council (VR) and support from the Horizon 2020 project PrecisionTox (grant No. 965 406).
Conflict of interest statement. None declared.
References
- 1. Puig O., Caspary F., Rigaut G., Rutz B., Bouveret E., Bragado-Nilsson E., Wilm M., Séraphin B. The tandem affinity purification (TAP) method: a general procedure of protein complex purification. Methods. 2001; 24:218–229. [DOI] [PubMed] [Google Scholar]
- 2. Hoffmann F., Bolz S., Junger K., Klose F., Stehle I.F., Ueffing M., Boldt K., Beyer T. Paralog-specific TTC30 regulation of Sonic hedgehog signaling. Front. Mol. Biosci. 2023; 10:1268722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Alerasool N., Leng H., Lin Z.-Y., Gingras A.-C., Taipale M. Identification and functional characterization of transcriptional activators in Human cells. Mol. Cell. 2022; 82:677–695. [DOI] [PubMed] [Google Scholar]
- 4. Beyer T., Klose F., Kuret A., Hoffmann F., Lukowski R., Ueffing M., Boldt K. Tissue- and isoform-specific protein complex analysis with natively processed bait proteins. J. Proteomics. 2021; 231:103947. [DOI] [PubMed] [Google Scholar]
- 5. Gavin A.-C., Aloy P., Grandi P., Krause R., Boesche M., Marzioch M., Rau C., Jensen L., Bastuck S., Dümpelfeld B. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006; 440:631–636. [DOI] [PubMed] [Google Scholar]
- 6. Collins S.R., Kemmeren P., Zhao X.-C., Greenblatt J.F., Spencer F., Holstege F.C.P., Weissman J.S., Krogan N.J. Toward a comprehensive atlas of the physical interactome of saccharomyces cerevisiae. Mol. Cell. Proteomics. 2007; 6:439–450. [DOI] [PubMed] [Google Scholar]
- 7. Kühner S., Noort V.v., Betts M.J., Leo-Macias A., Batisse C., Rode M., Yamada T., Maier T., Bader S., Beltran-Alvarez P. et al. Proteome organization in a genome-reduced bacterium. Science. 2009; 326:1235–1240. [DOI] [PubMed] [Google Scholar]
- 8. Boldt K., Reeuwijk J.v., Lu Q., Koutroumpas K., Nguyen T.-M.T., Texier Y., Beersum S.E.C.v., Horn N., Willer J.R., Mans D.A. et al. An organelle-specific protein landscape identifies novel diseases and molecular mechanisms. Nat. Commun. 2016; 7:11491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Feller S.M., Lewitzky M. Very ‘sticky’ Proteins – Not too sticky after all. Cell Commun. Signal. CCS. 2012; 10:15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Mellacheruvu D., Wright Z., Couzens A., Lambert J.-P., St-Denis N., Li T., Miteva Y., Hauri S., Sardiu M., Low T. et al. The CRAPome: a contaminant repository for affinity purification mass spectrometry data. Nat. Methods. 2013; 10:730–736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Orchard S., Ammari M., Aranda B., Breuza L., Briganti L., Broackes-Carter F., Campbell N., Chavali G., Chen C., Del Toro Ayllón N. et al. The MIntAct Project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 2013; 42:D358–D363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Oughtred R., Rust J., Chang C., Breitkreutz B.-J., Stark C., Willems A., Boucher L., Leung G., Kolas N., Zhang F. et al. The BioGRID Database: a comprehensive biomedical resource of curated protein, genetic and chemical interactions. Protein Sci. Publ. Protein Soc. 2021; 30:187–200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Huttlin E., Bruckner R., Navarrete-Perea J., Cannon J., Baltier K., Gebreab F., Gygi M., Thornock A., Zarraga G., Tam S. et al. Dual proteome-scale networks reveal cell-specific remodeling of the Human interactome. Cell. 2021; 184:3022–3040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Cao M., Zhang H., Park J., Daniels N.M., Crovella M.E., Cowen L.J., Hescott B. Going the distance for protein function prediction: a new distance metric for protein interaction networks. PLoS One. 2013; 8:e76339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Cox J., Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 2008; 26:1367–1372. [DOI] [PubMed] [Google Scholar]
- 16. Cho H., Berger B., Peng J. Compact integration of multi-network topology for functional analysis of genes. Cell Syst. 2016; 3:540–548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Schelhorn S.-E., Mestre J., Albrecht M., Zotenko E. Inferring physical protein contacts from large-scale purification data of protein complexes. Mol. Cell. Proteomics. 2011; 10:M110.004929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Franz M., Lopes C.T., Fong D., Kucera M., Cheung M., Siper M.C., Huck G., Dong Y., Sumer O., Bader G.D. Cytoscape.Js 2023 update: a graph theory library for visualization and analysis. Bioinformatics. 2023; 39:btad031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Brückner A., Polge C., Lentze N., Auerbach D., Schlattner U. Yeast two-hybrid, a powerful tool for systems biology. Int. J. Mol. Sci. 2009; 10:2763–2788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Aloy P., Russell R.B. The third dimension for protein interactions and complexes. Trends Biochem. Sci. 2002; 27:633–638. [DOI] [PubMed] [Google Scholar]
- 21. Tsitsiridis G., Steinkamp R., Giurgiu M., Brauner B., Fobo G., Frishman G., Montrone C., Ruepp A. CORUM: the Comprehensive Resource of Mammalian protein Complexes-2022. Nucleic Acids Res. 2023; 51:D539–D545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Berman H.M. The Protein Data Bank. Nucleic Acids Res. 2000; 28:235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Blohm P., Frishman G., Smialowski P., Goebels F., Wachinger B., Ruepp A., Frishman D. Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis. Nucleic Acids Res. 2014; 42:D396–D400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Trabuco L.G., Betts M.J., Russell R.B. Negative protein–Protein interaction datasets derived from large-scale two-hybrid experiments. Methods. 2012; 58:343–348. [DOI] [PubMed] [Google Scholar]
- 25. Unal I. Defining an optimal cut-point value in ROC analysis: an alternative approach. Comput. Math. Methods Med. 2017; 2017:3762651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Youden W.J. Index for rating diagnostic tests. Cancer. 1950; 3:32–35. [DOI] [PubMed] [Google Scholar]
- 27. Liu X. Classification accuracy and cut point selection. Stat. Med. 2012; 31:2676–2686. [DOI] [PubMed] [Google Scholar]
- 28. Pepe M.S. The Statistical Evaluation of Medical Tests for Classification and Prediction. 2003; Oxford: Oxford University Press. [Google Scholar]
- 29. Schramm S.-J., Jayaswal V., Goel A., Li S.S., Yang Y.H., Mann G.J., Wilkins M.R. Molecular Interaction Networks for the analysis of Human disease: utility, limitations and considerations. Proteomics. 2013; 13:3393–3405. [DOI] [PubMed] [Google Scholar]
- 30. Braun P., Tasan M., Dreze M., Barrios-Rodiles M., Lemmens I., Yu H., Sahalie J.M., Murray R.R., Roncari L., de Smet A.-S. et al. An experimentally derived confidence score for binary protein-protein interactions. Nat. Methods. 2009; 6:91–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data for developing the tool are downloaded from the three public sources: IntAct, BioGRID, the BioPlex Interactome. Data used for the example section can be found within the papers or under the following links: http://www.russelllab.org/wesa_eg/TTC20B_eg.txt and http://www.russelllab.org/wesa_eg/DCAF7_eg.txt.
All code can be accessed at: https://github.com/russelllab/wesa and https://figshare.com/articles/software/WeSA/25467793.





