Abstract
Analysis of multi-validated protein interaction data reveals networks with greater interconnectivity than the more segregated structures seen in previously available data. To help visualize this, the authors draw comparisons between continuous stratus clouds and altocumulus clouds.
Using a small dataset of protein–protein interactions [1], it was proposed that the yeast protein interaction network is made up of two sorts of hubs, party and date, and that these define modularity in the yeast protein interaction network. We found [2], by using several larger high-confidence datasets and appropriate statistical analyses, that we could not support these conclusions. Bertin et al. now invite analysis of a further dataset of protein-protein interactions, which they argue support the party/date distinction. The claimed properties of party and date hubs are not, however, present in this dataset either. In particular, when controlling for important covariables where necessary, there is no evidence for (1) bimodality in partner co-expression, (2) enrichment for similarly localized proteins that physically interact with party hubs, (3) a lower rate of evolution of party hubs, (4) differences in the effects of deletion of date and party hubs, or (5) higher genetic connectivity of date hubs. In sum, all of our prior conclusions remain robust and there is no evidence for distinctive classes of network hubs.
It was suggested [1] that some hub proteins operate at the same intracellular place and time with their multiple interactants (as if at a party) while others operate on a one-by-one basis with their numerous partners (as if on a date). Is this distinction informative? Originally, four features were used to the draw a partition between date and party hubs: expression bimodality, localization entropy, network fragmentation, and genetic connectivity [1]. A subsequent analysis suggested a fifth distinction, namely different rates of evolution after control for covariables [3]. Given the small size of the original dataset and the absence of statistical support for some of the assertions, we asked [2] whether these claims were robust. In both the original dataset and in new high-confidence interaction datasets [2], we found we could not support any of the five points of evidence.
Bertin et al. now nominate a new dataset, which they argue supports three of the five points of evidence. Bertin et al. first note a curation issue with one of our many datasets, called HC, which inadvertently contains interactions that were, owing to an ambiguity in the literature, supported by a single analysis. We certainly agree that inclusion of the data from [4] and [5] as independent validations was in error, as the data in [5] indeed fully encompasses that of [4] (A-C Gavin, personal communication). However, approximately half of the interactions reported in [4] remain multi-validated by other means. An updated high-confidence dataset that removes this duplication and incorporates more recent interaction data is available here (Dataset S1) and as a download from the BioGRID database (see http://www.thebiogrid.org). Importantly, however, our dataset HCm is unaffected by the above concern as we required validation of an interaction by multiple different methods. The new build of Bertin et al. (called “filtered-HC”) mimics HCm by excluding interactions not multivalidated with different methods. As the results of HCm confirmed those of our other datasets [2], we were surprised by the claim that the date/party distinction is still supported in filtered-HC. Because this dataset provides the most robustly defendable set of interactions, here we re-analyse the filtered-HC network to ask whether it substantiates the date/party distinction.
No Evidence for Bimodality of Co-Expression Values
Han et al. originally proposed that clear evidence for a binary hub classification (party versus date) derives from bimodal distribution of co-expression (PCC) values: one class with high average PCCs (party hubs) and the other with low average PCCs (date hubs) [1]. This proposal was based exclusively on visual inspection of the data. By contrast, we applied a formal test that examines deviation from a null of unimodality [6,7] and found no evidence for bimodality. Up to now we have analysed 25 expression datasets across seven protein interaction builds (including filtered-HC; Table 1) and added datasets nominated by Han et al., a total of 181 separate tests. To a first approximation, by chance we should expect to see around nine incidences of significance at the 5% significance level owing to type I error (although this assumes independence between datasets). We find just two.
Table 1. Test for Bimodality of Neighbour Correlation Distribution at Two Different Hub Connectivity Thresholds.
Given this lack of evidence for bimodality, Bertin et al. appear to concur that bimodality cannot be used to define party and date hubs. Surprisingly though, they now assert that bimodality never was a key point of evidence. The original definition of date and party hubs, however, stated that bimodality represented a “natural boundary” between the two classes [1]; indeed it was argued that the lack of obvious bimodality in some expression datasets was due to low sample sizes [1]. At the same time, Bertin et al. also venture to suggest that the standard statistical test for deviation from unimodality [6,7] has a high false negative rate. It does not (see Text S1).
Neighbours of Date Hubs Do Not Have more Diverse Localizations
Originally, Han et al. reported that the partners of date hubs have more diverse intracellular localizations, as measured by information entropy [1]. However, this analysis did not normalise for connection density and arbitrarily omitted data from some cellular compartments [1]. In the filtered-HC dataset again (Table 2), as before [2], upon normalization and inclusion of all the data, the entropy is in the opposite direction to that predicted by the date/party hypothesis. This inversion we showed [2] is owing to differences in abundance that follow from the assignment of party hubs as those with highly co-expressed partners (PCC > 0.5). As Bertin et al. make no statement on this issue, we assume that they do not dispute this result.
Table 2. Opposite Localization Entropy of Date and Party Hubs.
Definitions and Inferences
The evidence for the biological relevance of date and party hubs falls into two classes: the definitional, namely bimodality/co-expression and subcellular colocalization, and the inferential, or corollary behaviours that may derive from the underlying biology. As the definitional aspects do not bear scrutiny, one must be suspicious that any correlates are merely consequences of the method used to define the two hub classes. The only standing criterion left is the arbitrary distinction between those hubs with a PCC > 0.5 and those without. Highly co-expressed proteins do have a number of odd properties, namely higher connectivity and abundance. These biases are robust in the filtered HC dataset: party hubs have higher connectivity (p = 0.00006) and protein abundance (p = 0.001). It is then important to ask whether further properties stem from such biases.
Given their Abundance, Date and Party Hubs Do Not Evolve at Different Rates
Bertin et al. find that party hubs evolve more slowly. As originally noted [3], the question is whether party hubs evolve slower than date hubs when controlling for important covariates, most notably protein abundance [8]. We showed previously that any weak tendency for party hubs to evolve slower was accounted for by their abundance [2]. Unlike our prior analysis, Bertin et al. do not ask if party and date hubs evolve at different rates controlling for abundance but, instead, ask if PCC is related to evolutionary rate controlling for abundance. However, they inappropriately apply a parametric test (Pearson product-moment correlation) that requires the distribution of all variables to be normally distributed. Although the method is robust to some degree of deviation from normality, the extent to which the abundance data is non-normal is extreme (Shaprio-Wilks tests for null of normality, W = 0.2, p << 0.0001, W = 1 implying normality, W << 1 implying deviation from normality). This leaves two avenues: either to transform the data to make them approximately normal or to perform the equivalent non-parametric test.
Partial Spearman's correlation is the nonparametric equivalent. Using evolutionary rate data from sensu strictu yeasts [9], controlling for abundance [10], the more highly co-expressed genes have, if anything, a slightly higher rate of evolution (partial rho controlling for abundance, rho = +0.13, p = 0.02, p determined by simulation, implemented in R [11]). If we log transform the abundance data then the parametric correlation agrees that the sign of the partial correlation changes (rho = +0.029, p = 0.36). The log transformed abundance data has a Shapiro Wilks W score of 0.95, as opposed to 0.2 for the untransformed.
Our previous tests differed from that performed by Bertin et al: we employed analysis of covariance (ANCOVA) to ask whether date and party hubs evolve at different rates when covariate controlled (this being the prior claim [3]). We find, in accord with our results [2], that differences in abundance explain all difference in rates of evolution between the two classes (Fig 1). In the ANCOVA, as above, if anything date hubs evolve slightly slower than party hubs (Fig 1). Analysis of residuals supports these results (Fig 1). Although we can recover the result of Bertin et al. when Pearson's partial correlation is inappropriately applied to nontransformed data, all appropriate tests reject the contention of evolutionary rate differences.
Bertin et al. also suggest that a recent study of hub proteins that bind partners at multiple different sites, as opposed to re-use of the same site, provides support for the difference in evolutionary rate between party and date proteins [12]. However, this report failed to properly control for abundance [12], which if performed reveals no differences (p > 0.45) (Text S2). These results accord with our prior finding that, controlling for abundance, more highly connected hubs do not evolve more slowly, in no small part owing to re-use of binding sites [13]. In summary, evolutionary rate differences provide do not support the date/party distinction.
No Evidence for Large Differences in Effects of Hub Deletion when Allowing for Connectivity
It is argued that date hubs establish network integrity because of their positioning as intermodule linkers, as opposed to the intramodule positioning of party hubs [1]. But might any differences in deletion of date versus party hubs merely reflect a difference in connectivity of the two hub classes? Two metrics were used to measure the effect of hub deletion on the network: characteristic pathway length (CPL) and main component size (MCS) [1]. We previously considered [2] CPL to be of limited worth, because differences in pathway length may not have biological consequences (for example, since diffusion is fast, transmission delays due to increase in number of intermediate steps may be inconsequential). Moreover, CPL is susceptible to network incompleteness, which is acute for small stringent datasets such as filtered-HC. However, to enable comparison with Bertin et al., we analyze both MCS and CPL.
In addition to connectivity, it is desirable to correct for dispensability, because it is not biologically sensible to analyze networks that are fatally crippled by the loss of essential genes. Fortunately, nonessential date and party hubs have equal connectivity (p = 0.94), and thus control for both parameters simultaneously. Deletion of nonessential date and party hubs has an identical effect on network integrity (for MCS, see Figure 2; for CPL see Figure S1). Bertin et al. observe the same result for MCS even without controlling for dispensability. As an alternative means to correct for connectivity, we randomly swapped date and party hubs of the same connectivity. If the differential deletion effect is solely due to inter-versus intramodule positioning, then interchanging date with party hubs should obviate the difference. Instead, hub swapping yields the same deletion profile as the original unswapped case (Figure 3). Finally, we asked whether, even in the absence of controls for connectivity or dispensability, the difference between party and date hubs is sensitive to removal of just a few extreme hubs. Removal of just the top two percent of hubs obviates the difference between date and party hub deletion on MCS (Figure S2).
In sum, controlling for connectivity by two different means eliminates the difference between date and party hub deletion; even when not controlling for connectivity, the deletion effect relies entirely on a few extreme date hubs. There is thus no reason to suppose date and party hubs have different network positions.
No Evidence for a Difference in Genetic Connectivity
While Bertin et al. contend that date hubs have more genetic interactions in filtered-HC, they acknowledge that study bias may confound analysis, as noted [1]. Using a metric of study bias [14] (see Figure 4), we find that date and party hubs do indeed differ in their study bias (p = 0.039, Mann Whitney U-test). To examine the impact of this, we considered the difference in mean number of genetic interactions per physical connection (g i/p i) between date and party hubs; this metric controls for the fact that genetic and physical interactions are positively correlated [15]. As we incrementally purge the data of study bias, the difference in mean g i/p i between date and party hubs diminishes to zero (Figure 4). Even making no allowance for study bias, g i/p i for date and party hubs is not significantly different (Mann Whitney U-test; p > 0.06). There is thus no significant difference in genetic connectivity of party and date hubs.
Conservation of Date/Party Classification Is a Consequence of Definition, Not of Biology
Finally, Bertin et al. raise one new prospective line of evidence, namely, those proteins that appear as hubs across datasets tend to preserve their status as party or date. This observation, however, follows definition: if a hub is co-expressed (with PCC > 0.5) in any one dataset, it is defined as a party hub; if not, by default it is a date hub. Once a hub is classified as a party hub, its status cannot change solely with the addition of extra expression data. The reverse classification, i.e., date to party, is also disfavored because co-expression across different assays is not independent. The low rates of transfer of hub status merely follow from definitions and do not address the biological validity of the date/party distinction.
Summary
In the new filtered-HC dataset, as in others, the two definitional criteria of date/party hubs find no support. Four corollary points of evidence— rate of evolution, effect of deletion on network topology, genetic connectivity, and hub status quo—also find no support. That across multiple datasets, and under multiple different tests, we repeatedly find no evidence for the date/party hypothesis suggests that network hubs do not fall into discrete classes.
Supporting Information
Footnotes
Nizar N. Batada and Laurence D. Hurst are with the Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom. Teresa Reguly, Ashton Breitkreutz, Lorrie Boucher, Bobby-Joe Breitkreutz, and Mike Tyers are with Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada. Lorrie Boucher and Mike Tyers are also with the Department of Medical Genetics and Microbiology, University of Toronto, Toronto, Canada.
References
- Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, et al. Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature. 2004;430:88–93. doi: 10.1038/nature02555. [DOI] [PubMed] [Google Scholar]
- Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, et al. uStratus not altocumulus: A new view of the yeast protein interaction network. PLoS Biol. 2006;4(10):e317. doi: 10.1371/journal.pbio.0040317. doi: 10.1371/journal.pbio.0040317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fraser HB. Modularity and evolutionary constraint on proteins. Nat Genet. 2005;37:351–352. doi: 10.1038/ng1530. [DOI] [PubMed] [Google Scholar]
- Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002;415:141–147. doi: 10.1038/415141a. [DOI] [PubMed] [Google Scholar]
- Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, et al. Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006;440:631–636. doi: 10.1038/nature04532. [DOI] [PubMed] [Google Scholar]
- Hartigan JA, Hartigan PM. The dip test of unimodality. Ann Stat. 1985;13:70–84. [Google Scholar]
- Hartigan PM. Computation of the dip statistic to test for unimodality. J Roy Stat Soc C, App Stat. 1985;34:320–325. [Google Scholar]
- Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH. Why highly expressed proteins evolve slowly. Proc Natl Acad Sci U S A. 2005;102:14338–14343. doi: 10.1073/pnas.0504070102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hirsh AE, Fraser HB, Wall DP. Adjusting for selection on synonymous sites in estimates of evolutionary distance. Mol Biol Evol. 2005;22:174–177. doi: 10.1093/molbev/msh265. [DOI] [PubMed] [Google Scholar]
- Newman JR, Ghaemmaghami S, Ihmels J, Breslow DK, Noble M, et al. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature. 2006;441:840–846. doi: 10.1038/nature04785. [DOI] [PubMed] [Google Scholar]
- R Development Core Team. R: A language and environment for statistical computing. Vienna (Austria): R Foundation for Statistical Computing; 2005. [Google Scholar]
- Kim PM, Lu LJ, Xia Y, Gerstein MB. Relating three-dimensional structures to protein networks provides evolutionary insights. Science. 2006;314:1938–1941. doi: 10.1126/science.1136174. [DOI] [PubMed] [Google Scholar]
- Batada NN, Hurst LD, Tyers M. Evolutionary and physiological importance of hub proteins. PLoS Comp Bio. 2006;2(7):e88. doi: 10.1371/journal.pcbi.0020088. doi: 10.1371/journal.pcbi.0020088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reguly T, Breitkreutz A, Boucher L, Breitkreutz B-J, Hon G, et al. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J Biol. 2006;5:11. doi: 10.1186/jbiol36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ozier O, Amin N, Ideker T. Global architecture of genetic interactions on the protein network. Nat Biotechnol. 2003;21:490–491. doi: 10.1038/nbt0503-490. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.