Abstract
A longstanding puzzle in human genetics is what limits the clinical manifestation of hundreds of hereditary diseases to certain tissues, while their causal genes are expressed throughout the human body. A general conception is that tissue-selective disease phenotypes emerge when masking factors operate in unaffected tissues, but are specifically absent or insufficient in disease-manifesting tissues. Although this conception has critical impact on the understanding of disease manifestation, it was never challenged in a systematic manner across a variety of hereditary diseases and affected tissues. Here, we address this gap in our understanding via rigorous analysis of the susceptibility of over 30 tissues to 112 tissue-selective hereditary diseases. We focused on the roles of paralogs of causal genes, which are presumably capable of compensating for their aberration. We show for the first time at large-scale via quantitative analysis of omics datasets that, preferentially in the disease-manifesting tissues, paralogs are under-expressed relative to causal genes in more than half of the diseases. This was observed for several susceptible tissues and for causal genes with varying number of paralogs, suggesting that imbalanced expression of paralogs increases tissue susceptibility. While for many diseases this imbalance stemmed from up-regulation of the causal gene in the disease-manifesting tissue relative to other tissues, it was often combined with down-regulation of its paralog. Notably in roughly 20% of the cases, this imbalance stemmed only from significant down-regulation of the paralog. Thus, dosage relationships between paralogs appear as important, yet currently under-appreciated, modifiers of disease manifestation.
Author summary
A longstanding enigma in human genetics is what limits the clinical manifestation of hundreds of hereditary diseases to certain tissues or cell types, while their causal genes are present and expressed throughout the human body. A general conception was that the tissue-wide robustness to the causal aberration is achieved owing to the presence of a compensatory factor, and that disease phenotypes emerge wherever this factor is limited. Here, we tested this general conception at large-scale for the first time. We focused on paralogs of disease-causing genes, which share their functionality and may compensate for their aberration. Based on quantitative analyses of several types of omics data, we show that paralogs of causal genes are down-regulated relative to the disease-causing gene preferentially in the respective disease-manifesting tissue. This tendency is common across various subsets of causal genes, diseases, and tissues. Thus, paralogs of causal genes appear to contribute to the tissue-wide robustness against causal aberrations, and serve as important, yet currently under-appreciated, modifiers of disease manifestation.
Introduction
Hereditary diseases are caused by germline aberrations that are common to cells throughout the human body. For hundreds of these diseases, these germline aberrations have been identified and mapped to causal genes [1], and many more causal genes are likely to be identified in coming years owing to the extensive usage of sequencing techniques in medical settings [2]. However, the identification of a causal gene is often just the starting point for understanding the molecular basis of each disease. The genotype-to-phenotype relationship between a causal gene and the respective disease phenotype is typically complex [3,4], and, for numerous hereditary diseases remains to be elucidated. By shedding light on these relationships, we hope to obtain better understanding of disease mechanisms and advance the search for cures.
Tissue-selectivity is a hallmark of many hereditary diseases [5]. For example, familial mutations in BRCA1 gene increase the risk for breast and ovarian cancers, and familial mutations in RB1 gene lead primarily to retinoblastoma. From an evolutionary point of view, tissue-selectivity is not surprising given that limited manifestation is probably less detrimental than whole body diseases, and thus more likely heritable. Yet, tissue-selectivity is intriguing due to the pattern of expression of causal genes. For example, both BRCA1 and RB1 are expressed ubiquitously across most human tissues without eliciting disease phenotypes in those tissues. In fact, most causal genes exhibit tissue-specific disease manifestation along with tissue-wide expression [5,6]. Several molecular mechanisms may lead to this phenomenon. In some cases, the disease-manifesting tissue (denoted disease tissue henceforth) has unique features [7,8], such as long-lived neurons and age-related protein misfolding diseases [9]. In other cases, the tissue-selective effect may depend on the specific isoform expressed in that tissue [10]. Meta-analysis studies showed that causal genes tend to have elevated expression preferentially in their disease tissues [5,6], hinting to a quantitative basis for tissue selectivity. Previously, we showed that causal genes tend to form tissue-specific interactions preferentially in their respective disease tissues, suggesting that these interactions contribute to tissue selectivity [5]. Yet for many hereditary diseases, the molecular mechanisms that underlie them remain hidden.
Here, we consider the role that paralogs of causal genes may play in determining the tissue-selectivity of hereditary diseases. Paralogs, namely homologous genes within the same species resulting from gene duplication events, have been repeatedly shown to have redundant functions and to compensate for the loss of each other (reviewed in [11]). At a systems level, paralogs were shown to be less essential than genes lacking paralogs (singletons) in yeast [12], worms [13], mice [14] and plant [15]. A recent measurement of the essentiality of over 17,000 human genes showed that the same tendency holds for human paralogs [16]. The impact of paralogs was also demonstrated in the context of disease. For example, a mouse model of retinoblastoma that carries a homozygous deletion in the Rb gene, the homolog of human RB1 gene, does not develop retinoblastoma [17], unless one of the paralogs of Rb, p107 [18] or p130 [19], is removed.
Interestingly, in several cases the compensatory impact of paralogs was found to be dosage-dependent. For example, mouse embryos that are homozygous for Mek1 gene deletion and which typically die due to placental defects, survive if two copies of Mek2 gene are inserted, while one copy of Mek2 is not sufficient [20]. Similarly, the Eif2s3y gene on the mouse Y chromosome was shown to be replaceable by its X-linked homolog Eif2s3x gene for spermatogenesis initiation, but more copies of Eif2s3x were required for progression through meiosis [21]. In human, the essentiality of each of the two paralogous helicases, the genes DDX3Y and DDX3X, was inversely correlated with the expression level of the other paralog, stressing that their functional redundancy is dosage-dependent [16].
We hypothesized that tissue-selectivity of some hereditary diseases may be related to quantitative relationships between causal genes and their paralogs. Accordingly, owing to the functional redundancy between paralogs, a paralog of an aberrant causal gene can generally compensate for its malfunction (Fig 1A). However, when the quantitative relationships between them change, compensation may become insufficient and disease phenotypes will emerge. This might occur when the causal gene is up-regulated in the disease tissue without a similar change in the level of the paralog (Fig 1B). Alternatively, the causal gene may be expressed at an intermediate level in the disease tissue, but the paralog is down-regulated at the disease tissue (Fig 1C). While dosage-dependent compensation relationships between paralogs were demonstrated previously (e.g., [16,20,21]), this phenomenon was never analyzed systematically at large-scale in the context of tissue-selective diseases.
In this study, we analyze quantitatively the relationships between 80 causal genes and their paralogs across 112 hereditary diseases. We focused on hereditary diseases that manifest selectively in distinct tissues, including the brain, skeletal muscle, heart, skin, liver, thyroid or testis. We took advantage of various omics data including 420 RNA-sequencing profiles of 45 human tissues made available by the Genotype-Tissue Expression (GTEx) consortium [22], to assess quantitatively the relationships between causal genes and their paralogs across tissues. The majority of the causal genes were functionally-overlapping with their paralogs, and causal genes were less essential than singleton genes. Next, we computed the expression ratios between causal genes and their paralogs in different tissues. The ratios were typically similar across tissues, except for the disease tissues where the ratios were significantly high. These high ratios were observed for causal genes with different numbers of paralogs and in various disease tissues. To distinguish between the possible scenarios leading to imbalanced expression (Fig 1B and 1C), we carried differential expression analysis across the different tissues. In 26% of the cases, the causal gene was significantly up-regulated in the disease tissue. In 19% of the cases a paralog was significantly down-regulated in the disease tissue, and in additional 24% of the cases both occurred. These results suggest that paralogous compensation can shed light on the tissue-selective manifestations of hereditary diseases.
Results
Functional relationships between causal genes and paralogs
We started by creating a high-confidence dataset of tissue-selective hereditary diseases. For this, we manually curated hereditary diseases that manifested clinically in one of the following tissues: brain, skeletal muscle, heart, skin, liver, thyroid or testis. We included diseases with various modes of inheritance, since in both dominant and recessive disorders paralogous compensation may contribute to the robustness of unaffected tissues. For each disease, we extracted its known causal genes from the OMIM database [1], and identified paralogs of the causal gene based on phylogeny and sequence identity (see Methods). The causal genes contained various types of aberrations, several of which can lead to partial or complete loss of the gene product and its function, due to, e.g., protein truncation or in-frame missense mutations [23,24]. Other aberrations could lead to gain-of-function for which paralogous compensation may not be relevant, but these were shown in a systematic screen to occur at low frequency [23]. Next, we examined the pattern of expression of causal genes across tissues, to avoid causal genes that are tissue-specific and thus inevitably elicit tissue-specific phenotypes. For this, we used RNA-sequencing profiles of human tissues made public by the GTEx consortium [22] (see Methods). We computed the number of tissues expressing each causal gene above a certain threshold (see Methods), and limited our analysis to causal gene and their paralogs that were co-expressed in at least five tissues.
Henceforth, we analyzed 112 hereditary diseases caused by germline mutations in 80 causal genes (Fig 2A and S1 Table). Some of the genes were causal for distinct diseases (due to distinct mutations) that manifested in different tissues, resulting in 93 pairs of causal genes and disease tissues (S2 Table). The majority of the causal genes were expressed ubiquitously across tissues (83%, Fig 2B and S1 Fig). Thus, in agreement with previous studies [5,6], tissue-selectivity of their respective diseases could not stem simply from tissue-selective expression of the casual genes. Most causal genes were associated with one or two paralogs (76%, Fig 2C), and many paralogs were also globally expressed (66%, Fig 2B).
Monogenic disease genes and their paralogs were shown previously to be frequently functionally overlapping, based on their co-expression relationships [25] and overlap in interaction partners [26]. We used similar measures to test for functional overlap between causal genes and their paralogs in our dataset. For each causal gene and its paralog, denoted causal gene–paralog (CGP) pair, we computed the expression correlation between them across all tissues, and the overlap in their protein interaction partners (see Methods). The majority of the CGP pairs were significantly functionally overlapping by at least one measure, and, as expected, functional overlap was more frequent among pairs with higher sequence identity (Fig 2D).
The functional overlap between causal genes and their paralogs suggests that causal genes in our dataset would have a lower tendency to be essential, relative to singleton genes. This was recently shown for human genes with paralogs in general [16]. We repeated the same test for the causal genes in our dataset (Fig 2E). Indeed, causal genes were significantly less essential than singleton genes (Kolmogorov-Smirnov test, p = 0.02), in agreement with the presence of a functionally redundant paralog.
Imbalanced expression of causal genes and paralogs occurs preferentially in the disease tissue
According to the imbalance hypothesis, the relative levels of a causal gene and its paralog are comparable across tissues, except for the disease tissue, where we expect to find a shift in balance (Fig 1). To test this hypothesis, we computed the ratio between the expression levels of causal genes and their paralogs across the different tissues (see Methods). We then compared the ratios obtained in disease tissues to the ratios obtained in other tissues (Fig 3A, left). The ratios in the disease tissue were significantly higher than the median ratios in unaffected tissues, in accordance with the imbalance hypothesis (Mann-Whitney, p<10−15). A similar shift in balance was evident upon considering only causal genes with a single paralog (Fig 3A, middle, p = 0.0058). In case a causal gene has multiple paralogs, paralogs might have distinct compensatory behaviors or a cumulative effect. Thus, for causal genes with multiple paralogs, we additionally tested whether imbalance was still observable upon combining all the paralogs of each causal gene (see Methods). Indeed, these ratios too were significantly higher in the disease tissue relative to unaffected tissues (Fig 3A, right, p = 0.0012).
We further tested the generality of the imbalance by dividing causal genes according to their disease tissues and repeating this test. Notably, the ratios obtained for pairs in their respective disease tissue were higher than the ratios obtained for the same pairs in the six unaffected tissues, for all disease tissues except testis and thyroid, which included a single causal gene (Fig 3B–3E and S2 Fig). We extended this test to include all tissues. Upon comparing the ratios for CGP pairs in their disease tissue to their ratios in all other tissues, we find that with the exception of testis and thyroid, the highest ratios were obtained consistently in the disease tissue and its closely related tissues, such as different regions of the heart, skin that is sun-exposed and unexposed, and closely related brain regions (Fig 3F and S3 Fig). Thus, imbalanced expression of causal genes and their paralogs is prevailing among hereditary diseases.
Down-regulation of paralogs can contribute to expression imbalance
Our next goal was to analyze quantitatively the causes for the shift in balance observed for causal genes and their paralogs in their respective disease tissues. In general, their balance could be shifted due to up-regulation of the causal gene or down-regulation of its paralog in the disease tissue relative to other tissues, or both, as shown schematically in Fig 4A. The CGP pair CAV1, CAV3 presented in Fig 1B demonstrates the first scenario, while the CGP pair VRK1, VRK2 presented in Fig 1C demonstrates the second scenario. To distinguish between these scenarios and to quantify their frequency in our dataset, we used differential expression analysis. We focused on the 36 tissues for which five or more samples were available. For each disease tissue, we calculated the differential expression of genes in this tissue relative to all other tissues. This allowed us to identify rigorously genes that were up-regulated or down-regulated significantly in the respective disease tissue (2-fold change and p<0.01, see Methods). We then analyzed the frequency of the different scenarios among our CGP pairs (Fig 4B).
The most common scenario was the up-regulation of the causal gene, which we observed in 52% of the CGP pairs. In additional 15% of the pairs, this was combined with down-regulation of the paralog. Notably, in another 9% of the pairs the paralog alone was significantly down-regulated. To test the generality of these trends, we repeated this analysis for several partitions of the causal genes. We observed the same trends upon analyzing separately genes that share the same disease tissue (e.g., Fig 4C and S4 Fig). The different scenarios were evident also when analyzing causal genes with a single paralog (Fig 4D, left). Specifically, 36% of the causal genes were up-regulated, including 7% where the paralog was down-regulated. In additional 20% of the genes, only the paralog was down-regulated (Fig 4E). We further tested whether these scenarios occurred preferentially in the disease tissue by carrying randomization tests (see Methods). We found that each scenario was significantly more frequent in the disease tissue than in unaffected tissues. This included up-regulation of the casual gene (p<10−3), down-regulation of the paralog (p<10−3), or both (p<0.05). We repeated this analysis for genes with multiple paralogs by combining CGP pairs of the same causal gene (see Methods, Fig 4D, right). The frequency of each scenario in the combined dataset including all genes was significantly larger than expected by chance (Fig 4F). This suggests that imbalance in general, and specifically imbalance due to paralog down-regulation, occurs preferentially in the disease tissue.
An example for down-regulation of a paralog is presented by the charged multivesicular body protein 1a (CHMP1A) gene. CHMP1A is causal for pontocerebellar hypoplasia type 8, an autosomal recessive neurodevelopmental disorder [27]. Two germline mutations in CHMP1A were identified independently in patients, both leading to lack of CHMP1A expression in patient-derived cells [28]. Our data shows ubiquitous expression of CHMP1A across tissues, with intermediary expression in the cerebellum (Fig 5A, left). CHMP1A has a paralog, CHMP1B, with considerable sequence identity (55.6%) and significantly overlapping protein interaction partners. CHMP1B was significantly under-expressed in the cerebellum (Fig 5A middle), leading to imbalance and potentially insufficient compensation specifically in the disease tissue (Fig 5A, right). Additional examples appear in Fig 5B–5E.
Discussion
Paralogous compensation is a key mechanism for maintaining genetic robustness [12–15,25]. Paralogs result from gene duplication events, and may be retained in the genome following sub- or neo-functionalization [29] or the sharing of gene dosage [30]. Their ability to compensate for each other relies on their functional similarity, and in some cases involves changes in the abundance of the functional paralog, its cellular localization or protein interactions (reviewed in [11]). Paralogous compensation is more frequent among young paralogs that have not diverged much [15,25], but was also observed among ancient paralogs [31]. Here we harnessed the concept of paralogous compensation to illuminate a fundamental question: What makes certain cell types or tissues succumb to a germline aberration while others remain robust. We hypothesized that paralogous compensation acts in various tissues throughout the body, however is limited and thus insufficient in the disease tissue, which therefore becomes vulnerable to germline mutations (Fig 1).
We tested our hypothesis on genes causal for 112 hereditary diseases that manifest predominantly in a single tissue (Fig 2). The causal variants of the genes we analyzed contained various types of aberrations, several of which can lead to partial or complete loss of the gene product or its function, due to, e.g., protein truncation or in-frame missense mutations [23,24]. Other aberrations can lead to gain-of-function for which paralogous compensation may not be relevant, but these were shown in a systematic screen to occur at low frequency [23]. The causal genes and their paralogs in our dataset appeared to be functionally related by various measures, and, in accordance, were less essential than singleton genes (Fig 2D and 2E).
Dosage sharing between paralogs was suggested to be one of the first steps following gene duplication [30]. Previous studies in yeast, fly and Arabidopsis observed that duplicate genes have higher expression divergence compared to singleton genes [32–34]. In human, variation in gene expression was shown to be far greater among tissues than among individuals [35], and was high among disease-related genes [36], as we also observed for the paralogs that we studied. Here, we analyzed systematically for the first-time dosage relationships between genes that are causal for tissue-selective hereditary diseases and their paralogs. For this, we exploited transcriptional profiles of 36 human tissues sampled from deceased donors with no genetic diseases [22]. Together, these profiles provided an atlas of gene expression in normal tissues and a baseline indicating the relevance of a gene within a specific tissue. While expression levels of wildtype alleles in patients carrying causal aberrations might differ from the levels observed in the general population, such changes were assessed previously and shown to be very limited [37–39]. Using these data, we found that the ratio between the expression levels of causal genes and their paralogs tend to be significantly high particularly in their disease tissues (Fig 3). This agrees with our hypothesis that, upon causal aberration, paralogous compensation will be limited specifically in the disease tissue, thereby making this tissue more vulnerable than other tissues expressing the same causal gene.
The relatively high expression ratios could stem from up-regulation of the causal gene in the vulnerable tissue, or from down-regulation of its paralog (Fig 1). To differentiate between these scenarios, we carried a rigorous differential expression analysis, which was enabled by the large numbers of samples available per tissue. The largest fraction of the cases included causal genes that were up-regulated significantly in their disease tissues, as previously observed [5,6]. However, in many other cases the causal gene was not upregulated and there was no tissue-specific isoform, yet a paralog was down-regulated significantly (Fig 4E and Fig 5). Notably, the frequency of each of these scenarios was higher than expected by chance (p<0.05, randomization test). A specifically interesting example involved two paralogous causal genes that were down-regulated at each other’s disease tissue (Fig 5E). LDLR and VLDLR belong to the low-density lipoprotein receptor gene family and are functionally overlapping (>47% sequence identity and significantly overlapping interactions). LDLR is causal for familial hypercholesterolemia that manifests in the liver, and VLDLR is causal for cerebellar hypoplasia and mental retardation. Each of them was expressed at intermediate levels at its respective disease tissue, and down-regulated at the disease tissue of its paralog, suggesting that by this they elicit tissue-specific phenotypes.
There remains a subset of causal genes for which imbalanced expression or significant expression changes were not observed. This includes cases where paralogous compensation may be irrelevant, due to limited functional overlap between paralogs or their tissue-specific isoforms, or a gain-of-function aberration. Interestingly a recent study showed that some paralogs in yeast are dependent on each other, and thus when mutated impart fragility rather than robustness [39]. We identified one such disease-related pair in human. BRAF and RAF1 are two functionally-related paralogous genes of the RAF family of serine/threonine protein kinases, known to be causal for various types of Noonan and Leopard syndromes. Interestingly, they were shown to form a heterodimer, thus explaining their dependency and common phenotypes [40]. Additional cases of paralogous compensation may remain hidden due to under-sampling of relevant cell types, or to lack of post-transcriptional profiling [38]. These might come to light upon analyzing data from specific cell types, or by rigorous proteomic analyses at large scales. In the future, it will be intriguing to extend the concept of compensation to higher-order entities such as pathways (e.g., [15,41]).
Our results show that systematic analysis of large-scale datasets illuminates dosage relationships and paralogous compensation events. They suggest that compensatory factors underlie tissue-selective genotype-phenotype relationships and particularly disease susceptibility, and point to paralogs as new and effective modifiers of tissue robustness.
Methods
Disease set and genes analyzed in this study
The disease set included hereditary diseases with known protein-coding causal genes according to OMIM [42] that were predicted to manifest in either the brain, heart, liver, skeletal muscle, skin, testis or thyroid [6]. We used literature and expert curation to validate their clinical association and to filter out diseases with multiple affected tissues. Causal genes were downloaded from OMIM [42]. Paralogs were extracted from Ensembl-Biomart [6,43,44] and limited to paralogs with reciprocal sequence identity of 40% or more.
Gene expression analyses
RNA sequencing profiles were obtained from the GTEx portal on 2/22/17 (version 6p) [44]. In the expression quantification of genes by GTEx, only uniquely mapped reads were considered [45]. Only samples from individuals with traumatic injury as cause of death were included as proxy for healthy tissues (S3 Table). We verified the absence of possible confounders, including gender and age group, by applying mixed linear models to each tissue separately. To evaluate the expression distribution of a gene across tissues, we considered a gene as expressed in a certain tissue if its level exceeded 0.3 RPKM in at least half of the samples of that tissue. For each gene, the number of tissues expressing that gene was recoded. We included in the analyses only CGP pairs that were co-expressed in at least 5 tissues. Ratios between RPKM gene expression levels were calculated per sample. A distinct ratio was computed for a causal gene and each of its paralogs. In the combined analysis, a ratio was computed between the expression level of a causal gene and the sum of the expression levels of its paralogs. The ratio in a specific tissue was set to the median ratio across all samples of that tissue. To assess the general difference in expression patterns between paralogs that were not filtered for causal genes, we analyzed paralogs with at least 40% identity that were expressed in at least 5 common tissues (similarly to causal genes and their paralogs). For each pair, we calculated the expression ratio in each sample, across all samples of the same tissue.
Functional overlap analysis
We correlated between the expression levels of a causal gene and each of its paralogs across tissues by using Pearson correlation. For each gene, its expression level per tissue was set to the median RPKM level over samples of that tissue. We downloaded data of experimentally-detected protein-protein interactions from BioGRID [46], DIP [47] and IntAct [48] by using myProteinNet [49] and computed the number of proteins that interact with a causal gene, with its paralog, and with both. The CRISPR scores of genes, which represent their essentiality, were extracted from [16].
Differential expression analysis
Differential expression analysis was applied to 36 GTEx tissues with at least five samples. Raw counts were extracted from GTEx portal and normalized using the TMM method by the edgeR package (27), to obtain the same library size for every sample. Genes with less than 10 counts in all samples were removed before normalization. In each sample, we transformed RNA-sequencing normalized counts using VOOM [50], and calculated differential expression using a linear model in the R-package Limma [51]. Specifically, all samples of the same tissue were compared to a background set containing all other samples (not limited to tissues with 5 samples or more). Only genes with an absolute 2-fold change or more and FDR adjusted P-values <0.01 were considered differentially expressed.
Statistical analysis
We used mixed linear models to predict the expression of a causal gene in each tissue by the expression levels of its paralogs, by the number of its paralogs, by whether its expression was measured in the disease tissue and by the amount of tissues which manifest a disease associated with this gene. We accounted for the clustered structure of the donors by including a random intercept in all of the models. Mixed linear models were computed by using IBM SPSS Statistics, Version 23.0. We compared between the expression distributions of causal genes and their paralogs across tissues by using the Kolmogorov-Smirnov test. The significance of protein interactions overlap between a causal gene and its paralog was computed by using Fisher exact test. We compared between essentiality scores of causal genes and protein-coding genes without paralogs by using the Kolmogorov-Smirnov test. This test was also used to compare between the distribution of ratios obtained in disease tissues versus unaffected tissues. We used a randomization test to assess whether causal genes (and their paralogs) are over-expressed (or under-expressed) preferentially in the disease tissue relative to other tissues. Specifically, in each randomized run each causal gene was assigned a randomly selected disease tissue out of the set of GTEx tissues expressing the causal gene and its paralog. For causal genes with multiple paralogs, the disease tissue was selected randomly from the set of GTEx tissues expressing the causal gene and at least one of its paralogs. We then counted the number of causal genes that were significantly over-expressed, had an under-expressed paralog, or both, in the randomly selected disease tissues. We repeated this analysis 1,000 times. Statistical significance was set to the fraction of randomized runs in which the number of causal genes in a given subset was at least as high as the fraction observed for these pairs in the original dataset.
Supporting information
Acknowledgments
We thank Francois Aguet, Ayellet Segre and Vered Chalifa-Caspi for their help in the gene expression analyses of GTEx samples. We thank Liad Alfandri for his help in the association between diseases and tissues.
Data Availability
All relevant data are within the paper and its Supporting Information files.
Funding Statement
This research was supported by the Israel Science Foundation (ISF) through grant number 860/13 and the Joint Broad-ISF Research Grant 2435/16. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.McKusick VA (2007) Mendelian Inheritance in Man and its online version, OMIM. Am J Hum Genet 80: 588–604. doi: 10.1086/514346 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Alkuraya FS (2016) Discovery of mutations for Mendelian disorders. Hum Genet 135: 615–623. doi: 10.1007/s00439-016-1664-8 [DOI] [PubMed] [Google Scholar]
- 3.Moaven N, Tayebi N, Goldin E, Sidransky E (2015) Complexity of Genotype-Phenotype Correlations in Mendelian Disorders: Lessons from Gaucher Disease. Rare Diseases. Springer Netherlands. pp. 69–90. [Google Scholar]
- 4.Antonarakis SE, Chakravarti A, Cohen JC, Hardy J (2010) Mendelian disorders and multifactorial traits: the big divide or one for all? Nat Rev Genet 11: 380–384. doi: 10.1038/nrg2793 [DOI] [PubMed] [Google Scholar]
- 5.Barshir R, Shwartz O, Smoly IY, Yeger-Lotem E (2014) Comparative analysis of human tissue interactomes reveals factors leading to tissue-specific manifestation of hereditary diseases. PLoS Comput Biol 10: e1003632 doi: 10.1371/journal.pcbi.1003632 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lage K, Hansen NT, Karlberg EO, Eklund AC, Roque FS, et al. (2008) A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes. Proc Natl Acad Sci U S A 105: 20870–20875. doi: 10.1073/pnas.0810772105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Sekijima Y, Wiseman RL, Matteson J, Hammarstrom P, Miller SR, et al. (2005) The biological and chemical basis for tissue-selective amyloid disease. Cell 121: 73–85. doi: 10.1016/j.cell.2005.01.018 [DOI] [PubMed] [Google Scholar]
- 8.Godfrey R, Quinlivan R (2016) Skeletal muscle disorders of glycogenolysis and glycolysis. Nat Rev Neurol 12: 393–402. doi: 10.1038/nrneurol.2016.75 [DOI] [PubMed] [Google Scholar]
- 9.Taylor RC, Dillin A (2011) Aging as an event of proteostasis collapse. Cold Spring Harb Perspect Biol 3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Buljan M, Chalancon G, Eustermann S, Wagner GP, Fuxreiter M, et al. (2012) Tissue-specific splicing of disordered segments that embed binding motifs rewires protein interaction networks. Mol Cell 46: 871–883. doi: 10.1016/j.molcel.2012.05.039 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Diss G, Ascencio D, DeLuna A, Landry CR (2014) Molecular mechanisms of paralogous compensation and the robustness of cellular networks. J Exp Zool B Mol Dev Evol 322: 488–499. doi: 10.1002/jez.b.22555 [DOI] [PubMed] [Google Scholar]
- 12.Gu Z, Steinmetz LM, Gu X, Scharfe C, Davis RW, et al. (2003) Role of duplicate genes in genetic robustness against null mutations. Nature 421: 63–66. doi: 10.1038/nature01198 [DOI] [PubMed] [Google Scholar]
- 13.Conant GC, Wagner A (2004) Duplicate genes and robustness to transient gene knock-downs in Caenorhabditis elegans. Proc Biol Sci 271: 89–96. doi: 10.1098/rspb.2003.2560 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.White JK, Gerdin AK, Karp NA, Ryder E, Buljan M, et al. (2013) Genome-wide generation and systematic phenotyping of knockout mice reveals new roles for many genes. Cell 154: 452–464. doi: 10.1016/j.cell.2013.06.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hanada K, Sawada Y, Kuromori T, Klausnitzer R, Saito K, et al. (2011) Functional compensation of primary and secondary metabolites by duplicate genes in Arabidopsis thaliana. Mol Biol Evol 28: 377–382. doi: 10.1093/molbev/msq204 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wang T, Birsoy K, Hughes NW, Krupczak KM, Post Y, et al. (2015) Identification and characterization of essential genes in the human genome. Science 350: 1096–1101. doi: 10.1126/science.aac7041 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Maandag EC, van der Valk M, Vlaar M, Feltkamp C, O'Brien J, et al. (1994) Developmental rescue of an embryonic-lethal mutation in the retinoblastoma gene in chimeric mice. EMBO J 13: 4260–4268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Robanus-Maandag E, Dekker M, van der Valk M, Carrozza ML, Jeanny JC, et al. (1998) p107 is a suppressor of retinoblastoma development in pRb-deficient mice. Genes Dev 12: 1599–1609. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.MacPherson D, Sage J, Kim T, Ho D, McLaughlin ME, et al. (2004) Cell type-specific effects of Rb deletion in the murine retina. Genes Dev 18: 1681–1694. doi: 10.1101/gad.1203304 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Aoidi R, Maltais A, Charron J (2016) Functional redundancy of the kinases MEK1 and MEK2: Rescue of the Mek1 mutant phenotype by Mek2 knock-in reveals a protein threshold effect. Sci Signal 9: ra9 doi: 10.1126/scisignal.aad5658 [DOI] [PubMed] [Google Scholar]
- 21.Yamauchi Y, Riel JM, Ruthig VA, Ortega EA, Mitchell MJ, et al. (2016) Two genes substitute for the mouse Y chromosome for spermatogenesis and reproduction. Science 351: 514–516. doi: 10.1126/science.aad1795 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Consortium G (2015) Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348: 648–660. doi: 10.1126/science.1262110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Sahni N, Yi S, Taipale M, Fuxman Bass JI, Coulombe-Huntington J, et al. (2015) Widespread macromolecular interaction perturbations in human genetic disorders. Cell 161: 647–660. doi: 10.1016/j.cell.2015.04.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wang X, Wei X, Thijssen B, Das J, Lipkin SM, et al. (2012) Three-dimensional reconstruction of protein networks provides insight into human genetic disease. Nat Biotechnol 30: 159–164. doi: 10.1038/nbt.2106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hsiao TL, Vitkup D (2008) Role of duplicate genes in robustness against deleterious human mutations. PLoS Genet 4: e1000014 doi: 10.1371/journal.pgen.1000014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Chen WH, Zhao XM, van Noort V, Bork P (2013) Human monogenic disease genes have frequently functionally redundant paralogs. PLoS Comput Biol 9: e1003073 doi: 10.1371/journal.pcbi.1003073 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Johns Hopkins University B, MD. Online Mendelian Inheritance in Man, OMIM®. pp. MIM Number: {614961}:.
- 28.Mochida GH, Ganesh VS, de Michelena MI, Dias H, Atabay KD, et al. (2012) CHMP1A encodes an essential regulator of BMI1-INK4A in cerebellar development. Nat Genet 44: 1260–1264. doi: 10.1038/ng.2425 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Lynch M, Force A (2000) The probability of duplicate gene preservation by subfunctionalization. Genetics 154: 459–473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Lan X, Pritchard JK (2016) Coregulation of tandem duplicate genes slows evolution of subfunctionalization in mammals. Science 352: 1009–1013. doi: 10.1126/science.aad8411 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Kafri R, Dahan O, Levy J, Pilpel Y (2008) Preferential protection of protein interaction network hubs in yeast: evolved functionality of genetic redundancy. Proc Natl Acad Sci U S A 105: 1243–1248. doi: 10.1073/pnas.0711043105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Gu Z, Rifkin SA, White KP, Li WH (2004) Duplicate genes increase gene expression diversity within and between species. Nat Genet 36: 577–579. doi: 10.1038/ng1355 [DOI] [PubMed] [Google Scholar]
- 33.Dong D, Yuan Z, Zhang Z (2011) Evidences for increased expression variation of duplicate genes in budding yeast: from cis- to trans-regulation effects. Nucleic Acids Res 39: 837–847. doi: 10.1093/nar/gkq874 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ha M, Kim ED, Chen ZJ (2009) Duplicate genes increase expression diversity in closely related species and allopolyploids. Proc Natl Acad Sci U S A 106: 2295–2300. doi: 10.1073/pnas.0807350106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Mele M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, et al. (2015) Human genomics. The human transcriptome across tissues and individuals. Science 348: 660–665. doi: 10.1126/science.aaa0355 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Alemu EY, Carl JW Jr., Corrada Bravo H, Hannenhalli S (2014) Determinants of expression variability. Nucleic Acids Res 42: 3503–3514. doi: 10.1093/nar/gkt1364 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Rivas MA, Pirinen M, Conrad DF, Lek M, Tsang EK, et al. (2015) Human genomics. Effect of predicted protein-truncating genetic variants on the human transcriptome. Science 348: 666–669. doi: 10.1126/science.1261877 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.DeLuna A, Springer M, Kirschner MW, Kishony R (2010) Need-based up-regulation of protein levels in response to deletion of their duplicate genes. PLoS Biol 8: e1000347 doi: 10.1371/journal.pbio.1000347 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Diss G, Gagnon-Arsenault I, Dion-Cote AM, Vignaud H, Ascencio DI, et al. (2017) Gene duplication can impart fragility, not robustness, in the yeast protein interaction network. Science 355: 630–634. doi: 10.1126/science.aai7685 [DOI] [PubMed] [Google Scholar]
- 40.Weber CK, Slupsky JR, Kalmes HA, Rapp UR (2001) Active Ras induces heterodimerization of cRaf and BRaf. Cancer Res 61: 3595–3598. [PubMed] [Google Scholar]
- 41.Axelrod M, Gordon VL, Conaway M, Tarcsafalvi A, Neitzke DJ, et al. (2013) Combinatorial drug screening identifies compensatory pathway interactions and adaptive resistance mechanisms. Oncotarget 4: 622–635. doi: 10.18632/oncotarget.938 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.McKusick-Nathans Institute of Genetic Medicine JHUB, MD) Online Mendelian Inheritance in Man, OMIM®.
- 43.Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, et al. (2009) EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 19: 327–335. doi: 10.1101/gr.073585.107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Herrero J, Muffato M, Beal K, Fitzgerald S, Gordon L, et al. (2016) Ensembl comparative genomics resources. Database (Oxford) 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.GTEx project (2016) GTEx Release v6p Analysis Methods.
- 46.Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, et al. (2006) BioGRID: a general repository for interaction datasets. Nucleic Acids Res 34: D535–539. doi: 10.1093/nar/gkj109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, et al. (2002) DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 30: 303–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Orchard S, Ammari M, Aranda B, Breuza L, Briganti L, et al. (2014) The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res 42: D358–363. doi: 10.1093/nar/gkt1115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Basha O, Flom D, Barshir R, Smoly I, Tirman S, et al. (2015) MyProteinNet: build up-to-date protein interaction networks for organisms, tissues and user-defined contexts. Nucleic Acids Res 43: W258–263. doi: 10.1093/nar/gkv515 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Law CW, Chen Y, Shi W, Smyth GK (2014) voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15: R29 doi: 10.1186/gb-2014-15-2-r29 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, et al. (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43: e47 doi: 10.1093/nar/gkv007 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All relevant data are within the paper and its Supporting Information files.