Abstract
The molecular and clinical features of a complex disease can be influenced by other diseases affecting the same individual. Understanding disease-disease interactions is therefore crucial for revealing shared molecular mechanisms among diseases and designing effective treatments. Here we introduce Flow Centrality (FC), a network-based approach to identify the genes mediating the interaction between two diseases in a protein-protein interaction network. We focus on asthma and COPD, two chronic respiratory diseases that have been long hypothesized to share common genetic determinants and mechanisms. We show that FC highlights potential mediator genes between the two diseases, and observe similar outcomes when applying FC to 66 additional pairs of related diseases. Further, we perform in vitro perturbation experiments on a widely replicated asthma gene, GSDMB, showing that FC identifies candidate mediators of the interactions between GSDMB and COPD-associated genes. Our results indicate that FC predicts promising gene candidates for further study of disease-disease interactions.
Subject terms: Network topology, Statistical methods, Biochemical networks, Asthma, Chronic obstructive pulmonary disease
Complex diseases often share genetic determinants and symptoms, but the mechanistic basis of disease interactions remains elusive. Here, the authors propose a network topological measure to identify proteins linking complex diseases in the interactome, and identify mediators between COPD and asthma.
Introduction
Biological networks are powerful resources for discovering and understanding the mechanisms that underlie human complex diseases1,2. Indeed, it is accepted that biological components such as genes and proteins do not act in isolation, but are connected through intricate networks of molecular interactions that allow perturbations to diffuse across the system and generate, enhance or alter the disease phenotype. Over the last decade it has been observed that protein-coding genes associated to a disease have a strong tendency to interact with each other and agglomerate in a specific network neighborhood called the disease module3–6. However, disease progression is strongly influenced by the biological context of the organism. Perturbations causing one disease might affect other diseases, especially when the involved genes lie in the same network neighborhood, producing complex phenotypes and comorbidities7.
Finding the molecular commonalities between related diseases is crucial in understanding their heterogeneity as well as identifying common biomarkers and therapeutics. As a step in this direction, Menche et al.5 measured the network-based separation between 226 disease pairs, observing that overlapping disease modules display significant molecular similarity, elevated coexpression of their associated genes, similar symptoms and high comorbidity. However, while the introduced separation measure offers information on the similarity of two diseases, it does not help in identifying the genes encoding proteins that influence both diseases. Furthermore, mediator genes may not be part of either disease module, but they could mediate the interactions between the two diseases without participating in the core pathways of the individual diseases. In this work we propose a methodology to identify the mediators linking pairs of complex diseases, focusing on asthma and chronic obstructive pulmonary disease (COPD), two of the most widespread chronic respiratory diseases that have been estimated to be the cause of over 3 million deaths worldwide8. Asthma and COPD are influenced by genetic and environmental factors and they often manifest through similar phenotypes, like airflow obstruction, inflammation, and shortness of breath9,10. A widely-accepted definition of their differences is still lacking since many cases fall in-between the two classic descriptions of these conditions, and patients often show asthma-like and COPD-like features simultaneously. For example, airflow obstruction reversibility, considered one of the main hallmarks of asthma, can be present in many COPD patients9,10. On the other hand, fixed airflow obstruction, a cardinal manifestation of COPD, can develop in asthmatics as well, particularly those with severe disease or persistent symptoms since childhood11,12. Moreover, people affected by asthma since birth are more likely to develop COPD at later ages13–15. This phenotypic gray area has been the source of extensive debate on a possible common genetic origin of the two diseases, a hypothesis first proposed by Orie and Sluiter16, and termed the “Dutch hypothesis”. Despite the considerable effort in delineating and summarizing the richness of the clinical manifestations of asthma and COPD, there is still little understanding of the shared molecular mechanisms and the causal relationships between the two disorders. Next-generation sequencing and genome-wide association studies (GWAS) allow to identify potential causal genes that can explain the development of these chronic respiratory diseases and possibly offer mechanistic insights into their shared causality17,18. Although the presence of shared disease gene associations might be expected in the context of the asthma-COPD overlap, previous work has provided little genetic support for the Dutch hypothesis, finding little to no overlap between the major asthma and COPD genes identified via GWAS12. Here we show that network-based statistical methods can provide additional avenues to explore this problem.
We model asthma and COPD in the network of protein–protein interaction (PPI), also referred to as the interactome. Each node of the network corresponds to a protein-coding gene and the link between two genes represents a physical interaction between the corresponding proteins. In order to find the mediators between the two diseases we define a topological measure, called flow centrality (FC), identifying the genes that are involved in most of the molecular interactions occurring between the two disorders. We show that flow central genes are more functionally related with each other and with the disease genes of asthma and COPD than expected by chance. Furthermore, we generalize these results by replicating it on 66 additional pairs of related diseases. Using multiple lines of evidence, including prior literature, gene coexpression analysis in multiple transcriptomics datasets from asthmatic and COPD subjects, and in vitro genetic perturbation in a bronchial epithelial cell line (a cell type relevant to both asthma and COPD), we show that genes with high FC values are biologically meaningful and related to known asthma-specific, COPD-specific and overlapping processes. Together, these results establish flow centrality as a valuable tool in the detection of genes mediating the interaction between different diseases, offering an opportunity to understand the relation between complex diseases.
Results
Disease modules construction
We considered the protein–protein interaction constructed previously19, which integrates high-quality yeast-two-hybrid data from publicly available datasets and literature-derived interactions (see Methods). While a gene may express different isoforms, we only considered one protein product per gene, and thus we refer to the nodes of the network as genes or proteins interchangeably throughout the text. We compiled two sets of seed genes representing known GWAS loci associated to asthma and COPD from the recent literature (see Methods). The asthma seed gene set is composed of 36 genes (35 mapped in the network) and the COPD gene set is composed of 30 genes (Supplementary Data 1 and 2, respectively), and the two sets have no overlap. To explore the network neighborhood of each disease we construct a disease module by applying the DIAMOnD algorithm, a procedure for ranking the genes in the network according to their connectivity significance to the seed genes20 (see Methods). To define a cutoff for the gene ranking calculated through DIAMOnD, we considered two reference sets of GWAS-significant genes associated, respectively, to asthma and COPD, downloaded from the UK-Biobank repository21 (UKB). For both the diseases, the final module size was chosen as the size that maximized the enrichment of UKB genes in each respective module (see Methods). The two modules have 14 overlapping genes (see Supplementary Fig. 1b), summarized in Supplementary Data 3. Most of the overlapping genes in the list, such as TP53, MDM2, NFKB1, RELA, CTNNB1, TGFBR2, SMAD3, MAPK1, MAPK3, MAPK8, STAT1, and STAT3 are known to be involved in the regulation of apoptosis, proliferation, inflammation, cellular remodeling and differentiation22–26. Although these biological processes may play a role in asthma and COPD, they are not unique to these disorders. This inherent non-specificity can also be deduced by the high degree that characterizes all these genes, as shown in Supplementary Data 3. Furthermore, the empirical p-value quantifying the significance of their overlap is largely non-significant (), confirming the elusive nature of the asthma-COPD relationship. This lack of significance in the overlap motivated our following analysis.
Flow centrality between modules
Asthma and COPD manifest through similar phenotypes and symptoms, and many asthma patients develop COPD at older ages9,10,12. This observation suggests that a perturbation originating from asthma-specific genetic risk factors may slowly disrupt critical pathways, ultimately leading to the development of COPD in susceptible subjects. This perturbation may not be carried exclusively by the direct interactions of disease-specific genes; it may in fact travel through mediating genes that are not specifically linked to a single disease, thus making their recognition with standard approaches challenging.
These mediating genes are likely to be participating in the majority of the interactions between the two modules, constituting a “bottleneck” in the communication between the two diseases. In a network, betweenness centrality measures quantify the frequency of occurrence of a node in the paths that connect all the other nodes. A path is defined as an ordered sequence of steps across the edges of the network that start from a source node and lead to a destination node. There are multiple possible paths between any source and destination, and several works in literature have been dedicated to exploring different criteria for selecting and weighting these paths. For example, the classic betweenness centrality measure, proposed by Freeman27, considers only the shortest paths between the source and destination nodes. In other work a random walk betweenness centrality is proposed, where paths are weighted by the probability of being traversed by a walker in a random walk process28. Further, in another study, the authors designed a factorial weighting scheme that favors paths of shorter lengths, called communicability betweenness29. Kivimaki et al.30 defined the framework of randomized shortest paths (RSP), which interpolates between the classic concept of shortest-path-based betweenness centrality and the random walk betweenness centrality through a temperature parameter. The canonical form of these measures is an average across all the paths starting from any source node and leading to any destination node, resulting in an estimate of the node’s centrality in the global network topology. While betweenness-central nodes may have a role in the pathways of asthma and COPD, by definition they are not specific to these two disorders (since their centrality does not change when considering different diseases), and thus they are less likely to provide meaningful information about their shared pathways.
In this work we introduce the concept of flow centrality, explained in detail in the Methods section (see Fig. 1a). Flow centrality is a betweenness measure that is parametric on a source set and destination set of nodes, and its coverage spans exclusively the shortest paths connecting the two modules, instead of the whole network, similarly to a recently proposed measure called Double Specific Betweenness (S2B)31. Therefore, when all the nodes of the network are selected as both sources and targets of the shortest paths flow centrality reduces to the classic betweenness centrality defined in ref. 27. Flow centrality and the betweenness centrality measures described above are correlated to the node degree, regardless of the chosen source and target modules. To correct for this effect we defined a randomization scheme of the source and target modules to generate a null distribution of expected flow centrality values. The flow centrality score (FCS) is then calculated as the z-score of the flow centrality value when compared with the null distribution (see Fig. 1b and Methods section). A large positive value of the FCS implies that the node is highly central with respect to the source and target gene sets, even when accounting for its global centrality.
By defining the asthma node set as the source module and the COPD node set as the destination module, we calculated the flow centrality scores of all the nodes of the network. While all the betweenness centrality measures are highly correlated to the degree and with each other (Spearman’s , see Supplementary Figs. 2 and 3), denoting low specificity with respect to the asthma and COPD modules, we find that the flow centrality scores are quite orthogonal to the other measures (Spearman’s ), suggesting that FC is highly specific of the particular source and target gene sets.
Among the top flow central nodes (see Supplementary Data 4), several genes, such as SLC39A8, SOX17, and MFAP4 show a direct relationship with asthma and COPD. More specifically, it has been found in literature that the expression levels of SLC39A8, SOX17, and MFAP4 might directly affect both asthma and COPD. For example, MFAP4-deficient mice showed attenuated eosinophilic inflammation, eotaxin production, airway remodeling and airway hyper-responsiveness that are classical characteristics of asthma, while expression of SOX17 in respiratory epithelial cells decreased the expression of transforming growth factor-beta (TGF-)-responsive cell cycle inhibitors such as p15, p21, and p57 in the adult mouse lung32,33. SOX17 also inhibited TGF--mediated transcriptional responses in vitro, demonstrating an inhibitory effect on the TGF- pathway32,33. TGF-, that is highly expressed in small airway epithelium of COPD patients34, is known to play a role in the increased submucosal collagen expression occurring within the disease, and is also known to be a mediator involved in tissue remodeling in the asthmatic lung35,36. SLC39A8, a zinc transporter, is a major portal for cadmium (Cd) uptake37. SLC39A8 mRNA and protein expression levels were found to be significantly increased in lungs of chronic smokers compared with nonsmokers37. Cd is found in cigarette smoke, and it could contribute to smoking-induced lung diseases such as COPD37. In the presence of Cd, inhibition of the NF-B pathway and SLC39A8 expression reduces cell toxicity while TNF- treatment of primary human lung epithelia and A549 (lung cancer cell line) cells showed induced expression of SLC39A8, resulting in higher cell death37,38. IHH and DHH are part of the sonic hedgehog pathway and are known to directly interact with HHIP (hedgehog interacting protein) which is strongly associated with the risk of COPD39,40. HHIP competes with Ptch1 (which is the membrane receptor for IHH) for the binding of IHH and DHH. Ptch1 binding to IHH and DHH triggers the hedgehog signaling pathway, therefore the binding of HHIP with IHH negatively regulates the hedgehog pathway which is known to have a crucial role in lung development39,41.
Functional similarity of flow central genes
To validate the biological relevance of flow central genes, we selected the shortest paths between asthma and COPD seed genes whose intermediate nodes (i.e., all the nodes in the path except for the source and target) are characterized by high FCS (see Methods section for further details on the selection). By applying this selection criterion we obtained 371 distinct central paths to which we refer to as flow central paths (see Fig. 1c).
We assessed the degree of functional relatedness between the genes occurring in the flow central paths by considering their associated Gene Ontology (GO) terms. The GO similarity between two genes is defined as the best-match average (BMA) of Resnik’s similarity measure, one of the most well-known information-based similarity measures for hierarchically-ordered elements42. Further, we defined the sequential similarity (SS), a path-level quantity that measures the average GO similarity between adjacent genes in a network path (see Fig. 1d top left and Methods section). The higher the SS, the more functionally similar are the genes along the path.
We calculated the SS for each flow central path, obtaining a distribution of 371 similarity values. To estimate their significance we generated two null distributions of network paths, namely the random paths of Type A and Type B. To generate the Type A set we extract 10,000 random paths with a distribution of lengths that matches the empirical distribution observed in the FC paths (length-preserved) using the randomization scheme explained in Methods. The Type B set is constructed by randomly extracting 10,000 paths from the pool of the shortest paths between asthma and COPD seed genes (endpoints-preserved). Type A accounts for the possible biases related to the particular lengths of the FC paths, while Type B allows a direct comparison to the case where no FC information is utilized.
Figure 2a shows the comparison of the SS distributions for the flow central, Type A and Type B paths. The sequential similarities of FC paths are considerably greater than the similarities of Type A and Type B paths (one-tailed Mann–Whitney test p-values 1.12e−111 and 2.06e−77, respectively). We evaluated the separate contributions of the three main Gene Ontology categories to the global similarity (see Fig. 2b): cellular component (CC), molecular function (MF), and biological process (BP). In all cases the similarities of FC paths are significantly higher than expected. In Fig. 3a we show the FC paths ordered by number of GO annotations and the top 50 BP GO terms ordered by their information content, i.e., their specificity in the entire GO database. Biological regulation is one of the most enriched categories, which is expected because of the large number of genes annotated to regulatory processes. However, its occurrence is still more frequent than cellular process terms which are more common in the GO annotation corpus, suggesting the importance of regulatory mechanisms in the cross-talk between asthma and COPD pathways. For example, in Figs. 3b–d are shown three FC paths that are enriched in several biological processes which are relevant for both the disease onset and exacerbation. Regulation of chemokine production, regulation of T-cell activation, wound healing, tube development and inflammatory response are biological processes that are involved in airway remodeling and immune response for both asthma and COPD. More specifically, the genes of the paths in Fig. 3b, c are highly related to the TGF- signaling pathway. The TGF- signaling pathway, which consists of proteins such as TGFBR1, TGFBR2, SMAD2, and SMAD3, is involved in differentiation, cell growth and many other cellular functions that play a crucial role in development and wound healing43,44. The RAR pathway, which interacts with the TGF- signaling pathway through the SMAD proteins, is activated by binding retinoic acid to the retinoic acid receptors (RARs) such as RARB45,46. The RAR pathway is also involved in cellular functions that play a crucial role in development and wound healing45. On the other hand, the FC path shown in Fig. 3d consists of genes that are involved in the inflammatory response through the JAK-STAT signaling pathway and the TLR4 signaling pathway47,48. Both the JAK-STAT signaling pathway and the TLR4 signaling pathway play a crucial role in immune response and the cross-talk between the two pathways is thought to regulate the severity of the host inflammatory response49.
Functional similarity of FC genes of related diseases
To test whether the previous result holds in general, we considered the corpus of gene-disease associations (GDA) contained in the DisGeNet repository50 and the disease–disease similarities extracted from the Disease Ontology knowledge base. We selected all the pairs of similar diseases with a minimum of 50 associated genes and low overlap as to reduce to a case similar to asthma and COPD (see Methods section and Supplementary Figs. 6 and 7). These criteria result in 66 distinct pairs of diseases that are related according to their phenotypes, genetic causes, localization in the organism, etc (Supplementary Data 5). Some examples are Alzheimer’s disease and amyotrophic lateral sclerosis, that are both neurodegenerative diseases which share similar phenotypical features such as dementia, language dysfunction, and muscle weakness51,52, and pathologic processes involving genes playing a major role in protein homeostasis and endoplasmic reticulum stress53,54; psoriasis and allergic contact dermatitis are both inflammatory skin diseases involving the immune response that share similar phenotypical features due to inflammation55,56 and pro-inflammatory pathways involving IL-3657; polycystic ovary syndrome and Alzheimer’s disease do not share phenotypical features, yet studies showed that the two diseases might have a casual relation based on insulin resistance and through the protein phosphatase 2A pathway58–60. For each pair, we calculated the flow centrality of all the nodes in the network, selected their corresponding FC paths and extracted 10,000 Type A and B paths, following the same scheme defined above. We proceeded to evaluate the SS values of FC paths and Type A/B paths, computing two p-values and , corresponding, respectively, to the comparisons FC Type A paths and FC Type B paths. Then, we classified every disease pair with its least significant p-value (i.e., max), determining a worst-case estimate of the SS increase in FC paths. The scores of the resulting p-values, computed as their negative log-transformed values, are shown in Fig. 2c. We find that for the vast majority of disease pairs (58 out of 66) we obtain highly significant differences (p-value < 1e−20) between the SS of FC paths and random. In addition, we tested the specificity of the previous result. We generated 100 random degree-preserved sets of nodes of each disease module occurring in the 66 pairs (6600 pairs of random modules). For each original disease pair, we compared its SS distribution to each random pair through Mann–Whitney test, obtaining 100 worst-case p-values (see Methods). We find that the FC paths of the original disease pairs are almost always more sequentially similar than their randomized counterparts (Supplementary Fig. 8), with the only exception being the disease pair Hydrocephalus Leukodystrophy, possibly due to a weaker genetic link between the two diseases. Overall, this result shows that flow centrality is a highly specific property of the source and destination modules, and that it would not yield the same outcomes if applied to unrelated genes.
Coexpression of flow central genes
To highlight the putative mechanistic connections between asthma and COPD, we measured the coexpression of the genes along the flow central paths connecting the two diseases. Although gene coexpression does not necessarily imply a functional relation, it indicates whether two genes are synergistic (or antagonistic) in terms of expression, suggesting co-participation in the same biological processes. Thus, a higher degree of coordination between FC genes with asthma and COPD disease genes indicates their involvement in biological processes common to both diseases.
As reference expression data we considered two expression datasets of asthmatic and COPD patients from Gene Expression Omnibus. The first dataset is a microarray expression measurement of airway epithelial cells in asthmatics and healthy controls (GSE430261), and the second one is an RNA-seq profiling of lung tissue in COPD patients and healthy controls (GSE5714862) (see Supplementary Data 6 and Methods section for details). To measure the coexpression of the genes along each path, we defined the sequential coexpression (SC) as the average absolute coexpression between adjacent genes in the path (see Methods). For a given path, a higher sequential coexpression denotes a larger degree of coexpression between the genes interacting along the path. For each expression dataset, we calculated the SC of the FC paths for the healthy and disease states separately (Fig. 1d, e), obtaining two distributions of SC values for asthma and COPD, respectively. In the same way, we evaluated the SC values of the Type A and Type B paths for the same cases described above (asthma control/disease and COPD control/disease).
We find that in both the asthma and COPD data the FC paths are enriched for statistically higher SC values compared with both Type A paths (MW p-values 8.38e−10 and 2.14e−18, respectively) and to Type B paths (p-values 2.25e−8 and 1.41e−33, see Fig. 4a). In addition, the same result holds in the samples of healthy patients (worst-case p-value 1e−9), suggesting that FC paths correspond to interaction cascades that can be active both in healthy and disease state.
We repeated the same analysis in 16 additional GEO expression datasets. In each dataset, several subdivisions of the disease and healthy samples (classes) were considered when further information was available (such as cell type, tissue, or disease severity, see Supplementary Data 6). Similarly as before, we classified every dataset with its least significant p-value across all the classes. The SC values and the scores of the resulting p-values are shown, respectively, in Supplementary Fig. 9 and Fig. 4b. Despite the large variability of the considered expression datasets, we find similar outcomes for all the disease classes in a total of 13 out of 18 GEO datasets, with five cases being largely significant (worst-case p-value 1e−10). These results suggest that the interaction paths identified by flow centrality are robust to fluctuations and are not specific to a single cell type, tissue or experimental setting. Interestingly, we observe that the same result holds also in the respective classes of the healthy or control states (see Supplementary Fig. 10).
Since asthma and COPD are related, we hypothesized that their flow central paths are more coexpressed than random paths connecting asthma to other unrelated diseases. To test this hypothesis, we considered the DisGeNet GDA corpus, from which we extracted all the unrelated diseases and phenotypes with number of annotated genes similar to asthma and COPD (between 25 and 35 genes), for a total of 59 phenotypes. We thus measured the SC of random paths connecting the asthma and COPD seed genes to the genes associated to these phenotypes (see Methods). The SC values of the random paths connecting the asthma seeds and each DisGeNet phenotype were measured in the epithelial brushings of asthmatic samples (GSE4302), whereas the SC values between these phenotypes and COPD seeds were measured in the lung tissue of COPD samples (GSE57148). Figure 5a shows the SC distributions of FC paths and random paths of each DisGeNet phenotype in the asthma case (top) and COPD case (bottom). For clarity we show only the distributions of the top 10 phenotypes, ordered by their p-value scores (bars at the top of each plot). In both cases we find that the FC paths are characterized by significantly larger coexpression values, confirming the close relationship between asthma and COPD. In order to further test the specificity of the asthma-COPD relation and account for eventual intrinsic biases of the processing steps, including disease module construction and flow centrality evaluation, we re-executed the whole processing pipeline between asthma and two related diseases of the lung, pneumonia and idiopathic pulmonary fibrosis (IPF) (see Methods). We find that asthma and COPD are characterized by higher SC values with respect to asthma pneumonia and asthma IPF pairs in the epithelial brushings of asthmatic samples (GSE4302) (Fig. 5b, top). We then repeated the same analysis for the pairs COPD-pneumonia and COPD-IPF, obtaining a similar result in lung tissue of COPD samples (GSE57148) (Fig. 5b, bottom). This result suggests that the molecular interaction of asthma and COPD may be deeper than expected when compared with other lung diseases, as conjectured by the Dutch hypothesis.
Overexpression and knockdown experiments in cell lines
To further validate the FC approach, we used in vitro gene perturbation to experimentally establish a connection between an asthma source seed gene and a COPD target seed gene via a network path of high flow centrality (see Methods). For this, we focused our attention on the asthma seed gene GSDMB, one of several genes on 17q21 that harbors the most replicated asthma-susceptibility locus identified by GWAS63. GSDMB is expressed in bronchial epithelium (a cell type relevant to the pathogenesis of both asthma and COPD) and recent murine models suggest that GSDMB overexpression results in spontaneous airway remodeling64—subepithelial fibrosis—that in humans contributes to fixed airway obstruction observed in COPD. For this experiment, we considered all the flow central paths between GSDMB and any of the COPD seed genes (Fig. 6a), i.e., those paths where all the intermediate genes have a significant FCS. To maximize the sensitivity of the analysis we consider as significant those genes whose FCS is >2 or whenever the right-tailed empirical p-value of their flow centrality value is <0.05. We find 8 paths satisfying these criteria. Of note, all eight flow central paths pass through one of two GSDMB neighbors HIVEP1 and PEBP1 (Fig. 6b). In experiments conducted in triplicate in a human bronchial epithelial cell line, we either augmented or suppressed GSDMB mRNA expression by plasmid transfection or siRNA knockdown, respectively, and obtained expression data for GSDMB, all predicted flow central genes, and target COPD seed genes from RNA-seq profiles of global gene expression (see Methods for details). We found strong evidence for connections between the asthma seed GSDMB and its predicted downstream target COPD seeds IL27, HHIP, and GSTCD. As summarized in Fig. 6b, both overexpression and silencing of GSMDB resulted in reciprocal downstream alterations in the expression of most flow central genes and target COPD genes. For example, GSDMB silencing resulted in significant changes in the expression of flow central HIVEP1 (expression increased), MAPK8 (decreased), IL27RA (increased), and the COPD seed gene IL27 (increased), while GSDMB overexpression resulted in changes in expression opposite to those induced by GSDMB silencing (MAPK8 increased, IL27RA decreased, with non-significant decreased expression of HIVEP1, see path 1 in Fig. 6b. Baseline expression of IL27 was below meaningful detection levels, precluding its analysis). Similar patterns were observed for most genes in paths connecting GSDMB to HHIP and GSTCD.
Discussion
The causal relationships of complex diseases are elusive because often multiple mechanistic processes explain why these diseases occur and develop in many different forms. However, with the advent of sequencing technologies and multi-omic assays it is now possible to obtain a more complete overview of the genetic profiles that are more susceptible to developing a condition. The long-standing question of the potential mechanistic relationships between asthma and COPD can thus be approached from a molecular viewpoint, and the putative causes analyzed at the level of genes and proteins. Yet, the information obtained by such technologies is mostly about the ‘actors’ of the processes, more than the processes themselves, leaving room for targeted studies analyzing the relations between the genes involved in disease development and pathways cross-talk.
The analysis of protein–protein interactions connecting the two diseases represents a first step in disentangling the intricate pathways that are responsible for the common pathogenesis of diseases such as asthma and COPD.
In this work we defined flow centrality, a topological measure to detect the genes mediating the molecular interactions occurring between asthma and COPD. Flow central genes show high specificity and can not be trivially associated to disease genes through first-neighbor interactions. By analyzing the network paths connecting asthma to COPD, we showed that flow central genes are functionally similar to the seed genes of the two diseases. This pattern is quite general: for a multitude of related disease pairs we observed high functional similarity between the flow central genes and their respective sources and targets, suggesting that flow centrality captures low-level molecular mechanisms that underlie different pathological conditions. As further support of this hypothesis, we measured high coexpression between flow central genes and the disease genes of asthma and COPD in multiple human transcriptomics datasets. To obtain experimental evidence of the regulation patterns occurring between the asthma and COPD genes, we restricted our attention to GSDMB, one of the most replicated genes associated to asthma, and assessed the downstream effects of its perturbation through in vitro overexpression/knockdown experiments. The flow central nodes occurring within the network paths connecting GSDMB to the COPD seed genes show strong differential expression patterns, hinting that these genes could participate in the molecular mechanisms carrying the perturbation from the asthma-specific to the COPD-specific domain.
These results suggest that flow centrality can help in identifying the genes involved in the key pathways associated with the transitioning or hybrid phenotypes between the two diseases. Multi-omics measurements (such as transcriptomics, genomics and epigenomics assays) could be leveraged to define a molecular profile of the flow central genes in affected patients65. By correlating these molecular profiles with the patients’ clinical conditions and outcomes, it would be in principle possible to locate these profiles on the asthma-COPD spectrum, creating new opportunities for targeted therapeutics.
The effectiveness of the flow centrality approach depends on the reliability of current PPI data. However, it is estimated that only around 20% of the total protein interactions are known, and a considerable number of the modeled interactions could be the result of false positive interactions5. Moreover, since the discovery of real interactions is nonuniform, and mainly driven by the interest in researching proteins that are associated to important functions or diseases, it may result in an inaccurate modeling of the actual wiring patterns of the network. However, the increase in reliability allowed by new and improved bias-free experimental and prediction66 assays of protein interactions (such as yeast-two hybrid), will be crucial in refining our understanding of the genes responsible for carrying a disease perturbation.
Methods
Construction of the interactome
The network we utilized in this work has been compiled by Cheng et al.19, and integrates protein–protein interactions extracted from 15 databases:
Binary PPIs tested by high-throughput yeast-two-hybrid (Y2H) systems (refs. 67,68, http://interactome.baderlab.org).
Kinase-substrate interactions from KinomeNetworkX69, Human Protein Resource Database (HPRD)70, PhosphoNetworks71,72, PhosphositePlus73, DbPTM 3.074, and Phospho. ELM75.
PPIs identified by affinity purification followed by mass spectrometry (AP-MS), Y2H and by literature-derived low-throughput experiments, and protein three-dimensional structures from BioGRID76, PINA77, Instruct78, HPRD70, MINT79, IntAct80, and InnateDB81.
Signaling network by literature-derived low-throughput experiments as annotated in SignaLink2.082.
By considering only the largest connected component of the network and removing self-loops, the resulting interactome includes 16,656 proteins and 243,592 interactions. For further details, refer to ref. 19.
Asthma and COPD seed genes
We identified a set of well-established genes by aggregating several sources of genome-wide associations studies that have been replicated for COPD and asthma susceptibility and specific genes implicated by eQTL or functional studies within GWAS regions. The sources considered for asthma and COPD are detailed, respectively, in Supplementary Data 1 and 2. For COPD, we also considered genes causing Mendelian syndromes which include emphysema as part of their phenotypes: alpha-1 antitrypsin deficiency (SERPINA1) and cutis laxa (ELN and FBLN5).
Disease module construction
The asthma and COPD disease modules are built through the DIAMOnD algorithm20. DIAMOnD is based on an iterative scheme that exploits the network’s topology to gradually build a disease module. Given a disease gene set of genes, at each iteration DIAMOnD calculates the statistical significance of connectivity of each node of the network to the disease genes. If the disease module at the current iteration is composed of genes, then a candidate node with degree and edges connected to the genes in the module has a p-value
1 |
where is the hypergeometric distribution
2 |
and is the total number of genes in the network. In ref. 20, seed genes can be weighted in order to be more preponderant in the p-value calculation, but in this analysis this possibility is not explored. Among the candidate nodes, the node that is most significantly connected to the set (and thus has a smaller p-value) is added to the module and the procedure starts again with the increased gene set. This operation is repeated for a fixed number of iterations , reaching a final module size of genes. In order to choose we extracted from UK-Biobank21 the genes significantly associated with asthma and COPD, using a threshold p-value of , and not present, respectively, in the asthma and COPD seed genes set. While UKB genes are in general different from the seed genes of asthma and COPD, some overlap may occur. Therefore, we considered only the UKB genes that are not present in the seed genes of asthma and COPD, respectively, 742 and 458 genes. Starting from the asthma seed genes we executed DIAMOnD and, at each iteration, we measured the hypergeometric p-value between GWAS-significant genes and the genes in the current module, obtaining the curve shown in Supplementary Fig. 1(a, left). We then selected as iterations cutoff the value that yielded the lowest p-value in the curve. We repeated the same operations for the COPD module (Supplementary Fig. 1(b, right)). The final sizes of the asthma and COPD modules are, respectively, 373 genes and 228 genes, with 14 overlapping genes.
Significance of overlap between the modules
In order to test the significance of the overlap between the asthma and COPD modules we generated 1000 random pairs of gene sets of asthma and COPD with the procedure described below (section Gene set randomization in Methods), and calculated the fraction of times when the measured overlap between random samples is equal or greater than the observed value (14 genes).
Flow centrality
Given a source disease module and a destination module , we define the flow centrality of a node is given by
3 |
where is the number of shortest paths from to passing through node , is the total number of shortest paths between and , and is the size of the corresponding set. In the particular case when , where is equal to the set of all the nodes of the networks, then the flow centrality reduces to the betweenness centrality measure. Note that, while Eq. (3) implies a directionality between the source disease module and target module , in undirected networks such roles are interchangeable.
The raw values of flow centrality as calculated by Eq. (3) are biased toward hubs: high-degree nodes are more likely to participate in shortest paths between node pairs just by chance. To account for this bias we calculate the statistical significance of the obtained values by comparing them with a null distribution generated by randomizing 1000 times the source and target modules. The details of the randomization scheme are described in Methods section. For each random pair of source and target modules we calculate the flow centrality of each node of the network and measure the average and standard deviation across all the samples. The FCS of a node is then calculated as
4 |
A large positive FCS indicates that the node is more likely to occur in the shortest paths connecting the source and target modules, while a small or negative value suggests that the node is not relevant to the chosen pair of modules.
FCS stability
To evaluate the stability of FCS values to moderate variations of the boundaries of the disease modules we performed the following test. We defined a range of possible small variations in the selected cutoff value iteration of DIAMOnD modules, i.e., . For example, when considering a variation 30 from the list in the case of the asthma module (373 genes), we build a perturbed asthma module by considering only the first genes prioritized by DIAMOnD, where is the original cutoff value, obtaining a module size of genes. We repeat the same scheme for COPD. For each value of we calculate the perturbed FCS values by setting the perturbed modules as source and target. The perturbed FCS are then compared with the original ones (see Supplementary Fig. 4), and Supplementary Fig. 5 shows their Spearman’s correlation for each value of . The obtained correlation values are very high (), indicating the robustness of the FCS scores to moderate variations of the modules size.
Gene set randomization
We defined a randomization scheme designed to create a null distribution of random modules that are topologically similar to a given DIAMOnD module. A straightforward way to generate randomized genes sets would be to select a number of random genes in a degree-preserved way, where is the size of the disease module we want to randomize, and repeat this process a number of times to obtain the samples. This method, however, has the drawback of generating disease modules that are quite different from the asthma and COPD sets we calculated with DIAMOnD. DIAMOnD iteratively searches in the neighborhood of the seed genes, generating modules that are more compact and well interconnected with respect to a random selection. Therefore, a z-score evaluated by comparing on such samples would be confounded by the different topological properties of the random modules. For this reason we defined the following randomization scheme:
Given a set of seed genes of a disease module (obtained with DIAMOnD), we extract a new set of random seed genes in a degree-preserved way.
We run DIAMOnD on the set of random seed genes for iterations, where is the size of , obtaining a new random module of size .
In this way, the procedure generates random modules that are topologically more similar to those generated by DIAMOnD.
Selection of network paths
The flow central paths are selected as all the shortest paths connecting the asthma and COPD seed genes, whose intermediate genes (i.e., those genes that are not the source or destination of the path) have a flow centrality score of 2 or greater. Assuming a normality in the null distribution of the FC values, a value that is 2 standard deviations away from the average value is well outside the bulk of the null distribution. Choosing an excessively large threshold can result in too few nodes being selected and might lead to missing important nodes in areas of lower edge density, while a too low threshold would increase the false positives. As an additional constraint we require for all the intermediate nodes in the paths to participate in at least five shortest paths connecting COPD and asthma nodes, in order to remove from the pool all the nodes that have unstable FCS values because of low shortest-path statistics. Note that while the full disease module information has not been used to select the initial pool of shortest paths, this information is embedded in the calculation of flow centrality of each gene in the network, since the FC depends on the source and target disease modules.
The Type A path randomization scheme is structured as follows:
Extract one length value from the empirical distribution of FC path lengths.
Create an empty path .
Select a node uniformly at random in the network and add it to .
Select one random neighbor of among those not already in and add it to .
Repeat from step 3 until the length of is .
Add to the current set of random paths.
Repeat from step 1 until a desired number of random paths is obtained.
Note that in the actual implementation of the scheme above some additional controls are performed in order to account for edge cases such as when no new neighbors can be added to the path, etc.
The Type B random paths are selected by uniformly sampling paths from the pool of shortest paths connecting the genes of the two diseases.
Sequential similarity
Given a path of length as an ordered sequence of unique genes in the network . The sequential similarity is then defined as
5 |
where is any GO terms similarity measure between genes. In this work we considered the best-match average (BMA) of Resnik’s similarity measure83,84, defined as follows. Given two genes and associated to the sets of GO terms and , respectively, the BMA Resnik similarity has the form
where denotes the Resnik similarity measure between the GO terms and .
Sequential coexpression
Given a path of length as an ordered sequence of unique genes in the network . The sequential coexpression is then defined as
7 |
where is the random variable indicating the expression values of gene and is the Pearson correlation. The sequential coexpression is the absolute correlation of the expression of adjacent genes in a network path, and therefore it measures the extent of coordination in the gene expression along the path. Notice that multiple transcripts in the expression data can be associated to the same gene. In those cases, we calculated the sequential coexpression as the maximum absolute value of correlation between all the possible pairs of probes/transcripts associated to the two genes. If at least one gene is not present in the expression dataset considered, then the sequential coexpression of path is considered null and excluded from the analysis.
Sequential similarity of related disease pairs
We downloaded from the DisGeNet repository50 all the curated gene-disease associations (GDA) and the disease mappings to convert the UMLS CUI identifiers to the identifiers of several other vocabularies. Note that in order to limit the amount of false positives, the DisGeNet associations obtained by text mining of MEDLINE abstracts (extracted through the BeFree tool) are excluded from the analysis. The data have been downloaded on July 19th from the webpage http://www.disgenet.org/downloads. From the GDA data we selected only the annotations to phenotypes of the type “Disease or Syndrome”. We then filtered all the diseases that are associated to <50 genes that can be mapped on the PPI network. Each resulting disease is associated to a list of Disease Ontology IDs (DOID), as annotated in the disease mapping data. Disease Ontology is a standardized ontology of human diseases that semantically integrates disease and medical vocabularies through cross mapping and integration of MeSH, ICD, NCI’s thesaurus, SNOMED CT and OMIM85,86. Notice that a disease can be mapped to multiple DOIDs, since it can belong to multiple categories in the ontology tree. We proceeded to calculate the pairwise similarities between diseases, using the R package DOSE87. For each pair, the similarity is calculated as the maximum Resnik similarity between all their associated DOIDs. The calculated similarities are shown in Supplementary Fig. 2, where the similarities that could not be retrieved (i.e., returned as null by the DOSE function) are set as 0. The related disease pairs are selected as those pairs with:
Similarity greater than the 90th percentile in the overall distribution of similarities, not considering the similarities that could not be retrieved and the similarities equal to 0.
Similarity <1, to avoid selecting disease IDs that are synonyms of the same phenotype.
Number of overlapping associated genes <10, to retrieve related diseases with little common genetic basis, as in the asthma-COPD case.
After applying this criteria, the resulting disease pairs are 66, listed in Supplementary Data 5. We manually scrutinized the disease pairs to assess the existence of an actual relation between them, obtaining for most of them a positive match. Given a disease pair D1–D2, we evaluated the flow centrality values of all the nodes in the network, by following the scheme outlined in the main text, with the only difference being in the generation of the random modules for the FCS calculation. In this case the disease modules correspond to the set of GDA retrieved from DisGeNet, without recurring to DIAMOND prioritization, and thus the random samples are obtained with a simple degree-preserved randomization of the disease genes. After the FCS values are calculated, we selected the corresponding FC paths, extracted the random paths of Type A (RdmA) and B (RdmB) as described in the main text, and evaluated the three distribution of sequential similarities (SS). For each disease pair we perform two right-sided Mann–Whitney tests, comparing the SS of the FC paths with the SS of RdmA and the SS of RdmB, obtaining two p-values and . A final p-value is calculated as . Significance of the aggregated p-value implies that the SS of the FC paths are significantly greater than the SS of both RdmA and RdmB, and thus the FC paths are more likely to represent meaningful biological links between the two diseases.
To assess the specificity of the result, for each disease pair D1–D2 in the pool defined above we generated two sets of 100 random modules, by randomizing the disease genes of D1 and D2 in a degree-preserving way. By using the values of flow centrality evaluated for the original pair, we selected the flow central paths of each random pair, i.e., the shortest paths connecting the nodes of the two random modules where all the intermediate genes have flow centrality >2. We refer to these paths as random FC paths. Then, we performed a Mann–Whitney test between the distribution of SS values of the original disease pair and the SS values of each random pair, separately. As a result of this operation we obtained 100 p-values comparing the SS of the actual disease pair with the SS of each random pair.
Selection of random diseases and phenotypes
In order to test the significance of the sequential coexpression of the flow central paths, we considered a number of diseases and phenotypes from the DisGeNet repository50 that are unrelated to asthma and COPD. The objective of this test is to compare the coexpression of the FC paths of asthma and COPD with random paths connecting asthma to a random disease, and repeat the same for COPD. We selected only the diseases and phenotypes with gene set sizes similar to the asthma and COPD seed gene sets, i.e., between 25 and 35 genes after mapping to the PPI network. In addition, we restrict the selection to the phenotypes annotated as “Disease or Syndrome”. With this criterion we obtained a total of 59 diseases and phenotypes. For each of these gene sets, we sampled 10,000 shortest paths by iteratively choosing one random seed gene of asthma as source and one random gene in the set as target, and repeated the same for COPD.
GEO expression data
We considered 18 microarray and RNA-seq expression datasets from GEO, as detailed in Supplementary Data 6. For each dataset, standard data processing steps were applied, such as conversion of probe IDs and gene symbols to entrez IDs and log-transformation, when necessary. For each expression dataset different subgroups of samples were selected, depending on the information available. Samples were first divided in disease or healthy condition, when present, and analyzed separately. Different tissues or cell types in the same dataset where further divided in separate classes and analyzed separately, when the information were available. For example, in GSE104468 data the asthmatic and control samples are further divided in bronchial epithelia, nasal epithelia and PBMC. Classes of samples that were not relevant for the analysis of the asthma-COPD overlap were excluded. For example, since allergy is an asthma-specific feature, atopic samples were excluded from the analysis when non-atopic counterparts were available (e.g., GSE473). In addition, in some cases we also considered overlapping groupings. For example, in GSE37147, we selected both the class of COPD smokers and its subclass of COPD smokers with no history of asthma. More details on the selected classes and the corresponding numbers of samples are provided in Supplementary Data 6.
SC of asthma and COPD with pneumonia and IPF
We selected the seed genes of pneumonia and idiopathic pulmonary fibrosis (IPF) from the DisGeNet repository, obtaining, respectively, 52 and 18 genes mapped on the PPI. In order to build a module with the same size as the COPD module, we run DIAMOnD with iterations for pneumonia and iterations for IPF, where is the size of the COPD module. We then evaluated the flow centrality of the PPI nodes with the asthma module as source and the pneumonia module as target gene set, and repeat the same for asthma and IPF. The FCS of each gene is calculated by randomizing the asthma, pneumonia and IPF modules with the procedure described above. In brief, seed genes are randomized in a degree-preserved way, and DIAMOnD is subsequently executed on the random gene sets to create the random modules. The sequential coexpression of the two pairs is then evaluated and compared with the SC of the asthma-COPD pair on expression data of asthmatic patients (GSE4302). The same process is repeated for COPD-pneumonia and COPD-IPF on expression data of COPD patients (GSE57148).
Overexpression and knockdown experiments
Cell culture: Human bronchial epithelial cell line Beas-2B or 16HBE cells were purchased from ATCC and cultured in Dulbecco’s modified Eagle medium (DMEM) or Eagle’s minimal essential medium (EMEM), respectively, supplemented with 10% fetal bovine serum, penicillin (50 units/ml), and streptomycin (50 g/ml).
Overexpression of recombinant GSDMB in Beas-2B cells: Human GSDMB in pCMV6 (epitope-tagged with Myc and FLAG, both at the carboxy-terminus) purchased from Origene (catalog number RC202279). Beas-2B cells were plated in 6-well plates at 4 105 cells/well overnight in complete medium. The next day, 0.5 μg of GSDMB plasmids or control plasmids and 1 μl of Lipofectamine 3000® (Thermo Fisher) were added into cells with fresh DMEM medium according to the manufacturer’s instructions in triplicate wells. RNA extraction was done at 48 h after transfection for RNA sequence.
siRNA knockdown in 16HBE cells: 16HBE cells were plated in 6-well plates at 6 105 cells/well overnight in complete medium. The next day, 30 pmol of GSDMB siRNA or control siRNA and 5 μl of Lipofectamine RNAiMAX (Thermo Fisher) were added into cells with fresh EMEM medium according to the manufacturer’s instructions in triplicate wells. Two different hairpins (Thermo Fisher, s31709, s31711) were used in the experiments. RNA extraction was done at 48 h after transfection for RNA sequence.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
The authors would like to thank Istvan Kovacs, Marc Santolini, and Ayse Kilic for useful discussion, and the Rivas lab for making the Global Biobank Engine resource available. We acknowledge the support of the National Institutes of Health (NIH) grants R01 HL118455-04-1 and P01 HL13285. The funders had no role in study design, data collection, and analysis, decision to publish, or preparation of the paper.
Author contributions
E.M. and A.S. conceived and developed the idea. E.M. analyzed the data and was the lead writer of the manuscript. F.G., P.H.K. and X.Z. carried out the experiments. S.H.B., E.K.S., A.-L.B., S.T.W., B.A.R. and A.S. contributed to the writing of the paper, provided critical feedback and helped shape the research and the analysis of the problem.
Data availability
The authors declare that the main data supporting the findings of this study are available within the article and its Supplementary Information files. Extra data are available from the corresponding author upon request.
Code availability
The source code for reproducing the analysis has been developed in python 3.6 and is available as a github repository at the url https://github.com/reemagit/flowcentrality.
Competing interests
A.L.B. is founder of Nomix, Foodome and Scipher Medicine, companies that explore the role of networks and food in health. The remaining authors declare no competing interests.
Footnotes
Peer review information Nature Communications thanks Anil Jegga and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Deceased: Amitabh Sharma.
Change history
4/19/2021
A Correction to this paper has been published: 10.1038/s41467-021-22939-x
Supplementary information
Supplementary information is available for this paper at 10.1038/s41467-020-14600-w.
References
- 1.Barabási A-L, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 2011;12:56. doi: 10.1038/nrg2918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Boyle EA, Li YI, Pritchard JK. An expanded view of complex traits: from polygenic to omnigenic. Cell. 2017;169:1177–1186. doi: 10.1016/j.cell.2017.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Goh K-I, et al. The human disease network. Proc. Natl Acad. Sci. USA. 2007;104:8685–8690. doi: 10.1073/pnas.0701361104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Sharma A, et al. A disease module in the interactome explains disease heterogeneity, drug response and captures novel pathways and genes in asthma. Hum. Mol. Genet. 2015;24:3005–3020. doi: 10.1093/hmg/ddv001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Menche J, et al. Uncovering disease-disease relationships through the incomplete interactome. Science. 2015;347:1257601. doi: 10.1126/science.1257601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Sharma, A. et al. Integration of molecular interactome and targeted interaction analysis to identify a COPD disease network module. bioRxiv10.1101/408229 (2018). [DOI] [PMC free article] [PubMed]
- 7.Gratten J, Visscher PM. Genetic pleiotropy in complex traits and diseases: implications for genomic medicine. Genome Med. 2016;8:78. doi: 10.1186/s13073-016-0332-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Soriano JB, et al. Global, regional, and national deaths, prevalence, disability-adjusted life years, and years lived with disability for chronic obstructive pulmonary disease and asthma, 1990–2015: a systematic analysis for the global burden of disease study 2015. Lancet Respiratory Med. 2017;5:691–706. doi: 10.1016/S2213-2600(17)30293-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Desai M, Oppenheimer J, Tashkin DP. Asthma–chronic obstructive pulmonary disease overlap syndrome: What we know and what we need to find out. Ann. Allergy, Asthma Immunol. 2017;118:241–245. doi: 10.1016/j.anai.2016.12.016. [DOI] [PubMed] [Google Scholar]
- 10.Wurst KE, Kelly-Reif K, Bushnell GA, Pascoe S, Barnes N. Understanding asthma-chronic obstructive pulmonary disease overlap syndrome. Respiratory Med. 2016;110:1–11. doi: 10.1016/j.rmed.2015.10.004. [DOI] [PubMed] [Google Scholar]
- 11.McGeachie MJ, et al. Genetics and genomics of longitudinal lung function patterns in individuals with asthma. Am. J. Respiratory Crit. Care Med. 2016;194:1465–1474. doi: 10.1164/rccm.201602-0250OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Postma DS, Weiss ST, van den Berge M, Kerstjens HA, Koppelman GH. Revisiting the dutch hypothesis. J. Allergy Clin. Immunol. 2015;136:521–529. doi: 10.1016/j.jaci.2015.06.018. [DOI] [PubMed] [Google Scholar]
- 13.McGeachie MJ, et al. Patterns of growth and decline in lung function in persistent childhood asthma. N. Engl. J. Med. 2016;374:1842–1852. doi: 10.1056/NEJMoa1513737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Svanes C, et al. Early life origins of chronic obstructive pulmonary disease. Thorax. 2010;65:14–20. doi: 10.1136/thx.2008.112136. [DOI] [PubMed] [Google Scholar]
- 15.Sears MR, et al. A longitudinal, population-based, cohort study of childhood asthma followed to adulthood. N. Engl. J. Med. 2003;349:1414–1422. doi: 10.1056/NEJMoa022363. [DOI] [PubMed] [Google Scholar]
- 16.Orie, N. & Sluiter, H. (eds). Bronchitis. in Proceedings of the International Symposium on Bronchitis, Groningen, The Netherlands (RoyalVan Gorcum, Assen, 1961).
- 17.Visscher PM, et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Klein RJ, Xu X, Mukherjee S, Willis J, Hayes J. Successes of genome-wide association studies. Cell. 2010;142:350–351. doi: 10.1016/j.cell.2010.07.026. [DOI] [PubMed] [Google Scholar]
- 19.Cheng F, et al. Network-based approach to prediction and population-based validation of in silico drug repurposing. Nat. Commun. 2018;9:2691. doi: 10.1038/s41467-018-05116-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ghiassian SD, Menche J, Barabási A-L. A DIseAse MOdule Detection (DIAMOnD) algorithm derived from a systematic analysis of connectivity patterns of disease proteins in the human interactome. PLoS Computational Biol. 2015;11:e1004120. doi: 10.1371/journal.pcbi.1004120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Global Biobank Engine. http://gbe.stanford.edu/ (Stanford, CA, 2017).
- 22.Liu Y, Kulesz-Martin M. p53 protein at the hub of cellular DNA damage response pathways through sequence-specific and non-sequence-specific DNA binding. Carcinogenesis. 2001;22:851–860. doi: 10.1093/carcin/22.6.851. [DOI] [PubMed] [Google Scholar]
- 23.Ma B, Hottiger MO. Crosstalk between Wnt/-catenin and NF-B signaling pathway during inflammation. Front. Immunol. 2016;7:378. doi: 10.3389/fimmu.2016.00378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Roux PP, Blenis J. ERK and p38 MAPK-activated protein kinases: a family of protein kinases with diverse biological functions. Microbiol. Mol. Biol. Rev. 2004;68:320–344. doi: 10.1128/MMBR.68.2.320-344.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Moens U, Kostenko S, Sveinbjørnsson B. The role of mitogen-activated protein kinase-activated protein kinases (MAPKAPKs) in inflammation. Genes. 2013;4:101–133. doi: 10.3390/genes4020101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Simon AR, Takahashi S, Severgnini M, Fanburg BL, Cochran BH. Role of the JAK-STAT pathway in PDGF-stimulated proliferation of human airway smooth muscle cells. Am. J. Physiol.-Lung Cell. Mol. Physiol. 2002;282:L1296–L1304. doi: 10.1152/ajplung.00315.2001. [DOI] [PubMed] [Google Scholar]
- 27.Freeman LC. A set of measures of centrality based on betweenness. Sociometry. 1977;40:35–41. [Google Scholar]
- 28.Newman ME. A measure of betweenness centrality based on random walks. Soc. Netw. 2005;27:39–54. [Google Scholar]
- 29.Estrada E, Higham DJ, Hatano N. Communicability betweenness in complex networks. Phys. A. 2009;388:764–774. [Google Scholar]
- 30.Kivimäki I, Lebichot B, Saramäki J, Saerens M. Two betweenness centrality measures based on randomized shortest paths. Sci. Rep. 2016;6:19668. doi: 10.1038/srep19668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Garcia-Vaquero, M. L., Gama-Carvalho, M., Rivas, J.D.L. et al. Searching the overlap between network modules with specific betweeness (S2B) and its application to cross-disease analysis. Sci Rep8, 11555 (2018). [DOI] [PMC free article] [PubMed]
- 32.Pilecki B, et al. Microfibrillar-associated protein 4 modulates airway smooth muscle cell phenotype in experimental asthma. Thorax. 2015;70:862–872. doi: 10.1136/thoraxjnl-2014-206609. [DOI] [PubMed] [Google Scholar]
- 33.Lange AW, Keiser AR, Wells JM, Zorn AM, Whitsett JA. Sox17 promotes cell cycle progression and inhibits TGF-/Smad3 signaling to initiate progenitor cell behavior in the respiratory epithelium. PLoS One. 2009;4:e5711. doi: 10.1371/journal.pone.0005711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Takizawa H, et al. Increased expression of transforming growth factor-1 in small airway epithelium from tobacco smokers and patients with chronic obstructive pulmonary disease (COPD) Am. J. Respiratory Crit. Care Med. 2001;163:1476–1483. doi: 10.1164/ajrccm.163.6.9908135. [DOI] [PubMed] [Google Scholar]
- 35.Sime PJ, Xing Z, Graham FL, Csaky KG, Gauldie J. Adenovector-mediated gene transfer of active transforming growth factor-beta1 induces prolonged severe fibrosis in rat lung. J. Clin. Investig. 1997;100:768–776. doi: 10.1172/JCI119590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Makinde T, Murphy RF, Agrawal DK. The regulatory role of TGF- in airway remodeling in asthma. Immunol. Cell Biol. 2007;85:348–356. doi: 10.1038/sj.icb.7100044. [DOI] [PubMed] [Google Scholar]
- 37.Napolitano JR, et al. Cadmium-mediated toxicity of lung epithelia is enhanced through NF- B-mediated transcriptional activation of the human zinc transporter ZIP8. Am. J. Physiol.-Lung Cell. Mol. Physiol. 2012;302:L909–L918. doi: 10.1152/ajplung.00351.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Besecker B, et al. The human zinc transporter SLC39A8 (Zip8) is critical in zinc-mediated cytoprotection in lung epithelia. Am. J. Physiol.-Lung Cell. Mol. Physiol. 2008;294:L1127–L1136. doi: 10.1152/ajplung.00057.2008. [DOI] [PubMed] [Google Scholar]
- 39.Chuang P-T, Kawcak T, McMahon AP. Feedback control of mammalian hedgehog signaling by the hedgehog-binding protein, HIP1, modulates FGF signaling during branching morphogenesis of the lung. Genes Dev. 2003;17:342–347. doi: 10.1101/gad.1026303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Zhou X, et al. Identification of a chronic obstructive pulmonary disease genetic determinant that regulates HHIP. Hum. Mol. Genet. 2011;21:1325–1335. doi: 10.1093/hmg/ddr569. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Zhou X, et al. Gene expression analysis uncovers novel hedgehog interacting protein (HHIP) effects in human bronchial epithelial cells. Genomics. 2013;101:263–272. doi: 10.1016/j.ygeno.2013.02.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Resnik P. Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. 1999;11:95–130. [Google Scholar]
- 43.Horbelt D, Denkis A, Knaus P. A portrait of transforming growth factor superfamily signalling: background matters. Int. J. Biochem. Cell Biol. 2012;44:469–474. doi: 10.1016/j.biocel.2011.12.013. [DOI] [PubMed] [Google Scholar]
- 44.Ramirez H, Patel SB, Pastar I. The role of TGF signaling in wound epithelialization. Adv. Wound Care. 2014;3:482–491. doi: 10.1089/wound.2013.0466. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wicke C, et al. Effects of steroids and retinoids on wound healing. Arch. Surg. 2000;135:1265–1270. doi: 10.1001/archsurg.135.11.1265. [DOI] [PubMed] [Google Scholar]
- 46.Pendaries V, Verrecchia F, Michel S, Mauviel A. Retinoic acid receptors interfere with the TGF-/Smad signaling pathway in a ligand-specific manner. Oncogene. 2003;22:8212. doi: 10.1038/sj.onc.1206913. [DOI] [PubMed] [Google Scholar]
- 47.Čokić VP, et al. Proinflammatory cytokine IL-6 and JAK-STAT signaling pathway in myeloproliferative neoplasms. Mediators Inflamm. 2015;2015:453020. doi: 10.1155/2015/453020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Troutman TD, Bazan JF, Pasare C. Toll-like receptors, signaling adapters and regulation of the pro-inflammatory response by PI3K. Cell cycle. 2012;11:3559–3567. doi: 10.4161/cc.21572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Greenhill CJ, et al. IL-6 trans-signaling modulates TLR4-dependent inflammatory responses via STAT3. J. Immunol. 2011;186:1199–1208. doi: 10.4049/jimmunol.1002971. [DOI] [PubMed] [Google Scholar]
- 50.Piñero, J. et al. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database2015 (2015). [DOI] [PMC free article] [PubMed]
- 51.Dugger BN, Dickson DW. Pathology of neurodegenerative diseases. Cold Spring Harb. Perspect. Biol. 2017;9:a028035. doi: 10.1101/cshperspect.a028035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Nixon RA. The role of autophagy in neurodegenerative disease. Nat. Med. 2013;19:983. doi: 10.1038/nm.3232. [DOI] [PubMed] [Google Scholar]
- 53.Montibeller L, de Belleroche J. Amyotrophic lateral sclerosis (ALS) and Alzheimer’s disease (AD) are characterised by differential activation of ER stress pathways: focus on UPR target genes. Cell Stress Chaperones. 2018;23:897–912. doi: 10.1007/s12192-018-0897-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Kim D, et al. SIRT1 deacetylase protects against neurodegeneration in models for Alzheimer’s disease and amyotrophic lateral sclerosis. EMBO J. 2007;26:3169–3179. doi: 10.1038/sj.emboj.7601758. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Kostner L, et al. Allergic contact dermatitis. mmunology Allergy Clin. 2017;37:141–152. doi: 10.1016/j.iac.2016.08.014. [DOI] [PubMed] [Google Scholar]
- 56.Lowes MA, Suarez-Farinas M, Krueger JG. Immunology of psoriasis. Annu. Rev. Immunol. 2014;32:227–255. doi: 10.1146/annurev-immunol-032713-120225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Balato A, et al. IL-36 is involved in psoriasis and allergic contact dermatitis. J. investigative Dermatol. 2016;136:1520. doi: 10.1016/j.jid.2016.03.020. [DOI] [PubMed] [Google Scholar]
- 58.Sabayan B, Foroughinia F, Haghighi AB, Mowla A. Are women with polycystic ovary syndrome (PCOS) at higher risk for development of Alzheimer disease? Alzheimer Dis. Associated Disord. 2007;21:265–267. doi: 10.1097/WAD.0b013e31813e89d5. [DOI] [PubMed] [Google Scholar]
- 59.Jiang S-W, et al. Pathologic significance of SET/I2PP2A-mediated PP2A and non-PP2A pathways in polycystic ovary syndrome (PCOS) Clin. Chim. Acta. 2017;464:155–159. doi: 10.1016/j.cca.2016.11.010. [DOI] [PubMed] [Google Scholar]
- 60.Arif M, et al. Cytoplasmic retention of protein phosphatase 2A inhibitor 2 (I2PP2A) induces Alzheimer-like abnormal hyperphosphorylation of Tau. J. Biol. Chem. 2014;289:27677–27691. doi: 10.1074/jbc.M114.565358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Woodruff PG, et al. Genome-wide profiling identifies epithelial cell genes associated with asthma and with treatment response to corticosteroids. Proc. Natl Acad. Sci. USA. 2007;104:15858–15863. doi: 10.1073/pnas.0707413104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Singh D, et al. Altered gene expression in blood and sputum in COPD frequent exacerbators in the ECLIPSE cohort. PloS One. 2014;9:e107381. doi: 10.1371/journal.pone.0107381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Torgerson DG, et al. Meta-analysis of genome-wide association studies of asthma in ethnically diverse North American populations. Nat. Genet. 2011;43:887. doi: 10.1038/ng.888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Das, S. et al. GSDMB induces an asthma phenotype characterized by increased airway responsiveness and remodeling without lung inflammation. Proc. Natl Acad. Sci. USA113, 13132–13137 (2016). [DOI] [PMC free article] [PubMed]
- 65.Menche J, et al. Integrating personalized gene expression profiles into predictive disease-associated gene pools. NPJ Syst. Biol. Appl. 2017;3:10. doi: 10.1038/s41540-017-0009-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Kovács, I. A. et al. Network-based prediction of protein interactions. bioRxiv10.1101/275529 (2018). [DOI] [PMC free article] [PubMed]
- 67.Rolland T, et al. A proteome-scale map of the human interactome network. Cell. 2014;159:1212–1226. doi: 10.1016/j.cell.2014.10.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Rual J-F, et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature. 2005;437:1173. doi: 10.1038/nature04209. [DOI] [PubMed] [Google Scholar]
- 69.Cheng F, Jia P, Wang Q, Zhao Z. Quantitative network mapping of the human kinome interactome reveals new clues for rational kinase inhibitor discovery and individualized cancer therapy. Oncotarget. 2014;5:3697. doi: 10.18632/oncotarget.1984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Peri S, et al. Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res. 2004;32:D497–D501. doi: 10.1093/nar/gkh070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Newman RH, et al. Construction of human activity-based phosphorylation networks. Mol. Syst. Biol. 2013;9:655. doi: 10.1038/msb.2013.12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Hu J, et al. Phosphonetworks: a database for human phosphorylation networks. Bioinformatics. 2013;30:141–142. doi: 10.1093/bioinformatics/btt627. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Hornbeck PV, et al. Phosphositeplus, 2014: mutations, PTMs and recalibrations. Nucleic acids Res. 2014;43:D512–D520. doi: 10.1093/nar/gku1267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Lu C-T, et al. DbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications. Nucleic Acids Res. 2012;41:D295–D305. doi: 10.1093/nar/gks1229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Dinkel H, et al. Phospho. ELM: a database of phosphorylation sites—update 2011. Nucleic Acids Res. 2010;39:D261–D267. doi: 10.1093/nar/gkq1104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Chatr-Aryamontri A, et al. The biogrid interaction database: 2015 update. Nucleic Acids Res. 2014;43:D470–D478. doi: 10.1093/nar/gku1204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Cowley MJ, et al. PINA v2. 0: mining interactome modules. Nucleic Acids Res. 2011;40:D862–D865. doi: 10.1093/nar/gkr967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Meyer MJ, Das J, Wang X, Yu H. INstruct: a database of high-quality 3D structurally resolved protein interactome networks. Bioinformatics. 2013;29:1577–1579. doi: 10.1093/bioinformatics/btt181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Licata L, et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 2011;40:D857–D861. doi: 10.1093/nar/gkr930. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Orchard S, et al. The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 2013;42:D358–D363. doi: 10.1093/nar/gkt1115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Breuer K, et al. InnateDB: systems biology of innate immunity and beyond—recent updates and continuing curation. Nucleic Acids Res. 2012;41:D1228–D1233. doi: 10.1093/nar/gks1147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Fazekas D, et al. SignaLink 2—a signaling pathway resource with multi-layered regulatory networks. BMC Syst. Biol. 2013;7:7. doi: 10.1186/1752-0509-7-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Azuaje, F., Wang, H. & Bodenreider, O. Ontology-driven similarity approaches to supporting gene functional assessment. in Proceedings of the ISMB’2005 SIG Meeting on Bio-ontologies, 9–10 (2005).
- 84.Resnik, P. Using information content to evaluate semantic similarity in a taxonomy. in Proceedings of the 14th International Joint Conference on Artificial Intelligence, 448–453 (1995).
- 85.Schriml LM, et al. Human disease ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res. 2018;47:D955–D962. doi: 10.1093/nar/gky1032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Schriml LM, et al. Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res. 2011;40:D940–D946. doi: 10.1093/nar/gkr972. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Yu G, Wang L-G, Yan G-R, He Q-Y. Dose: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics. 2014;31:608–609. doi: 10.1093/bioinformatics/btu684. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The authors declare that the main data supporting the findings of this study are available within the article and its Supplementary Information files. Extra data are available from the corresponding author upon request.
The source code for reproducing the analysis has been developed in python 3.6 and is available as a github repository at the url https://github.com/reemagit/flowcentrality.