Abstract
Disease–disease relationships (e.g., disease comorbidities) play crucial roles in pathobiological manifestations of diseases and personalized approaches to managing those conditions. In this study, we develop a network-based methodology, termed meta-path-based Disease Network (mpDisNet) capturing algorithm, to infer disease–disease relationships by assembling four biological networks: disease–miRNA, miRNA–gene, disease–gene, and the human protein–protein interactome. mpDisNet is a meta-path-based random walk to reconstruct the heterogeneous neighbors of a given node. mpDisNet uses a heterogeneous skip-gram model to solve the network representation of the nodes. We find that mpDisNet reveals high performance in inferring clinically reported disease–disease relationships, outperforming that of traditional gene/miRNA-overlap approaches. In addition, mpDisNet identifies network-based comorbidities for pulmonary diseases driven by underlying miRNA-mediated pathobiological pathways (i.e., hsa-let-7a- or hsa-let-7b-mediated airway epithelial apoptosis and pro-inflammatory cytokine pathways) as derived from the human interactome network analysis. The mpDisNet offers a powerful tool for network-based identification of disease–disease relationships with miRNA-mediated pathobiological pathways.
Subject terms: Computational biology and bioinformatics, Cardiology, Regulatory networks
Disease comorbidity analysis from microRNA regulatory networks
Identification of disease–disease relationships play essential roles in development of personalized approaches to managing those conditions. A team led by Dr. Feixiong Cheng at Cleveland Clinic developed a network-based methodology (termed mpDisNet) to infer clinically relevant disease–disease relationships from miRNA regulatory networks. mpDisNet shows high performance in identification of clinically reported disease–disease relationships by a unique integration of miRNA–gene–disease networks under the human interactome model. Importantly, mpDisNet successfully identifies a network-based relationship between asthma and chronic obstructive pulmonary disease driven by underlying miRNA-mediated pathobiological pathways. From a translational perspective, if broadly applied, mpDisNet would offer a powerful network-based tool for understanding of clinical comorbidities for multiple complex diseases from heterogeneous biological networks, a significant challenge of precision medicine.
Introduction
The manifestation and clinical severity of human disease are affected by myriad factors, including genetic, epigenetic, lifestyle, and various environmental variables.1 Identification of disease–disease relationships not only offers insights into disease heterogeneity, but also reveal etiology and pathogenesis of disease comorbidities,2,3 thus driving development of effective therapeutic strategies.4,5 Previous studies designed to map comprehensive disease–disease connections focused mainly on known associations among diseases and associated genes/proteins. However, the predisposition to human disease is dictated by a complex, polygenic, and pleiotropic genetic architecture.6 Some complex diseases that are mainly driven by environmental or acquired triggers often display more limited genetic risk. Thus, traditional bioinformatics analysis of genetic risk factors offers limited power to detect the true breadth of complex disease–disease relationships.
Beyond genetic analysis, shared patterns of gene expression have raised possibilities to inspect disease–disease relationships.6 Alteration and dysregulation of gene expressions are caused by several biological mechanisms, including microRNA (miRNA) dysregulation. In 1993, Ambros et al. discovered the first type of miRNA (lin-4) in a nematode, revealing for the first time the essential function of miRNA in the posttranscriptional regulation of gene expression.7 MiRNAs belong to a class of endogenous, small, non-coding RNAs (~22 nucleotides) and play crucial roles in inhibiting the expression of target mRNAs at the posttranscriptional level.8 Specifically, miRNAs regulate target genes by partially or completely pairing with their 3′ UTR region, thereby reducing the stability of the target miRNA or inhibiting translation to downregulate the expression of genes of interest.9 This complex regulatory network not only regulates the expression of multiple genes through one miRNA, but also finely regulates the expression of multiple genes by the combination of several miRNAs. Thus, the shared patterns of gene expression regulated by miRNAs may offer possibilities to inspect disease–disease relationships.
Currently, more than 30,000 miRNAs within ~200 species have been identified.10 Cumulative empirical evidences show that miRNAs are closely related to the development, progression, and prognosis of multiple diseases, such as pulmonary vascular disease.11,12 However, it is not obvious whether ascertaining the comprehensive breadth of miRNA-mediated gene networks offer discerning power to reveal important disease–disease relationships. Recent human protein–protein interactome network modeling shows that network-based approaches have raised possibilities to identify disease–disease relationships2 and drug–disease associations.4
In this study, we developed a network-based methodology, termed meta-path-based Disease Network (mpDisNet) capturing algorithm, to infer new disease–disease relationships from miRNA-mediated network perspectives. We built a heterogeneous miRNA–gene–disease network by assembling four biological networks: disease–miRNA, miRNA–gene, gene–disease, and the human protein–protein interactome (Table 1). Specifically, mpDisNet searches a specific meta-path (a meta-path is a path linking two specified nodes in a network mode) based on a Random Walk algorithm13 to reconstruct the heterogeneous neighbors of a node. Specifically, we utilized a heterogeneous skip-gram model14 to solve the network representation of the nodes in mpDisNet (Fig. 1). We found that mpDisNet displayed a higher performance in inferring disease–disease relationships compared with traditional miRNA-overlapping approaches. Via t-distributed stochastic neighbor embedding (t-SNE) analysis,15 the reduced dimension graphs generated by the disease–miRNA–gene and disease–gene networks reveal that mpDisNet can effectively distinguish different class of human diseases, offering potential pathobiological implications. We further identified pulmonary disease comorbidities (e.g., lung cancer-asthma and asthma-chronic obstructive pulmonary disease) with potential miRNA-mediated pathobiological mechanisms. If broadly applied, mpDisNet would offer a powerful network-based tool for identification of disease–disease relationships for multiple complex diseases from heterogeneous biological networks.
Table 1.
Networks | # of nodes | # of links (edges) | |
---|---|---|---|
Disease–miRNA | diseases | 394 | 7669 |
miRNA | 691 | ||
miRNA–gene | miRNA | 568 | 163,090 |
genes | 14,762 | ||
Disease-genes | diseases | 394 | 50,589 |
genes | 2684 | ||
The human interactome | proteins | 16,706 | 246,995 |
Note: The number of nodes and edges, and the according data resources are illustrated. More details about those data resources are provided in the Supplementary Methods
Results
Pipeline of mpDisNet
MpDisNet infers miRNA-mediated disease–disease relationships based on the topology of multiple networks among diseases, miRNAs, and genes (Fig. 1). The pipeline of mpDisNet has four key steps (see Methods section): (i) network data integration: we reconstructed a heterogenous network by assembling four experimentally validated networks, including disease–miRNA, miRNA–gene, disease–gene, and the human interactome networks (Table 1); (ii) meta-path-based Random Walks: we reconstructed heterogeneous neighbors of the nodes using the random walk of the meta-path and generated instance sequences;14 (iii) heterogeneous skip-gram: we generated the multidimensional vector for each disease by the skip-gram from the instance sequences; and (iv) network-based inferring disease–disease relationships: we calculated the disease–disease cosine similarities based on the multidimensional vectors generated from the skip-gram (iii). The detailed pipeline of mpDisNet is illustrated in Fig. 1.
Performance of mpDisNet
We compared mpDisNet with miRNA-overlap measure on the experimentally validated disease–miRNA association network (see Methods section). Herein, mpDisNet is the result of selecting the meta-path M1 (disease–miRNA–gene–gene–miRNA–disease) and M3 (disease–gene–gene–disease) in an integrated heterogeneous network (Fig. 1). For miRNA-overlap measure, we assume that the set of miRNAs corresponding to disease A is Am, and the corresponding set of disease B is Bm. We calculated disease–disease similarity based on overlap measure as below:
1 |
We selected the top 300 pairs of the highest similarity disease pairs (Supplementary Table 1) obtained by miRNA-overlap measure and mpDisNet, and plotted two network graphs of miRNA-overlap measure (Fig. 2a) and mpDisNet (Fig. 2b), respectively. The node color of each disease is classified according to the disease pathobiological classification from a previous study.16 Overall, the mpDisNet (Fig. 2b) can capture clinically reported disease–disease comorbidities in the same pathobiological categories of specific diseases, outperforming miRNA-overlap measure (Fig. 2a). For example, associations among obesity (Mesh ID: D009765), diabetes mellitus (Mesh ID: D003920), cystic fibrosis (Mesh ID: D003550), osteoporosis (Mesh ID: D010024), and metabolic syndrome X (Mesh ID: D024821) are well captured by mpDisNet (Fig. 2b). For cardiovascular disease, the significant associations among heart disease (myocardial infarction), coronary artery disease, atherosclerosis, ischemia, and hypertension are successfully identified by mpDisNet as well (Fig. 2b). For neurological diseases, the mpDisNet-predicted relationships among schizophrenia, bipolar disorder, and Alzheimer’s disease were consistent with a recent study.6 Finally, multiple types of cancer are found to share a strong association identified by mpDisNet, consistent with recent pan-cancer studies.17,18 Altogether, mpDisNet identifies potentially well-known disease–disease relationships.
To validate performance of mpDisNet further, we collected 220 clinically reported disease–disease pairs from a previous study.19 We found that these 220 disease–disease pairs can be correctly re-identified by mpDisNet. However, miRNA-overlap measure can only identify 120 pairs. We plotted the network map (Fig. 3) of mpDisNet-predicted 100 comorbid disease pairs (Supplementary Table 2) which are not identified by miRNA-overlap measure. For example, mpDisNet successfully identifies the associations of autoimmune lymphoproliferative syndrome with bipolar disorder, cataract, celiac disease, and Crohn disease. In addition, cerebral infarction is associated with several diseases or syndromes, including friedreich ataxia, long QT Syndrome, multiple endocrine neoplasia Type 1, osteogenesis imperfecta, retinitis pigmentosa, telangiectasia, hereditary hemorrhagic, and thalassemia, identified by mpDisNet as well (Fig. 3 and Supplementary Table 2).
We next turned to evaluate the receiver operating characteristic (ROC) and precision-recall curves based on 66 clinically reported disease–disease pairs (Supplementary Table 3) derived from the previously published implicit semantic similarity measure.20 We found that mpDisNet showed a reasonable accuracy (the area under ROC [AUROC = 0.65] and the area under precision-recall curve [AUPR] = 0.68, Fig. 4) in inferring the clinically reported disease–disease pairs, outperforming that of miRNA-overlap measure (AUROC = 0.59 and AUPR = 0.56, Fig. 4). In addition, mpDisNet showed a reasonable accuracy (AUROC = 0.67 and AUPR = 0.66) in inferring the clinically reported disease–disease pairs on an external validation set,21 revealing high generalizability. Altogether, mpDisNet reveals high accuracy in inferring disease–disease relationships, outperforming traditional miRNA-overlap measure.
Biological interpretation of mpDisNet
We next turned to investigate whether the underlying miRNA-mediated subnetworks identified by mpDisNet can offer potential pathobiological mechanisms for the inferred disease–disease relationships. Specifically, we integrated two networks into a single heterogeneous network and evaluated two meta-paths M1 (disease–miRNA–gene–gene–miRNA–disease) and M3 (disease–gene–gene–disease) as shown in Fig. 1. The multidimensional vectors of the two meta-paths were obtained by random walk and skip-gram, and then the multidimensional vectors were concatenated to infer disease–disease relationships (see Methods). We then performed dimensionality reduction visualization analysis using a t-SNE algorithm.22 We removed diseases with unknown classification and kept diseases with well-known pathobiological annotations with at least seven types of diseases in each category. In the dimensionality reduction diagram (Fig. 5), a closer distance between two diseases reveals a higher relevant pathobiological relationship. We found that the same pathobiological categories of diseases are clustered by the multidimensional vectors (Fig. 5), indicating that the underlying miRNA-mediated pathobiological pathways can be identified by mpDisNet.
Network-based identification of miRNA-mediated pathobiological pathways between lung cancer and asthma
As shown in Fig. 2b, we found a strong association of cancers (e.g., lung neoplasms) with asthma and COPD. This finding is consistent with recent meta-analyses, suggesting the potential associations of COPD and asthma with several cancer types such as lung cancer.23,24 For example, shortness of breath and respiratory distress often increase the suffering of advanced-stage lung cancer patients.23,24 However, the underlying disease pathways for lung cancer-associated asthma remain unclear. Asthma is a condition characterized by chronic inflammation of the lungs, including airway hyper-reactivity, excessive mucous formation, and respiratory obstruction. We asserted that lung cancer-associated asthma may be caused from tumor cell microenvironments, such as cross-talk pro-inflammatory pathway. For example, recent studies showed that micro-environmental inflammation by tumor cell-immune cell cross-talk may induce lung cancer-associated pulmonary hypertension.25,26
We therefore performed a multi-layer human interactome network analysis to inspect the miRNA-mediated pathobiological pathways for lung cancer-associated asthma via mpDisNet (Fig. 6). For example, two highlighted miRNAs, hsa-mir-7a and hsa-mir-155, play important roles in both lung cancer27,28 and asthma,29,30 which are involved in multiple meta-paths in Fig. 6. Hsa-mir-34a was reported as a tumor suppressor gene by inhibiting non-small cell lung cancer (NSCLC) growth and suppressing the CD44hi stem-like NSCLC cells.31,32 We found that a meta-path of hsa-mir-34a-SAA1-APBB1 may involve in the lung cancer-associated asthma by meta-path-based network analysis within the human protein–protein interactome (Fig. 6). SSA1, encoding serum amyloid A1, activates the NLRP3 inflammasome and promotes asthma in mice.33 Thus, hsa-mir-34a that mediates lung tumor growths, may involve in inflammasome-mediated pathways in asthma as well.
We next examined whether we can identify novel miRNA-mediated pathways for lung cancer-associated asthma. Figure 6 reveals that a meta-path of hsa-mir-17-STK11/LKB1 plays a key role in lung cancer by regulating cancer cell metabolism.34–36 STK11/LKB1 is a central regulator of T cell development, activation and metabolism.37 In addition, the T cell plays an important functional role in asthma as well.38 Collectively, hsa-mir-17-STK11/LKB1 may offer a potential pathobiological pathway for lung cancer-associated asthma. In summary, potential miRNA-mediated disease pathways captured by mpDisNet offer candidate biomarkers in understanding of pathobiological mechanisms of lung cancer-associated asthma. However, these candidate network biomarkers identified by mpDisNet are warranted by experimental or clinical validation further.
Network-based identification of miRNA-mediated pathobiological pathways between COPD and asthma
Asthma and COPD are obstructive pulmonary diseases that have affected millions of people all over the world.39 They are two diseases with differences in etiology, symptoms, type of airway inflammation, inflammatory cells, mediators, consequences of inflammation, response to therapy and course.39 The similarities in airway inflammation in severe asthma and COPD and good response to combination therapies in both diseases suggest that they may share some pathophysiologic characteristics.40,41
We next turned to inspect the miRNA-mediated pathways between asthma-COPD. Both hsa-let-7a (differentially expressed in patients with severe asthma42) and hsa-let-7b play important roles in asthma by targeting pro-inflammatory pathways.29 We found two meta-paths, including hsa-let-7a-CASP3-CCND1-hsa-mir-20a and hsa-let-7b-CCND2-FOXO4-hsa-mir-499a between asthma and COPD, via mpDisNet (Fig. 7). Genetic studies and in vitro observations have shown potential associations of CCND1 and CCND2 with asthma and COPD.43–45 In addition, CASP3 was reported to play a functional role in airway epithelial apoptosis46,47 and pro-inflammatory cytokines (FOXO4) may contribute to regulation of muscle atrophy and smooth muscle cell migration.48,49 Altogether, miRNA-mediated airway epithelial apoptosis and pro-inflammatory cytokine pathways (hsa-let-7a and hsa-let-7b) may offer potential mechanisms for the overlapping syndrome between asthma and COPD. In addition, several mpDisNet-predicted meta-paths, such as hsa-mir-148b-ADAM33-PGD-hsa-mir-1 and hsa-mir-221-ACTB-BUB1-hsa-mir-196a (Fig. 7) may offer new pathobiological pathways to explain the asthma-COPD comorbidity as well.50–54
Discussion
Understanding of disease–disease relationships is important for the diagnosis, prevention, and treatment of the human disease. Most of the existing comorbid data are from the medical records analysis of clinical patients.3 This method requires a large amount of data calculation and has many interference factors. Recent remarkable development of systems biology technologies and network medicine approaches raised possibilities to predict disease comorbidities from human protein–protein interactome.2,3 In order to integrate biological networks to predict disease–disease relationships, we presented a network-based methodology, termed mpDisNet, to infer disease–disease relationships from miRNA regulatory network perspective.
Specifically, we constructed a comprehensive, multi-layer biological network connecting diseases, miRNA, and genes. We employed a skip-gram algorithm to obtain the multidimensional feature vectors of disease and then calculated the disease–disease similarities from the reduced informative multidimensional vectors. We demonstrated that mpDisNet can identify both clinically reported and new disease–disease associations, outperforming miRNA-overlap measure. Moreover, mpDisNet offers miRNA-mediated pathobiological pathways by searching miRNA meta-paths from the human protein–protein interactome, as we showcased for lung cancer-associated asthma and asthma-COPD. However, comprehensive validation for more mpDisNet-predicted disease–disease relationships are warranted in the future.
We highlighted several significant contributions in the current study. We assembled four comprehensive networks, including disease–miRNA, miRNA–genes, disease–gene, and the human protein–protein interactome to search the meta-paths by mpDisNet. In this way, we can utilize the complementary information from different biological networks compared with traditional network-based approaches using single type of data.55,56 Network analysis further shows that integrating miRNA-mediated network can improve the capability in inferring disease–disease relationships, offering a new network-based tool for assessment of disease comorbidities. In addition, the network-based framework presented in mpDisNet could be applied for prediction of drug–target interactions, gene–gene (protein–protein) interactions, RNA–RNA interactions, and other biological networks as well. Finally, the new disease–disease relationships inferred by mpDisNet may offer potential candidate network biomarkers for better understanding of underlying pathobiological pathways from miRNA network perspective.
We acknowledged several potential limitations in current network-based framework of mpDisNet. First, when the known miRNA associated with disease is fewer, the comorbidity between disease pairs computed by miRNA-mediated networks may be false positive. Second, potential literature data bias (e.g., degree/connectivity of well-studied miRNAs/proteins) may generate a potential false positive rate. Third, each random walk requires a specific meta-path, and the choice of this single meta-path may also affect performance of mpDisNet. In the future, we may improve mpDisNet by integrating more comprehensive biological networks, analyzing the relevant associations in tissue-specific networks in which the disease occurs, adopting more flexible random walk strategies.
In summary, this study offers a network-based, systems biology methodology for comprehensive identification of disease–disease relationships from miRNA regulatory network perspective. From a translational perspective, if broadly applied, mpDisNet would offer a powerful network-based tool for understanding of clinical comorbidities for multiple complex diseases from heterogeneous biological networks, a significant challenge of precision medicine.
Methods
Reconstruction of heterogeneous networks
We reconstructed a heterogenous miRNA–gene–disease network by assembling four types of networks: (a) disease–miRNA, (b) miRNA–gene, (c) disease–gene, and (d) the human protein–protein interactome networks.
Disease–miRNA network
We collected experimentally validated disease–miRNA associations from two databases: miR2Disease57 and HMDD v3.0.58 All disease terms were annotated by Medical Subject Headings (MeSH) and Unified Medical Language System (UMLS) vocabularies.59 The disease–miRNA associations in two databases were combined and the duplicate associations were removed. Finally, we kept a total of 7669 associations connecting 691 miRNAs with 394 diseases in this study.
miRNA–gene network
We collected the known miRNA targets to build miRNA–gene networks from miRTarBase database.60 We annotated all protein-coding genes using gene Entrez ID, chromosomal location, and the official gene symbols from the National Center for Biotechnology Information (NCBI) database.61 In this study, we only kept the data from Homo sapiens. After excluding duplicate associations, 163,090 miRNA–gene associations connecting 568 miRNAs with 14,762 human genes were used.
Disease–gene network
We assembled disease–gene associations from four public databases: the Online Mendelian Inheritance in Man (OMIM),62 HuGE Navigator,63 PharmGKB,64 and Comparative Toxicogenomics Database (CTD).65 All disease terms were annotated using MeSH vocabularies,66 and the genes were annotated using the Entrez IDs and official gene symbols from the NCBI database.66 Duplicated pairs from different data sources were deleted. In total, we obtained 50,589 disease–gene associations connecting 2684 genes with 394 unique disease terms.
The human protein–protein interactome
To build a comprehensive human protein–protein interactome, we focused on high-quality protein–protein interactions (PPIs) with five types of experimental evidences: (i) Binary PPIs tested by high-throughput yeast-two-hybrid (Y2H) systems;67,68 (ii) Kinase-substrate interactions by literature-derived low-throughput and high-throughput experiments; (iii) Literature-curated PPIs identified by affinity purification followed by mass spectrometry (AP-MS), Y2H and by literature-derived low-throughput experiments; (iv) PPIs from protein three-dimensional (3D) structures; and (v) Signaling networks supported by literature-derived low-throughput experiments. The genes were mapped to their Entrez ID based on the NCBI database61 as well as their official gene symbols based on GeneCards (http://www.genecards.org/). Duplicated PPIs and all computationally predicted data, such as evolutionary analysis, metabolic associations, and gene co-expression data, were deleted. The resulting updated human interactome used in this study includes 246,995 PPIs connecting 16,706 unique proteins. The detailed descriptions are provided in our recent studies.4,5
Meta-path-based random walks
We employed a meta-path-based random walk to capture the semantic and structural correlation between different types of nodes. Given a heterogeneous network, G = (V, E, F), and meta-path, , the transition probability in step i was defined as follows:
2 |
where , and represent the set of nodes belonging to the type, Vf+1, in the neighborhood of node, . In other words, , walking is on the condition of a preset meta-path, P. Moreover, meta-paths are generally used on symmetric paths, that is, its first node type V1 is the same with the last one Vl, facilitating its recursive for random walks, i.e.,
3 |
The meta-path-based random walk strategy ensures that the semantic relationships among different types of nodes are properly conserved in the reconstructed heterogeneous network.
Heterogeneous skip-gram
Furthermore, we employed a heterogeneous skip-gram representation learning model.13 The heterogeneous skip-gram is a modification based on the original Skip-gram model, by adding the superposition of different node types. For a heterogeneous network, G = (V, E, F), each node, ν, and each edge, e, are associated with their mapping functions, and , respectively. Given a node, ν, maximizes the probability that the heterogeneous context, Nf(ν), is as follows:
4 |
where Nf(ν) denotes the neighborhood of ν with the fth type of nodes. The conditional probability, , is defined as a softmax function69 and adjusted to a specific node type,70 f, as follows:
5 |
where Xv is the vth row of X, which is the embedding vector for node v; Vf represents the node type set of type, f, in the network. This specifies a multinomial distribution for each type in the output layer of the last layer of skip-gram. According to the negative sampling71 in Word2vec,72 the above function is defined as follows:
6 |
where and Pf(uf) are pre-defined distributions by the type of node of neighbor, cf, that aims to predict from which a negative node is drawn from for M times.
The gradients of the above pre-defined distributions are derived as follows:
7 |
8 |
where is an indicator function to indicate whether is the neighborhood context node cf. When m = 0, then . The model is optimized by using the stochastic gradient descent algorithm.73
Network-based inferring disease–disease relationships
The network-based similarities between two diseases can be calculated based on single meta-path or multiple meta-paths. In this study, we evaluated three meta-paths (M1, M2, M3) to infer disease–disease relationships. For M1 (disease–miRNA–gene–gene–miRNA–disease) as shown in Fig. 1, we randomly walked in disease–miRNA–gene heterogeneous network based on meta-path M1 for 50 steps. Each walk includes 251 nodes. We run 1000 random walks for each disease and 1000 random walk instance sequences are generated. By inputting all the sequences into heterogeneous skip-gram, we obtained the representation vectors of each disease. Then, we calculated the cosine similarity between diseases based on these vectors. In this way, we calculated the disease similarity for meta-path M2 (disease–miRNA–gene–gene–gene–miRNA–disease), M3 (disease–gene–gene–disease) as well. We predicted disease–disease relationships based on multiple meta-paths by concatenating the representation vectors learned from each meta-path and then calculated the cosine similarity between the concatenated vectors. Therefore, we assembled a disease–miRNA–gene network and a disease–gene network into a heterogeneous network. In this integrated heterogeneous network, we selected the meta-paths M1 and M3, respectively. The multidimensional vectors of the two meta-paths can be obtained by random walk and skip-gram, and then the multidimensional vectors were concatenated to infer disease–disease relationships. The detailed network-based analyses are provided in our recent studies.4,5,74
Supplementary information
Acknowledgements
This work was supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health under Award Number K99HL138272 and R00HL138272 to F.C. This work was partly supported by NIH grants R01 HL124021, HL122596, HL138437, and UH2 TR002073 as well as the American Heart Association Established Investigator Award 18EIA33900027 (S.Y.C.).
Author contributions
F.C. conceived the study. S.J., X.Z. and F.C. performed all experiments and analysis. J.L., J.F. and S.Y.C. performed data analysis. S.Y.C. and S.C.E. critically discussed the paper. F.C., S.J. and S.C.E. wrote and critically reviewed the paper.
Data availability
The authors declare that the data supporting the findings of this study are available within the paper and its supplementary information files, and https://github.com/ChengF-Lab/mpDisNet.
Code availability
Custom codes used in this study are available at https://github.com/ChengF-Lab/mpDisNet.
Competing interests
S.Y.C. has served as a consultant for Zogenix, Vivus, Aerpio, and United Therapeutics; S.Y.C. is a director, officer, and shareholder in Numa Therapeutics; S.Y.C. holds research grants from Actelion and Pfizer. S.Y.C. has filed patent applications regarding the targeting of metabolism in pulmonary hypertension.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Shuting Jin, Xiangxiang Zeng
Supplementary information
Supplementary information is available for this paper at 10.1038/s41540-019-0115-2.
References
- 1.Barabasi AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 2011;12:56–68. doi: 10.1038/nrg2918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Menche J, et al. Uncovering disease-disease relationships through the incomplete interactome. Science. 2015;347:1257601. doi: 10.1126/science.1257601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Zhou X, et al. A systems approach to refine disease taxonomy by integrating phenotypic and molecular networks. EBioMedicine. 2018;31:79–91. doi: 10.1016/j.ebiom.2018.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Cheng F, et al. Network-based approach to prediction and population-based validation of in silico drug repurposing. Nat. Commun. 2018;9:2691. doi: 10.1038/s41467-018-05116-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cheng F, Kovacs I, Barabasi AL. Network-based prediction of drug combinations. Nat. Commun. 2019;10:1197. doi: 10.1038/s41467-019-09186-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gandal MJ, et al. Shared molecular neuropathology across major psychiatric disorders parallels polygenic overlap. Science. 2018;359:693–697. doi: 10.1126/science.aad6469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993;75:843–854. doi: 10.1016/0092-8674(93)90529-Y. [DOI] [PubMed] [Google Scholar]
- 8.Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–297. doi: 10.1016/S0092-8674(04)00045-5. [DOI] [PubMed] [Google Scholar]
- 9.Ambros V. The functions of animal microRNAs. Nature. 2004;431:350–355. doi: 10.1038/nature02871. [DOI] [PubMed] [Google Scholar]
- 10.Kozomara A, Griffiths-Jones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 2014;42:D68–D73. doi: 10.1093/nar/gkt1181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Parikh VN, et al. MicroRNA-21 integrates pathogenic signaling to control pulmonary hypertension: results of a network bioinformatics approach. Circulation. 2012;125:1520–1532. doi: 10.1161/CIRCULATIONAHA.111.060269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bertero T, et al. Matrix remodeling promotes pulmonary hypertension through feedback mechanoactivation of the YAP/TAZ-miR-130/301 circuit. Cell Rep. 2015;13:1016–1032. doi: 10.1016/j.celrep.2015.09.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Dong, Y., Chawla, N. V. & Swami, A. metapath2vec: scalable representation learning for heterogeneous networks. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 135–144 (ACM, Halifax, NS, Canada, 2017).
- 14.Guthrie, D., Allison, B., Liu, W., Guthrie, L., & Wilks, Y. A closer look at skip-gram modelling. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC-2006), pp. 1222–1225 (2006).
- 15.van der Maaten L, Hinton G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008;9:2579–2605. [Google Scholar]
- 16.Goh KI, et al. The human disease network. Proc. Natl Acad. Sci. USA. 2007;104:8685–8690. doi: 10.1073/pnas.0701361104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Bailey MH, et al. Comprehensive characterization of cancer driver genes and mutations. Cell. 2018;174:1034–1035. doi: 10.1016/j.cell.2018.07.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Sondka Z, et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer. 2018;18:696–705. doi: 10.1038/s41568-018-0060-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Blair DR, et al. A nondegenerate code of deleterious variants in Mendelian loci contributes to complex disease risk. Cell. 2013;155:70–80. doi: 10.1016/j.cell.2013.08.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Mathur S, Dinakarpandian D. Finding disease similarity based on implicit semantic similarity. J. Biomed. Inform. 2012;45:363–371. doi: 10.1016/j.jbi.2011.11.017. [DOI] [PubMed] [Google Scholar]
- 21.Li P, Nie Y, Yu J. Fusing literature and full network data improves disease similarity computation. BMC Bioinformat. 2016;17:326. doi: 10.1186/s12859-016-1205-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Sun X, Nobel AB. On the size and recovery of submatrices of ones in a random binary matrix. J. Mach. Learn. Res. 2008;9:2431–2453. [Google Scholar]
- 23.Denholm, R., Crellin, E., Arvind, A. & Quint, J. Asthma and lung cancer, after accounting for co-occurring respiratory diseases and allergic conditions: a systematic review protocol. BMJ Open7, e013637 (2017). [DOI] [PMC free article] [PubMed]
- 24.Qu YL, et al. Asthma and the risk of lung cancer: a meta-analysis. Oncotarget. 2017;8:11614–11620. doi: 10.18632/oncotarget.14595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Pullamsetti SS, et al. Lung cancer-associated pulmonary hypertension: Role of microenvironmental inflammation based on tumor cell-immune cell cross-talk. Sci. Transl. Med. 2017;9:eaai9048. doi: 10.1126/scitranslmed.aai9048. [DOI] [PubMed] [Google Scholar]
- 26.Cheng F, Loscalzo J. Pulmonary comorbidity in lung cancer. Trends Mol. Med. 2018;24:239–241. doi: 10.1016/j.molmed.2018.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Yerukala Sathipati S, Ho SY. Identifying the miRNA signature associated with survival time in patients with lung adenocarcinoma using miRNA expression profiles. Sci. Rep. 2017;7:7507. doi: 10.1038/s41598-017-07739-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Chu D, et al. Quantitative proteomic analysis of the miR-148a-associated mechanisms of metastasis in non-small cell lung cancer. Oncol. Lett. 2018;15:9941–9952. doi: 10.3892/ol.2018.8581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Polikepahad S, et al. Proinflammatory role for let-7 microRNAS in experimental asthma. J. Biol. Chem. 2010;285:30139–30149. doi: 10.1074/jbc.M110.145698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Oglesby IK, McElvaney NG, Greene CM. MicroRNAs in inflammatory lung disease–master regulators or target practice? Respir. Res. 2010;11:148. doi: 10.1186/1465-9921-11-148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Li XJ, Ren ZJ, Tang JH. MicroRNA-34a: a potential therapeutic target in human cancer. Cell Death Dis. 2014;5:e1327. doi: 10.1038/cddis.2014.270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Shi Y, Liu C, Liu X, Tang DG, Wang J. The microRNA miR-34a inhibits non-small cell lung cancer (NSCLC) growth and the CD44hi stem-like NSCLC cells. PLoS ONE. 2014;9:e90022. doi: 10.1371/journal.pone.0090022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ather JL, et al. Serum amyloid A activates the NLRP3 inflammasome and promotes Th17 allergic asthma in mice. J. Immunol. 2011;187:64–73. doi: 10.4049/jimmunol.1100500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Sanchez-Cespedes M. The role of LKB1 in lung cancer. Fam. Cancer. 2011;10:447–453. doi: 10.1007/s10689-011-9443-0. [DOI] [PubMed] [Google Scholar]
- 35.Facchinetti F, et al. LKB1/STK11 mutations in non-small cell lung cancer patients: Descriptive analysis and prognostic value. Lung Cancer. 2017;112:62–68. doi: 10.1016/j.lungcan.2017.08.002. [DOI] [PubMed] [Google Scholar]
- 36.Izreig S, et al. The miR-17 approximately 92 microRNA Cluster Is a Global Regulator of Tumor Metabolism. Cell Rep. 2016;16:1915–1928. doi: 10.1016/j.celrep.2016.07.036. [DOI] [PubMed] [Google Scholar]
- 37.MacIver NJ, et al. The liver kinase B1 is a central regulator of T cell development, activation, and metabolism. J. Immunol. 2011;187:4187–4198. doi: 10.4049/jimmunol.1100367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Robinson DS. The role of the T cell in asthma. J. Allergy Clin. Immunol. 2010;126:1081–1091. doi: 10.1016/j.jaci.2010.06.025. [DOI] [PubMed] [Google Scholar]
- 39.Martinez FD. Early-life origins of chronic obstructive pulmonary disease. N. Engl. J. Med. 2016;375:871–878. doi: 10.1056/NEJMra1603287. [DOI] [PubMed] [Google Scholar]
- 40.Cukic V, Lovre V, Dragisic D, Ustamujic A. Asthma and chronic obstructive pulmonary disease (COPD) - differences and similarities. Mater. Sociomed. 2012;24:100–105. doi: 10.5455/msm.2012.24.100-105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Postma DS, Rabe KF. The Asthma-COPD overlap syndrome. N. Engl. J. Med. 2015;373:1241–1249. doi: 10.1056/NEJMra1411863. [DOI] [PubMed] [Google Scholar]
- 42.Rijavec M, Korosec P, Zavbi M, Kern I, Malovrh MM. Let-7a is differentially expressed in bronchial biopsies of patients with severe asthma. Sci. Rep. 2014;4:6103. doi: 10.1038/srep06103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Du CL, et al. Up-regulation of cyclin D1 expression in asthma serum-sensitized human airway smooth muscle promotes proliferation via protein kinase C alpha. Exp. Lung Res. 2010;36:201–210. doi: 10.3109/01902140903290022. [DOI] [PubMed] [Google Scholar]
- 44.Thun GA, Imboden M, Berger W, Rochat T, Probst-Hensch NM. The association of a variant in the cell cycle control gene CCND1 and obesity on the development of asthma in the Swiss SAPALDIA study. J. Asthma. 2013;50:147–154. doi: 10.3109/02770903.2012.757776. [DOI] [PubMed] [Google Scholar]
- 45.Xaing M, Liu X, Zeng D, Wang R, Xu Y. Changes of protein kinase Calpha and cyclin D1 expressions in pulmonary arteries from smokers with and without chronic obstructive pulmonary disease. J. Huazhong Univ. Sci. Technol. Med. Sci. 2010;30:159–164. doi: 10.1007/s11596-010-0205-2. [DOI] [PubMed] [Google Scholar]
- 46.Truong-Tran AQ, Grosser D, Ruffin RE, Murgia C, Zalewski PD. Apoptosis in the normal and inflamed airway epithelium: role of zinc in epithelial protection and procaspase-3 regulation. Biochem. Pharmacol. 2003;66:1459–1468. doi: 10.1016/S0006-2952(03)00498-2. [DOI] [PubMed] [Google Scholar]
- 47.Demedts IK, Demoor T, Bracke KR, Joos GF, Brusselle GG. Role of apoptosis in the pathogenesis of COPD and pulmonary emphysema. Respir. Res. 2006;7:53. doi: 10.1186/1465-9921-7-53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Okamoto T, Machida S. Changes in FOXO and proinflammatory cytokines in the late stage of immobilized fast and slow muscle atrophy. Biomed. Res. 2017;38:331–342. doi: 10.2220/biomedres.38.331. [DOI] [PubMed] [Google Scholar]
- 49.Li H, et al. FoxO4 regulates tumor necrosis factor alpha-directed smooth muscle cell migration by activating matrix metalloproteinase 9 gene transcription. Mol. Cell Biol. 2007;27:2676–2686. doi: 10.1128/MCB.01748-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Kozmus CE, Potocnik U. Reference genes for real-time qPCR in leukocytes from asthmatic patients before and after anti-asthma treatment. Gene. 2015;570:71–77. doi: 10.1016/j.gene.2015.06.001. [DOI] [PubMed] [Google Scholar]
- 51.Wang X, et al. Association of ADAM33 gene polymorphisms with COPD in a northeastern Chinese population. BMC Med. Genet. 2009;10:132. doi: 10.1186/1471-2350-10-132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Wang X, et al. Genetic variants in ADAM33 are associated with airway inflammation and lung function in COPD. BMC Pulm. Med. 2014;14:173. doi: 10.1186/1471-2466-14-173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Davies ER, et al. Soluble ADAM33 initiates airway remodeling to promote susceptibility for allergic asthma in early life. JCI Insight. 2016;1:e87632. doi: 10.1172/jci.insight.87632. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Domingo C, Palomares O, Sandham DA, Erpenbeck VJ, Altman P. The prostaglandin D2 receptor 2 pathway in asthma: a key player in airway inflammation. Respir. Res. 2018;19:189. doi: 10.1186/s12931-018-0893-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Li J, et al. Network-based identification of microRNAs as potential pharmacogenomic biomarkers for anticancer drugs. Oncotarget. 2016;7:45584–45596. doi: 10.18632/oncotarget.10052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Li J, et al. Computational prediction of microRNA networks incorporating environmental toxicity and disease etiology. Sci. Rep. 2014;4:5576. doi: 10.1038/srep05576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Jiang Q, et al. miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 2009;37:D98–D104. doi: 10.1093/nar/gkn714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Li Y, et al. HMDDv2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2014;42:D1070–D1074. doi: 10.1093/nar/gkt1023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32:D267–D270. doi: 10.1093/nar/gkh061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Hsu SD, et al. miRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions. Nucleic Acids Res. 2014;42:D78–D85. doi: 10.1093/nar/gkt1266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Coordinators NR. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2016;44:D7–D19. doi: 10.1093/nar/gkv1290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33:D514–D517. doi: 10.1093/nar/gki033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Yu W, Gwinn M, Clyne M, Yesupriya A, Khoury MJ. A navigator for human genome epidemiology. Nat. Genet. 2008;40:124–125. doi: 10.1038/ng0208-124. [DOI] [PubMed] [Google Scholar]
- 64.Hernandez-Boussard T, et al. The pharmacogenetics and pharmacogenomics knowledge base: accentuating the knowledge. Nucleic Acids Res. 2008;36:D913–D918. doi: 10.1093/nar/gkm1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Davis AP, et al. The Comparative Toxicogenomics Database: update 2011. Nucleic Acids Res. 2011;39:D1067–D1072. doi: 10.1093/nar/gkq813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Corrdinators NR. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2013;41:D8–D20. doi: 10.1093/nar/gks1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Rolland T, et al. A proteome-scale map of the human interactome network. Cell. 2014;159:1212–1226. doi: 10.1016/j.cell.2014.10.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Rual JF, et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature. 2005;437:1173–1178. doi: 10.1038/nature04209. [DOI] [PubMed] [Google Scholar]
- 69.Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013;35:1798–1828. doi: 10.1109/TPAMI.2013.50. [DOI] [PubMed] [Google Scholar]
- 70.Goldberg, Y. & Levy, O. word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. Preprint at: https://arXiv.org/abs/1402.3722 (2014).
- 71.Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. Proceedings of the 26th International Conference on Neural Information Processing Systems. 3111–3119 (Curran Associates Inc., Lake Tahoe, Nevada, 2013).
- 72.Rong, X. word2vec parameter learning explained. Preprint at: https://arXiv.org/abs/1411.2738 (2014).
- 73.Lan GH. An optimal method for stochastic composite optimization. Math. Program. 2012;133:365–397. doi: 10.1007/s10107-010-0434-y. [DOI] [Google Scholar]
- 74.Cheng F, et al. A genome-wide positioning systems network algorithm for in silico drug repurposing. Nat. Commun. 2019;10:3476. doi: 10.1038/s41467-019-10744-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The authors declare that the data supporting the findings of this study are available within the paper and its supplementary information files, and https://github.com/ChengF-Lab/mpDisNet.
Custom codes used in this study are available at https://github.com/ChengF-Lab/mpDisNet.