Weighted gene co-expression network analysis revealed key biomarkers associated with the diagnosis of hypertrophic cardiomyopathy

Xin Li; Chenxin Wang; Xiaoqing Zhang; Jiali Liu; Yu Wang; Chunpu Li; Dongmei Guo

doi:10.1186/s41065-020-00155-9

. 2020 Oct 24;157:42. doi: 10.1186/s41065-020-00155-9

Weighted gene co-expression network analysis revealed key biomarkers associated with the diagnosis of hypertrophic cardiomyopathy

Xin Li ^1,^#, Chenxin Wang ^2,^#, Xiaoqing Zhang ^3,^#, Jiali Liu ⁴, Yu Wang ¹, Chunpu Li ^5,^✉, Dongmei Guo ^4,^✉

PMCID: PMC7585681 PMID: 33099311

Abstract

Objective

To reveal the molecular mechanism underlying the pathogenesis of HCM and find new effective therapeutic strategies using a systematic biological approach.

Methods

The WGCNA algorithm was applied to building the co-expression network of HCM samples. A sample cluster analysis was performed using the hclust tool and a co-expression module was constructed. The WGCNA algorithm was used to study the interactive connection between co-expression modules and draw a heat map to show the strength of interactions between modules. The genetic information of the respective modules was mapped to the associated GO terms and KEGG pathways, and the Hub Genes with the highest connectivity in each module were identified. The Wilcoxon test was used to verify the expression level of hub genes between HCM and normal samples, and the “pROC” R package was used to verify the possibility of hub genes as biomarkers. Finally, the potential functions of hub genes were analyzed by GSEA software.

Results

Seven co-expression modules were constructed using sample clustering analysis. GO and KEGG enrichment analysis judged that the turquoise module is an important module. The hub genes of each module are RPL35A for module Black, FH for module Blue, PREI3 for module Brown, CREB1 for module Green, LOC641848 for module Pink, MYH7 for module Turquoise and MYL6 for module Yellow. The results of the differential expression analysis indicate that MYH7 and FH are considered true hub genes. In addition, the ROC curves revealed their high diagnostic value as biomarkers for HCM. Finally, in the results of the GSEA analysis, MYH7 and FH highly expressed genes were enriched with the “proteasome” and a “PPAR signaling pathway,” respectively.

Conclusions

The MYH7 and FH genes may be the true hub genes of HCM. Their respective enriched pathways, namely the “proteasome” and the “PPAR signaling pathway,” may play an important role in the development of HCM.

Keywords: Hypertrophic cardiomyopathy, WGCNA, Hub gene

Introduction

Hypertrophic cardiomyopathy (HCM) is the most common genetic heart disease with a prevalence of approximately 1:500 [18]. HCM is characterized by ventricular hypertrophy, with clinical features in patients ranging from asymptomatic to heart failure and sudden cardiac death. At present, symptomatic treatment is still used to delay the progress of the disease, while traditional treatment is still unable to reverse the disease. Since the first chromosomal location was mapped in 1989 [13], variants in numerous genes have been reported to cause HCM. The clinical diagnostic for genetic testing of HCM is increasingly becoming a part of the mainstream clinical management of patients and playing a key role in the cascade detection of family members [5, 10]. Previous studies have shown that in more than 50% of HCM samples, the disease is caused by mutations in genes encoding cardiac myosin, such as MYH7, TNNI3, MYBPC3 and MYL3. However, the molecular mechanism underlying the pathogenesis of HCM remains unclear, and new advanced analytic strategies and genomic technologies provide opportunities to explore the genetic structure of HCM. Therefore, this is an opportunity to reveal the molecular mechanism of HCM and find new effective therapeutic strategies.

In a number of computational studies, disease risk modules have been developed to provide important measurement for the diagnosis of genetic mutations and the development of new treatment strategies [11, 27, 28]. WGCNA (weighted gene co-expression network analysis) is a powerful genetic analysis strategy. It is a systematic biological analysis method based on “guilt-by-association.” This method is used to identify gene modules that can be used as candidate biomarkers or therapeutic targets [8, 16]. It is currently widely used for research and analysis of schizophrenia [7, 21], cancer [6], intracranial aneurysm [29] and other diseases. By using WGCNA, we can create co-expression networks to find differentially relevant gene clusters and carry out gene-specific analysis [2, 27, 28].

In this study, we used this method to analyze a large amount of HCM genetic data to find genes and pathways that play an important role in the occurrence and development of HCM. Provide guidance for disease research and identify potential effective treatment options. It is hoped that the results of this study will provide guidance for the study of the disease and the search for potential effective treatments.

Materials and methods

Microarray data analysis

Analysis was carried out of the gene expressions of the HCM datasets acquired from the GEO database (http://www.ncbi.nlm.nih.gov/geo )[3]. GSE36961, a much larger and newer microarray dataset of HCM, includes 106 cases and 39 controls (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE36961). Clinical information, including age and gender, was also acquired. The data set was based on Platforms GPL15389 (Illumina humanHT-12 V3.0 expression beadchip). Gene IDs were mapped to the microarray probes using the annotated information offered by the record. Probes corresponding to more than one gene were excluded from the dataset. The average expression values of the genes were obtained using measurements from a number of probes. A suitable threshold value was selected based on the number of probes with different thresholds of expression. The WGCNA algorithm [16] was applied to building the co-expression network of HCM samples. Sample cluster analysis was performed using the hclust tool (R package, https://www.rdocumentation.org/packages/stats/versions/3.6.1/topics/hclust) with a threshold value of 35.

Co-expression module construction

The power value was evaluated during the construction of the modules using the WGCNA package in R (https://cran.r-project.org/web/packages/WGCNA/). The mean connectivity and scale independence of network modules were analyzed using the gradient test under different power values, which ranged from 1 to 20. The soft threshold power of 9 was chosen based on the scale-free topology criterion. The WGCNA algorithm further identified co-expression modules under these conditions. The minimum size of the gene group was set at 50 to ensure the reliability of the results for this module.

Interaction analysis of co-expression modules

The interactive connection among the co-expression modules was studied using the WGCNA algorithm. The WGCNA R software package [16] can be used to determine network construction, the calculation of topological properties, gene selection, module detection, differential network analysis, and network statistics. Furthermore, a heat map was plotted to exhibit the strength of interaction among the modules.

Functional enrichment analysis

Functional enrichment analysis was carried out in co-expression modules. The genetic information of the respective modules was mapped to the associated gene ontology (GO) terms and KEGG pathways using the DAVID tool (version 6.8; http://david.abcc.ncifcrf.gov/) [12]. The top five records with p-value < 0.05 were retained for analysis.

Hub gene identification and validation

Hub Genes with the highest connectivity in each module were identified using the function “chooseTopHubInEachModule” in WGCNA [23]. Bee swarm plots were created using the R package (https://cran.r-project.org/web/packages/beeswarm/index.html). Wilcoxon tests were performed to validate the hub genes’ expression levels between HCM and normal tissue samples. P < 0.05 was considered statistically significant. To validate the possibility of a hub gene as a biomarker, we outlined receiver operating characteristic (ROC) curves and calculated the area under the ROC curve (AUC) with the “pROC” R package [24].

Gene set enrichment analysis

To detect potential function of the hub genes, gene set enrichment analysis (GSEA) was conducted to explore the high-risk score associated KEGG pathways on GSEA software downloaded from the Broad Institute (http://www.broadinstitute. org/gsea). By running GSEA, normalized enrichment scores and p-value were generated.

Results

HCM dataset pre-processing

A total amount of 37,846 gene expression values were derived from the raw file.Four thousand genes with the greatest average expression values were chosen for cluster evaluation (Fig. 1). To ensure that the results of network construction were reliable, 105 HCM samples remained for subsequent WGCNA analysis after GSM 907253 (which was distant from other samples in the sample cluster analysis) was removed.

Fig. 1 — Hierarchical clustering of the top 4000 genes with the highest average expression values of HCM samples in the clustering analysis. There were 1 outlier samples in the total 106 samples, that is GSM907253, when the threshold value was determined as 35 (red line)

Identification of co-expression modules of HCM genes

The expression values of 4000 genes in 105 HCM samples were analyzed to identify the modules of highly correlated genes. The soft threshold power was set at 9 (scale-free R2 = 0.88) to guarantee a scale-free network (Fig. 2). A total of 8 modules including green (255 genes), turquoise (992 genes), grey (932 genes), blue (473 genes), brown (424 genes), yellow (387 genes), pink (149 genes), and black (388 genes) were identified (Fig. 3). The genes in grey were not included in any module, so no further analysis was conducted for these genes.

Fig. 2 — The influence of different soft threshold power on the scale independence degree of coexpression modules of HCM genes. The influence of different soft threshold power on the average connectivity degree of coexpression modules of HCM genes

Fig. 3 — The constructed coexpression modules of HCM genes by WGCNA software

Correlation analysis of co-expression modules

The WGCNA package analyzed the interactive relationships underlying the seven co-expression modules (Fig. 4). Gene expression levels were relatively independent as illustrated by the topological overlap matrix (TOM) plot of 4000 genes, suggesting that each module was independently validated. The connectivity degree of eigengenes was assessed to further quantify the similarity of co-expression. These seven modules yielded two main clusters followed by cluster analysis (Fig. 5a), including two modules (modules Green and Yellow) and five modules (modules Brown, Blue, Pink, Turquoise and Black), respectively. Based on the heatmap plot of the adjacencies (Fig. 5b), we found two pairs (modules Blue and Pink; modules Black and Turquoise) had the higher adjacency value.

Fig. 5 — Connectivity analysis between different modules. a Hierarchical cluster analysis of the genes in different modules; b Connectivity level analysis of the genes in different modules. Within the heatmap, red represents a positive correlation and blue represents a negative correlation. Squares of red color along the diagonal are the meta-modules

Functional enrichment analysis in critical co-expression modules

Enrichment analysis of GO and KEGG were conducted to assess the functions of genes in the seven identified modules. The leading five enriched GO and KEGG terms with p value < 0.05 were selected for further analysis. Based on the heatmap plot for GO (Fig. 6a) and KEGG (Fig. 6b) evaluations, significant differences in the enriched terms and enriched degree was detected among the co-expression modules. Through the analysis of the GO biological process, each module was very different from the others (Table 1). The genes in module Black were generally enriched in GO:0006614 (SRP-dependent co-translational protein targeting membranes), GO:0000184 (nuclear-transcribed mRNA catabolic process, nonsense-mediated decay) and GO:0006413 (translational initiation). The genes in module Green were primarily enriched in GO:0006281 (DNA repair) and GO:1903146 (regulation of mitophagy). The genes in module Turquoise were mainly enriched in GO:0006412 (translation). The genes in the other four modules were mainly enriched in GO cellular component and GO molecular function. The results of the KEGG pathway enrichment analysis are shown in Table 2. Module Black and Turquoise were mainly enriched in the pathways hsa03010: Ribosome. Module Blue was mainly enriched in the pathways hsa04120: Ubiquitin mediated proteolysis. Module Brown was mostly enriched in cellular processes hsa01200: Carbon metabolism. Module Green was enriched in pathways hsa00190: Oxidative phosphorylation. Module Pink was mainly enriched in hsa00020: Citrate cycle (TCA cycle). In module Yellow, a total of 14 genes were mainly enriched in hsa04145: Phagosome.

Fig. 6 — The heatmap for GO (a) and KEGG (b) enrichment analysis of HCM genes in coexpression modules. Rows and columns represent the terms and modules, respectively

Table 1.

GO enrichment for the genes in the coexpression modules of HCM

	Term	Count	Percentage%	p value
Module black	GO:0006614 ~ SRP-dependent cotranslational protein targeting to membrane	42	13.72549	2.17E-48
	GO:0000184 ~ nuclear-transcribed mRNA catabolic process, nonsense-mediated decay	43	14.05229	7.92E-45
	GO:0003735 ~ structural constituent of ribosome	52	16.99345	1.92E-43
	GO:0006413 ~ translation initiation	42	13.72549	2.32E-40
	GO:0005840 ~ ribosome	44	14.37908	3.65E-40
Module blue	GO:0005829 ~ cytosol	154	38.11881	7.31E-23
	GO:0044822 ~ poly(A) RNA binding	75	18.56436	3.88E-17
	GO:0005515 ~ protein binding	273	67.57426	8.90E-15
	GO:0005739 ~ mitochondrion	75	18.56436	2.61E-14
	GO:0070062 ~ extracellular exosome	115	28.46535	3.13E-12
Module brown	GO:0005515 ~ protein binding	239	67.12483	4.32E-14
	GO:0070062 ~ extracellular exosome	102	28.65169	1.69E-11
	GO:0005829 ~ cytosol	112	31.46067	9.05E-11
	GO:0005739 ~ mitochondrion	58	16.29213	3.13E-09
	GO:0043209 ~ myelin sheath	16	4.494382	1.75E-07
Module green	GO:0006281 ~ DNA repair	9	4.945055	0.001606
	GO:0005739 ~ mitochondrion	24	13.18681	0.001903
	GO:0005743 ~ mitochondrial inner membrance	12	6.593407	0.002201
	GO:1903146 ~ regulation of mitophagy	4	2.197802	0.003498
	GO:0005515 ~ protein binding	101	55.49451	0.003597
Module pink	GO:0044822 ~ poly(A) RNA binding	24	24	1.27E-08
	GO:0005515 ~ protein binding	70	70	5.93E-07
	GO:0005829 ~ cytosol	36	36	4.25E-06
	GO:0005654 ~ nucleoplasm	32	32	6.74E-06
	GO:0003723 ~ RNA binding	13	13	2.91E-05
Module turquoise	GO:0070062 ~ extracellular exosome	237	28.31541	1.66E-24
	GO:0005515 ~ protein binding	535	63.91876	6.94E-24
	GO:0005829 ~ cytosol	259	30.94385	3.09E-22
	GO:0044822 ~ poly(A) RNA binding	122	14.57587	2.18E-19
	GO:0006412 ~ translation	49	5.854241	2.69E-17
Module yellow	GO:0070062 ~ extracellular exosome	125	36.54971	6.04E-23
	GO:0031012 ~ extracellular matrix	40	11.69591	2.10E-22
	GO:0005615 ~ extracellular space	59	17.25146	7.14E-10
	GO:0005925 ~ focal adhesion	29	8.479532	8.16E-10
	GO:0005576 ~ extracellular region	65	19.00585	2.04E-09

Open in a new tab

Abbreviations: GO Gene Ontology, SRP signal recognition particle

Table 2.

KEGG pathway enrichment for the genes in the coexpression modules of HCM

	Term	Count	Percentage%	p value
Module black	hsa03010:Ribosome	48	15.68627	7.55E-42
	hsa00190:Oxidative phosphorylation	22	7.189542	1.93E-11
	hsa05010: Alzheimer’s disease	17	5.555556	6.60E-06
	hsa05012: Parkinson’s disease	15	4.901961	1.72E-05
	hsa04932: Non-alcoholic fatty liver disease (NAFLD)	15	4.901961	3.45E-05
Module blue	hsa04120: Ubiquitin mediated proteolysis	14	3.465347	1.61E-04
	hsa03015: mRNA surveillance pathway	10	2.475248	0.001237
	hsa00270: Cysteine and methionine metabolism	6	1.485149	0.004544
	hsa04114: Oocyte meiosis	10	2.475248	0.004845
	hsa01130: Biosynthesis of antibiotics	14	3.465347	0.008644
Module brown	hsa01200: Carbon metabolism	13	3.651685	4.17E-06
	hsa00020: Citrate cycle (TCA cycle)	7	1.966292	3.22E-05
	hsa04144: Endocytosis	15	4.213483	5.62E-04
	hsa04141: Protein processing in endoplasmic reticulum	12	3.370787	8.83E-04
	hsa01130: Biosynthesis of antibiotics	13	3.651685	0.001751
Module green	hsa00190: Oxidative phosphorylation	7	3.846154	0.001913
	hsa05010: Alzheimer’s disease	7	3.846154	0.00608
	hsa05016: Huntington’s disease	7	3.846154	0.011398
	hsa05012: Parkinson’s disease	6	3.296703	0.012855
Module pink	hsa00020: Citrate cycle (TCA cycle)	3	3	0.015527
Module pink	hsa01200: Carbon metabolism	4	4	0.035113
Module turquoise	hsa03010: Ribosome	31	3.703704	4.60E-11
	hsa05016: Huntington’s disease	31	3.703704	2.27E-07
	hsa03050: Proteasome	12	1.433692	2.10E-05
	hsa05012: Parkinson’s disease	21	2.508961	1.12E-04
	hsa00190: Oxidative phosphorylation	20	2.389486	1.37E-04
Module yellow	hsa04145: Phagosome	14	4.093567	5.39E-05
	hsa04510: Focal adhesion	15	4.385965	3.75E-04
	hsa04611: Platelet activation	11	3.216374	0.001026
	hsa04974: Protein digestion and absorption	9	2.631579	0.00112
	hsa05146: Amoebiasis	9	2.631579	0.003661

Open in a new tab

Abbreviations: KEGG Kyoto Encyclopedia of Genes and Genomes

Hub gene identification and validation

Modular hub genes with the highest connectivity were identified using the WGCNA package function “chooseTopHubInEachModule”. The hub genes of each module are RPL35A for module Black, FH for module Blue, PREI3 for module Brown, CREB1 for module Green, LOC641848 for module Pink, MYH7 for module Turquoise and MYL6 for module Yellow. Differential expression analysis between HCM and normal tissue samples of these hub genes was performed to evaluate their effects. Briefly, only two hub genes showed significantly higher expression compared to the normal group (p < 0.05). MYH7 (Fig. 7a) and FH (Fig. 7b) were regarded as true hub genes. Moreover, ROC curves revealed their high diagnostic value as biomarkers for HCM (Fig. 7c; MYH7 AUC: 0.762, FH AUC: 0.612).

Fig. 7 — Differential expression analysis of real hub genes between HCM and normal tissues. Expression levels of MYH7 (a) and FH (b) were significantly up-regulated in HCM samples in comparison to normal tissues in the GSE36961. c ROC analysis of real hub genes

Gene set enrichment analysis

We performed GSEA to further explore the potential functions of MYH7 and FH in HCM. As shown in Fig. 8a and b, genes in high expression groups of MYH7 and FH were enriched (p < 0.05) in “Proteasome” and “PPAR signaling” pathways, respectively.

Discussion

HCM is a recognized genetic form of heart disease. Many patients have poor long-term outcomes, including heart failure, malignant arrhythmias and sudden cardiac death. Currently there are no treatments that can effectively reverse the disease, including drug therapy, interventional therapy, and surgical treatment. Therefore, there is an urgent need to explore new effective therapeutic strategies and etiological explanations for HCM. Genetic testing is an indispensable part of labor practice, which has diagnostic and predictive value. It also offers hope that scientists will be able to decipher the mechanisms of disease occurrence and identify targets for effective treatments. As well, new sequencing techniques have led to a large number of candidate genes. In this study, we used a global approach to construct a gene co-expression network to predict candidate gene clusters in the pathogenesis of HCM. Furthermore, we hope to predict candidate gene sets as the basis for a given biological process through closed co-expressed gene modules with common functional annotations.

WGCNA is a powerful statistical method based on gene correlation which can be used to construct gene networks, detect modules, identify central genes and screen candidate genes as biomarkers [16]. In the statistical process, WGCNA focuses on processing a set of gene modules rather than individual genes. This avoids the disadvantages of only processing genes and therefore ignoring molecular transcription networks. A few similar bioinformatic studies have been previously reported. Jing et al. identified specific modules and hub genes related to coronary artery disease by WGCNA (Jing [17]). In 2019, Ran et al. identified biomarkers correlated with hypertrophic cardiomyopathy with co-expression analysis (Ran [4]). In order to avoid the failures of choosing soft thresholding power by scale-free topology fit, we did not filter genes by differential expression when using WGCNA as in previous similar studies. In this study, we obtained 4000 genes of 105 HCM samples from a NCBI dataset, which yielded 7 modules through the use of the WGCNA method. According to a correlation study by the topological overlap matrix (TOM) plot (Fig. 4), each module was shown to be independent of the others. In addition, functional enrichment analysis was performed on the genes in these modules to identify important modules and the genes they contained. By analyzing the functional richness of these seven modules, there are significant differences in their enrichment degrees and terms. By analyzing these data, we found that in both GO enrichment and KEGG pathway analysis, the turquoise module has the highest enrichment. The greatest number of genes (1202 genes) were enriched in the turquoise module. It accounts for 30% of the total number of genes. Therefore, the turquoise module is the most relevant module from the 7 previously identified. Through GO analysis, the genes were mainly concentrated in protein binding, poly(A) RNA binding and translation. Through KEGG analysis, we can find that differentially expressed genes in the turquoise module are highly rich in Ribosome, Proteasome and Oxidative phosphorylation. The WGCNA package function “chooseTopHubInEachModule” was used to identify the modular hub genes with the highest connectivity. The hub genes of each module are RPL35A for module Black, FH for module Blue, PREI3 for module Brown, CREB1 for module Green, LOC641848 for module Pink, MYH7 for module Turquoise and MYL6 for module Yellow. Interestingly, the hub gene of the most important module Turquoise obtained by GO and KEGG analysis was MYH7. It has been repeatedly reported as one of the prevalent pathogenic mutation genes for HCM [1, 22]. Compared to normal people, the above 7 hub genes were further differentially expressed. We found that FH and MYH7 were highly expressed in the HCM group compared to the normal group. These results suggested that these two genes may play an important role in the occurrence and development of HCM. Therefore MYH7 (Fig. 7a) and FH (Fig. 7b) were regarded as true hub genes. Moreover, ROC curves revealed their high diagnostic value as biomarkers for HCM (Fig. 7c; MYH7 AUC: 0.762, FH AUC: 0.612).

MYH7 was a gene encoding myosin heavy chain beta (MHC-β). It was mainly expressed in the heart and is also expressed in skeletal muscles [20]. Multiple previous independent studies have demonstrated that pathogenic mutations in the β-myosin heavy chain (MYH7) gene caused HCM [9, 25]. In addition, mutations in the MYH7 gene are very common in HCM, and can be seen in 25 to 40% of patients [26]. The hub genes in the important modules calculated by WGCNA in this study are consistent with the results of many of the above studies.

The fumarate hydratase (FH) gene is localized to the chromosomal position 1q42.3-q43. In normal cells, the FH gene is located in both mitochondria and cytosol and catalyzes fumarate to malate [14]. Fumarate is a covalent oncometabolite. Its accumulation is characteristic of hereditary leiomyomatosis of genetic cancer syndrome. The mutation of the FH gene may cause the affected cells to transition to aerobic glycolysis (Warburg effect) [15]. It has been found that mutations in fumaric acid can cause several fumarase-related diseases in humans, such as benign mesenchymal tumors of the uterus, leiomyomatosis and so on. In the results of this study, the FH gene may also be a key gene in the occurrence and development of HCM. At present there is little research on their correlation. In the future, exploration of the FH gene may be a salient new direction for further research on HCM.

Therefore, in order to determine the potential molecular function of these two important hub genes, we continued to use GSEA to search for KEGG pathways that are rich in high-expression samples. As shown in Fig. 8a and b, genes in high expression groups of MYH7 and FH were enriched (p < 0.05) in “Proteasome” and “PPAR signaling” pathways, respectively.

The proteasome is the main multicatalytic protease complex. It is involved in the degradation of most intracellular proteins. In addition, muscle fibers and sarcoma proteins have been shown to be primarily degraded by proteasome. Some scholars have pointed out that proteasome dysfunction is related to human HCM [19]. As well, a previous study suggested that the protease inhibitor ps-519 may cause a significant regression of cardiac hypertrophy.

Peroxisome proliferator-activated receptors (PPAR) are expressed in many tissues, such as skeletal and cardiac muscles, fat cells, liver cells, etc. Different subtypes of PPAR have different tissue distribution and expression profiles, leading to different clinical outcomes. In particular, the subtype of PPARα is highly expressed in tissues with high fatty acid oxidation capacity, such as liver, heart, and skeletal muscle. In addition, activation of the receptor for another subtype, PPARβ/δ has been shown to protect the myocardium from ischemia-reperfusion injury typical of diabetic cardiomyopathy. Although there is no research on the direct relationship between the PPAR pathway and HCM, this pathway may also be a direction for further researching HCM and finding therapeutic approaches.

Conclusions

In this study, two key genes of HCM, FH and MYH7, were identified from extensive genetic data through co-expression network analysis. In addition, the most enriched pathways for two key genes were discovered. They are the PPAR signaling pathway and proteasome. They may have played a very important role in the occurrence and development of HCM. The key genes and pathways identified in this study may provide guidance for further study of the mechanism and treatment of HCM in the future. These findings still need to be verified in a large number of clinical practices in the future.

Acknowledgements

The present study was supported by grants from the Science and Technology Development Plan Project of Taian City (No. 2019NS096, No. 2016NS116, No.2019NS159, and No.2019NS161).

Abbreviations

HCM: Hypertrophic cardiomyopathy
WGCNA: Weighted gene co-expression network analysis
GO: Gene ontology
ROC: Receiver operating characteristic
GSEA: Gene set enrichment analysis
TOM: Topological overlap matrix
MHC-β: myosin heavy chain beta
FH: Fumarate hydratase
PPAR: Peroxisome proliferator-activated receptors

Authors’ contributions

XL, XZ, DG, and CL conceived this study; XL, XZ, DG and CL performed the analysis; XL, XZ, DG, CL, JL, YW and CW prepared the manuscript. All authors have read and approved the submitted manuscript.

Funding

No funding was obtained for this study.

Availability of data and materials

The datasets generated during the current study are available in the GEO database (http://www.ncbi.nlm.nih.gov/geo).

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

No competing financial interests exist.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Xin Li, Chenxin Wang and Xiaoqing Zhang contributed equally to this work.

Contributor Information

Xin Li, Email: 15763027@qq.com.

Chenxin Wang, Email: alicechenxin@sohu.com.

Xiaoqing Zhang, Email: over974@sohu.com.

Jiali Liu, Email: li41li@126.com.

Yu Wang, Email: carlwy68@aliyun.com.

Chunpu Li, Email: simonlx000@126.com.

Dongmei Guo, Email: dongmeiguo@aliyun.com.

References

1.Andersen PS, Havndrup O, Hougs L, et al. Diagnostic yield, interpretation, and clinical utility of mutation screening of sarcomere encoding genes in Danish hypertrophic cardiomyopathy patients and relatives. Hum Mutat. 2009;30:363–370. doi: 10.1002/humu.20862. [DOI] [PubMed] [Google Scholar]
2.Bakhshi S, Gupta A, Sharma MC, Khan SA, Rastogi S. Her-2/neu, p-53, and their coexpression in osteosarcoma. J Pediatr Hematol Oncol. 2009;31:245–251. doi: 10.1097/MPH.0b013e318197947e. [DOI] [PubMed] [Google Scholar]
3.Barrett T, Troup DB, Wilhite SE, et al. NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic Acids Res. 2007;35(Database issue):D760–D765. doi: 10.1093/nar/gkl887. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Chen R, Ge T, Jiang W, Huo J, Chang Q, Geng J, Shan Q. Identification of biomarkers correlated with hypertrophic cardiomyopathy with co-expression analysis. J Cell Physiol. 2019;234(12):21999–22008. doi: 10.1002/jcp.28762. [DOI] [PubMed] [Google Scholar]
5.Ciriono AL, et al. Role of genetic testing in inherited cardiovascular disease: a review. JAMA Cardiol. 2017;2:1153–1160. doi: 10.1001/jamacardio.2017.2352. [DOI] [PubMed] [Google Scholar]
6.Clarke C, Madden SF, Doolan P, Aherne ST, Joyce H, O'Driscoll L, Gallagher WM, Hennessy BT, Moriarty M, Crown J, et al. Correlating transcriptional networks to breast cancer survival: a large-scale coexpression analysis. Carcinogenesis. 2013;34:2300–2308. doi: 10.1093/carcin/bgt208. [DOI] [PubMed] [Google Scholar]
7.de Jong S, Boks MP, Fuller TF, Strengman E, Janson E, de Kovel CG, Ori AP, Vi N, Mulder F, Blom JD, et al. A gene co-expression network in whole blood of schizophrenia patients is independent of antipsychotic-use and enriched for brain-expressed genes. PLoS One. 2012;7:e39498. doi: 10.1371/journal.pone.0039498. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Dileo MV, Strahan GD, den Bakker M, Hoekenga OA. Weighted correlation network analysis (WGCNA) applied to the tomato fruit metabolome. PLoS One. 2011;6:e26683. doi: 10.1371/journal.pone.0026683. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Geisterfer-Lowrance AAT, Kass S, Tanigawa G, et al. A molecular basis for familial hypertrophic cardiomyopathy: a β cardiac myosin heavy chain gene missense mutation. Cell. 1990;62:999–1006. doi: 10.1016/0092-8674(90)90274-I. [DOI] [PubMed] [Google Scholar]
10.Hershberger RE, et al. ACMG professional practice and guidelines committee. Genetic evaluation of cardiomyopathy: a clinlcal practice resource of the American College of Medical Genetics and Genomics (ACMG) Genet Med. 2018;20:899–909. doi: 10.1038/s41436-018-0039-z. [DOI] [PubMed] [Google Scholar]
11.Hu YS, Pan Y, Li WH, Zhang Y, Li J, Ma BA. Association between TGFBR1*6A and osteosarcoma: a Chinese case-control study. BMC Cancer. 2010;10:169. doi: 10.1186/1471-2407-10-169. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Huang da W, Sheiman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37(1):1–13. doi: 10.1093/nar/gkn923. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Jarcho JA, et al. Mapping a gene for familial hypertrophic cardiomyopathy to chromosome 14q1. N Engl J Med. 1989;321:1372–1378. doi: 10.1056/NEJM198911163212005. [DOI] [PubMed] [Google Scholar]
14.Jardim-Messeder D, Cabreira-Cagliari C, Rauber R, Turchetto-Zolet AC, Margis R, Masrgis-Pinheiro M. Fumarate reductase superfamily: a diverse group of enzymes whose evolution heterozygotes to the establishment of different metabolic pathways. Mitochondrion. 2017;34:56–66. doi: 10.1016/j.mito.2017.01.002. [DOI] [PubMed] [Google Scholar]
15.King A, Selak MA, Gottlieb E. Succinate dehydrogenase and fumarate hudratase: linking mitochondrial dysfunction and cancer. Oncogene. 2006;25(34):4675–4682. doi: 10.1038/sj.onc.1209594. [DOI] [PubMed] [Google Scholar]
16.Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Liu J, Jing L, Xinlin T. Weighted gene co-expression network analysis identifies specific modules and hub genes related to coronary artery disease. BMC Cardiovasc Disord. 2016;16:54. doi: 10.1186/s12872-016-0217-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Maron BJ. Hypertrophic cardiomyopathy: a systematic review [J] JAMA. 2002;287(10):1308–1320. doi: 10.1001/jama.287.10.1308. [DOI] [PubMed] [Google Scholar]
19.Predmore JM, Wang P, Davis F, Bartolone S, Westfall MV, Dyke DB, Pagani F, Powell SR, Day SM. Ubiquitin proteasome dysfunction in human hypertrophic and dilated cardiomyopathies. Circulation. 2010;121:997–1004. doi: 10.1161/CIRCULATIONAHA.109.904557. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Quiat D, Voelker KA, Pei J, Grishin NV, Grange RW, Bassel-Duby R, Olson EN. concerted regulation of myofiber-specific gene expression and muscle performance by the transcriptional repressor Sox6. Proc Natl Acad Sci U S A. 2011;108(25):10196–10201. doi: 10.1073/pnas.1107413108. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Ren Y, Cui Y, Li X, Wang B, Na L, Shi J, Wang L, Qiu L, Zhang K, Liu G, Xu Y. A co-expression network analysis reveals lncRNA abnormalities in peripheral blood in early-onset schizophrenia. Prog Neuro-Psychopharmacol Biol Psychiatry. 2015;63:1–5. doi: 10.1016/j.pnpbp.2015.05.002. [DOI] [PubMed] [Google Scholar]
22.Richard P, Charron P, Carrier L, et al. Hypertrophic cardiomyopathy. Circulation. 2003;107:2227–2232. doi: 10.1161/01.CIR.0000066323.15244.54. [DOI] [PubMed] [Google Scholar]
23.Sezin T, Vorobyev A, Sadik CD, Zillikens D, Gupta Y, Ludwig RJ. Gene expression analysis reveals novel shared gene signatures and candidate molecular mechanisms between pemphigus and systemic lupus Erythematosus in CD4+ T cells. Front Immunol. 2017;8:1992. doi: 10.3389/fimmu.2017.01992. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in R. Bioinformatics. 2005;21(20):3940–3941. doi: 10.1093/bioinformatics/bti623. [DOI] [PubMed] [Google Scholar]
25.Tanigawa G, Jarcho JA, Kass S, et al. A molecular basis for familial hypertrophic cardiomyopathy: an α/β cardiac myosin heavy chain hybrid gene. Cell. 1990;62:991–998. doi: 10.1016/0092-8674(90)90273-H. [DOI] [PubMed] [Google Scholar]
26.Van Driest SL, Ommen SR, Tajik AJ, et al. Sarcomeric genotyping in hypertrophic cardiomyopathy. Mayo Clin Proc. 2005;80(4):463–469. doi: 10.1016/S0025-6196(11)63196-0. [DOI] [PubMed] [Google Scholar]
27.Wang J, Liu X, Qi X. Effect of variation of FGF2 genotypes on the risk of osteosarcoma susceptibly: a case control study. Int J Clin Exp Med. 2015;8:6114–6118. [PMC free article] [PubMed] [Google Scholar]
28.Wang YB, Jia N, Xu CM, Zhao L, Zhao Y, Wang X, Jia TH. Selecting key genes associated with osteosarcoma based on a differential expression network. Genet Mol Res. 2015;14:17708–17717. doi: 10.4238/2015.December.21.44. [DOI] [PubMed] [Google Scholar]
29.Zheng X, Xue C, Luo G, Hu Y, Luo W, Sun X. Identification of crucial genes in intracranial aneurysm based on weighted gene coexpression network analysis. Cancer Gene Ther. 2015;22:238–245. doi: 10.1038/cgt.2015.10. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets generated during the current study are available in the GEO database (http://www.ncbi.nlm.nih.gov/geo).

[CR1] 1.Andersen PS, Havndrup O, Hougs L, et al. Diagnostic yield, interpretation, and clinical utility of mutation screening of sarcomere encoding genes in Danish hypertrophic cardiomyopathy patients and relatives. Hum Mutat. 2009;30:363–370. doi: 10.1002/humu.20862. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Bakhshi S, Gupta A, Sharma MC, Khan SA, Rastogi S. Her-2/neu, p-53, and their coexpression in osteosarcoma. J Pediatr Hematol Oncol. 2009;31:245–251. doi: 10.1097/MPH.0b013e318197947e. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Barrett T, Troup DB, Wilhite SE, et al. NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic Acids Res. 2007;35(Database issue):D760–D765. doi: 10.1093/nar/gkl887. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Chen R, Ge T, Jiang W, Huo J, Chang Q, Geng J, Shan Q. Identification of biomarkers correlated with hypertrophic cardiomyopathy with co-expression analysis. J Cell Physiol. 2019;234(12):21999–22008. doi: 10.1002/jcp.28762. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Ciriono AL, et al. Role of genetic testing in inherited cardiovascular disease: a review. JAMA Cardiol. 2017;2:1153–1160. doi: 10.1001/jamacardio.2017.2352. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Clarke C, Madden SF, Doolan P, Aherne ST, Joyce H, O'Driscoll L, Gallagher WM, Hennessy BT, Moriarty M, Crown J, et al. Correlating transcriptional networks to breast cancer survival: a large-scale coexpression analysis. Carcinogenesis. 2013;34:2300–2308. doi: 10.1093/carcin/bgt208. [DOI] [PubMed] [Google Scholar]

[CR7] 7.de Jong S, Boks MP, Fuller TF, Strengman E, Janson E, de Kovel CG, Ori AP, Vi N, Mulder F, Blom JD, et al. A gene co-expression network in whole blood of schizophrenia patients is independent of antipsychotic-use and enriched for brain-expressed genes. PLoS One. 2012;7:e39498. doi: 10.1371/journal.pone.0039498. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Dileo MV, Strahan GD, den Bakker M, Hoekenga OA. Weighted correlation network analysis (WGCNA) applied to the tomato fruit metabolome. PLoS One. 2011;6:e26683. doi: 10.1371/journal.pone.0026683. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Geisterfer-Lowrance AAT, Kass S, Tanigawa G, et al. A molecular basis for familial hypertrophic cardiomyopathy: a β cardiac myosin heavy chain gene missense mutation. Cell. 1990;62:999–1006. doi: 10.1016/0092-8674(90)90274-I. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Hershberger RE, et al. ACMG professional practice and guidelines committee. Genetic evaluation of cardiomyopathy: a clinlcal practice resource of the American College of Medical Genetics and Genomics (ACMG) Genet Med. 2018;20:899–909. doi: 10.1038/s41436-018-0039-z. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Hu YS, Pan Y, Li WH, Zhang Y, Li J, Ma BA. Association between TGFBR1*6A and osteosarcoma: a Chinese case-control study. BMC Cancer. 2010;10:169. doi: 10.1186/1471-2407-10-169. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Huang da W, Sheiman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37(1):1–13. doi: 10.1093/nar/gkn923. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Jarcho JA, et al. Mapping a gene for familial hypertrophic cardiomyopathy to chromosome 14q1. N Engl J Med. 1989;321:1372–1378. doi: 10.1056/NEJM198911163212005. [DOI] [PubMed] [Google Scholar]

[CR14] 14.Jardim-Messeder D, Cabreira-Cagliari C, Rauber R, Turchetto-Zolet AC, Margis R, Masrgis-Pinheiro M. Fumarate reductase superfamily: a diverse group of enzymes whose evolution heterozygotes to the establishment of different metabolic pathways. Mitochondrion. 2017;34:56–66. doi: 10.1016/j.mito.2017.01.002. [DOI] [PubMed] [Google Scholar]

[CR15] 15.King A, Selak MA, Gottlieb E. Succinate dehydrogenase and fumarate hudratase: linking mitochondrial dysfunction and cancer. Oncogene. 2006;25(34):4675–4682. doi: 10.1038/sj.onc.1209594. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Liu J, Jing L, Xinlin T. Weighted gene co-expression network analysis identifies specific modules and hub genes related to coronary artery disease. BMC Cardiovasc Disord. 2016;16:54. doi: 10.1186/s12872-016-0217-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Maron BJ. Hypertrophic cardiomyopathy: a systematic review [J] JAMA. 2002;287(10):1308–1320. doi: 10.1001/jama.287.10.1308. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Predmore JM, Wang P, Davis F, Bartolone S, Westfall MV, Dyke DB, Pagani F, Powell SR, Day SM. Ubiquitin proteasome dysfunction in human hypertrophic and dilated cardiomyopathies. Circulation. 2010;121:997–1004. doi: 10.1161/CIRCULATIONAHA.109.904557. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Quiat D, Voelker KA, Pei J, Grishin NV, Grange RW, Bassel-Duby R, Olson EN. concerted regulation of myofiber-specific gene expression and muscle performance by the transcriptional repressor Sox6. Proc Natl Acad Sci U S A. 2011;108(25):10196–10201. doi: 10.1073/pnas.1107413108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Ren Y, Cui Y, Li X, Wang B, Na L, Shi J, Wang L, Qiu L, Zhang K, Liu G, Xu Y. A co-expression network analysis reveals lncRNA abnormalities in peripheral blood in early-onset schizophrenia. Prog Neuro-Psychopharmacol Biol Psychiatry. 2015;63:1–5. doi: 10.1016/j.pnpbp.2015.05.002. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Richard P, Charron P, Carrier L, et al. Hypertrophic cardiomyopathy. Circulation. 2003;107:2227–2232. doi: 10.1161/01.CIR.0000066323.15244.54. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Sezin T, Vorobyev A, Sadik CD, Zillikens D, Gupta Y, Ludwig RJ. Gene expression analysis reveals novel shared gene signatures and candidate molecular mechanisms between pemphigus and systemic lupus Erythematosus in CD4+ T cells. Front Immunol. 2017;8:1992. doi: 10.3389/fimmu.2017.01992. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in R. Bioinformatics. 2005;21(20):3940–3941. doi: 10.1093/bioinformatics/bti623. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Tanigawa G, Jarcho JA, Kass S, et al. A molecular basis for familial hypertrophic cardiomyopathy: an α/β cardiac myosin heavy chain hybrid gene. Cell. 1990;62:991–998. doi: 10.1016/0092-8674(90)90273-H. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Van Driest SL, Ommen SR, Tajik AJ, et al. Sarcomeric genotyping in hypertrophic cardiomyopathy. Mayo Clin Proc. 2005;80(4):463–469. doi: 10.1016/S0025-6196(11)63196-0. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Wang J, Liu X, Qi X. Effect of variation of FGF2 genotypes on the risk of osteosarcoma susceptibly: a case control study. Int J Clin Exp Med. 2015;8:6114–6118. [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Wang YB, Jia N, Xu CM, Zhao L, Zhao Y, Wang X, Jia TH. Selecting key genes associated with osteosarcoma based on a differential expression network. Genet Mol Res. 2015;14:17708–17717. doi: 10.4238/2015.December.21.44. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Zheng X, Xue C, Luo G, Hu Y, Luo W, Sun X. Identification of crucial genes in intracranial aneurysm based on weighted gene coexpression network analysis. Cancer Gene Ther. 2015;22:238–245. doi: 10.1038/cgt.2015.10. [DOI] [PubMed] [Google Scholar]

PERMALINK

Weighted gene co-expression network analysis revealed key biomarkers associated with the diagnosis of hypertrophic cardiomyopathy

Xin Li

Chenxin Wang

Xiaoqing Zhang

Jiali Liu

Yu Wang

Chunpu Li

Dongmei Guo

Abstract

Objective

Methods

Results

Conclusions

Introduction

Materials and methods

Microarray data analysis

Co-expression module construction

Interaction analysis of co-expression modules

Functional enrichment analysis

Hub gene identification and validation

Gene set enrichment analysis

Results

HCM dataset pre-processing

Fig. 1.

Identification of co-expression modules of HCM genes

Fig. 2.

Fig. 3.

Correlation analysis of co-expression modules

Fig. 4.

Fig. 5.

Functional enrichment analysis in critical co-expression modules

Fig. 6.

Table 1.

Table 2.

Hub gene identification and validation

Fig. 7.

Gene set enrichment analysis

Fig. 8.

Discussion

Conclusions

Acknowledgements

Abbreviations

Authors’ contributions

Funding

Availability of data and materials

Ethics approval and consent to participate

Consent for publication

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases