Abstract
Pulmonary arterial hypertension (PAH) is a life-threatening disease with a poor prognosis, and metabolic abnormalities play a critical role in its development. This study used metabolomics, machine learning algorithms and bioinformatics to screen for potential metabolic biomarkers associated with the diagnosis of PAH. In this study, plasma samples were collected from 17 patients diagnosed with idiopathic pulmonary arterial hypertension (IPAH) and 20 healthy controls. Plasma metabolomic profiling was performed by high-performance liquid chromatography-mass spectrometry. Gene profiles of PAH patients were obtained from the GEO database. Key differentially expressed metabolites (DEMs) and metabolism-related genes were subsequently identified using machine learning algorithms. Twenty differential plasma metabolites associated with IPAH were identified (VIP score > 1 and p < 0 0.05), and enrichment analysis revealed the arginine biosynthesis pathway as the most altered pathway. Using machine learning models, including least absolute shrinkage and selection operator (LASSO), random forest (RF) and support vector machine (SVM), we extracted key metabolites that correlated with clinical phenotypes. Our results suggested that five metabolites, kynurenine, homoserine, tryptophan, AMP, and spermine, are potential biomarkers for IPAH. Bioinformatics analysis also identified 3 metabolism-related genes, MAPK6, SLC7A11 and CDC42BPA, that are strongly correlated with pulmonary hypertension, demonstrating strong predictive power and clinical relevance. Our findings revealed some key genes associated with metabolism in PH, and provided crucial information about complex metabolic reprogramming signals and may lead to the identification of useful metabolic biomarkers for the diagnosis of PAH.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-024-76514-7.
Keywords: Idiopathic pulmonary arterial hypertension, Metabolomics, Biomarkers, Machine learning, Biochemical pathways
Subject terms: Computational biology and bioinformatics, Biomarkers, Cardiology, Diseases, Medical research
Introduction
Pulmonary arterial hypertension (PAH) is a rare and severe condition characterized by progressive remodeling of the pulmonary vasculature, which ultimately leads to failure of the right ventricle (RV) and ultimately mortality if left untreated1,2. While conventional diagnostic approaches and therapeutic strategies have proven effective in certain cases, they have been unable to fully address the complex pathophysiology of PAH. The five-year and seven-year survival rates for individuals diagnosed with PAH are approximately 57% and 49%, respectively3. Early diagnosis of PAH is of significant importance to patients. Consequently, there is an urgent need to elucidate the potential mechanisms underlying PAH and to identify associated biomarkers.
In recent years, research into the potential causes of PAH has focused on the aberrant regulation of metabolic pathways. Metabolomics is becoming an increasingly important tool for investigating the nature of metabolic abnormalities associated with PAH. This approach allows the identification of novel biomarkers and therapeutic targets4,5. Furthermore, the integration of metabolomic data with other omics technologies, including genomics, transcriptomics, and proteomics, has begun to enhance our understanding of the pathogenesis of PAH. This approach has led to the discovery of new metabolic pathways that were previously not associated with this disease6,7. The utilization of these comprehensive omics strategies is likely to elucidate the intricate etiology of PAH and facilitate the development of novel therapeutic interventions.
A number of metabolomic studies of PAH have been conducted, and the results have already provided valuable insights into the potential for new pathway markers that could be used to aid in the diagnosis and prognosis of the disease8–10. Furthermore, high-resolution mass spectrometry (MS)-based metabolomics in conjunction with machine learning (ML) algorithms has been widely employed for the purpose of identifying probable metabolite biomarkers for the diagnosis of disease and the classification of clinical presentations11,12. A major challenge for metabolomic studies investigating the association between PAH and other clinical phenotypes, including those of similar complexity, is the selection of an appropriate statistical methodology to identify associations between metabolites and disease outcome. Alotaibi et al. demonstrated that the combination of conventional and statistical learning techniques can be used to analyze metabolomic data, revealing both convergent and divergent metabolic markers in pulmonary arterial hypertension patients. This highlights the complexity of the pathophysiology of the disease6.
The objective of this study was to identify the key metabolites and abnormal pathways associated with pulmonary hypertension. To this end, we performed targeted metabolomics analysis of plasma samples from 17 IPAH patients and healthy controls. In order to ascertain key metabolites that correlate with clinical phenotypes, we employed machine learning models, including LASSO, random forest and support vector machine models. Additionally, the dysregulated genes associated with PAH were investigated using gene expression profiling of lung tissues and peripheral blood mononuclear cells from PAH versus healthy individuals and enrichment analysis. Notably, there was a degree of overlap between the enriched pathways associated with metabolites and those linked to genes. Using targeted metabolomic differential analysis and multiple machine learning algorithms as core tools, this study focused on the expression landscape of metabolism-related genes and metabolites in PAH to identify promising signature metabolic markers. We hope that these results will provide new insights for the diagnostic study of PAH.
Results
Plasma metabolic profile and enrichment and clustering of metabolites of interest
Metabolite profiles were investigated in IPAH patients and healthy controls, as shown in the flowchart in Fig. 1. A clear distinction between IPAH patients and healthy controls was evident in the score plots generated by OPLS-DA (Fig. 2a). The R2Y and Q2 values for this model are presented in (Fig. 2b). The plasma samples from the PAH and control groups exhibited a discernible distinction in the clustering pattern when subjected to principal component analysis, with the R2Y (cumulated proportion of variance of X) and Q2 values exceeding the threshold of 0.5. This outcome indicated that the predictive ability and reliability of the OPLS-DA models were commendable. Significantly altered metabolites between healthy controls and patients with IPAH revealed distinct metabolite profiles (Fig. 2c). A total of 8 upregulated and 15 downregulated metabolites were detected in the IPAH group compared to the healthy control group (FC > 1.2 or < 0.5, p < 0.05) (Fig. 2d). The metabolites that exhibited the most significant changes were then ranked according to their variable importance in projection value. A VIP value greater than 1.0 indicated a significant contribution to the model (Fig. 2e). Among these altered metabolites, 20 metabolites had a VIP score > 1 and p < 0 0.05, and their expression in each sample is shown in (Supplementary Table S1).
Fig. 1.
Flowchart of the main analysis. Identification of discriminating metabolites from recombinant 17 IPAH versus 20 healthy controls. Enrichment of metabolite sets was analyzed using Metaboanalyst. The top 5 enriched pathways were identified and selected for further identification of metabolically associated genes (MAGs). Next, the common MAGs and differentially expressed genes (DEGs) in the lungs of PAH patients versus normal controls were identified from the GEO database. Key metabolites and genes that correlated with clinical phenotypes using machine learning models, including least absolute shrinkage and selection operator (LASSO), random forest (RF), and support vector machine (SVM). Hub genes were validated in peripheral blood mononuclear cells from GSE131793 and in lungs of PH patients from GSE53408 dataset.
Fig. 2.
Identification of metabolites and enriched pathways that distinguish metabolites from patients with IPAH and controls. (a) Scores plot generated from OPLS-DA model demonstrated a well separated sample distribution of healthy controls and IPAH patients. (b) R2Y and Q2 scores generated in PLS-DA, (R2Y, Q2 > 0.5 respectively). (c) Heatmap generated from plasma metabolic profiling of IPAH patients. Each column represents a single sample. (d) 8 up-regulated metabolites (red dots) and 15 down-regulated metabolites (green dots) were identified and visualized in volcano plot (fold change > 1.2 or < 0.5 and p < 0.05). (e) Significantly Altered Metabolites in Partial Least Discriminant Analysis (PLS-DA) Ranked by Variable Importance in Projection (VIP) Score. (f) Enriched pathways of all distinguishing metabolites.
Metabolic pathway enrichment and clustering analysis revealed that arginine biosynthesis, histidine metabolism, arginine and proline metabolism, glycine, serine and threonine metabolism were enriched with metabolites of interest which distinguished the IPAH patients from healthy controls (Fig. 2f). For further analysis in Genecards, the top 5 metabolite sets were selected. According to the results, 2589 unique genes were identified as MAGs with a relevance score > 8. ( Supplementary Table S2)
DEGs screening from the GEO database and bioinformatic analysis
The GSE113439 dataset was used as the training set, We then used R software to extract 523 DEGs from the gene expression matrix of the training set, of which 440 were significantly upregulated and 83 were significantly downregulated. The resulting DEGs are shown as a volcano plot (Fig. 3a) and a heatmap (Fig. 3b). The Venn diagram showed 89 overlapping genes between DEGs and MAGs (Fig. 3c), of which 78 genes were upregulated and 11 genes were downregulated. KEGG pathway analysis showed that the overlapping genes were significantly enriched in Lipid and atherosclerosis, Ferroptosis, Tryptophan metabolism, NOD-like receptor signaling pathway, Fatty acid degradation (Fig. 3d). The Disease Ontology (DO) showed that overlapping genes were significantly enriched for cardiovascular diseases such as myocardial infarction, atherosclerosis, pulmonary hypertension, peripheral vascular disease; metabolic diseases such as hyperglycemia, lipid storage disease, fatty liver disease, and glucose intolerance; and inflammatory diseases such as hepatitis, inflammatory bowel disease (Fig. 3e).
Fig. 3.
Identification of metabolites in association with DEGs. (a) Volcano map showing the differential gene. red and blue represent up- and downregulated genes, and gray represents no significant difference. (b) Heat map showing the up- and downregulated genes. The two colors represent different trends; the darker the color, the more prominent the trend. (c) 89 overlapping metabolite-associated genes (MAGs) and DEGs between patients and healthy controls were visualized in a Venn diagram. (d) Enrichment analysis results of KEGG. (e) Disease ontology (DO) enrichment analysis.
Machine learning of metabolites and diagnostic efficacy
After inputting the 20 metabolites into the LASSO regression model (Fig. 4a), RF (Fig. 4b) and SVM (Fig. 4c and d) classifier, 10, 18 and 15 signature metabolites were identified respectively. The metabolites identified by three machine learning algorithms were intersected to obtain five characteristic metaboli: kynurenine, homoserine, tryptophan, AMP, and spermine (Fig. 4e). To assess the diagnostic capacity of the metabolites, the receiver operating characteristic (ROC) curve analysis was employed to examine the data set. The area under the ROC curve was found to be 91.3% for AMP (Fig. 5a), 81.8% for kynurenine (Fig. 5b), 81.8% for tryptophan (Fig. 5c), 80.3% for homoserine (Fig. 5d), and 79.1% for spermine (Fig. 5e), in IPAH patients, respectively. The joint diagnostic efficacy of the selected metabolites was analyzed using the random forest method, resulting in an area under the curve (AUC) of 95.2% for IPAH (Fig. 5f).
Fig. 4.
Screening of DEMs via the machine learning. (a) A DEMs screened by Lasso. (b) Random forest (RF) algorithm. (c) and (d) The process of feature gene selection using SVM technique. (e) VENN diagram of hub DEMs.
Fig. 5.
ROC curve. ROC curve of AMP (a), kynurenine (b), tryptophan (c), homoserine (d), and spermine (e), the AUC for the joint diagnostic efficacy of the selected metabolites was analyzed using the random forest method (f).
Machine learning algorithms for identifying the target genes
We used three machine learning models to analyze gene expression data in PAH patients. LASSO regression was used to narrow down the differentially expressed DEGs, resulting in the discovery of 10 genes as potential markers for PAH (Fig. 6a). Nine genes were shortlisted by random forest (Fig. 6b). In addition, the SVM method identified 14 genes as significant biomarkers (Fig. 6c and d). The genes identified by three machine learning algorithms were intersected to yield three key characteristic genes: MAPK6, SLC7A11 and CDC42BPA (Fig. 6e).
Fig. 6.
Machine learning to jointly screen hub genes. (a) 10 characteristic genes are selected by the LASSO algorithm. (b) 9 characteristic genes are selected by the RF algorithm. (c) and (d) 14 characteristic genes are selected by the SVM algorithm (e) Venn diagram of the overlapping genes identified by the three machine learning algorithms.
Verification of hub gene expression and diagnostic efficacy
In the validation set GSE53408 and GSE131793, these three genes were confirmed as key PAH genes with expression patterns consistent with those in the training set. The results demonstrated a significant elevation in mRNA expression levels of MAPK6, SLC7A11 and CDC42BPA in PAH patients when compared to those in the control group (all P < 0.05) (Fig. 7a, b and c). In the GSE113439 training dataset, the AUC values for MAPK6, SLC7A11, and CDC42BPA were 1, 0.9879, and 1, respectively (Fig. 7a). In the GSE131793 validation set, the AUC values for the same markers were 0.86, 0.89, and 0.83, respectively (Fig. 7b). Additionally, in the GSE53408 validation set, the AUC values remained consistently high, with 1, 0.9621, and 0.9773 for MAPK6, SLC7A11, and CDC42BPA, respectively (Fig. 7c). Taken together, MAPK6, SLC7A11 and CDC42BPA were found to be strong predictors of PAH.
Fig. 7.
Verification of the expression of hub genes and the diagnostic power of hub genes. (a) Expression of hub genes and receiver operating characteristic (ROC) curve analysis of the training set (GSE113439). (b) Expression of hub genes and ROC curve analysis of the test set (GSE131793). AUC represents the area under the curve. (c) Expression of hub genes and ROC curve analysis of the test set (GSE53408). AUC represents the area under the curve.
Discussion
Recent research has increasingly highlighted the role of metabolomics in understanding PAH. Metabolomics provides a unique vantage point to examine the perturbations in small molecule metabolites that result from or contribute to the PAH pathophysiology. For instance, studies have identified alterations in lipid metabolism, glycolysis, and amino acid pathways in PAH patients, indicating a systemic shift in energy utilization and biosynthesis13. Another study highlighted the role of disrupted nitric oxide pathways and suggested potential metabolic signatures that could be linked to the severity and progression of PAH14. A combination of liquid and gas chromatography-based mass spectrometry was employed to ascertain that patients with severe pulmonary arterial hypertension (PAH) exhibited disrupted glycolysis, an increased tricarboxylic acid (TCA) cycle, and altered fatty acid metabolism with changes in oxidation pathways15.
Nevertheless, the construction of a clinically useful predictor based on such a large number of features is not a viable option. Furthermore, given the reduced predictive power of a single metabolite and its variation across cohorts16,17, our objective was to construct a robust predictor for PAH diagnosis with a small number of metabolites by leveraging the metabolites associated with diagnosis across our cohort. The application of machine learning algorithms in metabolomics is becoming increasingly important in the interpretation of complex biological datasets. Machine learning offers robust analytical capabilities that can handle the vast and complex datasets typical of metabolomic studies, enabling researchers to uncover patterns and associations that may be missed through traditional statistical methods18,19. In this study, the machine learning algorithms LASSO, SVM, and RF were employed in a joint feature selection process for disease diagnostic factors. The selection of these statistical approaches was based on their complementary features. In particular, LASSO is effective for the selection of features when the number of predictors is significantly larger than the size of the sample20. The LASSO algorithm is also employed for categorical variables, thereby enhancing the predictive accuracy and interpretability of statistical models21. Random forest and SVM are robust machine learning techniques that can effectively process high-dimensional data and have been previously utilized in metabolomics22. In our study, the use of algorithms like Random Forest and Support Vector Machine allowed for effective differentiation between control and IPAH patients, highlighting key metabolites that serve as potential biomarkers. This approach not only improves the accuracy of biomarker identification but also enhances our understanding of the underlying molecular mechanisms of diseases. The predictive modeling capabilities of machine learning can further assist in the clinical translation of metabolomic findings, providing a foundation for personalized medicine approaches for treating IPAH.
The arginine pathway was most prominent in our cohort, which is consistent with the findings of other relevant studies. Arginine serves as the substrate for the synthesis of nitric oxide (NO), a crucial mediator of vascular homeostasis and vasodilation23. Patients afflicted with pulmonary arterial hypertension (PAH), as well as other forms of pulmonary hypertension, exhibit reduced arginine bioavailability in comparison to healthy controls. One study demonstrated that in pulmonary hypertension, alterations in the arginine metabolic pathway are evident, particularly a strong inverse correlation between the ratio of arginine to ornithine and citrulline and key pulmonary hemodynamic indicators, indicating significant changes in arginine bioavailability24. Altered arginine metabolism, particularly through increased arginase activity, significantly impacts nitric oxide synthesis in pulmonary arterial hypertension. This highlights distinct metabolic endotypes among patients who could influence therapeutic approaches25. These studies corroborate our findings where significant alterations in the arginine synthesis pathway were observed, emphasizing the shift toward altered polyamine metabolism in IPAH.
In this study, a total of 20 differentially expressed metabolites (DEMs) were identified by metabolomics analysis. In this study, we used a combination of VIP values greater than 1 and raw P values less than 0.05 as criteria for identifying differential metabolites. This approach is commonly applied in metabolomics studies to balance model contribution and statistical significance. The VIP (Variable Importance in Projection) value, derived from PLS-DA models, reflects the importance of each metabolite in distinguishing between sample groups. Using VIP values helps ensure that the metabolites selected contribute meaningfully to the model’s ability to differentiate groups, while the raw P value ensures that the observed differences are statistically significant26. The use of VIP values combined with raw P values is a feasible and widely accepted method in metabolomics analysis27,28. It allows the inclusion of biologically relevant metabolites that may not meet the more stringent criteria of adjusted P values, but still play an important role in metabolic pathways. By applying both criteria, we aim to provide a comprehensive view of the metabolic changes while also maintaining statistical rigor.
These DEMs were mainly involved in various metabolic pathways, including arginine biosynthesis, histidine metabolism, arginine and proline metabolism, glycine, serine and threonine metabolism. The application of machine learning has identified five metabolites, namely AMP, kynurenine, homoserine, tryptophan, and spermine, which can significantly differentiate patients with pulmonary hypertension from healthy individuals. Adenosine monophosphate (AMP) acts as an intermediary in the energy metabolism of adenosine triphosphate (ATP) and is a critical element of the urea cycle. ATP-activated protein kinase (AMPK) is a highly conserved serine/threonine protein kinase that has a proapoptotic function in invasive smooth muscle cells (SMCs)29. Kynurenine is a direct metabolite of tryptophan, produced through the kynurenine pathway. This pathway is essential for the catabolism of tryptophan and is implicated in the production of several bioactive compounds, especially the immune response30. Gregory et al. identified metabolic signatures of right ventricular-pulmonary vascular dysfunction, revealing that tryptophan metabolites, particularly those produced via the indoleamine 2,3-dioxygenase (IDO) pathway, are closely associated with pulmonary hypertension and could serve as novel biomarkers24. The results contribute to an accumulating body of clinical and preclinical evidence indicating a role for the kynurenine pathway of tryptophan metabolism in the pathogenesis of PAH. Homoserine is an important intermediate in living organisms. As part of the amino acid synthesis pathway, homoserine metabolites play an important role in the regulation of cell proliferation and differentiation31,32. Homoserine plays a role in metabolic syndrome (MetS), a common health problem in which cardiovascular-metabolic risk factors are present. A study of patients with primary MetS and matched controls found significantly lower levels of homoserine and associations with markers of inflammation, blood glucose, blood pressure, and lipocalin in patients with MetS compared to controls33. Yang et al. concludes that elevated plasma spermine promotes pulmonary vascular remodeling in pulmonary arterial hypertension, and targeting spermine synthase may offer a novel therapeutic approach for the disease34.
Three hub genes were identified through machine learning, and their expressions were further validated using an additional PAH dataset, demonstrating robust diagnostic value. MAPK6 is a member of the MAPK signaling pathway, which plays an important role in how cells respond to external stimuli. MAPK6 is implicated in a number of cellular processes, including proliferation, differentiation, invasion, and immune response35–37. A large body of scientific literature supports the involvement of kynurenine and tryptophan in inflammatory responses and immune regulation38. Both metabolites are key players in the kynurenine pathway. The kynurenine pathway is the primary route of tryptophan catabolism. This pathway is critical for immune regulation, particularly in the control of inflammation. Kynurenine and its derivatives have been shown to have immunosuppressive and anti-inflammatory properties that have implications for a variety of immune-related diseases39. Dysregulation of this pathway, especially under conditions of immune stress, can lead to pathological conditions such as cardiovascular disease, autoimmune syndromes, and neurodegenerative disorders40. In relation to MAPK6, this gene is involved in pathways that regulate cellular responses to inflammation. The interaction between MAPK6 dysregulation and kynurenine or tryptophan metabolism may contribute to the inflammatory processes seen in diseases such as pulmonary arterial hypertension, where both immune regulation and cellular metabolism are important.SLC7A11, also known as xCT, is a cystine/glutamate reverse transporter protein that plays a primary role in the cystine metabolic pathway41. Glutathione is a crucial antioxidant that safeguards cells from oxidative stress by neutralizing reactive oxygen species (ROS) and upholding cellular redox homeostasis. SLC7A11 therefore plays a pivotal role in regulating cellular redox homeostasis and in resisting cellular demise, such as that resulting from iron toxicity42. The SLC7A11 gene is overexpressed in a multitude of human malignancies, exhibiting a correlation with tumor growth, proliferation, dissemination, the tumor microenvironment and resistance to treatment43,44. The metabolite spermine, a polyamine, is known to be involved in the regulation of oxidative stress and polyamine metabolism45. Polyamine metabolism is essential for cell growth and survival under stress conditions. Altered polyamine metabolism, as seen with elevated spermine levels, may exacerbate the effects of SLC7A11 dysregulation. Consequently, it is also regarded as an important target for cancer therapy. CDC42BPA is a serine/threonine protein kinase that binds to the small GTPase CDC42 and is involved in the regulation of cytoskeletal reorganization, cell migration and shape change. CDC42BPA affects cell cycle-related proteins and signaling pathways and has been shown to promote cell proliferation46. Energy metabolism, particularly the use of ATP, is intimately linked to cytoskeletal dynamics, as cytoskeletal rearrangements are energy-intensive processes. Studies indicate that around 50% of cellular ATP is consumed in maintaining the cytoskeleton’s structure and function, suggesting that changes in AMP and ATP levels can directly influence these dynamics47.
The present study employs a multi-faceted approach to elucidate the expression landscape of metabolites and metabolism-related genes in PAH. This approach employs a combination of advanced analytical techniques, including targeted metabolomics differential analysis and multiple machine learning algorithms, to identify potential signature biomarkers. The application of intelligent algorithms to mine variables becomes crucial in light of the substantial amount of metabolomic information present in PAH. In this study, five distinctive DEMs were identified in the serum of PAH patients through the application of bioinformatics techniques, including Lasso, SVM, and RF algorithms. Furthermore, three genes related to metabolism were identified. Subsequently, the diagnostic efficacy of these five metabolites was evaluated by means of ROC curve analysis, and they were finally identified as potential new markers for the diagnosis of PAH. Furthermore, the expression of metabolism-related genes was validated in lung tissue or PBMCs from three human datasets. However, in view of the exploratory nature of this study and the limited number of subjects, we elected to use completely healthy individuals as controls. The decision to use healthy individuals as controls may result in an overestimation of the sensitivity and specificity of the metabolic biomarkers. In future large sample studies, we will select case controls as the control group to authenticate the diagnostic efficacy of these biomarkers.
Conclusions
In this study, five different metabolites and three metabolism-related genes associated with the diagnosis of PAH patients were identified using metabolomics, machine learning algorithms and bioinformatics. Investigating the association between metabolic traits and PAH provides new insights into the underlying biological mechanisms, which could potentially improve the diagnosis and treatment of PAH.
Methods
Study design and samples
The case-control study included 17 patients diagnosed with IPAH derived from China-Japan Union Hospital of Jilin University, along with 20 age- and sex-matched healthy controls. All participants provided fasting plasma samples at a consistent time of day into EDTA tubes. Subsequently, the samples were frozen at a temperature of -80 °C for subsequent metabolomic analysis. The diagnosis of PAH was based on the established criteria outlined in the 2015 European Society of Cardiology (ESC) and the European Respiratory Society (ERS) guidelines for the diagnosis and treatment of pulmonary hypertension48: a mean pulmonary artery pressure (mPAP) of ≥ 25 mmHg, pulmonary vascular resistance (PVR) of ≥ 3 Wood units, and a pulmonary artery wedge pressure (PAWP) of ≤ 15 mmHg, as defined by the consensus at the time of cohort enrollment. The clinical characteristics were displayed in Table 1.
Table 1.
Clinical characteristics of the study population.
| Characteristics | IPAH (n = 17) | Control (n = 20) | P-value |
|---|---|---|---|
| Age(years) | 65.82 (16.09) | 58.60(14.27) | 0.157 |
|
BMI (kg/m2) Male, n (%) |
23.21 (2.18) 9 (52.9%) |
23.33 (3.68) 10 (50.0%) |
0.903 - |
| NT-ProBNP (pg/ml) | 3549.71 (453.12) | - | - |
| 6MMW(m) | 355.09(174.67) | - | - |
| mPAP (mmHg) | 68.96 (22.76) | - | - |
| PVR(wood) | 9.89 (5.51) | - | - |
Data are expressed as the mean ± SD or as number (%).
BMI, body mass index; 6MWD, 6-minute walking distance; IPAH, idiopathic pulmonary arterial hypertension; mPAP, mean pulmonary arterial pressure; NT-proBNP, N-terminal pro-B-type natriuretic peptide; PVR, pulmonary vascular resistance.
Sample preparation and metabolite identification
A targeted metabolomic panel was employed to analyze 120 small molecule metabolites encompassing core networks of energy, amino acid, amine, nucleotide metabolism lipid, organic acid and so on. (details in Supplementary Fig. 1). Initially, 100 µL of plasma was combined with 20 µL of a reducing agent and 20 µL of polar metabolite internal standards. The mixture was then vortexed for 10 s, after which 400 µL of acetonitrile was added. The sample was vortexed for five minutes, allowed to stand for 15 min, and then centrifuged at 13,000 rpm for 10 min at 4 °C. The supernatant was then transferred to a glass tube and evaporated under a gentle nitrogen stream at room temperature.
The dried residue was reconstituted with 100 µL of a methanol/acetonitrile mixture (75:25, v/v) using ultrasonic waves, then centrifuged again at 13,000 rpm for 10 min at 4 °C. The transparent supernatant was collected for targeted metabolite analysis using the Agilent 6490 Triple Quadrupole LC-MS. Chromatographic separation was conducted on a Waters XBridge Amide column (2.1 × 100 mm, 3.5 μm particle size) maintained at 35 °C. The mobile phase A consisted of 50% acetonitrile with 15 mM ammonium acetate and 0.2% ammonium hydroxide, while the mobile phase B was acetonitrile/water (95:5, v/v) with the same additives. The gradient program was as follows: The mobile phase was initially maintained at 100% B for 10 min, then gradually decreased to 0% B over the subsequent 13 min. This was followed by a gradual increase to 100% B over the subsequent 1 min, and finally maintained at 100% B for the remaining 10 min. The flow rate was set to 0.3 mL/min, with an injection volume of 5 µL at an injection temperature of 4 °C.
Identification of differential metabolites and functional enrichment between different groups
To identify the most significant metabolites involved in the pathophysiology of PAH, we employed the online analytical tool MetaboAnalyst 6.0, which facilitates the exploration of potential metabolites and involved pathways49. For further data analysis, Orthogonal partial least square-discriminate analysis (OPLS-DA) was employed to visually discriminate between groups. In order to reduce the noise and artifacts present in the metabolomic data, all measured concentrations were mean-centered and auto-scaled. The quality and predictability of the PLS-DA model were then evaluated by R2Y (cum) and Q2 (cum) values, respectively. Variables were considered to discriminate between groups if they had a Variable Important in Projection (VIP) value > 1 and corresponding p < 0.05. Metabolite Set Enrichment Analysis was conducted to identify biologically meaningful patterns that were significantly enriched in the quantitative metabolomic data. The top 5 metabolic pathways were selected as keywords for analysis in Genecards (https://www.genecards.org). Genes with a relevance score > 8 were defined as metabolically associated genes (MAGs). Subsequently, machine learning was employed to identify the pivotal metabolites.
Further insight into the gene expression profiles of the PAH
The gene expression profiles of PAH patients were utilized for analysis. All gene expression microarray data were obtained from the Gene Expression Omnibus (GEO) database in the form of a standardized and quality-controlled gene expression matrix (https://www.ncbi.nlm.nih.gov/geo/). The GSE113439 dataset was integrated as training sets, a total of 26 lung samples were obtained, which includes 15 PAH samples and 11 control samples. The validation datasets selected for gene expression comparisons were GSE131793, a microarray expression matrix containing 12 PAH and 11 normal control samples from human peripheral blood mononuclear cells, and GSE53408, containing 12 PAH and 11 normal control samples from human lung tissue, both from the GEO database. The genes with |logFC| >1 and p < 0.05 were considered significant. The intersection of MAGs and DEGs was considered as critical metabolism-related genes. To investigate the potential pathways involved, disease ontology (DO) and KEGG enrichment analysis50 of the critical metabolism-related genes was performed using the R packages ‘org.Hs.eg.db’, ‘clusterProfiler’51 and ‘DOSE’ packages52. Differences at p < 0.05 were considered statistically significant. Subsequently, machine learning was employed to identify the hub genes among these genes.
Identification of disease signature metabolites and hub gene using machine learning
In this study, the machine learning algorithms LASSO, SVM and RF were jointly used for feature selection of disease diagnostic factors. For the analysis of high-dimensional data, LASSO is a dimensionality reduction algorithm. LASSO estimates regression coefficients by minimizing the residual sum of squares plus a penalty term, using a regularization parameter λ to balance data fit and sparsity, thereby reducing dimensionality and retaining valuable variables to avoid overfitting. LASSO regression constructs a refined model, and the “glmnet” software package determines the optimal lambda through tenfold cross-validation, minimizing cross-validation error53. The support vector machine (SVM) is a feature selection algorithm which can sort the features according to the recursive feature deletion sequence. Using the R package “e1071”, the eigenvectors generated by the support vector machine (SVM) were eliminated to extract the optimal variables for identifying diagnostics in PAH54. The Random Forest algorithm is an integrated learning method that uses multiple decision trees for training and prediction. Random forest software was used to build the RF model. Candidate genes and metabolisms were identified based on their importance in relation to PAH55. We have applied each of these three machine learning methods to the identification of differential genes as well as to the identification of differential metabolites. Finally, a Venn diagram was employed to identify the shared DEG and DEMs. One method to determine the diagnostic performance of PAH and control samples is the area under the receiver operating characteristic curve (AUC). AUC > 0.7 indicates relatively satisfactory diagnostic performance.
Statistical analysis
Multivariate statistical analyses in targeted metabolomics were performed using the online analysis tool MetaboAnalyst 6.0. Statistical analysis of the data was performed using R software (version 3.6.3) and GraphPad Prism software (version 8.0.0). The receiver operating curve (ROC) was used to assess the efficacy of the model, and the area under the curve (AUC) was used to assess the efficacy of the model. Student’s t-test and one-way ANOVA were used to compare two and multiple groups, respectively. A value of P < 0.05 was considered to indicate statistical significance. All the data are expressed as the mean ± SEM.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Author contributions
Z.H.K. conceived the study, designed the experiments, acquired the funding and drafted the article. Y.C. performed the bioinformatics analyses, interpreted the data and wrote the manuscript. L.Y.H. was responsible for bioinformatics analysis. All authors Z.H.K., Y.C. and L.Y.H. contributed to revising the manuscript critically for important intellectual content and have given final approval for the version to be published. Each author agrees to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Funding
This work was supported by grants from the Natural Science Foundation of Jilin Province(YDZJ202301ZYTS525).
Data availability
The gene expression profiles of GSE131793, GSE113439 and GSE53408 were downloaded from Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/). The datasets of metabolite are available from the corresponding author on reasonable request.
Declarations
Competing interests
The authors declare no competing interests.
Ethical approval
for research involving human subjects was obtained from the Ethics Committee of the China-Japan Union Hospital of Jilin University. All methods were performed in accordance with the relevant guidelines and regulations. Informed consent was obtained from all subjects and/or their legal guardians.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Schermuly, R. T., Ghofrani, H. A., Wilkins, M. R. & Grimminger, F. Mechanisms of disease: Pulmonary arterial hypertension. Nat. Rev. Cardiol.8, 443–455. 10.1038/nrcardio.2011.87 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Lau, E. M. T., Giannoulatou, E., Celermajer, D. S. & Humbert, M. Epidemiology and treatment of pulmonary arterial hypertension. Nat. Rev. Cardiol.14, 603–614. 10.1038/nrcardio.2017.84 (2017). [DOI] [PubMed] [Google Scholar]
- 3.Benza, R. L. et al. An evaluation of long-term survival from time of diagnosis in pulmonary arterial hypertension from the REVEAL Registry. Chest142, 448–456. 10.1378/chest.11-1460 (2012). [DOI] [PubMed] [Google Scholar]
- 4.Li, M. et al. Metabolic reprogramming regulates the proliferative and inflammatory phenotype of adventitial fibroblasts in pulmonary hypertension through the transcriptional corepressor C-terminal binding protein-1. Circulation134, 1105–1121. 10.1161/CIRCULATIONAHA.116.023171 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Sutendra, G. & Michelakis, E. D. The metabolic basis of pulmonary arterial hypertension. Cell. Metab.19, 558–573. 10.1016/j.cmet.2014.01.004 (2014). [DOI] [PubMed] [Google Scholar]
- 6.Zhao, Y. D. et al. De novo synthesize of bile acids in pulmonary arterial hypertension lung. Metabolomics10, 1169–1175. 10.1007/s11306-014-0653-y (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Reel, P. S., Reel, S., Pearson, E., Trucco, E. & Jefferson, E. Using machine learning approaches for multi-omics data analysis: A review. Biotechnol. Adv.49, 107739. 10.1016/j.biotechadv.2021.107739 (2021). [DOI] [PubMed] [Google Scholar]
- 8.Pi, H. et al. Metabolomic signatures associated with pulmonary arterial hypertension outcomes. Circ. Res.132, 254–266. 10.1161/CIRCRESAHA.122.321923 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Chen, C. et al. Metabolomics reveals metabolite changes of patients with pulmonary arterial hypertension in China. J. Cell. Mol. Med.24, 2484–2496. 10.1111/jcmm.14937 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Alotaibi, M. et al. Sex-related differences in eicosanoid levels in chronic thromboembolic pulmonary hypertension. Am. J. Respir. Cell. Mol. Biol.68, 228–231. 10.1165/rcmb.2022-0272LE (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Buergel, T. et al. Metabolomic profiles predict individual multidisease outcomes. Nat. Med.28, 2309–2320. 10.1038/s41591-022-01980-3 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Albaradei, S. et al. Machine learning and deep learning methods that use omics data for metastasis prediction. Comput. Struct. Biotechnol. J.19, 5008–5018. 10.1016/j.csbj.2021.09.001 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Rhodes, C. J. et al. Plasma metabolomics implicates modified transfer RNAs and altered bioenergetics in the outcomes of pulmonary arterial hypertension. Circulation135, 460–475. 10.1161/CIRCULATIONAHA.116.024602 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Sharma, S. et al. Altered carnitine homeostasis is associated with decreased mitochondrial function and altered nitric oxide signaling in lambs with pulmonary hypertension. Am. J. Physiol. Lung Cell. Mol. Physiol.294, L46–56. 10.1152/ajplung.00247.2007 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Zhao, Y. et al. Metabolomic heterogeneity of pulmonary arterial hypertension. PLoS One9, e88727. 10.1371/journal.pone.0088727 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Correale, M. et al. Circulating biomarkers in pulmonary arterial hypertension: An update. Biomolecules14. 10.3390/biom14050552 (2024). [DOI] [PMC free article] [PubMed]
- 17.Bujak, R. et al. New biochemical insights into the mechanisms of pulmonary arterial hypertension in humans. PLoS One11, e0160505. 10.1371/journal.pone.0160505 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Galal, A., Talal, M. & Moustafa, A. Applications of machine learning in metabolomics: Disease modeling and classification. Front. Genet.13, 1017340. 10.3389/fgene.2022.1017340 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Liebal, U. W., Phan, A. N. T., Sudhakar, M., Raman, K. & Blank, L. M. Machine learning applications for mass spectrometry-based metabolomics. Metabolites10. 10.3390/metabo10060243 (2020). [DOI] [PMC free article] [PubMed]
- 20.Deutelmoser, H. et al. Robust Huber-LASSO for improved prediction of protein, metabolite and gene expression levels relying on individual genotype data. Brief. Bioinform.22. 10.1093/bib/bbaa230 (2021). [DOI] [PMC free article] [PubMed]
- 21.Sampson, J. N., Chatterjee, N., Carroll, R. J. & Müller, S. Controlling the local false discovery rate in the adaptive Lasso. Biostatistics14, 653–666. 10.1093/biostatistics/kxt008 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Barberis, E. et al. Precision medicine approaches with metabolomics and artificial intelligence. Int. J. Mol. Sci.23. 10.3390/ijms231911269 (2022). [DOI] [PMC free article] [PubMed]
- 23.Vallance, P. & Chan, N. Endothelial function and nitric oxide: Clinical relevance. Heart85, 342–350. 10.1136/heart.85.3.342 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lewis, G. D. et al. Metabolic profiling of right ventricular-pulmonary vascular function reveals circulating biomarkers of pulmonary hypertension. J. Am. Coll. Cardiol.67, 174–189. 10.1016/j.jacc.2015.10.072 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kao, C. C. et al. Arginine metabolic endotypes in pulmonary arterial hypertension. Pulm. Circ.5, 124–134. 10.1086/679720 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Anwardeen, N. R., Diboun, I., Mokrab, Y., Althani, A. A. & Elrayess, M. A. Statistical methods and resources for biomarker discovery using metabolomics. BMC Bioinform.24, 250. 10.1186/s12859-023-05383-0 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zhang, Y. et al. Human serum metabolomic analysis reveals progression for high blood pressure in type 2 diabetes mellitus. BMJ Open. Diabetes Res. Care9. 10.1136/bmjdrc-2021-002337 (2021). [DOI] [PMC free article] [PubMed]
- 28.Chen, W. et al. Using an untargeted metabolomics approach to analyze serum metabolites in COVID-19 patients with nucleic acid turning negative. Front. Pharmacol.13, 964037. 10.3389/fphar.2022.964037 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Igata, M. et al. Adenosine monophosphate-activated protein kinase suppresses vascular smooth muscle cell proliferation through the inhibition of cell cycle progression. Circ. Res.97, 837–844. 10.1161/01.Res.0000185823.73556.06 (2005). [DOI] [PubMed] [Google Scholar]
- 30.Bartoli, F. et al. The association of kynurenine pathway metabolites with symptom severity and clinical features of bipolar disorder: An overview. Eur. Psychiatry65, e82. 10.1192/j.eurpsy.2022.2340 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Tao, S. et al. N-(3-oxododecanoyl)-l-homoserine lactone modulates mitochondrial function and suppresses proliferation in intestinal goblet cells. Life Sci.201, 81–88. 10.1016/j.lfs.2018.03.049 (2018). [DOI] [PubMed] [Google Scholar]
- 32.Guo, J. et al. N-(3-oxododecanoyl)-homoserine lactone regulates osteoblast apoptosis and differentiation by mediating intracellular calcium. Cell. Signal.75, 109740. 10.1016/j.cellsig.2020.109740 (2020). [DOI] [PubMed] [Google Scholar]
- 33.Patel, A., Abdelmalek, L., Thompson, A. & Jialal, I. Decreased homoserine levels in metabolic syndrome. Diabetes Metab. Syndr.14, 555–559. 10.1016/j.dsx.2020.04.052 (2020). [DOI] [PubMed] [Google Scholar]
- 34.He, Y. Y. et al. Spermine promotes pulmonary vascular remodelling and its synthase is a therapeutic target for pulmonary arterial hypertension. Eur. Respir. J.56. 10.1183/13993003.00522-2020 (2020). [DOI] [PubMed]
- 35.Bogucka, K. et al. ERK3/MAPK6 controls IL-8 production and chemotaxis. Elife9. 10.7554/eLife.52511 (2020). [DOI] [PMC free article] [PubMed]
- 36.Bogucka-Janczi, K. et al. ERK3/MAPK6 dictates CDC42/RAC1 activity and ARP2/3-dependent actin polymerization. Elife12. 10.7554/eLife.85167 (2023). [DOI] [PMC free article] [PubMed]
- 37.Tan, J., Yang, L., Liu, C. & Yan, Z. MicroRNA-26a targets MAPK6 to inhibit smooth muscle cell proliferation and vein graft neointimal hyperplasia. Sci. Rep.7, 46602. 10.1038/srep46602 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Ambrosio, L. F. et al. Association between altered tryptophan metabolism, plasma aryl hydrocarbon receptor agonists, and inflammatory Chagas disease. Front. Immunol.14, 1267641. 10.3389/fimmu.2023.1267641 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Xue, C. et al. Tryptophan metabolism in health and disease. Cell. Metab.35, 1304–1326. 10.1016/j.cmet.2023.06.004 (2023). [DOI] [PubMed] [Google Scholar]
- 40.Seo, S. K. & Kwon, B. Immune regulation through tryptophan metabolism. Exp. Mol. Med.55, 1371–1379. 10.1038/s12276-023-01028-7 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Jyotsana, N., Ta, K. T. & DelGiorno, K. E. The role of cystine/glutamate antiporter SLC7A11/xCT in the pathophysiology of cancer. Front. Oncol.12, 858462. 10.3389/fonc.2022.858462 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Lee, J. & Roh, J. L. SLC7A11 as a gateway of metabolic perturbation and ferroptosis vulnerability in cancer. Antioxid. (Basel)11. 10.3390/antiox11122444 (2022). [DOI] [PMC free article] [PubMed]
- 43.Zeng, C. et al. SHARPIN promotes cell proliferation of cholangiocarcinoma and inhibits ferroptosis via p53/SLC7A11/GPX4 signaling. Cancer Sci.113, 3766–3775. 10.1111/cas.15531 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Hu, P. et al. The mechanism of the imbalance between proliferation and ferroptosis in pulmonary artery smooth muscle cells based on the activation of SLC7A11. Eur. J. Pharmacol.928, 175093. 10.1016/j.ejphar.2022.175093 (2022). [DOI] [PubMed] [Google Scholar]
- 45.Leonetti, A. et al. Epileptic seizures and oxidative stress in a mouse model over-expressing spermine oxidase. Amino Acids52, 129–139. 10.1007/s00726-019-02749-8 (2020). [DOI] [PubMed] [Google Scholar]
- 46.He, P. Y. et al. Inhibition of cell migration and invasion by miR–29a–3p in a colorectal cancer cell line through suppression of CDC42BPA mRNA expression. Oncol. Rep.38, 3554–3566. 10.3892/or.2017.6037 (2017). [DOI] [PubMed] [Google Scholar]
- 47.DeWane, G., Salvi, A. M. & DeMali, K. A. Fueling the cytoskeleton - links between cell metabolism and actin remodeling. J. Cell. Sci.134. 10.1242/jcs.248385 (2021). [DOI] [PMC free article] [PubMed]
- 48.Galiè, N. et al. 2015 ESC/ERS guidelines for the diagnosis and treatment of pulmonary hypertension: The joint task force for the diagnosis and treatment of pulmonary hypertension of the European Society of Cardiology (ESC) and the European Respiratory Society (ERS): Endorsed by: Association for European Paediatric and Congenital Cardiology (AEPC), International Society for Heart and Lung Transplantation (ISHLT). Eur. Heart J.37, 67–119. 10.1093/eurheartj/ehv317 (2016). [DOI] [PubMed] [Google Scholar]
- 49.Chong, J., Wishart, D. S. & Xia, J. Using MetaboAnalyst 4.0 for comprehensive and integrative metabolomics data analysis. Curr. Protoc. Bioinform.68, e86. 10.1002/cpbi.86 (2019). [DOI] [PubMed] [Google Scholar]
- 50.Kanehisa, M. & Goto, S. KEGG: Uyoto encyclopedia of genes and genomes. Nucleic Acids Res.28, 27–30. 10.1093/nar/28.1.27 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Yu, G., Wang, L. G., Han, Y. & He, Q. Y. clusterProfiler: An R package for comparing biological themes among gene clusters. Omics16, 284–287. 10.1089/omi.2011.0118 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Yu, G., Wang, L. G., Yan, G. R. & He, Q. Y. DOSE: An R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics31, 608–609. 10.1093/bioinformatics/btu684 (2015). [DOI] [PubMed] [Google Scholar]
- 53.Huang, H. Controlling the false discoveries in LASSO. Biometrics73, 1102–1110. 10.1111/biom.12665 (2017). [DOI] [PubMed] [Google Scholar]
- 54.Lin, X. et al. Selecting feature subsets based on SVM-RFE and the overlapping ratio with applications in bioinformatics. Molecules23. 10.3390/molecules23010052 (2017). [DOI] [PMC free article] [PubMed]
- 55.Breiman, L. Random forests. Mach. Learn.45, 5–32. 10.1023/A:1010933404324 (2001).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The gene expression profiles of GSE131793, GSE113439 and GSE53408 were downloaded from Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/). The datasets of metabolite are available from the corresponding author on reasonable request.







