Skip to main content
PLOS Digital Health logoLink to PLOS Digital Health
. 2025 Dec 19;4(12):e0001125. doi: 10.1371/journal.pdig.0001125

Towards precision psychiatry: Metabolomics identifies three biological subtypes of depression

Simeng Ma 1,, Zhaowen Nie 1,, Mengyuan Zhang 2,, Junhua Mei 3, Enqi Zhou 1, Zhiyi Hu 2, Honggang Lv 1, Qian Gong 1, Gaohua Wang 1, Huiling Wang 1, Bo Du 4, Jun Yang 2,*,#, Zhongchun Liu 1,5,6,*,#
Editor: Hadi Ghasemi7
PMCID: PMC12716697  PMID: 41417750

Abstract

Depression is clinically and biologically heterogeneous, mandating classification strategies for personalized medicine. This study explored depression subtypes using metabolomics data from the UK Biobank and validated the subtypes in the Whitehall II cohort. The five-step analysis included: (1) identification of distinct subtypes using non-negative matrix factorization (NMF) and four machine learning algorithms; (2) genome-wide association studies (GWAS) to examine associations across subtypes and controls; (3) comparison of clinical characteristics across subtypes; (4) development of 24 subtype-specific diagnostic models and validation in an independent cohort; and (5) construction and comparison of metabolic networks across subtypes. Cluster analysis of 249 metabolomic indicators in individuals with current depressive episodes (n = 7,945) identified three metabolic subtypes of depression. Subtype 1 was characterized by fatty acid dysregulation, subtype 3 had a hyperlipidemia phenotype, while subtype 2 displayed an intermediate phenotype. Metabolic subtypes were not associated with SNPs. Diagnostic models built using the 249 metabolic indicators yielded the area under the curve (AUC) of 0.644 for the total depression sample and 0.785, 0.817, and 0.942 for subtypes 1, 2, and 3, respectively. Twenty-three additional diagnostic models based on combinations of metabolic indicators improved performance by 12.8-39.6% over a binary classification model. Metabolic networks significantly differed between each subtype and healthy controls but not between the total depressed group and controls. This study defines distinct metabolic subtypes of depression. Future research should combine high-throughput metabolomics with prospectively established depression cohorts and tailored interventions to explore subtype-specific diagnostic and therapeutic biomarkers.

Author summary

Depression is clinically and biologically heterogeneous, mandating classification strategies for personalized medicine. Can metabolomic profiling identify biologically distinct subtypes of depression and improve diagnostic accuracy? This study explored depression biological subtypes using metabolomics data. Analysis of 249 metabolic markers in 7,945 depressed patients using non-negative matrix factorization and machine learning revealed three subtypes: Subtype 1 (fatty acid dysregulation), Subtype 3 (hyperlipidemia), and Subtype 2 (intermediate phenotype). Diagnostic models incorporating these metabolic markers, validated in the Whitehall II cohort, outperformed binary classification, with AUC improvements of 12.8-39.6%. This study identifies three biological subtypes of depression, each demonstrating unique dysregulation patterns. Machine learning models incorporating these metabolic s indicators show enhanced diagnostic accuracy. These findings provide clues for future development of precision diagnostics and therapeutics for depression.

1. Introduction

Depression, a complex disease influenced genetic, environmental, psychological, and biological factors [1], is primarily diagnosed based on subjective clinical symptoms [2,3]. Despite progress in precision medicine, the absence of objective biomarkers for depression hampers clinical outcomes. Potential biomarkers for depression could originate from various dysregulated biological processes, including alterations in brain structure and function and peripheral changes in inflammatory, neurotransmitter, neurotrophic, neuroendocrine, and metabolic systems [1]. However, the complex etiology of depression cannot be fully captured by any single biological factor, and the interplay between these processes suggests that isolated examination may not suffice for clinical advancement.

Omics-based research, particularly metabolomics, offers a promising approach by capturing the intricate interactions between an organism and its environment [4]. Given the heterogeneity of depression, a binary diagnostic approach cannot accurately classify all patients. Instead, subtyping patients based on molecular or phenotypic data could enhance diagnosis, prediction, and treatment [5]. The metabolome, being the closest to the patient phenotype, is particularly suitable for molecular phenotyping [6].

Despite multiple metabolic alterations having been identified in depression [711], their utility as diagnostic biomarkers remain limited. Our previous machine learning analysis of the UK Biobank (n = 123,459) identified metabolic biomarkers associated with depression [12]. Diagnostic models incorporating these metabolites along with traditional risk factors achieved AUCs of 0.658 and 0.716 for lifetime and current depression, respectively. Here we build on this study to define depression subtypes based on the UK Biobank, the largest metabolomics database at present, and validate them in the Whitehall II cohort. We perform a five-step analysis: (1) identification of distinct subtypes using non-negative matrix factorization (NMF) and four machine learning algorithms; (2) genome-wide association studies (GWAS) to examine associations across subtypes and controls; (3) comparison of clinical characteristics across subtypes; (4) development of 24 subtype-specific diagnostic models and validation in an independent cohort; and (5) construction and comparison of metabolic networks across subtypes.

2. Results

2.1. Demographic characteristics

The detailed data analysis process is illustrated in Fig 1. Details of the metabolomic indicators are shown in Table A in S2 File. Of 183,926 participants included in the study, 7,945 were diagnosed with current depression. Table B in S2 File presents the demographic and clinical characteristics of the two groups.

Fig 1. Data analysis process.

Fig 1

Participants with depression were defined based on fields 130895 and 130896, which indicated the source and initial diagnosis date of depression, respectively. Exclusion criteria for the depression cohort included absence of metabolomics data, history of schizophrenia or bipolar disorder, diagnosed with cancer/ cerebrovascular disease/ substance dependence, missing depression diagnosis date, diagnosed with depression after the baseline assessment, and a PHQ-2 score <2 (indicating no current depressive episode). Healthy controls (HC) were included if they had metabolomics data, no psychiatric diagnosis, and a PHQ-2 score < 2.

A correlation heatmap of metabolomics data for individuals with current depression and healthy controls (HC) is shown in Fig 2A, which reveals strong correlations between lipoprotein subclasses and between various lipoprotein components. Compared to HC, individuals with current depression exhibited significantly more correlations between metabolic indicators (Fig 2B).

Fig 2. Application of metabolomics data to identify depression subtypes.

Fig 2

Correlation heatmap of metabolomic data for individuals with current depression (A) and healthy controls (B). (C) UMAP dimensionality reduction and visualization of identified clusters. (D) t-SNE dimensionality reduction and visualization of identified clusters. (E) Heatmap showing differential metabolites between the identified depression subtypes. In both A and B, correlations are shown in the bottom left, while the corresponding P-values are shown in the top right. These heatmaps represent -log10 transformed P-values for correlations between metabolic indicators.

2.2. Metabolic subtypes of depression

By analyzing 249 metabolic features, we identified three distinct metabolic subtypes (Table C in S2 File). As shown in Table B in S2 File, the three subtypes of depression included 3086, 3379, and 1480 individuals, respectively. Compared to subtype 1, subtypes 2 and 3 contained higher proportions of obesity, males, a greater number of co-morbid chronic diseases and metabolism-related disorders. However, no significant differences were observed among the three subgroups in terms of immune-related diseases.

The metabolomic differences between the three subtypes are depicted in Fig 2C and Table D in S2 File. The differences were particularly marked between subtypes 3 and 1, especially with respect to lipoprotein subclasses and fatty acid metabolism.

Subtype 3 showed upregulation of components associated with low density lipoprotein (LDL), very low density lipoprotein (VLDL), and intermediate density lipoprotein (IDL) subclasses, while high density lipoprotein (HDL) related components (excluding triglycerides) were downregulated. Subtype 1 showed completely opposite trends in these components. Similar trends were observed for fatty acid components: total fatty acids, omega-3, omega-6, polyunsaturated fatty acids (PUFA), monounsaturated fatty acids (MUFA), saturated fatty acids (SFA), and linoleic acid (LA) were significantly downregulated in subtype 1 and significantly upregulated in subtype 3. The degree of unsaturated PUFA to MUFA ratio (PUFA by MUFA) and docosahexaenoic acid to total fatty acids percentage (DHA%) were also significantly upregulated in subtype 1 and significantly downregulated in subtype 3.

Notably, subtype 3 exhibited characteristics consistent with hyperlipidemia, including elevated total cholesterol, triglycerides, and LDL cholesterol, as well as reduced HDL cholesterol, accompanied by increased apolipoprotein B, decreased apolipoprotein A1, and elevated apolipoprotein B to apolipoprotein A1 ratio.

Based on these results, we categorized the subtypes as follows: depression with fatty acid dysregulation (subtype 1), depression with an intermediate phenotype (subtype 2), and depression with hyperlipidemia (subtype 3).

2.3. GWAS results

GWAS data were present for 105,044 samples and 5,410,382 SNPs after quality control. We conducted four GWAS analyses to compare patients with depression (all depression cases and the three metabolic subtypes) with HC to explore the genetic basis of these subtypes. The results are shown in Fig 3A, and QQ-plots are provided in Fig A in S1 File.

Fig 3. Characteristics of identified metabolic subtypes.

Fig 3

(A) GWAS comparing HC with the total depression sample and three subtypes. (B) Comparison of depressive symptom among subgroups. (C) Comparison of polygenic risk scores among subgroups. (D) Comparison of C-reactive protein (CRP) levels among subgroups. (E) Comparison of GlycA levels among subgroups. * P < 0.05, ** P < 0.01, *** P < 0.001.

No genome-wide significant SNP associations were identified. Further linkage disequilibrium score regression (LDSR) analyses were conducted to assess genetic correlations between depression subtypes and 34 diseases/traits, as summarized in Table E in S2 File. There were significant genetic correlations between the total depression sample and the three metabolic subtypes and broad depression, neuroticism, and subjective well-being (PFDR < 0.05). The total depression sample and subtypes 1 and 2 showed significant genetic correlations with anxiety disorder, schizophrenia, bipolar disorder, anorexia nervosa, and insomnia, while subtype 3 did not. Other traits significantly correlated with the total depression sample, such as somatic pain, C-reactive protein (CRP), body mass index (BMI), visceral fat, lifestyle factors, and chronic diseases, showed differential patterns across subtypes. Notably, these genetic correlations were no longer significant in subtype 3.

2.4. Subgroup comparisons

We further compared depressive symptoms between the three metabolic subtypes (Fig 3B), observing a stepwise pattern (subtype 1 < subtype 2 < subtype 3) in Patient Health Questionnaire (PHQ)-2 scores, PHQ-4 scores, and four depressive symptoms. The subtype 2 polygenic risk score (PRS) consistently demonstrated an intermediate phenotype between the other two subtypes across various PRS calculation thresholds (Fig 3C).

There were also significant associations between depression subtypes and inflammatory markers. As shown in Fig 3D and 3E, individuals with subtypes 2 and 3 exhibited significantly higher circulating CRP and glycoprotein acetyls (GlycA) compared with subtype 1 and HC. For a more comprehensive analysis, see Table F in S2 File.

2.5. Machine learning algorithm for diagnostic prediction

We next constructed a model using 249 metabolomic indicators and evaluated its diagnostic performance in distinguishing HC, the total depression cohort, and subtypes 1, 2, and 3 (Fig 4A). Using 5-fold cross-validation to validate model performance and as shown in Fig 4C, the AUC values of the model for distinguishing depression, subtype 1, subtype 2, and subtype 3 in the test set were 0.644, 0.785, 0.817, and 0.942, respectively. The top-20 important features output by the model for subtypes 1–3 are shown in Fig 4B, noting some overlap in important features between the different subtypes. The top 20 features for the three models included 34 metabolic indicators, of which 27 were related to lipoproteins, 5 to fatty acids, and GlycA. Notably, 18 features were related to triglyceride-rich lipoproteins (TRLs).

Fig 4. Predictive performance of machine learning models for depression diagnosis.

Fig 4

(A) The top 20 important features identified by the models for the total depression sample and each subtype. (B) Venn diagram showing the overlap of important features among the three subtypes. (C) Performance of 24 machine learning models in the UK Biobank.

Given the significant differences in lipoprotein-related indicators between the different subtypes, we further constructed 23 additional models, as shown in Fig 4C. The metabolic indicators used in each model are listed in Table G in S2 File.

The diagnostic performance based on subtype classification demonstrated superior efficacy compared to the traditional binary classification approach, with an average increase in AUC ranging from 12.8% to 39.6% in the test set (Fig 4C). Specifically, the AUC for subtype 1 showed an improvement of 13.2% to 22.6%, for subtype 2 by 12.8% to 24.6%, and for subtype 3 by 20.6% to 39.6%. To further evaluate the robustness of the machine learning-based diagnostic performance, gender-stratified analyses were conducted. The results revealed that the AUC improved by 8.9% to 39.0% in female participants (Fig 5A) and by 1.9% to 37.3% in male participants (Fig 5B). Additionally, considering the potential influence of chronic diseases and BMI on metabolomics, sensitivity analyses were performed. After excluding participants with chronic diseases, the AUC increased by 11.9% to 40.5% (Fig 5C). Similarly, after excluding participants with abnormal BMI values, the AUC improved by 11.6% to 40.9% (Fig 5D).

Fig 5. Sensitivity analysis of the machine learning model’s diagnostic performance in the UK Biobank dataset.

Fig 5

Stratified by: (A) Female participants. (B) Male participants. (C) Participants without chronic diseases. (D) Participants with normal BMI (18.5–30).

As shown in Fig 6, to ensure the robustness of the results, four additional diagnostic models were constructed in the UK Biobank dataset. After sequentially excluding individuals with immune-related diseases (Fig 6A), metabolism-related diseases (Fig 6B), those taking chronic medications (Fig 6C), and those with all of the above conditions (Fig 6D), the AUC increased by 10.1–39.6%, 11.5–40.4%, 12.6–40.3%, and 13.6–41.9%, respectively.

Fig 6. Sensitivity analysis of the machine learning model’s diagnostic performance in the UK Biobank dataset.

Fig 6

Stratified by: (A) Participants without immune-related diseases. (B) Participants without metabolism-related diseases. (C) Participants without taking chronic medications. (D) Participants without all of the above conditions.

In these analyses, the diagnostic performance of residual cholesterol, IDLs, and size was slightly worse, while models including TRL and different lipoprotein components performed better.

The robustness of the identified metabolic subtypes was validated in the independent Whitehall II cohort. As depicted in Fig 7, the classification model based on subtypes significantly outperformed the traditional diagnostic model in the Whitehall II cohort, with the AUC improvement ranged from 10.4% to 40.0%. Subgroup analyses further corroborated these results, showing AUC improvements of 8.2% to 42.7% in females (Fig 8A) and 9.4% to 40.5% in males (Fig 8B). In additional sensitivity analyses, after excluding participants with chronic diseases, the AUC improved by 11.9% to 39.1% (Fig 8C), and after excluding those with abnormal BMI, the AUC improved by 8.6% to 38.6% (Fig 8D).

Fig 7. Performance of machine Learning models in predicting depression in the Whitehall II cohort.

Fig 7

Fig 8. Sensitivity analysis of the machine learning model’s diagnostic performance in the Whitehall II cohort.

Fig 8

Stratified by: (A) Female participants. (B) Male participants. (C) Participants without chronic diseases. (D) Participants with normal BMI (18.5–30).

2.6. Comparison of metabolic networks between different subtypes

Given the complex interdependencies between metabolites, we applied network analysis to 34 machine learning-selected metabolic indicators in HC, total depression samples, and subtypes 1–3 to identify central metabolic indicators in different subgroups and to compare metabolic patterns across these subgroups. The resulting metabolic networks for different subgroups are visualized in Fig B1 in S1 File. Centrality indices within these networks are depicted in Fig B2 in S1 File. To further explore the influence of potential confounders, covariates such as sex, age, BMI, and the number of chronic diseases were included in each model (Fig 9A). Centrality indices within these networks are depicted in Fig 9B. Based on the expected influence metric, we screened the top three metabolic indicators for each subgroup (Table H in S2 File). A total of eight metabolites were identified: GlycA, triglycerides to total lipids in large HDL percentage [L_HDL_TG (%)], triglycerides in small HDL (S_HDL_TG), concentration of very large VLDL particles (XL_VLDL_P), cholesteryl esters in chylomicrons and extremely large VLDL (XXL_VLDL_CE), MUFA, MUFA (%), and PUFA (%). Except for PUFA (%), the remaining seven metabolites showed consistent trends across the three depression subtypes, being lower in subtype 1 and higher in subtype 3. PUFA (%) exhibited an opposite trend.

Fig 9. Comparison of metabolic networks across depression subgroups.

Fig 9

(A) Metabolic networks for each subgroup. (B) Comparison of centrality indices, global connectivity, and network structure among subgroups.

Network stability and accuracy were deemed satisfactory. We examined differences in global connectivity and network structure between different depression subtypes and HC. As anticipated, there were significant network connectivity and structure differences between different subtypes and HC but not between the overall depression group and the HC group (P > 0.05). Detailed metrics are presented in Fig 9B.

3. Discussion

Here we applied cluster analysis to 249 metabolomic indicators in a large cohort of individuals with current depressive episodes. In doing so, we identify three metabolic subtypes of depression. Subtype 1 is characterized primarily by fatty acid dysregulation and subtype 3 by hyperlipidemia. Subtype-based diagnostic models performed significantly better than a binary classification model, with an average increase in AUC ranging from 12.8% to 39.6% in the test set. The results of gender-stratified analyses, sensitivity analyses, and validation in an independent cohort collectively underscore the robustness of the identified metabolic subtypes.

Subtype-based analysis suggests the presence of abnormal metabolic networks in depression, which may be masked by its heterogeneity, especially when using a binary classification approach.

Over the past decade, many studies have investigated the relationship between depression and metabolic dysregulation [12,13]. Metabolic syndrome is present in up to one-third of all individuals with depression, although the reported prevalence varies across studies [1417]. Moreover, 20–30% of individuals with depression exhibit immunometabolic dysregulation [18]. This immunometabolic depression is characterized by atypical, energy-related depressive symptoms, systemic low-grade inflammation, and metabolic abnormalities [19]. In our study, subtype 3 demonstrated a phenotype indicative of immunometabolic dysregulation, characterized by hyperlipidemia and elevated peripheral inflammatory markers. However, genetic correlation analysis did not reveal significant associations with CRP or chronic diseases. This finding supports a polygenic architecture model, suggesting that this dysregulation may be associated with genes of small effect size, without any dominant genetic variants [19] and more likely attributable to external environmental changes, including inflammation and chronic diseases. Targeted interventions for inflammation, metabolism, or lifestyle in this homogenous group of individuals with depression may be effective treatment options [18,20,21].

The metabolic profile of subtype 3 is consistent with clinical hyperlipidemia. Although dyslipidemia is associated with various factors, including chronic brain injury, aging, and mental health, the precise pathophysiological mechanisms remain incompletely understood [2224]. One plausible mechanism involves cholesterol’s modulation of neurotransmitter signaling, specifically affecting serotonergic, GABAergic, and glutamatergic neurotransmission, coupled with potential disruptions in synaptic plasticity and myelination, which may contribute to alterations in mental health [25]. As a key precursor for steroid hormone biosynthesis, cholesterol also plays a critical role in the hypothalamic-pituitary-adrenal axis, a crucial regulator of circadian rhythms, stress responses, and neuropsychiatric disorders [25]. Furthermore, elevated circulating lipids may compromise blood-brain barrier integrity, potentially promoting neuroinflammatory processes that could impact mental health. Critically, however, the blood-brain barrier effectively restricts the transport of cholesterol-rich lipoproteins into the brain parenchyma, thereby limiting the direct influence of circulating cholesterol on neuronal function in the context of an intact blood-brain barrier [24]. This observation is a key consideration when postulating a causal link between systemic hypercholesterolemia and brain pathology. Therefore, further investigation of the precise causal relationship between hyperlipidemia and depression and the specific pathophysiological mechanisms involved is warranted. While there is increasing evidence of a potential role for cholesterol in the pathophysiology of depression, its feasibility as a therapeutic target remains an open question. Exploiting this approach is clearly not a one-size-fits-all solution for depression, as patients in this study with subtype 1 did not exhibit any abnormalities in cholesterol or other lipoproteins. A lipoprotein-targeted intervention appears more appropriate for subtype 3, highlighting the need for personalized treatment approaches. Clinical research incorporating metabolomics is necessary to clarify this point.

Previous research has established a link between dysregulation of fatty acid metabolism and depression, with PUFA receiving the most attention. These essential fatty acids, crucial for normal brain development, are found in dietary sources such as fish (omega-3 PUFAs) and vegetable oils (omega-6 PUFAs) [26]. Depleted levels of these lipids have been strongly implicated in the pathophysiology of depression, potentially through mechanisms involving dysregulated inflammatory responses, reduced antioxidant capacity, and disruptions in neurotransmission [27]. It has been shown that omega-6 PUFAs are pro-inflammatory, whereas omega-3 PUFAs exert anti-inflammatory effects [28]. This balance of inflammation, mediated by eicosanoid derivatives, serves as a signaling mechanism in both the central and peripheral nervous systems, regulating inflammatory processes. Some studies have reported decreased levels of omega-3 PUFAs in patients diagnosed with depression, prompting numerous clinical trials to examine the effects of various omega-3 PUFA supplementation strategies on depression [2931]. However, the findings from these investigations have been inconsistent, potentially due to relatively small effects being masked by heterogeneity. A more promising avenue lies in personalized medicine approaches targeting specific subgroups. In this study, subtype 1 exhibits a metabolic profile primarily characterized by fatty acid dysregulation, suggesting that omega-3 PUFA intervention may be particularly efficacious in this subgroup. Our findings provide valuable insights for future clinical trial design, advocating pre-trial participant stratification based on metabolomic profiling to enable personalized therapeutic strategies tailored to distinct metabolic profiles.

Although the robustness of our model was maintained after adjusting for multiple potential confounders—including sex, BMI, chronic diseases, and medication use—it remains unclear whether the biological subtypes identified in this study, particularly the hyperlipidemia subtype, primarily reflect metabolic consequences of behavioral factors (such as sedentary behavior or poor dietary patterns) rather than depression-specific biological mechanisms. This important distinction warrants further investigation. Notably, the observed significant elevation of inflammatory markers within this subtype suggests that both biological and behavioral factors may interact in a bidirectional manner. Specifically, depressive symptoms may promote behavioral changes that exacerbate metabolic dysfunction, while metabolic alterations may in turn aggravate depressive symptoms through disruptions in energy homeostasis and neuroendocrine regulation. Given these complex interactions, future studies should explore whether tailored interventions—such as physical activity and dietary modifications—could serve as effective adjunctive therapies for specific depression subtypes, potentially mitigating both metabolic and mood-related symptoms.

Network analysis is a useful means to understand and visualize the heterogeneous nature of depression [3234]. To further explore our identified biological subtypes, we employed network analysis to construct metabolic networks for each subgroup. Interestingly, the metabolic network of the overall depression sample was not significantly different to HC. However, the networks of our identified subtypes differed significantly from HC and from each other, further demonstrating the existence of distinct metabolic network patterns within depression. A recent analysis of data from the Netherlands Study of Depression and Anxiety (N = 2498) investigated associations between 30 depressive symptoms and 46 metabolites, finding that the somatic symptoms of fatigue and hypersomnia, along with cholesterol and fatty acids, were central nodes in the network [35]. Drawing on this research, establishing a comprehensive network of individual depressive symptoms and metabolomics could facilitate the discovery of metabolic networks strongly associated with specific symptoms, thereby enabling targeted interventions [32]. However, given the limitations of the current data, which included only four depressive symptoms, we did not incorporate symptoms in our analysis.

The heterogeneity of depression has long impeded the discovery of biomarkers and the development of precise therapeutic strategies. This study demonstrates that stratifying patients into biologically homogeneous subtypes based on metabolomic profiles significantly enhances diagnostic precision. Unlike traditional binary classifications—which have shown limited utility in biomarker discovery—our approach identifies metabolically distinct subgroups, allowing for more accurate phenotypic differentiation. TRL and related lipoprotein components may help characterize specific metabolic subtypes of depression. As the terminal phase of systems biology research, metabolomics comprehensively reflects the dynamic changes of metabolites under pathophysiological conditions, thereby providing a critical biological basis for disease subtyping. Based on biologically defined subtypes derived from metabolomics, researchers can achieve finer classification of mental disorders and further integrate high-throughput multi-omics technologies—such as genomics, proteomics, and transcriptomics—to systematically elucidate underlying pathogenic mechanisms. This strategy is analogous to the clinical subcategorization of depression into subtypes such as “with anhedonia” and “without anhedonia,” aiming to uncover the molecular foundations of various subtypes and advance the implementation of individualized treatment and precision psychiatry. Future studies could explore whether these metabolic features contribute to subtype-specific diagnostic or intervention strategies, particularly when combined with other clinical and omics biomarkers. Indeed, metabolic disturbances are also commonly observed in other psychiatric disorders, such as schizophrenia, anxiety disorders, and bipolar disorder [3638]. Data-driven approaches could similarly be applied to identify biologically distinct subtypes within these conditions and promote transdiagnostic research in psychiatry. Most importantly, these metabolically defined subgroups open new avenues for targeted and subtype-specific interventions. By aligning treatment strategies with specific dysregulated metabolic pathways, we move closer to the vision of precision psychiatry—where therapy is tailored to an individual’s biological subtype. This represents a critical step forward from traditional one-size-fits-all diagnostic and therapeutic models.

This study has several limitations that should be acknowledged. First, the cross-sectional design prevents us from establishing causal relationships between the identified metabolic subtypes and clinical outcomes. Although multiple sensitivity analyses were conducted, residual confounding from lifestyle factors—such as dietary habits and physical activity—cannot be fully ruled out. The observational nature of the data also limits our capacity to infer the direction of causality between metabolic dysregulation and depression. Future studies should adopt longitudinal designs and incorporate causal inference methods to verify the stability and biological foundations of these subtypes. Second, the scope of metabolomic coverage was limited by the number of metabolites available, which may affect the biological interpretability of the identified subtypes. We are currently addressing this issue in ongoing research through the use of high-throughput metabolomic platforms to uncover more precise and subtype-specific biomarkers. Third, although the UK Biobank currently offers the largest metabolomic dataset available, the dynamic characteristics of metabolic profiles necessitate careful consideration of disease state and sample collection timing. While the PHQ-2 was used for case identification, this instrument does not provide a comprehensive clinical assessment. Future work should employ more detailed clinical evaluations to better characterize subtype-specific symptomatology and its associations with metabolite profiles.

4. Methods

4.1. Data source and research design

The UK Biobank, a prospective cohort study of ~500,000 UK adults aged 40–69, provides a rich resource of genetic and phenotypic data [39]. The UK Biobank received ethical approval from the North West Multicenter Research Ethics Committee, with reference number 11/NW/0382. To explore the relationship between metabolites and depression, we selected individuals experiencing a current depressive episode from this cohort. Participants with depression were defined based on fields 130895 and 130896, which indicated the source and initial diagnosis date of depression, respectively. The exclusion criteria for the depression cohort comprised the following: absence of metabolomics data, a history of schizophrenia, bipolar disorder or cancer, missing date of depression diagnosis, diagnosis of depression after the baseline assessment, and a PHQ-2 score < 2 (indicating no current depressive episode) [40]. HC were included if they had metabolomics data, no psychiatric diagnosis and cancer, and a PHQ-2 score < 2.

4.2. Plasma biomarker profiling by NMR

Metabolomic profiling was conducted at Nightingale Health (Finland) on EDTA plasma samples from approximately 280,000 UK Biobank participants. A total of 251 metabolic markers were quantified using eight high-throughput spectrometers. To align with validation cohorts and previous studies, glucose-lactate and spectrometer-corrected alanine were excluded from the analysis. Details of the metabolomic indicators are shown in Table A in S2 File. The indicators cover a wide range of metabolic pathways, including 14 subclasses of lipoprotein lipids (210), fatty acids and fatty acid constituents (18), as well as various small molecule metabolites such as amino acids (10), ketone bodies (4), glycolytic metabolites (4), and those involved in fluid balance (2) and inflammation (1). Among these, 92 indicators are associated with TRLs, macromolecules composed of a large neutral lipid core (TG) and polar components including phospholipids, free cholesterol, and apolipoproteins. TRLs originate from the intestine and the liver and include chylomicrons, VLDLs and IDLs, and they serve as the primary source of fatty acids for energy production in peripheral tissues or storage in adipose tissue [4143].

4.3. Determining the optimal number of subtypes

Following dimensionality reduction via NMF [44,45], the resulting features were visualized using both Uniform Manifold Approximation and Projection (UMAP) and t-Distributed Stochastic Neighbor Embedding (t-SNE) to illustrate underlying cluster structures in a low-dimensional space. We performed clustering analysis on the metabolomics data of participants with current depressive episodes using four machine learning algorithms: k-means, bisecting k-means, spectral, and agglomerative clustering. The optimal number of subtypes was determined by systematically varying the number of dimensions and subtypes from 2 to 8 using three clustering evaluation metrics: silhouette coefficient, Calinski-Harabasz index, and Davies-Bouldin index.

4.4. Genome-wide association study analysis

Analysis in this study was conducted with version 3 of the UKBB imputed data, with 487,409 samples imputed and available for analysis following UKBB centrally performed QC filtering.

After excluding participants of non-White ancestry, high missingness, relatedness, discordant genetic and self-reported sex, and filtering SNPs based on minor allele frequency (MAF) ≥ 0.05 and imputation INFO score > 0.9, 105,044 samples were retained for subsequent analyses. All association tests were performed in PLINK v1.90b6.21 using logistic regression. The analysis was adjusted for age, sex, genotyping batch, genotyping array, and 5 population principal components. Genetic correlations between depression subgroups and 34 diseases/traits were computed using LDSR analyses, implemented using the ‘ldsc’ package [46]. GWAS summary statistics used in this study are available at https://zenodo.org/records/10515792 and GWAS Catalog. P-values were adjusted for multiple testing using Benjamini-Hochberg false discovery rate (FDR) correction. This analysis is described in further detail in Supplemental Method A in S1 File.

4.5. Polygenic risk scores

To evaluate the cumulative genetic risk for depression, we calculated a PRS based on summary statistics derived from the iPSYCH depression GWAS, excluding data from the UK Biobank and 23andMe datasets [47]. As these summary statistics are based on people of European descent, we restricted PRS calculations to the European samples in the UK Biobank. Following Collister et al. [48], we calculated the PRS for depression across a range of SNP inclusion thresholds.

4.6. Network analyses

To explore the complex relationships between metabolites across different subgroups, we constructed metabolic networks using network analysis. This approach offers the advantage of identifying associations within the system while simultaneously considering the influence of all other variables [35,49].

In network analysis, nodes represent variables, and edges represent the partial correlation coefficient between two nodes. Network analysis provides quantitative centrality indices for each node. Three common methods for measuring centrality indices are betweenness, closeness, and strength [50]: betweenness centrality measures the number of times a node lies on the shortest path between two other nodes; closeness centrality measures the average distance of a node to all other nodes in the network; and strength centrality represents the sum of the weights (e.g., correlation coefficients) of the edges connected to a node. We introduced expected influence (EI) to assess the nature and strength of the cumulative influence of nodes in the network. Centrality stability refers to the consistency of the ranking of centrality indices (e.g., betweenness, closeness, and strength) after re-estimating the network with fewer cases or nodes. Centrality stability was calculated using the correlation stability coefficient (CS-coefficient), which should ideally be > 0.5 to interpret centrality differences. 95% bootstrap confidence intervals (CIs) were used to estimate the accuracy of edge weights, where larger CIs indicate a lower accuracy of edge estimates and narrower CIs indicate a more reliable network. Finally, we examined differences in global connectivity and network structure between the metabolic networks of different subgroups.

4.7. Validation in an independent cohort

The Whitehall II cohort (WHII), a prospective study of 10,308 UK civil servants aged 35–55, has been followed up every 2–5 years since 1985 [51]. Given the availability of metabolomics data from phase 5, our analysis was restricted to participants enrolled in phase 5 or earlier. Participants with current depression were selected using the same criteria as in our previous study: history of depression in phase 4 and a General Health Questionnaire depression subscale score ≥4 [12].

4.8. Statistical analysis

Common demographic characteristics and clinical variables were compared between different subtypes of depression and HC: sex, age, education level, BMI, economic status, physical activity level, smoking status, alcohol consumption, family history of depression, depressive symptoms, neuroticism, and number of chronic diseases (diabetes mellitus, heart diseases, and hypertensive diseases). Additionally, comparisons were made regarding the use of antidepressant medications, antipsychotic agents, medications for common chronic conditions, as well as the comorbidity of comorbid immune- and metabolism-related diseases across subgroups. A detailed description can be found in Supplementary Method B in S1 File.

Depressive symptoms were assessed using the first four items of the PHQ-9, focusing on frequency of depressed mood, unenthusiasm/disinterest, tenseness/restlessness, and tiredness/lethargy in the past two weeks. Neuroticism was assessed using an external 12-item scale derived from the Eysenck Personality Inventory neuroticism scale by Smith et al. [52], where higher scores (0–12) indicate greater neuroticism.

The CatBoost was used for diagnostic prediction, as previously [12]. We used 5-fold cross-validation and receiver operating characteristic (ROC) curves, sensitivity, and specificity to validate model performance. Shapley additive explanations (SHAP) were employed to assess the contribution of each feature to the model’s predictive performance. To mitigate class imbalance, we incorporated class weighting within CatBoost by setting the class_weights parameter to values inversely proportional to the frequency of each class. For each subtype-specific model, a distinct control group was constructed using a one-vs-rest partitioning strategy, wherein cases of the target subtype were compared against all other participants (including other subtypes and non-depressed controls).

Given the significant differences in the prevalence of depression between genders, subgroup analyses were conducted separately for males and females. Additionally, considering the potential influence of chronic diseases and BMI on metabolomic profiles, sensitivity analyses were performed by excluding individuals with chronic diseases or abnormal BMI (<18.5 or ≥30). The same analytical procedures were subsequently replicated in the Whitehall II cohort.

Furthermore, to account for potential confounding effects of comorbid immune- and metabolism-related diseases, as well as the use of chronic medications (such as cholesterol-lowering medication, blood pressure medication, and insulin), we constructed four additional models to ensure the robustness of the findings. These models sequentially excluded participants with immune-related diseases, metabolism-related diseases, those taking chronic medications, and those with all of the above conditions.

Spearman correlation analysis was performed separately in the depression and HC groups. To compare variables between HC and different depression subtypes, we employed the Kruskal-Wallis test and chi-squared test for post hoc comparisons. This analysis investigated potential group differences across various domains, including demographic characteristics, psychosocial measures, PRS, and metabolite levels. In addition, we compared CRP and GlycA levels across subgroups to assess systemic inflammation. CRP is one of 30 blood biomarkers measured in the UK Biobank [53]. Statistical significance was set at a two-sided P-value threshold of 0.05.

All data were analyzed using Python v3.11. We utilized pandas v1.4 and numpy v1.26 for data preprocessing, while statistical analysis was conducted using scipy v1.12 and statsmodels v0.14. For machine learning, we employed scikit-learn v1.5, with experimental results depicted using matplotlib v3.8, seaborn v0.13. For network analyses, we used the R v3.6.3 package mgm v1.2.12 to estimate the network, qgraph v1.6.9 to visualize the network, bootnet v1.4.3 for stability and accuracy analysis, and Network Comparison Test package v2.2.1 for network comparison.

Supporting information

S1 File

Supplemental Method A. Genome-wide association study analysis. Supplemental Method B. Supplementary methods for sensitivity analysis. Fig A. QQ plots of GWAS analysis for the total depression sample and three depression subtypes compared to healthy controls. Fig B. Comparison of metabolic networks across depression subgroups.

(DOCX)

pdig.0001125.s001.docx (1.2MB, docx)
S2 File

Table A. 249 metabolomic indicators. Table B. Demographic Characteristics. Table C. Cluster evaluation index. Table D. The metabolomic differences between the three subtypes. Table E. Genetic correlations between depression subgroups and 34 diseases/traits. Table F. Comparison of the differences between different subgroups. Table G. The metabolic indicators used in each model. Table H. Centrality indices of different subgroups.

(XLSX)

pdig.0001125.s002.xlsx (107.2KB, xlsx)

Data Availability

This research has been conducted using the UK Biobank Resource under Application Number 87530. Researchers can apply for data access via the UK Biobank Access Management System: https://biobank.ndph.ox.ac.uk/ukb/label.cgi?id=220. The Whitehall II Data was provided by Dementia Platforms UK (DPUK). All analyses were performed using the DPUK analysis portal. Researchers can explore and apply for access to this dataset through the DPUK Data Catalogue: https://data.dpuk.ukserp.ac.uk/CohortDirectory/Item?fingerPrintID=Whitehall%20II. The analytic code used in this study is fully available at: https://github.com/xiaogahuo/NMR_MDD_Subtype_Clustering.git.

Funding Statement

This work was supported by grants from the National Natural Science Foundation of China (Grant No. 823B2031 to SM) and the National Key Research and Development Project of China (Grant No. 2024YFC3308400 to ZL). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Marx W, Penninx BWJH, Solmi M, Furukawa TA, Firth J, Carvalho AF, et al. Major depressive disorder. Nat Rev Dis Primers. 2023;9(1):44. doi: 10.1038/s41572-023-00454-1 [DOI] [PubMed] [Google Scholar]
  • 2.World Health Organization. International Statistical Classification of Diseases and Related Health Problems. 2022. [cited 23 Aug 2024]. Available from: https://www.who.int/standards/classifications/classification-of-diseases [Google Scholar]
  • 3.American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders, 5th Edition: DSM-5. 5th edition. American Psychiatric Association; 2022. [Google Scholar]
  • 4.Beger RD, Dunn W, Schmidt MA, Gross SS, Kirwan JA, Cascante M, et al. Metabolomics enables precision medicine: “A White Paper, Community Perspective”. Metabolomics. 2016;12(10):149. doi: 10.1007/s11306-016-1094-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Johansson Å, Andreassen OA, Brunak S, Franks PW, Hedman H, Loos RJF, et al. Precision medicine in complex diseases-Molecular subgrouping for improved prediction and treatment stratification. J Intern Med. 2023;294(4):378–96. doi: 10.1111/joim.13640 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wishart DS. Metabolomics for Investigating Physiological and Pathophysiological Processes. Physiol Rev. 2019;99(4):1819–75. doi: 10.1152/physrev.00035.2018 [DOI] [PubMed] [Google Scholar]
  • 7.Singh P, Vasundhara B, Das N, Sharma R, Kumar A, Datusalia AK. Metabolomics in Depression: What We Learn from Preclinical and Clinical Evidences. Mol Neurobiol. 2024;62(1):718–41. doi: 10.1007/s12035-024-04302-5 [DOI] [PubMed] [Google Scholar]
  • 8.Amin N, Liu J, Bonnechere B, MahmoudianDehkordi S, Arnold M, Batra R, et al. Interplay of Metabolome and Gut Microbiome in Individuals With Major Depressive Disorder vs Control Individuals. JAMA Psychiatry. 2023;80(6):597–609. doi: 10.1001/jamapsychiatry.2023.0685 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Milaneschi Y, Allers KA, Beekman ATF, Giltay EJ, Keller S, Schoevers RA, et al. The association between plasma tryptophan catabolites and depression: The role of symptom profiles and inflammation. Brain Behav Immun. 2021;97:167–75. doi: 10.1016/j.bbi.2021.07.007 [DOI] [PubMed] [Google Scholar]
  • 10.Pu J, Liu Y, Zhang H, Tian L, Gui S, Yu Y, et al. An integrated meta-analysis of peripheral blood metabolites and biological functions in major depressive disorder. Mol Psychiatry. 2021;26(8):4265–76. doi: 10.1038/s41380-020-0645-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Brydges CR, Bhattacharyya S, Dehkordi SM, Milaneschi Y, Penninx B, Jansen R, et al. Metabolomic and inflammatory signatures of symptom dimensions in major depression. Brain Behav Immun. 2022;102:42–52. doi: 10.1016/j.bbi.2022.02.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ma S, Xie X, Deng Z, Wang W, Xiang D, Yao L, et al. A Machine Learning Analysis of Big Metabolomics Data for Classifying Depression: Model Development and Validation. Biol Psychiatry. 2024;96(1):44–56. doi: 10.1016/j.biopsych.2023.12.015 [DOI] [PubMed] [Google Scholar]
  • 13.Bot M, Milaneschi Y, Al-Shehri T, Amin N, Garmaeva S, Onderwater GLJ, et al. Metabolomics Profile in Depression: A Pooled Analysis of 230 Metabolic Markers in 5283 Cases With Depression and 10,145 Controls. Biol Psychiatry. 2020;87(5):409–18. doi: 10.1016/j.biopsych.2019.08.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Moradi Y, Albatineh AN, Mahmoodi H, Gheshlagh RG. The relationship between depression and risk of metabolic syndrome: a meta-analysis of observational studies. Clin Diabetes Endocrinol. 2021;7(1):4. doi: 10.1186/s40842-021-00117-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Vancampfort D, Correll CU, Wampers M, Sienaert P, Mitchell AJ, De Herdt A, et al. Metabolic syndrome and metabolic abnormalities in patients with major depressive disorder: a meta-analysis of prevalences and moderating variables. Psychol Med. 2014;44(10):2017–28. doi: 10.1017/S0033291713002778 [DOI] [PubMed] [Google Scholar]
  • 16.Pan A, Keum N, Okereke OI, Sun Q, Kivimaki M, Rubin RR, et al. Bidirectional association between depression and metabolic syndrome: a systematic review and meta-analysis of epidemiological studies. Diabetes Care. 2012;35(5):1171–80. doi: 10.2337/dc11-2055 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Vancampfort D, Stubbs B, Mitchell AJ, De Hert M, Wampers M, Ward PB, et al. Risk of metabolic syndrome and its components in people with schizophrenia and related psychotic disorders, bipolar disorder and major depressive disorder: a systematic review and meta-analysis. World Psychiatry. 2015;14(3):339–47. doi: 10.1002/wps.20252 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Penninx BWJH, Lamers F, Jansen R, Berk M, Khandaker GM, De Picker L, et al. Immuno-metabolic depression: from concept to implementation. Lancet Reg Health Eur. 2024;48:101166. doi: 10.1016/j.lanepe.2024.101166 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Milaneschi Y, Lamers F, Berk M, Penninx BWJH. Depression Heterogeneity and Its Biological Underpinnings: Toward Immunometabolic Depression. Biol Psychiatry. 2020;88(5):369–80. doi: 10.1016/j.biopsych.2020.01.014 [DOI] [PubMed] [Google Scholar]
  • 20.Zwiep JC, Bet PM, Rhebergen D, Nurmohamed MT, Vinkers CH, Penninx BWJH, et al. Efficacy of celecoxib add-on treatment for immuno-metabolic depression: Protocol of the INFLAMED double-blind placebo-controlled randomized controlled trial. Brain Behav Immun Health. 2022;27:100585. doi: 10.1016/j.bbih.2022.100585 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Vreijling SR, Chin Fatt CR, Williams LM, Schatzberg AF, Usherwood T, Nemeroff CB, et al. Features of immunometabolic depression as predictors of antidepressant treatment outcomes: pooled analysis of four clinical trials. Br J Psychiatry. 2024;224(3):89–97. doi: 10.1192/bjp.2023.148 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zhang M, Zhu Y-H, Zhu Z-Q. Research advances in the influence of lipid metabolism on cognitive impairment. Ibrain. 2022;10(1):83–92. doi: 10.1002/ibra.12018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Schneider M, Levant B, Reichel M, Gulbins E, Kornhuber J, Müller CP. Lipids in psychiatric disorders and preventive medicine. Neurosci Biobehav Rev. 2017;76(Pt B):336–62. doi: 10.1016/j.neubiorev.2016.06.002 [DOI] [PubMed] [Google Scholar]
  • 24.Martín MG, Pfrieger F, Dotti CG. Cholesterol in brain disease: sometimes determinant and frequently implicated. EMBO Rep. 2014;15(10):1036–52. doi: 10.15252/embr.201439225 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Cheon SY. Impaired Cholesterol Metabolism, Neurons, and Neuropsychiatric Disorders. Exp Neurobiol. 2023;32(2):57–67. doi: 10.5607/en23010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Parekh A, Smeeth D, Milner Y, Thure S. The Role of Lipid Biomarkers in Major Depression. Healthcare (Basel). 2017;5(1):5. doi: 10.3390/healthcare5010005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Greenwood CE, Young SN. Dietary fat intake and the brain: a developing frontier in biological psychiatry. J Psychiatry Neurosci. 2001;26(3):182–4. [PMC free article] [PubMed] [Google Scholar]
  • 28.Schmitz G, Ecker J. The opposing effects of n-3 and n-6 fatty acids. Prog Lipid Res. 2008;47(2):147–55. doi: 10.1016/j.plipres.2007.12.004 [DOI] [PubMed] [Google Scholar]
  • 29.Okereke OI, Vyas CM, Mischoulon D, Chang G, Cook NR, Weinberg A, et al. Effect of Long-term Supplementation With Marine Omega-3 Fatty Acids vs Placebo on Risk of Depression or Clinically Relevant Depressive Symptoms and on Change in Mood Scores: A Randomized Clinical Trial. JAMA. 2021;326(23):2385–94. doi: 10.1001/jama.2021.21187 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Wolters M, von der Haar A, Baalmann A-K, Wellbrock M, Heise TL, Rach S. Effects of n-3 Polyunsaturated Fatty Acid Supplementation in the Prevention and Treatment of Depressive Disorders-A Systematic Review and Meta-Analysis. Nutrients. 2021;13(4):1070. doi: 10.3390/nu13041070 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Rapaport MH, Nierenberg AA, Schettler PJ, Kinkead B, Cardoos A, Walker R, et al. Inflammation as a predictive biomarker for response to omega-3 fatty acids in major depressive disorder: a proof-of-concept study. Mol Psychiatry. 2016;21(1):71–9. doi: 10.1038/mp.2015.22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Borsboom D, Deserno MK, Rhemtulla M, Epskamp S, Fried EI, McNally RJ, et al. Network analysis of multivariate data in psychological science. Nat Rev Methods Primers. 2021;1(1). doi: 10.1038/s43586-021-00055-w [DOI] [Google Scholar]
  • 33.Borsboom D, Cramer AOJ. Network analysis: an integrative approach to the structure of psychopathology. Annu Rev Clin Psychol. 2013;9:91–121. doi: 10.1146/annurev-clinpsy-050212-185608 [DOI] [PubMed] [Google Scholar]
  • 34.Boschloo L, van Borkulo CD, Borsboom D, Schoevers RA. A Prospective Study on How Symptoms in a Network Predict the Onset of Depression. Psychother Psychosom. 2016;85(3):183–4. doi: 10.1159/000442001 [DOI] [PubMed] [Google Scholar]
  • 35.Rydin AO, Milaneschi Y, Quax R, Li J, Bosch JA, Schoevers RA, et al. A network analysis of depressive symptoms and metabolomics. Psychol Med. 2023;53(15):7385–94. doi: 10.1017/S0033291723001009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Salari N, Asgharpour M, Daneshkhah A, Zarei H, Asgharpoor B, Faghihi SH, et al. Global prevalence of metabolic syndrome among individuals with bipolar disorder: A systematic review and meta-analysis. Arch Psychiatr Nurs. 2025;58:151942. doi: 10.1016/j.apnu.2025.151942 [DOI] [PubMed] [Google Scholar]
  • 37.Manta A, Georganta A, Roumpou A, Zoumpourlis V, Spandidos DA, Rizos E, et al. Metabolic syndrome in patients with schizophrenia: Underlying mechanisms and therapeutic approaches (Review). Mol Med Rep. 2025;31(5):114. doi: 10.3892/mmr.2025.13479 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Tang F, Wang G, Lian Y. Association between anxiety and metabolic syndrome: A systematic review and meta-analysis of epidemiological studies. Psychoneuroendocrinology. 2017;77:112–21. doi: 10.1016/j.psyneuen.2016.11.025 [DOI] [PubMed] [Google Scholar]
  • 39.Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–9. doi: 10.1038/s41586-018-0579-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Levis B, Sun Y, He C, Wu Y, Krishnan A, Bhandari PM, et al. Accuracy of the PHQ-2 Alone and in Combination With the PHQ-9 for Screening to Detect Major Depression: Systematic Review and Meta-analysis. JAMA. 2020;323(22):2290–300. doi: 10.1001/jama.2020.6504 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Baratta F, Cocomello N, Coronati M, Ferro D, Pastori D, Angelico F, et al. Cholesterol Remnants, Triglyceride-Rich Lipoproteins and Cardiovascular Risk. Int J Mol Sci. 2023;24(5):4268. doi: 10.3390/ijms24054268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Borén J, Taskinen M-R, Björnson E, Packard CJ. Metabolism of triglyceride-rich lipoproteins in health and dyslipidaemia. Nat Rev Cardiol. 2022;19(9):577–92. doi: 10.1038/s41569-022-00676-y [DOI] [PubMed] [Google Scholar]
  • 43.Ginsberg HN, Packard CJ, Chapman MJ, Borén J, Aguilar-Salinas CA, Averna M, et al. Triglyceride-rich lipoproteins and their remnants: metabolic insights, role in atherosclerotic cardiovascular disease, and emerging therapeutic strategies-a consensus statement from the European Atherosclerosis Society. Eur Heart J. 2021;42(47):4791–806. doi: 10.1093/eurheartj/ehab551 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Yamada R, Okada D, Wang J, Basak T, Koyama S. Interpretation of omics data analyses. J Hum Genet. 2021;66(1):93–102. doi: 10.1038/s10038-020-0763-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Devarajan K. Nonnegative matrix factorization: an analytical and interpretive tool in computational biology. PLoS Comput Biol. 2008;4(7):e1000029. doi: 10.1371/journal.pcbi.1000029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Bulik-Sullivan BK, Loh P-R, Finucane HK, Ripke S, Yang J, Schizophrenia Working Group of the Psychiatric Genomics Consortium, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47(3):291–5. doi: 10.1038/ng.3211 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Als TD, Kurki MI, Grove J, Voloudakis G, Therrien K, Tasanko E, et al. Depression pathophysiology, risk prediction of recurrence and comorbid psychiatric disorders using genome-wide analyses. Nat Med. 2023;29(7):1832–44. doi: 10.1038/s41591-023-02352-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Collister JA, Liu X, Clifton L. Calculating Polygenic Risk Scores (PRS) in UK Biobank: A Practical Guide for Epidemiologists. Front Genet. 2022;13:818574. doi: 10.3389/fgene.2022.818574 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.McNally RJ. Can network analysis transform psychopathology? Behav Res Ther. 2016;86:95–104. doi: 10.1016/j.brat.2016.06.006 [DOI] [PubMed] [Google Scholar]
  • 50.Ma S, Yang J, Xu J, Zhang N, Kang L, Wang P, et al. Using network analysis to identify central symptoms of college students’ mental health. J Affect Disord. 2022;311:47–54. doi: 10.1016/j.jad.2022.05.065 [DOI] [PubMed] [Google Scholar]
  • 51.Marmot M, Brunner E. Cohort Profile: the Whitehall II study. Int J Epidemiol. 2005;34(2):251–6. doi: 10.1093/ije/dyh372 [DOI] [PubMed] [Google Scholar]
  • 52.Smith DJ, Nicholl BI, Cullen B, Martin D, Ul-Haq Z, Evans J, et al. Prevalence and characteristics of probable major depression and bipolar disorder within UK biobank: cross-sectional study of 172,751 participants. PLoS One. 2013;8(11):e75362. doi: 10.1371/journal.pone.0075362 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Allen NE, Arnold M, Parish S, Hill M, Sheard S, Callen H, et al. Approaches to minimising the epidemiological impact of sources of systematic and random variation that may affect biochemistry assay data in UK Biobank. Wellcome Open Res. 2021;5:222. doi: 10.12688/wellcomeopenres.16171.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLOS Digit Health. 2025 Dec 19;4(12):e0001125. doi: 10.1371/journal.pdig.0001125.r001

Author response to Decision Letter 0


Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

22 Jun 2025

PLOS Digit Health. doi: 10.1371/journal.pdig.0001125.r002

Decision Letter 0

Hadi Ghasemi

19 Aug 2025

Response to Reviewers Revised Manuscript with Track Changes Manuscript Journal Requirements: Additional Editor Comments (if provided): Reviewers' Comments:

Comments to the Author

1. Does this manuscript meet PLOS Digital Health’s publication criteria?>

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?-->?>

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: I don't know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)??>

The PLOS Data policy

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

Reviewer #1: This is a well-designed and clinically relevant study that leverages metabolomic data from two large cohorts (UK Biobank and Whitehall II) to identify biologically distinct subtypes of depression. The identification of three metabolic subtypes—fatty acid dysregulated, hyperlipidemic, and intermediate—provides a valuable framework for advancing precision psychiatry. The study addresses a critical gap in depression research by moving beyond symptom-based classifications to biologically defined subtypes. This aligns with the NIH RDoC framework and could inform future targeted therapies. The study builds meaningfully on the authors’ prior work (Ma et al., 2024) by introducing clustering methods and improving diagnostic performance. While the work is strong, minor revisions would further strengthen its impact.

1. Please provide comparative analyses between depressed individuals and healthy controls (potentially as supplementary material) to assess whether these metabolic clusters are unique to depression or represent more general metabolic variations.

2. A discussion of whether similar clustering patterns occur in other psychiatric disorders (e.g., bipolar disorder, anxiety disorders) or metabolic conditions (e.g., metabolic syndrome, diabetes) would help contextualize the findings. Are these patterns more strongly associated with depression-specific pathophysiological mechanisms (e.g., neuroinflammation, HPA axis dysregulation)?

While the stability of results after adjusting for sex and BMI is commendable, please discuss whether the observed metabolic changes might reflect secondary consequences of depression-related behaviors (e.g., altered appetite, physical activity) rather than primary disease mechanisms. For example, could the "hyperlipidemic" subtype represent metabolic changes associated with sedentary behavior rather than depression-specific biology?

3. Please provide a more detailed characterization of the "intermediate" subtype. Is this a distinct biological group or more of a residual category? What are its defining metabolic or clinical features?

4. The manuscript should more explicitly highlight how the current clustering approach advances beyond the authors' previous machine learning analysis (Ma et al., 2024):

What specific improvements does the current methodology offer in terms of diagnostic accuracy, biological interpretability, or clinical utility?

Are there novel metabolic pathways identified in this study that were not apparent in the prior work?

5. The discussion should more carefully consider alternative interpretations of the metabolic findings: To what extent might the observed metabolic changes reflect lifestyle factors (e.g., diet, physical activity) rather than primary disease mechanisms?

While inflammatory markers (CRP, GlycA) and BMI differences are noted, please discuss whether these could be consequences rather than causes of the metabolic alterations.

These revisions would strengthen the manuscript's conclusions and provide a more nuanced interpretation of the important findings. The study represents a significant contribution to the field of precision psychiatry, and these suggested modifications would further enhance its impact.

Reviewer #2: In this manuscript, the authors attempt to identify subtypes of depression using plasma metabolomics data from the UK Biobank and subsequently build machine learning-based diagnostic models. While the goal of advancing precision psychiatry is commendable, this study suffers from several critical and fundamental flaws in its design, subject selection, and data analysis. These issues are pervasive and significantly undermine the validity of the findings. In my assessment, these flaws are not correctable through revision, and therefore, I must recommend the rejection of this manuscript.

Below are my detailed comments outlining these major concerns.

Major Concerns:

1. Critical Bias in Patient and Control Selection: The methodology for cohort selection is fundamentally flawed.

1.1 Heterogeneous Definition of Depression: The authors define cases using UK Biobank Data-Field #130895, which includes both formal ICD-10 diagnoses and self-reported depression. Self-reports are not equivalent to a clinical diagnosis of Major Depressive Disorder (MDD) and may simply reflect depressive symptoms secondary to other conditions.

1.2 Insufficient Exclusion Criteria: The manuscript states that the exclusion criteria include a history of schizophrenia or bipolar disorder. However, it fails to exclude patients with other major medical or psychiatric conditions that are known to profoundly alter metabolic profiles. For example, patients with a primary diagnosis of cancer, autoimmune disorders, or other chronic inflammatory diseases may be included in the depression cohort, making it impossible to disentangle the metabolic signature of depression from that of the comorbidity. The authors also did not exclude patients with anxiety disorders, a common comorbidity that shares metabolic and inflammatory pathways with depression.

1.3 Lack of Comorbidity Analysis: The study fails to stratify or at least compare the distribution of major comorbidities (e.g., cancer, cardiovascular disease) across the identified subtypes. This is a significant omission, as the "hyperlipidemia" subtype, for instance, could simply be a cluster of patients with metabolic syndrome rather than a true biological subtype of depression.

2. Lack of Temporal Alignment Between Diagnosis and Blood draw for Metabolomics: A crucial flaw is the failure to ensure that the metabolomics data were collected during a current depressive episode that was diagnosed prior to the blood draw. The Methods section notes an exclusion for those "diagnosed with depression after the baseline assessment," but the blood samples for metabolomics were collected over a wide window (2019-2022). The manuscript does not confirm that for each patient, the blood draw date was cross-referenced with the diagnosis date (Field #130896) to establish a clear temporal link. Without this confirmation, the metabolic state measured may precede the onset of depression or occur long after remission, rendering the entire analysis invalid for identifying state-dependent biomarkers.

3. Inherent Bias in Metabolomics Data: The study's conclusions are constrained by the severe limitations of the UK Biobank's metabolomics panel.

3.1 Limited and Biased Coverage: The panel measures only 249 metabolic indicators, which represents a tiny fraction (< 0.2%) of the known human metabolome (over 20,000).

3.2 Overrepresentation of Lipids: Critically, the panel is heavily biased towards lipid metabolism. As the authors note, 210 of the 249 markers are related to lipoprotein lipids and another 18 are fatty acids. With ~92% of the input data related to lipids, it is statistically inevitable that any clustering algorithm would identify subtypes based on lipid dysregulation. The study, therefore, does not discover novel biological subtypes so much as it confirms the input bias of its dataset. The "fatty acid dysregulation" and "hyperlipidemia" subtypes are likely artifacts of the measurement panel rather than distinct depression biotypes.

4. Uncontrolled Confounding Variables (For example the use of antidepressants and statins): The analysis fails to account for key confounders that directly influence metabolic profiles. Most notably, the use of antidepressant medications, statins, or other psychotropic and metabolic drugs was not compared between subtypes or controlled for in the analysis. These medications have well-documented effects on lipid, fatty acid, and inflammatory markers, and their differential use among patients could easily explain the metabolic differences observed between the proposed subtypes.

5. Flawed Machine Learning Methodology: The approach to building and evaluating the diagnostic models is unsatisfactory and lacks rigor.

5.1 Handling of Imbalanced Data: The dataset is extremely imbalanced, with 7,945 depression cases compared to over 175,000 healthy controls. The Methods section makes no mention of how this severe class imbalance was handled. Standard machine learning practice requires techniques like oversampling (e.g., SMOTE), undersampling (e.g., ENN), or a hybrid approach (e.g., SMOTE-ENN) to prevent the model from simply learning to predict the majority class (healthy controls). This omission is a critical methodological flaw that invalidates the reported performance metrics (AUCs).

5.2 Unsatisfactory Model Performance and Selection: While the subtype models show higher AUCs than the binary model, the performance for Subtype 1 (AUC 0.785) and Subtype 2 (AUC 0.817) is modest and may be inflated due to the data imbalance issue. Furthermore, the reliance on a single algorithm (CatBoost) is not well-justified. A robust study would involve testing and optimizing a suite of models, including potentially more powerful deep learning architectures designed for tabular data (e.g., FT-Transformer, GANDALF), to ensure the best possible model is selected.

Minor Concerns:

1. Missing Methodological Details: Figure 2 presents visualizations using UMAP and t-SNE. However, these dimensionality reduction techniques are not described anywhere in the Methods section.

2. Ambiguity of Control Group: It is not explicitly clear whether the same healthy control (HC) group was used for all comparisons against the different depression subtypes in the various analyses (GWAS, machine learning, etc.). This should be clarified.

Reviewer #3: The study employed various methods, including biological assessment, cohort study, and machine learning analysis, to provide valuable insights into specific biomarkers of depression. Providing data on depression subtype and its implications for intervention is valuable and practical. the study used large sample of UK biobank data for analysis which made it more rigorous study.

Some points should be considered for further clarification as follows:

1. In the introduction section, the field of metabolomics study has been compared with binary, categorical, and diagnostic. It would be more appropriate to discuss a dimensional approach to diagnosis and treatment, comparing it with the main approach of the current study.

2. In the introduction ref #12, a study on the UK Biobank sample is mentioned. The biomarkers achieved an AUC diagnostic score of 0.6-0.7, which is insufficient for accurate diagnosis. How it could be justified and explained

3. Why did the "methodology" section come after the discussion section?

4. What exactly is the meaning of "depression": minor, major, dysthymia …. Type of depression? Why has PHQ-2 instead of PHQ-9 been used for case finding? The enrolled cases may not be considered as "major depressive disorder". Major depressive disorder, as mentioned by the article, is a heterogeneous disorder, which should not be considered an exact binary diagnosis by using precise diagnostic criteria.

5. This study helps to make a subtype of depressive disorder and has clinical implications; however, it does not add value to accurate diagnosis, as written in lines 273-274. How could it be justified?

6. "...This suggests that future research could develop streamlined diagnostic models for depression utilizing TRL and different lipoprotein components…" 281-282 LINES, again, do not explain the biomarker for diagnosis of depression. TRL may be involved in many normal and abnormal biological processes, it may just provide a feature of MDD. As mentioned in ref 33, the metabolic network associated with specific depressive symptoms may have intervention implications.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

Figure resubmission:Reproducibility:--> -->-->To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols-->?>

PLOS Digit Health. doi: 10.1371/journal.pdig.0001125.r004

Decision Letter 1

Hadi Ghasemi

20 Nov 2025

Towards Precision Psychiatry: Metabolomics Identifies Three Biological Subtypes of Depression

PDIG-D-25-00377R1

Dear Dr Liu,

We are pleased to inform you that your manuscript 'Towards Precision Psychiatry: Metabolomics Identifies Three Biological Subtypes of Depression' has been provisionally accepted for publication in PLOS Digital Health.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow-up email from a member of our team. 

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact digitalhealth@plos.org.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Digital Health.

Best regards,

Hadi Ghasemi

Academic Editor

PLOS Digital Health

***********************************************************

Additional Editor Comments (if provided):

Reviewer Comments (if any, and for reference):

Reviewer's Responses to Questions

Comments to the Author

Reviewer #3: All comments have been addressed

Reviewer #4: All comments have been addressed

Reviewer #5: All comments have been addressed

**********

publication criteria?>

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?-->?>

Reviewer #3: I don't know

Reviewer #4: Yes

Reviewer #5: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)??>

The PLOS Data policy

Reviewer #3: No

Reviewer #4: Yes

Reviewer #5: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

**********

Reviewer #3: (No Response)

Reviewer #4: (No Response)

Reviewer #5: Minor comments:

Page 96 (Line 286-288): Please provide reference(s) for the following: Specifically, depressive symptoms may promote behavioral changes that exacerbate metabolic dysfunction, while metabolic alterations may in turn aggravate depressive symptoms through disruptions in energy homeostasis and neuroendocrine regulation.

Page 117 (Line 686): It is mentioned as Figure length, please change it to Figure Legends

Please define all abbreviations at first use (TRL, GlycA, MUFA, etc.)

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #3: No

Reviewer #4: No

Reviewer #5: None

**********

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    Supplemental Method A. Genome-wide association study analysis. Supplemental Method B. Supplementary methods for sensitivity analysis. Fig A. QQ plots of GWAS analysis for the total depression sample and three depression subtypes compared to healthy controls. Fig B. Comparison of metabolic networks across depression subgroups.

    (DOCX)

    pdig.0001125.s001.docx (1.2MB, docx)
    S2 File

    Table A. 249 metabolomic indicators. Table B. Demographic Characteristics. Table C. Cluster evaluation index. Table D. The metabolomic differences between the three subtypes. Table E. Genetic correlations between depression subgroups and 34 diseases/traits. Table F. Comparison of the differences between different subgroups. Table G. The metabolic indicators used in each model. Table H. Centrality indices of different subgroups.

    (XLSX)

    pdig.0001125.s002.xlsx (107.2KB, xlsx)
    Attachment

    Submitted filename: Response to reviewers.docx

    pdig.0001125.s004.docx (1.1MB, docx)

    Data Availability Statement

    This research has been conducted using the UK Biobank Resource under Application Number 87530. Researchers can apply for data access via the UK Biobank Access Management System: https://biobank.ndph.ox.ac.uk/ukb/label.cgi?id=220. The Whitehall II Data was provided by Dementia Platforms UK (DPUK). All analyses were performed using the DPUK analysis portal. Researchers can explore and apply for access to this dataset through the DPUK Data Catalogue: https://data.dpuk.ukserp.ac.uk/CohortDirectory/Item?fingerPrintID=Whitehall%20II. The analytic code used in this study is fully available at: https://github.com/xiaogahuo/NMR_MDD_Subtype_Clustering.git.


    Articles from PLOS Digital Health are provided here courtesy of PLOS

    RESOURCES