Abstract
Depression is a heterogeneous and complex psychological syndrome with highly variable manifestations, which poses difficulties for treatment and prognosis. Depression patients are prone to developing various comorbidities, which stem from different pathophysiological mechanisms, remaining largely understudied. The current study focused on identifying comorbidity-specific phenotypes, and whether these clustered phenotypes are associated with different treatment patterns, clinical manifestations, physiological characteristics, and prognosis. We have conducted a 10-year retrospective observational cohort study using electronic medical records (EMR) for 11,818 patients diagnosed with depression and hospitalized at a large academic medical center in Chengdu, China. K-means clustering and visualization methods were performed to identify phenotypic categories. The association between phenotypic categories and clinical outcomes was evaluated using adjusted Cox proportional hazards model. We classified patients with depression into five stable phenotypic categories, including 15 statistically driven clusters in the discovery cohort (n = 9925) and the validation cohort (n = 1893), respectively. The categories include: (Category A) the lowest incidence of comorbidity, with prominent suicide, psychotic, and somatic symptoms (n = 3493/9925); (Category B) moderate comorbidity rate, with prominent anhedonia and anxious symptoms (n = 1795/9925); (Category C) the highest incidence of comorbidity of endocrine/metabolic and digestive system diseases (n = 1702/9925); (Category D) the highest incidence of comorbidity of neurological, mental and behavioral diseases (n = 881/9925); (Category E) other diseases comorbid with depression (n = 2054/9925). Patients in Category E had the lowest risk of psychiatric rehospitalization within 60-day follow-up, followed by Category C (HR, 1.57; 95% CI, 1.07–2.30), Category B (HR, 1.61; 95% CI, 1.10–2.40), Category A (HR, 1.82; 95% CI, 1.28–2.60), and Category D (HR, 2.38; 95% CI, 1.59–3.60) with P < 0.05, after adjustment for comorbidities, medications, and age. Regarding other longer observation windows (90-day, 180-day and 365-day), patients in Category D showed the highest rehospitalization risk all the time while there were notable shifts in rankings observed for Categories A, B and C over time. The results indicate that the higher the severity of mental illness in patients with five phenotypic categories, the greater the risk of rehospitalization. These phenotypes are associated with various pathways, including the cardiometabolic system, chronic inflammation, digestive system, neurological system, and mental and behavioral disorders. These pathways play a crucial role in connecting depression with other psychiatric and somatic diseases. The identified phenotypes exhibit notable distinctions in terms of comorbidity patterns, symptomology, biological characteristics, treatment approaches, and clinical outcomes.
Subject terms: Depression, Pathogenesis
Introduction
Depression is the most prevalent and disabling mental health condition affecting about 15–18% of the general population worldwide [1–3]. According to the World Health Organization (WHO), depression has been ranked as the third leading contributor to the overall global burden of disease since 2004 [4]. Unlike in other branches of medicine, the categorical or syndromal diagnosis of depression (DSM-5/ICD-10) rests entirely on descriptive phenomenology. The assumption of homogeneity in depression is recognized as a remarkable challenge that drives both comorbidity and heterogeneity issues [5]. Therefore, genetics, epigenetics, gene-environment interactions [6–11], early-life adversity [12], inflammation and other immune system dysfunction [13], brain imaging [14] have been extensively investigated to identify the neurobiological mechanisms that contribute to heterogeneity in depression.
There is an increasing recognition of evidence indicating that depression is driven by multiple underlying mechanisms, such as the hypothalamic-pituitary-adrenal (HPA) axis, heart-brain axis, and gut-brain axis, among others. This suggests that depression is a heterogeneous syndrome rather than a singular condition [15]. The heterogeneity of depression reflects the significant variation in symptom presentation, severity, course, treatment response, genetics, and neurobiology among different patients [16]. Patients may present with diverse symptom profiles, and diagnostic criteria for depression can be met by up to 10,377 unique combinations of symptoms [17]. This results in a wide range of possibilities, where two patients diagnosed with depression may only share a single core symptom. A multifactorial approach tackles the heterogeneity problem by assuming that disorders comprise multiple phenotypes driven by multiple discrete factors [5]. Conventional phenotypes of depression have been proposed based on a single factor, such as symptom (atypical, psychotic, melancholic depression), age at onset (seasonal affective disorder, postpartum, early versus late in life), course (single, recurrent, chronic), or severity [18, 19]. However, there’s limited evidence supporting the validity of these theoretically derived depression phenotypes, thus constraining their clinical applicability [18, 20]. Recent efforts to define depression phenotypes have focused on identifying symptom-specific clusters and examining neurophysiological correlates [15, 19]. Another research attempted to group patients based on neuroimaging and other biological measures, and validate the clusters through their correlation with clinical symptoms, treatment outcomes, or other clinical variables [21]. To date, defining whether there are phenotypes of patients that may result from different etiological sources of depression is an ongoing challenge.
Depression is associated with an increased risk of various comorbid diseases [22–31], although the explanations for this are not always clear. Longitudinal studies have reported elevated risks of asthma [22], diabetes [23], cardiovascular diseases [24–27], Parkinson’s disease [28], dementia [29], thyroid diseases [30], generalized anxiety disorder, post-traumatic stress disorder, various phobias, substance use disorders, obsessive compulsive disorder, attention-deficit hyperactivity disorder and personality disorders [31] among individuals diagnosed with depression. A recent study identified three main disease clusters after depression [32]. The high comorbidity rates suggest that transdiagnostic mechanisms may contribute to the biology underlying heterogeneous symptom presentations not only in depression, but also in these closely related disorders. A shared neural basis was recently proposed underlying psychiatric comorbidity [33]. While recent decades have seen significant advancements in understanding the pathophysiological and psychopathological mechanisms underlying the relationship between these medical conditions and depression, many physicians find it difficult to treat patients with psycho-somatic comorbidities. Distinct mechanisms could warrant different types of treatment. Treating all individuals with depression as a unitary disease entity may be a major reason for a sizable percentage of patients who remain nonresponsive or poorly responsive to available treatments.
A significant comorbidity exists across diagnostic labels thought to separate internalizing and externalizing disorders, with high phenotypic correlations observed between pairs of diagnoses [34, 35]. However, few studies have directly shed light on the association between phenotype of depression and multiple medical conditions (multimorbidity). Previous work, largely hypothesis-driven, focused on select comorbidities associated with depression, without considering a general pathogenic basis underlying both mental and somatic diseases. It has been suggested that depression is a systemic illness that affects the brain and the body, with the latter effects associated with increased vulnerability to, and poor prognosis of, several medical disorders [30]. There is an urgent need for studies aimed at assessing the underlying structure of systemic multimorbidity, to discriminate depression phenotypes and provide insight into the cause of comorbidity. Thus, based on the heterogeneity of depression and the assumption of the same pathogenic mechanism in comorbidity, multiple possible subtypes (comorbidity-specific phenotypes) of depression can be identified, to better understand the phenotype-specific mechanisms of depression. These can be validated via multi-omics analysis on patient populations and animal models in the future (Fig. 1).
Fig. 1.
Study overview of comorbidity-phenotype of depression.
Due to the complex etiology of depression, it is challenging to define its phenotypes based on clinical knowledge and empirical evidence. A viable alternative is to use data-driven approaches that leverage the wealth of information stored in EMR [36–40]. The wide availability of EMR has created a continuously growing repository of clinical data, including patient demographics, diagnoses, chief complaints, medications, laboratory measurements and procedures. This repository provides opportunities for large-scale population-based studies at low-cost [36, 38]. The integration of these rich data and clustering methods (e.g., Latent class analysis, K-means clustering, and hierarchical agglomerative clustering) provides a potential to measure differences between individuals based on their characteristics and find clusters of patients who are similar to each other than the patients in other clusters, wherein each cluster corresponds to a unique phenotype [38–40]. Multiple statistical testing methods (e.g., t-test, Kruskal-Wallis H-test and Chi-square test) [41] can be performed to find discriminative characteristics across different phenotypes, and visualization methods (e.g., Chord diagram, Radar chart and Boxplot) provide interpretation for the computationally derived phenotypes. While detractors may suggest that the subtypes derived from EMR heavily depend on the chosen methodology and cohort characteristics [15, 42], proponents of data-driven phenotyping assert that such limitations can be mitigated with expert input at each decision node [43]. In the realm of depression, clinicians and psychotherapists play a crucial role in overcoming these challenges. Notably, there has been a significant surge in the integration of data-driven approaches with expert panels in recent years [44]. This fusion offers promising insights and holds considerable potential for clinical application.
In this study, we seek to identify distinct clusters of patients with depression who exhibit similar phenotypes. We also aim to examine whether these phenotypes are linked to varying patterns of comorbidity, treatment approaches, clinical presentations (i.e., symptomatology), biological characteristics, and outcomes (i.e., psychiatric rehospitalization). By identifying multiple depression comorbidity-phenotype pairs, our objective is to establish a foundation for future research into the diverse pathogenic mechanisms contributing to the heterogeneity and comorbidity of depression. While this study focuses on the identification and characterization of phenotypic patterns, these insights are intended to guide subsequent investigations into the underlying biological, genetic, or environmental factors.
Methods
Setting and sample selection
This retrospective cohort study was conducted at West China Hospital (WCH) of Sichuan University. WCH is an academic medical center-based health system in Chengdu that serves patients from across southwest China. The hospital has 4300 beds and over 10,000 medical staff. Also, the psychiatric specialty at WHC serves a large number of patients with psychiatric disorders (e.g., depression, bipolar disorder, schizophrenia), with over 300 thousand outpatient visits and more than 6 thousand discharges annually. These patients hail from over 33 provinces, municipalities, and autonomous regions across China, with notable concentrations from Sichuan, Tibet, Chongqing, Guizhou, Yunnan, Gansu, Shandong, and other regions. Therefore, the studied sample can be considered a representative sample of China. In 2009, an EMR system integrated with the Health Information System (HIS) and the Laboratory Information System (LIS) was adopted in all departments throughout the hospital, marking the starting point for our data extraction.
Details of the study setting have been described in our recent publication [45]. In summary, the initial sample was obtained from an EMR research database derived from WCH. This database comprises 36,780 admission records of 21,964 inpatients. The inclusion criteria involved patients who had a diagnosis name containing either “depression,” “mania,” or “bipolar disorder” and had been hospitalized and discharged at least once between January 1, 2009, and December 31, 2018. From the database, we extracted the index admission record of each patient, specifically their first record indicating a diagnosis of depression [45]. (We excluded one patient with an exceptionally long length of stay.) Consequently, for this study, we recruited a study population of 13,176 patients with a recorded discharge diagnosis of depression based on the International Classification of Disease, Tenth Revision (ICD-10, Clinical Modification Codes F32 and F33). Here, the ICD-10 code serves as the gold standard for characterizing diagnostic categories, and the determination of the code is derived from direct assessments by trained psychiatrists. These assessments utilize both structured instruments, such as the Composite International Diagnostic Interview (CIDI), the Diagnostic Interview Schedule (DIS), and the Mini International Neuropsychiatric Interview (MINI), as well as semi-structured diagnostic interviews, exemplified by the Structured Clinical Interview for DSM (SCID) and Schedules for Clinical Assessment in Neuropsychiatry (SCAN) [46]. These comprehensive evaluations are performed at admission and undergo subsequent confirmation, and at times, revision, through repeated assessments during the hospitalization period. These conditions make it the optimal choice to improve the data quality. Therefore, the discharge diagnosis codes represent all the diseases of each patient during their hospitalization.
To ensure the integrity of our dataset, we excluded certain patients based on specific criteria. Firstly, we exclude 436 patients with missing demographic information to minimize potential bias caused by missing data, which may compromise the accuracy and reliability of our findings. To the best of our knowledge, the absence of demographic information was due to external factors; for example, some patients forgot to fill in the information, or some data lost during the early stage of the development of the information system. These factors are not related to the issues under study. Hence, the absence of demographic information is identified as Missing Completely At Random (MCAR), justifying its exclusion. Additionally, we excluded 922 patients who did not receive antidepressant treatment (as shown in Fig. 2). This decision is made because the choice to undergo antidepressant treatment serves as a direct indicator of a patient’s current depressive episode. As this study focuses on categorizing patients currently experiencing a depressive episode and comparing different antidepressant treatment options among distinct patient phenotypes, excluding those who did not receive treatment is necessary. By implementing these exclusion criteria, we obtain a valid sample of 11,818 patients for our analysis. This approach ensures that our findings are based on a reliable and representative cohort. To evaluate the performance of data-driven phenotyping on a distinct dataset and to ensure an unbiased estimation of the model’s generalization to new patients, we have created separate datasets to identify and validate depression phenotypes based on the respective time period of the data. 9925 patients from the primary study population hospitalized between 2010 and 2017 are selected to identify phenotypes, while 1893 patients admitted in 2018 are chosen to validate the derived phenotypes.
Fig. 2.
Flow chart of data set construction.
From the procedure just described, we have identified patients with depression using the presence of an F32 or F33 ICD-10 code in either the main or supplementary position. It should be noted that the ICD-10 code in the main diagnostic position represents the principal condition being treated, while the supplementary codes indicate comorbidities that contribute to the overall episode. In such setting, the phenotypes of depression patients who were hospitalized both for an acute episode of depression and for an acute episode of other comorbid conditions could be identified. The comorbidity-based phenotyping tool can be utilized in a broader range of clinical scenarios.
Outcome measures
In this analysis, the primary measures of outcome are psychiatric readmission within 60 days, 90 days, 180 days, and 365 days of the initial depression hospitalization. Readmission is defined as binary indicators of whether an individual had subsequent admission records within the defined time windows after the initial hospitalization, with a principal psychiatric diagnosis of ICD-10 code in the range F00-F99. For each patient, follow-up began on the date of discharge, and ended on the earliest instance of readmission within each time window (if at all), or on the end date of follow-up.
Assessing the therapeutic efficacy in patients with mental illness objectively represents a global technical challenge. Currently, the predominant method for evaluating therapeutic outcomes relies on scale-based assessments, which involve calculating the variation between pre-treatment and post-treatment scale scores. Some studies have explored the use of magnetic resonance functional imaging (fMRI) technology to scrutinize structural or functional changes in patients’ brains before and after treatment, aiming to gauge therapeutic effects. However, methods centered on fMRI data have been constrained by limited sample sizes and high costs, hindering widespread adoption within China’s mental health clinical practices. Mental health care primarily relies on scale assessments, but issues such as self-reporting of symptoms by patients can introduce subjectivity into therapeutic evaluations. To address these concerns, we engage in extensive communication and collaboration with the psychiatrists who partner in this study. Subsequently, we adopt the criterion of readmission at different time intervals as an objective proxy of therapeutic outcomes. Although this outcome indicator constitutes an indirect evaluation of therapeutic efficacy, it offers a more objective perspective.
Analytic process
Our objective is to classify patients with depression who share similar phenotypes through application of clustering analysis (K-means clustering). Figure 3 depicts the six steps we followed to accomplish this objective. The data preprocessing and machine learning modeling is carried out using R programming language (R 3.6.3). The code is available upon request.
Fig. 3.
Description of basic steps of analysis.
Data set construction
The data extracted from the EMR system encompasses the hospital’s digital abstracts of inpatient stays, containing a range of information including patient sociodemographic details collected at admission, laboratory results, medication prescriptions, physiotherapy and psychotherapy treatments, vital signs, diagnoses, length of stay, procedure codes, patient-reported data (chief complaints, past illness history, lifestyle behaviors, current medical history), doctor’s medical advice, discharge summaries, and other data typically used for billing purposes. (Fig. 3 Step 1).
Clinically guided feature processing
Prior studies have shown that the heterogeneity of depression is driven by the interaction of various factors, including genetic, neurobiological, clinical, psychological, behavioral, pharmacological, social, and environmental factors [7]. The database includes a comprehensive set of items to record all relevant information generated during a patient’s hospitalization.
Specifically, we recorded:
Sociodemographic information: gender, age, marital status, employment status, ethnicity, method of payment, and province of origin.
Hospitalization information: month/season/year of admission, department/specialty where the patient was admitted, whether the patient was transferred to another unit, diagnoses, severity of depression, whether the depression was recurrent, and whether the patient had co-occurring medical comorbidities (other mental disorders, endocrine diseases, nervous diseases, digestive diseases, circulatory diseases, respiratory diseases, and cancer). We also recorded the number of comorbidities in each disease system and the number of surgeries performed, as well as the length of stay.
Past history and lifestyle behaviors: histories of surgery, allergy, blood transfusion, medication use, smoking, alcoholism.
Physical examination at admission: presence of subcutaneous bleeding, cachexia, facial expression, nutrition status, cooperation, consciousness level, gait, body position, body temperature, pulse rate, respiratory rate, and systolic and diastolic blood pressure.
Text data (main complaints, history of present illness).
Laboratory test data (routine blood, urine, stool, and biochemical tests).
Treatment-related data (prescribed drugs, physiotherapies, and psychotherapies) available at the time of the patient’s index admission for depression.
By analyzing the dataset of doctors’ orders, we have captured and extracted variables related to the treatment patterns of each patient from the EMR system, including inpatient medication prescriptions (the type of drugs and usage frequency of common antidepressants, antipsychotics, anxiolytics, mood stabilizers, anti-side effects drugs, new hypnotics, β receptor blockers, hormonal drugs, Chinese patent medicines), physiotherapy patterns (modified electroconvulsive therapy, biofeedback therapy, transcranial magnetic stimulation, and electroencephalographic biofeedback therapy), and psychotherapy patterns. We solely extracted from the medical order data two key pieces of information regarding psychotherapy: whether a doctor prescribed psychotherapy for the patient and the total number of psychotherapy sessions conducted during the patient’s hospital stay. Specifically, within the medical order data, there is a field labeled “Order Item Name”, which records the specific orders given by doctors to patients, including medication names, psychotherapy, physiotherapy names, and so on. Based on this, we created two variables related to psychotherapy: one indicating whether the patient received psychotherapy (PSY_type, with a binary value of 0 or 1, where 0 represents no psychotherapy and 1 represents receiving psychotherapy), and the other representing the total number of psychotherapy sessions the patient received (PSY_SUM, with a numerical value).
Notably, our primary objective is to establish clinically meaningful depression phenotypes that can aid clinicians in optimizing patient management and treatment decisions. Only the information obtained promptly at admission is considered for the cluster analysis, while additional data generated during hospitalization are analyzed to identify distinctive characteristics across different phenotypes (Fig. 3 Step 2). Overall, 87 features are included as inputs for the cluster analysis, and 125 features are analyzed to compare key characteristics of patients among multiple phenotypes (Supplementary Information 1.2, Table S1, Table S2 and Table S3). For unstructured data (chief complaints) and treatment-related data processing, please refer to our previous publication (Supplementary Information 1.3 and 1.4, Table S1) [45]. Prior to commencing the analysis, each record’s missing values are checked and imputed using data from the EMR. Outliers are also detected and removed from the dataset.
Feature filtering
The large number of features in our dataset presents a significant challenge for K-means clustering methods [47]. To address this issue, we aim to reduce the dimensionality of our feature set by selecting a smaller subset of the most influential features. Subsequently, we will perform the clustering algorithm solely on the selected features. It is widely acknowledged that traditional K-means algorithms do not possess the ability to automatically select features during the clustering process [48]. We employed a novel approach to measure feature importance in K-means clustering [49]. This approach is model-agnostic and solely relies on a function called “FeatureImpCluster” that computes cluster assignments for new data points. By utilizing this function, we can identify the features that significantly impact cluster assignments and assign them relevance scores accordingly.
K-means is a divisive clustering approach that aims to group observations into K clusters in a way that minimizes the total sum of squared Euclidean distances between each observation and its closest cluster centroid [48]. The number of clusters K is a prerequisite input for this algorithm. Thus, a pre-processing step is required to determine the optimal number of clusters before performing feature selection. Although several methods are available to estimate the optimal number of clusters for a given dataset, only a few provide reliable and accurate results, such as the Elbow method [50], Average Silhouette method [51], Gap Statistic method [52]. We compare these three methods to determine the optimal value of K. The optimal K value is then provided as input to initialize the K-means algorithm, followed by an assessment of feature importance using the FeatureImpCluster function. We also evaluate multiple combinations of the number of top important features and the number of clusters to balance the performance and interpretability of the clustering model. Model performance is evaluated by calculating the ratio of the sum of squared Euclidean distances between clusters (BSS) over the total sum of squared Euclidean distances (TSS) (BSS/TSS). Model interpretability is measured using correlation analysis and principal component analysis (Fig. 3 Step 3).
In applying the K-means clustering method, we have placed significant emphasis on harmoniously integrating clinician’s expertise with data-driven outcomes while iteratively refining the latter. Notably, in selecting the optimal number of clusters and determining the ranking of factor importance, we actively engage clinical psychiatrists to guide our decision-making process. Ultimately, we opt for the factor importance ranking result that best aligned with collective clinical expertise, using it as the foundation for subsequent clustering algorithms. The same expert panel pipeline is consistently applied to the K-means algorithm used for data record clustering.
Clustering analysis and performance evaluation
We have implemented the K-means clustering method by inputting the combination of the selected features and the number of clusters as the initialization data. To evaluate the goodness of the clustering model, we use BSS/TSS and Silhouette coefficient as metrics. The BSS/TSS metric ranges from 0 to 1, where a value of 1 indicates clusters are well-separated and clearly distinct, while a value of 0 indicates clusters are indistinguishable. Similarly, the Silhouette coefficient ranges from −1 to 1, with a value of 1 indicating clusters are clearly distinct, and a value of -1 indicating clusters are incorrectly assigned. Additionally, we employ visualization methods (Boxplot to present differences among all clusters for each feature, Heatmap to visualize the distance matrix among all clusters) to aid in assessing and optimizing the clustering results for a non-prespecified value of K (Fig. 3 Step 4). Finally, we integrate clinician insights and our analytical results to propose clinically meaningful phenotypes.
Visualizing patient phenotypes and interpretation of clustering results
In this step (Fig. 3 Step 5), three types of clustering results are visualized using various graphs and charts. First, bar plots are used to compare and visualize symptom and treatment patterns among different phenotypes. Second, comorbidity networks are employed to visualize the comorbidity pattern associated with each phenotype. Finally, we utilize Chord diagrams and Radar charts to and highlight the characteristics of each phenotype visualize the weighted relationships between different features driving the clustering. The clustering algorithm assigns weights to each feature, impacting the clustering results. In conjunction with the visualizations, we develop a straightforward algorithm to quantify weight differences among features within each phenotype. The algorithm involves assessing the contribution of each feature towards the formation and separation of the different phenotypes and is detailed in the Supplementary (Section 1.3). These visualizations and algorithm provided valuable insights into the clustering results and revealed distinct characteristics for each phenotype.
To compare the symptom patterns of each phenotype, we calculate the incidence of various symptom categories for each phenotype. These categories include Depressed, Decreased Interests, Anxious, Cognitive Symptoms, Retardation or Psychomotor Impairment, Psychotic Symptoms, Self-Accusation, Suicide Attempts, Sleep Problems, Aspecific Somatization, Decreased Appetite or Significant Weight Loss, Lack of Energy, Irritable, Movement Impairment, Addictive Behavior, and Eating Disorder (Supplementary Information 2.1, Table S4). Furthermore, we calculate the incidence of medication use, physiotherapy, and psychotherapy for each phenotype. Specifically, we examined the usage of 11 antidepressants (ADP), 11 antipsychotics (AP), 10 anxiolytics (AA), 4 mood stabilizers (MSB), 8 anti-side effect drugs (ASE), 1 new hypnotics (HYP), 3 β receptor blockers (OT), hormonal drugs (T3), Chinese patent medicines (CM), as well as 4 physiotherapies (PHY), and psychotherapies (PSY). Additionally, we calculated the total amount of medical orders for each corresponding drug type (ADP_SUM, AP_SUM, AA_SUM, MSB_SUM, ASE_SUM, OT_SUM, T3_SUM, CM_SUM, PHY_SUM, PSY_SUM) for each phenotype (Supplementary Information 2.2, Table S5 and Table S6). By comparing the incidence rates of symptoms and medication usage patterns across phenotypes, we gain valuable insights into the distinct characteristics and differences among them.
We proceeded to investigate the relationship between comorbidities and the identified phenotypes. Note that the number of diagnoses for each patient range from 1 to 7, with a value of 1 indicating a sole diagnosis of depression, and values of 2 to 7 representing the presence of additional comorbid conditions associated with depression. These comorbid medical conditions are obtained from the main and supplementary diagnoses recorded using the ICD-10 coding system. We specifically utilized the first-level ‘phecodes,’ which are 3-character codes [53], resulting in a defined set of included medical conditions (Supplementary Information 2.3, Table S7). Subsequently, we identify all possible pairs of diagnoses involving depression and its comorbidities, which we referred to as depression-specific comorbidities. Comorbidity networks are then constructed based on these depression-specific comorbidities, with the prevalence of co-occurrence serving as a measure of the strength of comorbidity associations for each diagnosis pair. Each network in our analysis consists of nodes and edges. Nodes represent the diseases encompassed within each phenotype, with the color of each node indicating the disease system classified by the ICD-10. The size of each node reflects the prevalence of the corresponding disease within the phenotype. Edges are used to visually represent each diagnosis pair, with the width of each edge corresponding to the prevalence of co-occurrence for that particular diagnosis pair. For each edge, the source disease is designated as the main diagnosis, while the target disease is considered the comorbid diagnosis.
Validation using the hold-out test data set
To evaluate the robustness of our analysis, we have performed validation using an independent data set (Fig. 2). This data set has a similar structure to the discovery data set but is obtained from a different patient population. We have applied the same analytical process to the patients in this data set to evaluate the stability of the results obtained through data-driven phenotyping.
Statistical analysis
The baseline characteristics of all identified phenotypes are compared. The laboratory markers and treatment patterns are also compared for each phenotype. Continuous variables are presented as medians with interquartile ranges (IQR) or mean and standard deviation format, and categorical variables are reported as counts and percentages. Multiple statistical analyses were performed to investigate the significance of features among phenotypes, including the Chi-square test for categorical variables and the Kruskal-Wallis H-test for continuous variables with non-normal distribution, using the tableone package in R 3.6.3. The Kolmogorov-Smirnov test is used to determine whether quantitative data from different phenotypes are normally distributed, and pairwise post-hoc tests (Dunn test with Bonferroni adjustment procedure) are performed on continuous variables with significant differences. A significance level of 0.05 is used for drawing the main conclusions. Data analysis was completed between April 2022 and March 2024.
The association between phenotypes and clinical outcomes is evaluated using adjusted Cox proportional hazards models, with adjustments made for medications, physiotherapy, and psychotherapy, comorbidities, and age. For each phenotype, the hazard ratio (HR) is reported along with 95% confidence intervals (95% CI) and corresponding P-values. Additionally, a Kaplan-Meier (KM) plot is derived and analyzed using the log-rank test. Assumptions are further evaluated using both KM curves and Schoenfeld residuals tests.
Results
Baseline characteristics
Detailed breakdowns of the features corresponding to the identified phenotypes can be found in Table 1 and Table S8 (Supplementary Information 2.4). With the aim of analyzing the etiology of different phenotypes and developing treatment strategies tailored to each phenotype, 5 phenotypic categories (Category A to Category E) are induced from the 15 phenotypes by analyzing the commonalities among them in terms of age, marital status, occupation, symptom presentation, and comorbidity patterns, based on clinical expertise of psychiatrists who are also co-authors of this paper(Mapping relationship is illustrated in Fig. 4). Figure 5A–E highlight the characteristics of each phenotype (Category A for example), and the rest details for Category B to Category E are given in Supplementary Information Figure S1-B1 to Figure S1-E4. Variations in symptom patterns across different phenotypes are visualized in Fig. 6, which is based on the information in Table S4 of Supplementary Information 2.1. Figure 7 presents an overview of comorbidity patterns associated with depression across various phenotypes. Detailed comorbidity networks are elucidated in Table S7 and Figure S2 in Supplementary Information 2.3.
Table 1.
Characteristics of Patient Demographics, Behavior, and Medical History, Stratified by Clusters a.
Characteristic | Overall n = 9925 |
Cluster A1 n = 1053 |
Cluster A2 n = 1305 |
Cluster A3 n = 600 |
Cluster A4 n = 535 |
Cluster B1 n = 1233 |
Cluster B2 n = 562 |
Cluster C1 n = 879 |
Cluster C2 n = 242 |
Cluster C3 n = 581 |
Cluster D1 n = 390 |
Cluster D2 n = 491 |
Cluster E1 n = 95 |
Cluster E2 n = 420 |
Cluster E3 n = 280 |
Cluster E4 n = 1259 |
p-value |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gender = Female (%) | 6575 (66.2) | 656 (62.3) | 965 (73.9) | 410 (68.3) | 368 (68.8) | 857 (69.5) | 392 (69.8) | 624 (71.0) | 165 (68.2) | 393 (67.6) | 228 (58.5) | 289 (58.9) | 60 (63.2) | 248 (59.0) | 139 (49.6) | 781 (62.0) | < 0.001 |
age group (%) | < 0.001 | ||||||||||||||||
0–17 | 768 (7.7) | 469 (44.5) | 0 (0.0) | 38 (6.3) | 102 (19.1) | 0 (0.0) | 33 (5.9) | 11 (1.3) | 0 (0.0) | 6 (1.0) | 7 (1.8) | 95 (19.3) | 2 (2.1) | 0 (0.0) | 2 (0.7) | 3 (0.2) | |
18–35 | 2299 (23.2) | 583 (55.4) | 43 (3.3) | 218 (36.3) | 422 (78.9) | 194 (15.7) | 126 (22.4) | 92 (10.5) | 19 (7.9) | 78 (13.4) | 37 (9.5) | 311 (63.3) | 16 (16.8) | 5 (1.2) | 12 (4.3) | 143 (11.4) | |
36–59 | 4222 (42.5) | 1 (0.1) | 966 (74.0) | 313 (52.2) | 11 (2.1) | 624 (50.6) | 316 (56.2) | 418 (47.6) | 123 (50.8) | 326 (56.1) | 126 (32.3) | 82 (16.7) | 43 (45.3) | 70 (16.7) | 102 (36.4) | 701 (55.7) | |
>= 60 | 2636 (26.6) | 0 (0.0) | 296 (22.7) | 31 (5.2) | 0 (0.0) | 415 (33.7) | 87 (15.5) | 358 (40.7) | 100 (41.3) | 171 (29.4) | 220 (56.4) | 3 (0.6) | 34 (35.8) | 345 (82.1) | 164 (58.6) | 412 (32.7) | |
marital status (%) | < 0.001 | ||||||||||||||||
divorced | 432 (4.4) | 0 (0.0) | 126 (9.7) | 39 (6.5) | 0 (0.0) | 52 (4.2) | 26 (4.6) | 40 (4.6) | 10 (4.1) | 28 (4.8) | 13 (3.3) | 8 (1.6) | 4 (4.2) | 10 (2.4) | 8 (2.9) | 68 (5.4) | |
married | 6941 (69.9) | 18 (1.7) | 1088 (83.4) | 436 (72.7) | 217 (40.6) | 1110 (90.0) | 428 (76.2) | 728 (82.8) | 211 (87.2) | 482 (83.0) | 308 (79.0) | 179 (36.5) | 75 (78.9) | 340 (81.0) | 242 (86.4) | 1079 (85.7) | |
unknown | 16 (0.2) | 7 (0.7) | 0 (0.0) | 0 (0.0) | 1 (0.2) | 1 (0.1) | 0 (0.0) | 1 (0.1) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 1 (0.2) | 1 (1.1) | 0 (0.0) | 1 (0.4) | 3 (0.2) | |
single/unmarried | 2121 (21.4) | 1028 (97.6) | 14 (1.1) | 125 (20.8) | 317 (59.3) | 24 (1.9) | 93 (16.5) | 58 (6.6) | 12 (5.0) | 44 (7.6) | 32 (8.2) | 303 (61.7) | 10 (10.5) | 5 (1.2) | 9 (3.2) | 47 (3.7) | |
widowed | 415 (4.2) | 0 (0.0) | 77 (5.9) | 0 (0.0) | 0 (0.0) | 46 (3.7) | 15 (2.7) | 52 (5.9) | 9 (3.7) | 27 (4.6) | 37 (9.5) | 0 (0.0) | 5 (5.3) | 65 (15.5) | 20 (7.1) | 62 (4.9) | |
job status (%) | < 0.001 | ||||||||||||||||
civil servant | 911 (9.2) | 16 (1.5) | 165 (12.6) | 54 (9.0) | 38 (7.1) | 121 (9.8) | 53 (9.4) | 82 (9.3) | 21 (8.7) | 51 (8.8) | 39 (10.0) | 28 (5.7) | 9 (9.5) | 73 (17.4) | 39 (13.9) | 122 (9.7) | |
farmer | 1393 (14.0) | 38 (3.6) | 219 (16.8) | 93 (15.5) | 35 (6.5) | 241 (19.5) | 83 (14.8) | 142 (16.2) | 34 (14.0) | 132 (22.7) | 64 (16.4) | 43 (8.8) | 11 (11.6) | 31 (7.4) | 37 (13.2) | 190 (15.1) | |
freelancers | 474 (4.8) | 32 (3.0) | 56 (4.3) | 39 (6.5) | 23 (4.3) | 69 (5.6) | 44 (7.8) | 40 (4.6) | 8 (3.3) | 24 (4.1) | 11 (2.8) | 28 (5.7) | 3 (3.2) | 9 (2.1) | 9 (3.2) | 79 (6.3) | |
retired | 1371 (13.8) | 1 (0.1) | 189 (14.5) | 35 (5.8) | 1 (0.2) | 194 (15.7) | 67 (11.9) | 193 (22.0) | 66 (27.3) | 111 (19.1) | 107 (27.4) | 8 (1.6) | 17 (17.9) | 122 (29.0) | 73 (26.1) | 187 (14.9) | |
staff | 1258 (12.7) | 67 (6.4) | 204 (15.6) | 136 (22.7) | 77 (14.4) | 155 (12.6) | 80 (14.2) | 103 (11.7) | 34 (14.0) | 84 (14.5) | 38 (9.7) | 62 (12.6) | 12 (12.6) | 22 (5.2) | 25 (8.9) | 159 (12.6) | |
student | 1345 (13.6) | 753 (71.5) | 2 (0.2) | 73 (12.2) | 191 (35.7) | 7 (0.6) | 60 (10.7) | 29 (3.3) | 1 (0.4) | 11 (1.9) | 12 (3.1) | 179 (36.5) | 11 (11.6) | 0 (0.0) | 4 (1.4) | 12 (1.0) | |
unemployed | 842 (8.5) | 75 (7.1) | 103 (7.9) | 75 (12.5) | 51 (9.5) | 98 (7.9) | 49 (8.7) | 62 (7.1) | 29 (12.0) | 63 (10.8) | 36 (9.2) | 65 (13.2) | 12 (12.6) | 16 (3.8) | 7 (2.5) | 101 (8.0) | |
unknown | 1843 (18.6) | 62 (5.9) | 270 (20.7) | 70 (11.7) | 105 (19.6) | 268 (21.7) | 94 (16.7) | 177 (20.1) | 38 (15.7) | 80 (13.8) | 62 (15.9) | 57 (11.6) | 14 (14.7) | 131 (31.2) | 77 (27.5) | 338 (26.8) | |
worker | 488 (4.9) | 9 (0.9) | 97 (7.4) | 25 (4.2) | 14 (2.6) | 80 (6.5) | 32 (5.7) | 51 (5.8) | 11 (4.5) | 25 (4.3) | 21 (5.4) | 21 (4.3) | 6 (6.3) | 16 (3.8) | 9 (3.2) | 71 (5.6) | |
main diagnosis (%) | < 0.001 | ||||||||||||||||
Depression | 7738 (78.0) | 1053 (100) | 1305 (100) | 600 (100) | 535 (100) | 1231 (99.8) | 504 (89.7) | 692 (78.7) | 205 (84.7) | 534 (91.9) | 374 (95.9) | 296 (60.3) | 13 (13.7) | 21 (5.0) | 6 (2.1) | 369 (29.3) | |
Diseases of the circulatory system | 341 (3.4) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 1 (0.2) | 30 (3.4) | 1 (0.4) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 23 (24.2) | 109 (26.0) | 59 (21.1) | 118 (9.4) | |
Diseases of the digestive system | 74 (0.7) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 8 (0.9) | 3 (1.2) | 2 (0.3) | 0 (0.0) | 0 (0.0) | 5 (5.3) | 9 (2.1) | 7 (2.5) | 40 (3.2) | |
Diseases of the musculoskeletal system | 154 (1.6) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 16 (1.8) | 1 (0.4) | 0 (0.0) | 0 (0.0) | 1 (0.2) | 15 (15.8) | 23 (5.5) | 20 (7.1) | 78 (6.2) | |
Diseases of the nervous system | 378 (3.8) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 2 (0.4) | 31 (3.5) | 1 (0.4) | 4 (0.7) | 0 (0.0) | 6 (1.2) | 3 (3.2) | 77 (18.3) | 75 (26.8) | 179 (14.2) | |
Diseases of the respiratory system | 122 (1.2) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 7 (0.8) | 5 (2.1) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 2 (2.1) | 47 (11.2) | 21 (7.5) | 40 (3.2) | |
Mental and behavioral disorders due to psychoactive substance use | 91 (0.9) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 1 (0.2) | 9 (1.0) | 6 (2.5) | 11 (1.9) | 6 (1.5) | 26 (5.3) | 0 (0.0) | 0 (0.0) | 4 (1.4) | 28 (2.2) | |
Neoplasms | 70 (0.7) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 8 (0.9) | 1 (0.4) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 14 (14.7) | 12 (2.9) | 4 (1.4) | 31 (2.5) | |
Neurotic, stress-related and somatoform disorders | 334 (3.4) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 1 (0.1) | 49 (8.7) | 27 (3.1) | 9 (3.7) | 26 (4.5) | 8 (2.1) | 79 (16.1) | 1 (1.1) | 10 (2.4) | 5 (1.8) | 119 (9.5) | |
Organic/symptomatic mental disorders | 43 (0.4) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 6 (0.7) | 1 (0.4) | 0 (0.0) | 1 (0.3) | 2 (0.4) | 0 (0.0) | 17 (4.0) | 5 (1.8) | 11 (0.9) | |
other | 477 (4.8) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 1 (0.2) | 39 (4.4) | 6 (2.5) | 0 (0.0) | 0 (0.0) | 20 (4.1) | 18 (18.9) | 94 (22.4) | 74 (26.4) | 225 (17.9) | |
Other mental and behavioral disorders | 72 (0.7) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 1 (0.1) | 4 (0.7) | 5 (0.6) | 2 (0.8) | 3 (0.5) | 1 (0.3) | 44 (9.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 12 (1.0) | |
Schizophrenia, schizotypal and delusional disorders | 31 (0.3) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 1 (0.1) | 1 (0.4) | 1 (0.2) | 0 (0.0) | 17 (3.5) | 1 (1.1) | 1 (0.2) | 0 (0.0) | 9 (0.7) | |
total number of diagnoses (%) | < 0.001 | ||||||||||||||||
1 | 3901 (39.3) | 914 (86.8) | 1098 (84.1) | 509 (84.8) | 488 (91.2) | 649 (52.6) | 236 (42.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 7 (7.4) | 0 (0.0) | 0 (0.0) | 0 (0.0) | |
2 | 2150 (21.7) | 108 (10.3) | 148 (11.3) | 57 (9.5) | 38 (7.1) | 314 (25.5) | 169 (30.1) | 242 (27.5) | 0 (0.0) | 157 (27.0) | 83 (21.3) | 363 (73.9) | 11 (11.6) | 0 (0.0) | 0 (0.0) | 460 (36.5) | |
3 | 1261 (12.7) | 25 (2.4) | 47 (3.6) | 23 (3.8) | 7 (1.3) | 150 (12.2) | 82 (14.6) | 236 (26.8) | 25 (10.3) | 176 (30.3) | 87 (22.3) | 95 (19.3) | 15 (15.8) | 30 (7.1) | 15 (5.4) | 248 (19.7) | |
4 | 768 (7.7) | 2 (0.2) | 7 (0.5) | 8 (1.3) | 2 (0.4) | 58 (4.7) | 41 (7.3) | 160 (18.2) | 45 (18.6) | 111 (19.1) | 59 (15.1) | 28 (5.7) | 19 (20.0) | 46 (11.0) | 34 (12.1) | 148 (11.8) | |
5 | 584 (5.9) | 2 (0.2) | 4 (0.3) | 3 (0.5) | 0 (0.0) | 39 (3.2) | 24 (4.3) | 98 (11.1) | 55 (22.7) | 79 (13.6) | 57 (14.6) | 4 (0.8) | 14 (14.7) | 57 (13.6) | 45 (16.1) | 103 (8.2) | |
6 | 433 (4.4) | 2 (0.2) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 10 (0.8) | 6 (1.1) | 71 (8.1) | 42 (17.4) | 29 (5.0) | 40 (10.3) | 1 (0.2) | 8 (8.4) | 88 (21.0) | 47 (16.8) | 89 (7.1) | |
7 | 828 (8.3) | 0 (0.0) | 1 (0.1) | 0 (0.0) | 0 (0.0) | 13 (1.1) | 4 (0.7) | 72 (8.2) | 75 (31.0) | 29 (5.0) | 64 (16.4) | 0 (0.0) | 21 (22.1) | 199 (47.4) | 139 (49.6) | 211 (16.8) | |
severity (%) | < 0.001 | ||||||||||||||||
mild | 1 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 1 (0.1) | |
moderate | 12 (0.1) | 1 (0.1) | 1 (0.1) | 2 (0.3) | 0 (0.0) | 1 (0.1) | 1 (0.2) | 0 (0.0) | 1 (0.4) | 2 (0.3) | 1 (0.3) | 1 (0.2) | 0 (0.0) | 0 (0.0) | 1 (0.4) | 0 (0.0) | |
Other | 4 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 1 (0.1) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 1 (0.2) | 0 (0.0) | 2 (0.2) | |
severe | 1847 (18.6) | 431 (40.9) | 12 (0.9) | 598 (99.7) | 2 (0.4) | 58 (4.7) | 133 (23.7) | 129 (14.7) | 48 (19.8) | 122 (21.0) | 50 (12.8) | 201 (40.9) | 12 (12.6) | 4 (1.0) | 3 (1.1) | 44 (3.5) | |
unspecified | 8061 (81.2) | 621 (59.0) | 1292 (99.0) | 0 (0.0) | 533 (99.6) | 1173 (95.1) | 428 (76.2) | 750 (85.3) | 193 (79.8) | 457 (78.7) | 339 (86.9) | 289 (58.9) | 83 (87.4) | 415 (98.8) | 276 (98.6) | 1212 (96.3) | |
comorbidity (%) | < 0.001 | ||||||||||||||||
FALSE | 3901 (39.3) | 914 (86.8) | 1098 (84.1) | 509 (84.8) | 488 (91.2) | 649 (52.6) | 236 (42.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 7 (7.4) | 0 (0.0) | 0 (0.0) | 0 (0.0) | |
TRUE | 6024 (60.7) | 139 (13.2) | 207 (15.9) | 91 (15.2) | 47 (8.8) | 584 (47.4) | 326 (58.0) | 879 (100.0) | 242 (100.0) | 581 (100.0) | 390 (100.0) | 491 (100.0) | 88 (92.6) | 420 (100.0) | 280 (100.0) | 1259 (100) | |
psychiatric comorbidity (%) | < 0.001 | ||||||||||||||||
FALSE | 6690 (67.4) | 1053 (100) | 1305 (100) | 600 (100.0) | 535 (100.0) | 1205 (97.7) | 387 (68.9) | 614 (69.9) | 178 (73.6) | 454 (78.1) | 331 (84.9) | 0 (0.0) | 12 (12.6) | 4 (1.0) | 0 (0.0) | 12 (1.0) | |
TRUE | 3235 (32.6) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 28 (2.3) | 175 (31.1) | 265 (30.1) | 64 (26.4) | 127 (21.9) | 59 (15.1) | 491 (100.0) | 83 (87.4) | 416 (99.0) | 280 (100.0) | 1247 (99.0) | |
endocrine comorbidity (%) | < 0.001 | ||||||||||||||||
FALSE | 8218 (82.8) | 1047 (99.4) | 1305 (100) | 600 (100) | 535 (100.0) | 1233 (100) | 510 (90.7) | 0 (0.0) | 0 (0.0) | 581 (100.0) | 291 (74.6) | 466 (94.9) | 74 (77.9) | 273 (65.0) | 198 (70.7) | 1105 (87.8) | |
TRUE | 1707 (17.2) | 6 (0.6) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 52 (9.3) | 879 (100.0) | 242 (100.0) | 0 (0.0) | 99 (25.4) | 25 (5.1) | 21 (22.1) | 147 (35.0) | 82 (29.3) | 154 (12.2) | |
nervous comorbidity (%) | < 0.001 | ||||||||||||||||
FALSE | 9243 (93.1) | 1053 (100) | 1305 (100) | 600 (100.0) | 535 (100.0) | 1233 (100) | 561 (99.8) | 879 (100.0) | 241 (99.6) | 580 (99.8) | 0 (0.0) | 491 (100.0) | 87 (91.6) | 420 (100.0) | 0 (0.0) | 1258 (99.9) | |
TRUE | 682 (6.9) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 1 (0.2) | 0 (0.0) | 1 (0.4) | 1 (0.2) | 390 (100.0) | 0 (0.0) | 8 (8.4) | 0 (0.0) | 280 (100.0) | 1 (0.1) | |
digestive comorbidity (%) | < 0.001 | ||||||||||||||||
FALSE | 8532 (86.0) | 1030 (97.8) | 1305 (100) | 600 (100.0) | 535 (100.0) | 1232 (99.9) | 487 (86.7) | 878 (99.9) | 4 (1.7) | 0 (0.0) | 313 (80.3) | 491 (100.0) | 75 (78.9) | 419 (99.8) | 223 (79.6) | 940 (74.7) | |
TRUE | 1393 (14.0) | 23 (2.2) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 1 (0.1) | 75 (13.3) | 1 (0.1) | 238 (98.3) | 581 (100.0) | 77 (19.7) | 0 (0.0) | 20 (21.1) | 1 (0.2) | 57 (20.4) | 319 (25.3) | |
circulatory comorbidity (%) | < 0.001 | ||||||||||||||||
FALSE | 7802 (78.6) | 1049 (99.6) | 1305 (100) | 599 (99.8) | 533 (99.6) | 797 (64.6) | 484 (86.1) | 572 (65.1) | 149 (61.6) | 482 (83.0) | 227 (58.2) | 480 (97.8) | 59 (62.1) | 0 (0.0) | 120 (42.9) | 946 (75.1) | |
TRUE | 2123 (21.4) | 4 (0.4) | 0 (0.0) | 1 (0.2) | 2 (0.4) | 436 (35.4) | 78 (13.9) | 307 (34.9) | 93 (38.4) | 99 (17.0) | 163 (41.8) | 11 (2.2) | 36 (37.9) | 420 (100.0) | 160 (57.1) | 313 (24.9) | |
respiratory comorbidity (%) | < 0.001 | ||||||||||||||||
FALSE | 9070 (91.4) | 1024 (97.2) | 1270 (97.3) | 586 (97.7) | 527 (98.5) | 1153 (93.5) | 539 (95.9) | 787 (89.5) | 215 (88.8) | 514 (88.5) | 329 (84.4) | 475 (96.7) | 82 (86.3) | 283 (67.4) | 210 (75.0) | 1076 (85.5) | |
TRUE | 855 (8.6) | 29 (2.8) | 35 (2.7) | 14 (2.3) | 8 (1.5) | 80 (6.5) | 23 (4.1) | 92 (10.5) | 27 (11.2) | 67 (11.5) | 61 (15.6) | 16 (3.3) | 13 (13.7) | 137 (32.6) | 70 (25.0) | 183 (14.5) | |
cancer comorbidity (%) | < 0.001 | ||||||||||||||||
FALSE | 9831 (99.1) | 1052 (99.9) | 1300 (99.6) | 600 (100.0) | 534 (99.8) | 1228 (99.6) | 560 (99.6) | 865 (98.4) | 240 (99.2) | 577 (99.3) | 389 (99.7) | 490 (99.8) | 92 (96.8) | 400 (95.2) | 275 (98.2) | 1229 (97.6) | |
TRUE | 94 (0.9) | 1 (0.1) | 5 (0.4) | 0 (0.0) | 1 (0.2) | 5 (0.4) | 2 (0.4) | 14 (1.6) | 2 (0.8) | 4 (0.7) | 1 (0.3) | 1 (0.2) | 3 (3.2) | 20 (4.8) | 5 (1.8) | 30 (2.4) | |
total number of surgeries (%) | < 0.001 | ||||||||||||||||
0 | 9292 (93.6) | 1040 (98.8) | 1302 (99.8) | 596 (99.3) | 534 (99.8) | 1221 (99.0) | 557 (99.1) | 857 (97.5) | 227 (93.8) | 563 (96.9) | 380 (97.4) | 490 (99.8) | 0 (0.0) | 397 (94.5) | 214 (76.4) | 914 (72.6) | |
1 | 544 (5.5) | 13 (1.2) | 3 (0.2) | 4 (0.7) | 1 (0.2) | 12 (1.0) | 5 (0.9) | 22 (2.5) | 14 (5.8) | 18 (3.1) | 10 (2.6) | 1 (0.2) | 44 (46.3) | 23 (5.5) | 65 (23.2) | 309 (24.5) | |
2 | 59 (0.6) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 1 (0.4) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 21 (22.1) | 0 (0.0) | 1 (0.4) | 36 (2.9) | |
3 | 13 (0.1) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 13 (13.7) | 0 (0.0) | 0 (0.0) | 0 (0.0) | |
4 | 2 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 2 (2.1) | 0 (0.0) | 0 (0.0) | 0 (0.0) | |
5 | 15 (0.2) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 15 (15.8) | 0 (0.0) | 0 (0.0) | 0 (0.0) | |
history of smoking (%) | < 0.001 | ||||||||||||||||
FALSE | 9480 (95.5) | 1052 (99.9) | 1302 (99.8) | 600 (100.0) | 535 (100.0) | 1217 (98.7) | 562 (100.0) | 836 (95.1) | 234 (96.7) | 572 (98.5) | 380 (97.4) | 486 (99.0) | 73 (76.8) | 327 (77.9) | 216 (77.1) | 1088 (86.4) | |
TRUE | 445 (4.5) | 1 (0.1) | 3 (0.2) | 0 (0.0) | 0 (0.0) | 16 (1.3) | 0 (0.0) | 43 (4.9) | 8 (3.3) | 9 (1.5) | 10 (2.6) | 5 (1.0) | 22 (23.2) | 93 (22.1) | 64 (22.9) | 171 (13.6) | |
history of drinking (%) | < 0.001 | ||||||||||||||||
FALSE | 9594 (96.7) | 1052 (99.9) | 1302 (99.8) | 598 (99.7) | 535 (100.0) | 1221 (99.0) | 560 (99.6) | 841 (95.7) | 235 (97.1) | 572 (98.5) | 382 (97.9) | 487 (99.2) | 83 (87.4) | 367 (87.4) | 231 (82.5) | 1128 (89.6) | |
TRUE | 331 (3.3) | 1 (0.1) | 3 (0.2) | 2 (0.3) | 0 (0.0) | 12 (1.0) | 2 (0.4) | 38 (4.3) | 7 (2.9) | 9 (1.5) | 8 (2.1) | 4 (0.8) | 12 (12.6) | 53 (12.6) | 49 (17.5) | 131 (10.4) | |
history of medication use (%) | <0.001 | ||||||||||||||||
none | 7846 (79.1) | 950 (90.2) | 951 (72.9) | 498 (83.0) | 465 (86.9) | 919 (74.5) | 435 (77.4) | 638 (72.6) | 161 (66.5) | 436 (75.0) | 281 (72.1) | 379 (77.2) | 83 (87.4) | 363 (86.4) | 251 (89.6) | 1036 (82.3) | |
often | 1546 (15.6) | 56 (5.3) | 258 (19.8) | 58 (9.7) | 42 (7.9) | 242 (19.6) | 92 (16.4) | 205 (23.3) | 67 (27.7) | 113 (19.4) | 88 (22.6) | 73 (14.9) | 8 (8.4) | 49 (11.7) | 25 (8.9) | 170 (13.5) | |
sometime | 533 (5.4) | 47 (4.5) | 96 (7.4) | 44 (7.3) | 28 (5.2) | 72 (5.8) | 35 (6.2) | 36 (4.1) | 14 (5.8) | 32 (5.5) | 21 (5.4) | 39 (7.9) | 4 (4.2) | 8 (1.9) | 4 (1.4) | 53 (4.2) | |
total number of core symptoms (%) | < 0.001 | ||||||||||||||||
0 | 3133 (31.6) | 149 (14.2) | 222 (17.0) | 66 (11.0) | 55 (10.3) | 226 (18.3) | 139 (24.7) | 274 (31.2) | 76 (31.4) | 128 (22.0) | 110 (28.2) | 129 (26.3) | 75 (78.9) | 384 (91.4) | 261 (93.2) | 839 (66.6) | |
1 | 5712 (57.6) | 736 (69.9) | 914 (70.0) | 437 (72.8) | 385 (72.0) | 777 (63.0) | 386 (68.7) | 521 (59.3) | 148 (61.2) | 394 (67.8) | 239 (61.3) | 333 (67.8) | 16 (16.8) | 29 (6.9) | 17 (6.1) | 380 (30.2) | |
2 | 1080 (10.9) | 168 (16.0) | 169 (13.0) | 97 (16.2) | 95 (17.8) | 230 (18.7) | 37 (6.6) | 84 (9.6) | 18 (7.4) | 59 (10.2) | 41 (10.5) | 29 (5.9) | 4 (4.2) | 7 (1.7) | 2 (0.7) | 40 (3.2) | |
total number of psychological symptoms (%) | <0.001 | ||||||||||||||||
0 | 6963 (70.2) | 667 (63.3) | 1008 (77.2) | 390 (65.0) | 373 (69.7) | 881 (71.5) | 0 (0.0) | 638 (72.6) | 186 (76.9) | 447 (76.9) | 279 (71.5) | 349 (71.1) | 81 (85.3) | 377 (89.8) | 248 (88.6) | 1039 (82.5) | |
1 | 2234 (22.5) | 329 (31.2) | 263 (20.2) | 195 (32.5) | 143 (26.7) | 305 (24.7) | 161 (28.6) | 199 (22.6) | 48 (19.8) | 118 (20.3) | 81 (20.8) | 109 (22.2) | 12 (12.6) | 41 (9.8) | 29 (10.4) | 201 (16.0) | |
2 | 676 (6.8) | 53 (5.0) | 33 (2.5) | 15 (2.5) | 19 (3.6) | 47 (3.8) | 360 (64.1) | 41 (4.7) | 8 (3.3) | 16 (2.8) | 29 (7.4) | 31 (6.3) | 2 (2.1) | 1 (0.2) | 2 (0.7) | 19 (1.5) | |
3 | 50 (0.5) | 4 (0.4) | 1 (0.1) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 39 (6.9) | 1 (0.1) | 0 (0.0) | 0 (0.0) | 1 (0.3) | 2 (0.4) | 0 (0.0) | 1 (0.2) | 1 (0.4) | 0 (0.0) | |
4 | 2 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 2 (0.4) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | |
total number of physical symptoms (%) | <0.001 | ||||||||||||||||
0 | 4006 (40.4) | 952 (90.4) | 28 (2.1) | 179 (29.8) | 0 (0.0) | 630 (51.1) | 193 (34.3) | 310 (35.3) | 77 (31.8) | 136 (23.4) | 108 (27.7) | 319 (65.0) | 62 (65.3) | 258 (61.4) | 158 (56.4) | 596 (47.3) | |
1 | 4141 (41.7) | 96 (9.1) | 743 (56.9) | 345 (57.5) | 425 (79.4) | 475 (38.5) | 258 (45.9) | 377 (42.9) | 122 (50.4) | 289 (49.7) | 197 (50.5) | 142 (28.9) | 28 (29.5) | 120 (28.6) | 91 (32.5) | 433 (34.4) | |
2 | 1445 (14.6) | 5 (0.5) | 406 (31.1) | 71 (11.8) | 103 (19.3) | 104 (8.4) | 91 (16.2) | 153 (17.4) | 32 (13.2) | 130 (22.4) | 70 (17.9) | 29 (5.9) | 5 (5.3) | 36 (8.6) | 26 (9.3) | 184 (14.6) | |
3 | 300 (3.0) | 0 (0.0) | 110 (8.4) | 5 (0.8) | 7 (1.3) | 23 (1.9) | 18 (3.2) | 34 (3.9) | 10 (4.1) | 22 (3.8) | 14 (3.6) | 1 (0.2) | 0 (0.0) | 6 (1.4) | 5 (1.8) | 45 (3.6) | |
4 | 32 (0.3) | 0 (0.0) | 18 (1.4) | 0 (0.0) | 0 (0.0) | 1 (0.1) | 2 (0.4) | 4 (0.5) | 1 (0.4) | 4 (0.7) | 1 (0.3) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 1 (0.1) | |
5 | 1 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 1 (0.1) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) | |
vital signs, median (IQR) | |||||||||||||||||
body temperature | 36.50 [36.40,36.70] | 36.50 [36.40, 36.70] | 36.50 [36.40, 36.70] | 36.60 [36.50, 36.70] | 36.50 [36.40, 36.70] | 36.50 [36.40, 36.70] | 36.50 [36.40, 36.70] | 36.50 [36.40, 36.70] | 36.50 [36.40, 36.60] | 36.50 [36.40, 36.70] | 36.50 [36.40, 36.70] | 36.50 [36.40, 36.70] | 36.50 [36.30, 36.70] | 36.50 [36.30, 36.60] | 36.50 [36.30, 36.70] | 36.50 [36.30, 36.70] | < 0.001 |
pulse | 80.00 [72.00, 88.00] | 80.00 [76.00, 90.00] | 80.00 [72.00, 86.00] | 80.00 [75.00, 90.00] | 80.00 [76.00, 89.00] | 80.00 [72.00, 87.00] | 80.00 [72.00, 87.75] | 80.00 [73.00, 87.00] | 80.00 [75.00, 89.00] | 80.00 [71.00, 88.00] | 80.00 [74.00, 87.00] | 80.00 [72.00, 90.00] | 78.00 [68.00, 84.50] | 80.00 [70.00, 85.00] | 78.00 [70.75, 86.25] | 80.00 [72.00, 88.00] | < 0.001 |
breath | 20.00 [20.00, 20.00] | 20.00 [20.00, 20.00] | 20.00 [20.00, 20.00] | 20.00 [20.00, 20.00] | 20.00 [20.00, 20.00] | 20.00 [20.00, 20.00] | 20.00 [20.00, 20.00] | 20.00 [20.00, 20.00] | 20.00 [20.00, 20.00] | 20.00 [20.00, 20.00] | 20.00 [20.00, 20.00] | 20.00 [20.00, 20.00] | 20.00 [20.00, 20.00] | 20.00 [19.00, 20.00] | 20.00 [19.75, 20.00] | 20.00 [20.00, 20.00] | 0.005 |
SBP | 120.00 [111.00, 132.00] | 116.00 [109.00, 122.00] | 120.00 [111.00, 130.00] | 120.00 [111.00, 126.00] | 115.00 [108.00, 121.00] | 122.00 [112.00, 136.00] | 120.00 [110.00, 132.00] | 125.00 [114.00, 138.00] | 125.00 [117.25, 138.75] | 120.00 [113.00, 132.00] | 126.00 [115.25, 137.00] | 115.00 [108.00, 124.00] | 120.00 [110.00, 130.00] | 130.00 [120.00, 146.00] | 128.50 [117.00, 143.00] | 123.00 [112.00, 136.00] | < 0.001 |
DBP | 77.00 [70.00, 84.00] | 76.00 [68.00, 80.00] | 76.00 [70.00, 83.00] | 77.00 [71.00, 84.00] | 75.00 [69.00, 82.00] | 77.00 [70.00, 85.00] | 77.00 [70.00, 84.00] | 77.00 [70.00, 85.00] | 78.00 [73.25, 85.00] | 77.00 [70.00, 85.00] | 76.00 [70.25, 84.00] | 76.00 [68.00, 82.00] | 74.00 [64.50, 82.00] | 77.50 [70.00, 86.00] | 79.00 [72.00, 86.00] | 78.00 [70.00, 85.00] | < 0.001 |
admission age, median (IQR) | 46.00 [31.00, 60.00] | 18.00 [16.00, 22.00] | 49.00 [42.00, 59.00] | 38.00 [29.00, 46.00] | 25.00 [19.00, 31.00] | 50.00 [40.00, 64.00] | 44.00 [33.00, 55.00] | 56.00 [45.00, 65.00] | 56.00 [46.00, 65.00] | 50.00 [40.00, 61.00] | 63.00 [48.00, 72.00] | 25.00 [19.00, 33.00] | 52.00 [39.50, 64.50] | 73.00 [63.00, 79.00] | 64.50 [51.75, 74.00] | 51.00 [43.00, 63.00] | < 0.001 |
length of stay, median (IQR) | 15.00 [11.00, 21.00] | 15.00 [11.00, 21.00] | 16.00 [13.00, 21.00] | 16.00 [12.00, 21.00] | 14.00 [11.00, 19.00] | 17.00 [12.00, 22.00] | 17.00 [13.00, 21.75] | 17.00 [11.00, 22.50] | 18.00 [13.00, 24.75] | 16.00 [13.00, 22.00] | 18.00 [13.00, 25.00] | 16.00 [12.00, 22.00] | 14.00 [9.50, 23.50] | 12.00 [8.00, 21.00] | 12.00 [8.00, 20.00] | 13.00 [8.00, 19.00] | < 0.001 |
aContinuous variables are presented as medians with interquartile ranges.
Fig. 4. Organization diagram illustrating the organizational structure among 5 categories and 15 clusters.
A Organization diagram illustrating the discovery cohort, along with brief descriptions highlighting their distinctive features. B Organization diagram illustrating the validation cohort.
Fig. 5. Chord diagram and Radar Chart illustrating all 15 clusters.
A Chord diagram illustrating all 15 clusters (Edges in the lower half of the diagram symbolize clusters, while those in the upper half represent variables engaged in the clustering process. All connections follow a bottom-up direction, signifying that wider upper ends denote a higher significance of a variable within a cluster. Additionally, we employ a range of colors for the connections to enhance clarity and facilitate better distinction between them). B Radar Chart for Cluster A1. C Radar Chart for Cluster A2. D Radar Chart for Cluster A3. E Radar Chart for Cluster A4.
Fig. 6.
Symptom patterns for depression by phenotypes.
Fig. 7. Overview of comorbidity patterns for depression by phenotypes.
A A Summary of comorbidity-specific phenotypes for depression. (B–P) comorbidity network for each specific cluster is presented in Supplementary Information 2.3.
Category A: low comorbidity with suicide, psychotic and somatic symptoms
Cluster A1: younger low comorbidity depression phenotype with prominent suicide risk
In this phenotype (n = 1053), the average age of patients was notably young, at only 19.29 years old (median [IQR]: 18 [16,22]). Among them, 44.5% were under 18 years old, and 55.4% were between 18 and 35 years old. The majority were unmarried, with 71.5% being students. Primarily diagnosed with depression, 40.9% were identified as having severe depression. The comorbidity rate within this phenotype was relatively low (13.2%), encompassing mainly respiratory and digestive system diseases, as well as signs and abnormal clinical and laboratory findings (refer to Figure S1-A1). Importantly, 100% of these patients had no comorbidity related to mental and behavioral disorders or neurological diseases. Interestingly, this group exhibited the lowest frequency of previous medication use and reported the lowest incidence of somatic symptoms, such as Sleep Problems and Aspecific Somatization. Conversely, the incidence of psychological symptoms was relatively high, particularly evidenced by a suicide rate of 14.91% (Fig. 5B).
Cluster A2: middle-aged low comorbidity depression phenotype with prominent somatic symptoms
The identified cluster (n = 1305) emerged as the largest within the dataset, characterized by the highest proportion of female patients (965/1305; 73.9%). The median age for this phenotype was 49 years (IQR [42, 59]), with 74% falling within the age range of 36 to 59 years. Notably, the divorce rate was the highest among all 15 phenotypes, standing at 9.7%. Predominantly diagnosed with depression, the patients in this phenotype exhibited a low comorbidity rate (15.9%, as illustrated in Figure S1-A2). Strikingly, 100% of them reported no comorbidity related to mental and behavioral disorders, endocrine/metabolic, circulatory, digestive, and neurological diseases. Diverging from Cluster A1, individuals in this phenotype primarily presented with the highest incidence of somatic symptoms, notably including sleep problems (1173/1305; 89.89%) and aspecific somatization (833/1305; 63.83%) (Fig. 5C). This distinctive profile underscores the unique clinical characteristics of this particular cluster within the broader cohort.
Cluster A3: middle-aged low comorbidity severe depression phenotype with prominent psychotic symptoms
Within this phenotype comprising 600 patients, the median age was observed to be 38 years (IQR [29,46]). The primary diagnosis for all individuals in this group was depression, with 99.5% classified as major depressive disorder and 0.5% as moderate depressive disorder. In line with Cluster A2, patients in this phenotype exhibited a low comorbidity rate (15.2%, as depicted in Figure S1-A3). Similarly, 100% of the patients also reported no comorbidity related to mental and behavioral disorders, endocrine/metabolic, circulatory, digestive, and neurological diseases. Diverging from Cluster A2, individuals within this phenotype displayed a relatively high incidence of psychological symptoms, particularly in the category of Psychotic Symptoms (120/600; 20.0%). In contrast to Cluster A1, this cluster reported a significantly higher occurrence of somatic symptoms (Fig. 5D).
Cluster A4: younger low comorbidity depression phenotype with prominent sleep problems
Patients in this phenotype (n = 535) exhibited a median age of 25 years (IQR [19,31]), with almost all individuals falling below the age of 36. Predominantly diagnosed with depression, this phenotype displayed the lowest comorbidity rate (47/535; 8.79%, as illustrated in Figure S1-A4). In comparison to patients from other phenotypes, an overwhelming majority within this phenotype reported sleep problems (524/535; 97.94%). Contrasting with Cluster A2, a lower proportion of patients reported aspecific somatization (155/535; 28.97%) (Fig. 5E). Notably, this particular phenotype reported the highest rates of core symptoms, including feelings of “Depressed” and “Decreased Interests.”
Category B: moderate comorbidity with anhedonia and anxious symptoms
Cluster B1: middle-older depression phenotype with prominent decreased interests or anhedonia
The identified cluster (n = 1233) ranked as the third largest within the dataset, featuring a median age of 50 years (IQR [40,64]). Almost the entirety of this patient group (99.84%) received a primary diagnosis of depression. Notably, 584 patients (47.36%) within this phenotype presented comorbidities, with a significant prevalence of circulatory system diseases (436/1233; 35.36%). However, 100% of them reported no comorbidities related to the neurological, endocrine, and digestive systems. Figure S1-B1 illustrates that hypertension (I10), cerebral infarction/other cerebrovascular diseases (I63, I67), and chronic ischemic heart disease (I25) were the primary comorbidities in this phenotype. In comparison to patients from other phenotypes, those in this phenotype exhibited the highest incidence of the “Decreased Interests” category of symptoms (252/1233; 20.44%). It is noteworthy that, unlike other phenotypes, the manifestation of both somatic and psychological symptoms among patients in this cluster was not prominent.
Cluster B2: middle-aged depression phenotype with prominent anxious
Within this phenotype (n = 562), the median age of patients was 44 years (IQR [33,55]). A significant majority (89.68%) of these individuals received a primary diagnosis of depression. Notably, 326 patients (58.00%) reported comorbidities, including mental and behavioral disorders (31.14%), circulatory system diseases (13.88%), and digestive system diseases (13.35%). Figure S1-B2 reveals that emotional disorders with onset specific to childhood (F93), anxiety disorders (F41), hypertension (I10), and gastritis (K29) were commonly co-occurrent with depression in this phenotype. The distinguishing characteristic of this phenotype lies in the fact that 100% of the patients reported the symptom of tension. Furthermore, they exhibited the highest rate within the category of “Anxious” symptoms, encompassing tension, worry, upset, afraid, and related manifestations.
Category C: endocrine, nutritional, metabolic, and digestive comorbidity
Cluster C1: middle-older depression-endocrine, nutritional, and metabolic comorbidity phenotype
Within this phenotype (n = 879), the median age of patients was 56 years (IQR [45,65]). The majority (78.73%) of these individuals received a primary diagnosis of depression, while the other patients were hospitalized this time for non-depression diseases with depression as a supplementary diagnosis. A distinctive characteristic of these patients was the absolute prevalence (100%) of depression-endocrine, nutritional, and metabolic system disease diagnosis pairs, while being completely free from neurological and digestive disorders. Comorbidities within other disease systems were also prevalent in this phenotype, particularly in the circulatory system (307/879; 34.93%) and mental and behavioral disorders (93/879; 10.58%). Figure S1-C1 highlights that diabetes mellitus (E10-E14), disorders of the thyroid gland (E00-E07), malnutrition (E43-E46), disorders of lipoprotein metabolism and other lipidemia (E78), other disorders of fluid, electrolyte, and acid-base balance (E87), hypertension (I10), chronic ischemic heart disease (I25), and anxiety disorders (F41) were the primary comorbid conditions associated with this phenotype.
Cluster C2: middle-older depression-endocrine, nutritional, and metabolic-digestive comorbidity phenotype
Within this phenotype (n = 242), the median age of patients was 56 years (IQR [46,65]). Analogous to Cluster C1, the majority of these individuals (84.71%) received a primary diagnosis of depression. A notable characteristic of these patients was the absolute prevalence (100%) of depression-endocrine, nutritional, and metabolic-digestive system disease diagnosis pairs. Each individual in this phenotype suffered from at least three diseases, while remaining completely free of neurological disorders. Comorbidities in the circulatory system were also common in this phenotype (38.4%). Figure S1-C2 highlights representative comorbidities, including fatty liver/other specified diseases of the liver (K76), gastritis (K29), constipation (K59.0), diabetes mellitus (E10, E11, E14), disorders of lipoprotein metabolism and other lipidemias (E78), other disorders of fluid, electrolyte, and acid-base balance (E87), disorders of the thyroid gland (E03, E04, E07), and hypertension (I10). In comparison to other phenotypes, these patients exhibited the highest frequency of medication treatment (33.5%) and the longest hospitalization time (median [IQR]: 18 [13, 24.75] days).
Cluster C3: middle-older depression-digestive comorbidity phenotype
This phenotype comprised 581 individuals with a median age of 50 years (IQR [40,61]). Aligned with the characteristics of Cluster C1 and Cluster C2, a predominant 91.91% of these patients received a primary diagnosis of depression. A notable feature of this phenotype was the absolute prevalence of depression-digestive system disease pairs, reaching 100%. Remarkably, they exhibited a conspicuous absence of diseases within the endocrine, nutritional, metabolic and neurological systems. Common comorbidities with depression in this phenotype included gastritis (K29), fatty liver/other specified diseases of the liver (K76), constipation (K59.0), cholelithiasis (K80), other diseases of the stomach and duodenum (K31), and hypertension (I10) (Figure S1-C3).
Category D: neurological, mental and behavioral comorbidity
Cluster D1: older depression-neurological comorbidity phenotype
Patients within this phenotype (n = 390) exhibited a median age of 63 years (IQR [48,72]). Notably, several key characteristics distinguished this phenotype. Firstly, all individuals (100%) presented with depression-neurological system disease diagnosis pairs. Secondly, the primary diagnosis for this cohort was mental and behavioral disorders, with 95.9% specifically diagnosed with depression. For the remaining 16 individuals, who were primarily diagnosed with other mental disorders, depression served as a supplementary diagnosis. Thirdly, a higher prevalence of the “Anxious” category of symptoms was observed among these patients (97/390; 24.87%) compared to most other phenotypes, except for Cluster B2.
Comorbidities across various disease systems were notably prevalent in this phenotype, including the circulatory system (163/390; 41.79%), the endocrine system (99/390; 25.38%), and the respiratory system (61/390; 15.64%). Refer to Figure S1-D1 for a visual representation. Common comorbid conditions with depression in this phenotype encompassed degenerative disease of the nervous system (G31.9), sleep disorders (G47), Parkinson’s disease/secondary parkinsonism (G20, G21), demyelinating disease of the central nervous system, unspecified (G37.8, G37.9), other disorders of the brain (G93), epilepsy (G40), Alzheimer’s disease (G30), hypertension (I10), and cerebral infarction (I63).
Reflecting the relatively compromised health status of this phenotype, they ranked as the third oldest cluster. Additionally, they exhibited higher rates of prior frequent medication use (109/390; 28%) and the longest length of stay (median [IQR]: 18 [13,25] days) compared to other phenotypes.
Cluster D2: younger depression-mental and behavioral comorbidity phenotype
The 491 individuals within this cluster exhibited a median age of 25 years (IQR [29,46]). Predominantly diagnosed with mental and behavioral disorders, 60.29% of these patients were identified as having depression, with a notable 40.9% presenting with severe depression, similar to Cluster A1.
A distinctive feature of this phenotype was the remarkably high prevalence (94.5%) of depression-mental and behavioral disease diagnosis pairs. Importantly, they were nearly devoid of neurological, digestive, circulatory, respiratory, and other diseases, as depicted in Figure S1-D2. Common comorbidities in this phenotype included anxiety disorders (F41), obsessive-compulsive disorder (F42), mental and behavioral disorders due to the use of alcohol (F10), eating disorders (F50), persistent mood/affective disorders (F34), reaction to severe stress, and adjustment disorders (F43), as well as phobic anxiety disorders (F40).
Regarding symptomology, this phenotype exhibited the highest incidence of symptoms categorized as “Retardation or Psychomotor Impairment” (139/491; 28.31%), “Addictive Behavior” (45/491; 9.16%), and “Eating Disorder” (52/491; 10.59%). This comprehensive picture underscores the distinct clinical profile of this patient subset.
Category E: other diseases comorbid depression
Cluster E1: middle-older surgery-depression comorbidity phenotype
The 95 individuals within this cluster displayed an average age of 51.65 years (median [IQR]: 52 [39.5, 64.5]), with 35.8% surpassing 60 years and 45.3% falling within the 36 to 59 years range. A significant characteristic of this phenotype is that every patient (100%) underwent surgery, with all of them experiencing more than 2 intraoperative procedures.
The majority of these patients (82/95; 86.32%) primarily received diagnoses related to diseases of the circulatory system (23/95; 24.21%), the musculoskeletal system and connective tissue (15/95; 15.79%), and malignant tumor diseases (14/95; 14.74%). Consequently, depression served as their supplementary diagnosis, as illustrated in Figure S1-E1. This cluster exhibited higher rates of the “Aspecific Somatization” category of symptoms (72/95; 75.79%) and also had the highest occurrences of current smoking, painful expression, and transfer to other care units among the 15 clusters. This distinct clinical profile emphasizes the intricate interplay between surgical interventions, circulatory and musculoskeletal diseases, and mental health in this patient subset.
Cluster E2: older circulatory-depression comorbidity phenotype
This patient cluster (n = 420) represents the oldest cohort, with a median age of 73 years (IQR, 63–79 years), and a notable 82.1% of individuals exceeding 60 years. In alignment with Cluster E1, the majority (95%) of these patients received primary diagnoses related to non-depression diseases, encompassing the circulatory, neurological, and respiratory systems, with depression serving as a supplementary diagnosis (refer to Figure S1-E2).
A defining characteristic of these patients was the absolute prevalence (100%) of depression-circulatory system disease diagnosis pairs, coupled with their complete absence from neurological and digestive disorders. Notably, they exhibited the highest rates of being widowed, presenting acute and chronic sick faces, possessing a moderate nutritional status, having a history of prior surgeries, and experiencing the “Aspecific Somatization” category of symptoms among the 15 clusters. Reflecting the presence of numerous somatic comorbidities and an overall compromised health status, 19.3% of these patients presented with an abnormal gait. This comprehensive profile underscores the complex interplay between age, circulatory comorbidities, and mental health in this specific patient subset.
Cluster E3: older neurological-depression comorbidity phenotype
This patient cluster (n = 280) represents the second oldest cohort, with individuals having a median age of 64.5 years (IQR, 51.75–74). Distinguishing itself from other clusters, this group included a higher percentage of male patients (164/280; 58.57%). Similar to Clusters E1 and E2, a significant 97.86% of these patients received main diagnoses related to non-depression diseases, with depression serving as a supplementary diagnosis.
A salient characteristic of these patients was the absolute prevalence (100%) of depression-nervous system disease diagnosis pairs, with all of them being diagnosed as first-episode depression. Considerable comorbidities in other disease systems were evident in this phenotype, as depicted in Figure S1-E3.
Reflecting the presence of numerous somatic comorbidities and an overall poor health status, 25.36% of these patients exhibited abnormal expressions (apathetic, painful, worried), 45.71% presented with a sick face (acute and chronic), and they had the highest rates of abnormal gait (26.4%), passive body position (8.2%), smoking and alcoholism status (40.36%) among all 15 clusters. Additionally, this cluster reported the highest rates of the “Movement Impairment” category (68/280; 24.29%) and the “Lack of Energy” (57/280; 20.36%) of symptoms, including stiffness, numbness, inability to walk, impaired mobility, fatigue, weakness, and low energy, compared to most other phenotypes.
Cluster E4: middle-older uncharacterized depression phenotype
This cluster (n = 1259) constituted the second-largest group, with a median age of 51 years (IQR, 43–63 years). A substantial majority, comprising 70.69% of these patients, received primary diagnoses related to non-depression diseases, with depression serving as a supplementary diagnosis (refer to Figure S1-E4). Nearly all patients primarily diagnosed with depression presented with at least one comorbidity of mental and behavioral disorders. The chief complaint within this cluster was predominantly characterized by aspecific somatization symptoms.
Phenotype differences in laboratory biomarkers
An analysis of differences in laboratory biomarkers among clusters revealed significant variations between phenotypes (refer to Table S8 and Figure S3 in the Supplementary Information). Cluster D1 exhibited higher occurrences of chlorine and calcium levels above the normal range compared to other clusters. Furthermore, their median values for red blood cell count (6.40 [5.20, 7.50]), average red blood cell hemoglobin (HGB) (30.70 [29.70, 31.60]), and average red blood cell volume (93.40 [90.60, 96.05]) surpassed those of all other clusters (p < 0.001).
While comorbidities were infrequent in Cluster A1, laboratory markers played a more prominent role across all phenotypes. This cluster demonstrated higher occurrences of absolute lymphocyte value, urobilinogen, and urine protein above the normal range compared to other clusters (p < 0.001). These patients presented with the highest median values for various laboratory biomarkers (e.g., erythrocyte count, proportion of monocytes, lymphocyte proportion and absolute value, proportion of eosinophils, hemoglobin, total protein, albumin, potassium, white blood cell ratio, uric acid, calcium, anion gap, conductivity) and the lowest median values for others (e.g., average erythrocyte HGB, average erythrocyte volume, alanine aminotransferase, urea, total bilirubin, indirect bilirubin, cholesterol, and low-density lipoprotein) (p < 0.001).
Treatment patterns by phenotypes
The variations in treatment patterns across clusters are illustrated in Fig. 8, Table 2, and Supplementary Information 2.2’s Tables S5 and S6. A summary of the treatment patterns for each phenotype within the five categories is provided below.
Fig. 8. Treatment patterns for depression by phenotypes.
A The prevalence of each medication, physiotherapy, and psychotherapy by clusters. B The prevalence of each combination of drug types by clusters.
Table 2.
Differences in Treatment Patterns, Stratified by Clustersb.
Characteristic (mean (SD)) | Cluster A1 n = 1053 |
Cluster A2 n = 1305 |
Cluster A3 n = 600 |
Cluster A4 n = 535 |
Cluster B1 n = 1233 |
Cluster B2 n = 562 |
Cluster C1 n = 879 |
Cluster C2 n = 242 |
Cluster C3 n = 581 |
Cluster D1 n = 390 |
Cluster D2 n = 491 |
Cluster E1 n = 95 |
Cluster E2 n = 420 |
Cluster E3 n = 280 |
Cluster E4 n = 1259 |
p-value |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sum of medical orders | 497.25 (237.76) | 553.31 (235.32) | 551.72 (267.26) | 468.66 (239.60) | 604.95 (312.75) | 583.63 (266.03) | 723.06 (451.31) | 814.13 (411.17) | 667.41 (337.72) | 708.67 (314.07) | 601.03 (374.11) | 980.73 (986.80) | 907.82 (1096.21) | 786.72 (802.57) | 631.14 (663.39) | < 0.001 |
Sum of antidepressant orders | 23.42 (13.43) | 25.48 (15.20) | 23.89 (11.94) | 23.28 (16.68) | 25.78 (17.02) | 26.84 (14.48) | 23.98 (18.63) | 24.56 (17.20) | 25.54 (15.53) | 24.79 (14.83) | 24.77 (21.14) | 14.84 (17.60) | 13.18 (17.96) | 12.50 (13.03) | 16.83 (16.71) | < 0.001 |
Number of amitriptyline orders | 0.25 (2.50) | 0.80 (4.43) | 0.28 (2.50) | 0.23 (2.21) | 0.58 (3.73) | 0.08 (1.27) | 0.30 (2.24) | 0.41 (4.12) | 0.46 (3.74) | 0.41 (2.99) | 0.13 (1.32) | 0.63 (3.31) | 0.14 (1.41) | 0.25 (3.30) | 0.25 (2.11) | < 0.001 |
Number of escitalopram orders | 0.88 (4.53) | 0.80 (3.98) | 0.91 (4.35) | 0.81 (4.14) | 1.11 (4.72) | 1.65 (5.68) | 1.46 (5.51) | 2.02 (6.70) | 1.87 (6.38) | 1.63 (5.71) | 0.59 (3.42) | 1.07 (4.42) | 1.09 (4.51) | 1.40 (4.17) | 0.89 (4.12) | < 0.001 |
Number of duloxetine orders | 0.32 (3.15) | 0.22 (2.30) | 0.64 (3.70) | 0.19 (2.26) | 0.62 (4.39) | 0.55 (4.38) | 0.39 (3.07) | 0.81 (5.27) | 0.77 (4.64) | 0.38 (2.63) | 0.35 (2.92) | 1.56 (8.65) | 0.20 (1.73) | 0.26 (1.54) | 0.41 (2.61) | 0.001 |
Number of fluoxetine orders | 2.66 (7.43) | 2.05 (6.73) | 2.85 (7.79) | 3.56 (8.93) | 2.84 (8.37) | 1.21 (5.43) | 2.14 (7.26) | 1.83 (7.79) | 1.27 (5.20) | 1.51 (6.25) | 3.72 (9.08) | 0.91 (4.58) | 0.65 (3.71) | 1.01 (4.76) | 1.07 (5.67) | < 0.001 |
Number of clomipramine orders | 0.18 (2.24) | 0.09 (1.20) | 0.00 (0.00) | 0.22 (2.09) | 0.19 (2.50) | 0.34 (2.87) | 0.27 (3.43) | 0.08 (1.22) | 0.02 (0.47) | 0.07 (0.93) | 0.57 (4.69) | 0.02 (0.21) | 0.03 (0.54) | 0.00 (0.00) | 0.38 (3.68) | 0.001 |
Number of paroxetine orders | 1.85 (7.03) | 7.34 (11.86) | 3.94 (9.37) | 4.15 (8.85) | 7.05 (12.55) | 9.23 (12.46) | 6.70 (12.57) | 4.39 (9.43) | 5.08 (10.56) | 4.99 (10.57) | 4.40 (10.98) | 2.37 (7.36) | 2.81 (8.77) | 2.11 (7.53) | 4.04 (9.26) | < 0.001 |
Number of sertraline orders | 11.01 (12.99) | 3.73 (9.67) | 3.85 (9.61) | 5.32 (10.46) | 4.27 (10.32) | 5.40 (11.95) | 5.19 (11.11) | 6.11 (11.81) | 5.99 (12.27) | 8.00 (12.71) | 7.30 (12.58) | 3.66 (9.03) | 6.55 (14.00) | 5.83 (10.58) | 4.90 (10.33) | < 0.001 |
Number of venlafaxine orders | 0.37 (4.30) | 1.08 (6.05) | 0.62 (5.34) | 1.01 (5.82) | 0.78 (5.99) | 1.05 (6.92) | 0.41 (3.24) | 0.67 (4.16) | 0.42 (3.40) | 0.45 (3.54) | 0.79 (5.01) | 0.97 (3.28) | 0.51 (6.97) | 0.06 (0.74) | 0.55 (4.20) | 0.012 |
Number of citalopram orders | 0.06 (1.05) | 0.70 (4.24) | 0.19 (1.86) | 0.16 (1.65) | 0.77 (4.39) | 0.23 (2.36) | 0.67 (4.24) | 0.69 (4.05) | 0.36 (2.91) | 0.75 (4.73) | 0.08 (1.12) | 0.00 (0.00) | 0.00 (0.00) | 0.00 (0.00) | 0.16 (1.83) | < 0.001 |
Number of doxepin orders | 0.04 (0.66) | 0.56 (3.20) | 0.07 (1.12) | 0.33 (2.97) | 0.24 (2.47) | 0.53 (3.22) | 0.35 (3.23) | 0.37 (3.10) | 0.08 (1.00) | 0.25 (2.30) | 0.08 (0.92) | 0.00 (0.00) | 0.05 (0.65) | 0.01 (0.08) | 0.27 (2.46) | < 0.001 |
Number of venlafaxine hydrochloride orders | 5.45 (11.28) | 8.08 (12.91) | 10.51 (13.49) | 7.17 (13.64) | 7.15 (12.63) | 6.38 (12.38) | 5.87 (11.99) | 7.01 (12.59) | 9.15 (14.86) | 6.15 (12.06) | 6.15 (14.68) | 3.00 (12.72) | 0.75 (4.24) | 0.88 (4.45) | 3.55 (9.69) | < 0.001 |
Sum of antipsychotic orders | 17.49 (19.78) | 17.30 (14.76) | 19.39 (16.09) | 14.89 (15.43) | 16.98 (18.50) | 15.34 (15.27) | 15.95 (18.43) | 14.45 (15.64) | 16.29 (15.35) | 16.93 (18.35) | 21.46 (31.89) | 7.15 (17.10) | 4.73 (14.24) | 3.46 (9.40) | 8.31 (14.79) | < 0.001 |
Number of olanzapine orders | 6.82 (12.23) | 8.42 (10.91) | 9.76 (11.89) | 6.99 (9.84) | 8.21 (11.81) | 7.69 (10.92) | 6.79 (12.04) | 5.83 (10.26) | 9.13 (12.63) | 6.97 (11.35) | 8.52 (15.48) | 4.88 (13.34) | 3.45 (12.99) | 2.49 (8.10) | 4.25 (9.42) | < 0.001 |
Number of aripiprazole orders | 0.66 (3.63) | 0.02 (0.42) | 0.40 (2.76) | 0.17 (2.40) | 0.15 (1.75) | 0.27 (2.43) | 0.19 (2.14) | 0.27 (2.26) | 0.20 (1.93) | 0.16 (1.75) | 1.10 (5.08) | 0.00 (0.00) | 0.04 (0.73) | 0.00 (0.00) | 0.15 (1.98) | < 0.001 |
Number of quetiapine fumrate orders | 4.72 (10.04) | 3.59 (8.31) | 3.63 (8.24) | 3.82 (8.65) | 2.84 (7.82) | 3.75 (8.89) | 3.27 (8.86) | 3.68 (8.65) | 3.16 (8.30) | 3.31 (8.35) | 4.28 (10.39) | 0.73 (4.13) | 0.56 (3.05) | 0.39 (2.79) | 1.65 (5.91) | < 0.001 |
Number of haloperidol for injection orders | 0.09 (0.64) | 0.02 (0.22) | 0.05 (0.87) | 0.04 (0.45) | 0.04 (0.65) | 0.00 (0.04) | 0.03 (0.38) | 0.01 (0.14) | 0.02 (0.26) | 0.02 (0.19) | 0.07 (0.42) | 0.11 (0.47) | 0.06 (0.36) | 0.06 (0.48) | 0.08 (1.08) | 0.064 |
Number of tiapride orders | 0.15 (1.68) | 0.73 (3.86) | 0.22 (1.90) | 0.21 (2.25) | 0.77 (3.86) | 0.27 (2.54) | 0.78 (3.92) | 1.01 (4.96) | 0.73 (3.77) | 1.14 (5.18) | 0.46 (2.57) | 0.00 (0.00) | 0.12 (1.54) | 0.09 (0.88) | 0.43 (3.00) | < 0.001 |
Number of clozapine orders | 0.71 (5.31) | 1.15 (5.64) | 1.01 (5.22) | 0.72 (5.02) | 0.82 (5.59) | 0.55 (3.33) | 0.93 (5.49) | 0.61 (3.89) | 0.34 (2.36) | 0.81 (4.76) | 2.22 (16.02) | 0.00 (0.00) | 0.02 (0.34) | 0.04 (0.60) | 0.41 (4.07) | <0.001 |
Number of risperidone orders | 1.29 (6.67) | 0.07 (1.02) | 1.16 (6.73) | 0.33 (3.34) | 0.34 (3.07) | 0.27 (2.93) | 0.54 (4.67) | 0.35 (3.15) | 0.09 (1.15) | 0.26 (2.53) | 1.14 (6.40) | 0.32 (2.18) | 0.06 (1.09) | 0.00 (0.00) | 0.19 (2.54) | < 0.001 |
Number of sulpiride orders | 0.31 (4.22) | 0.14 (1.46) | 0.19 (2.01) | 0.16 (2.30) | 0.36 (2.71) | 0.21 (3.49) | 0.19 (1.87) | 0.22 (1.98) | 0.04 (0.81) | 0.36 (3.99) | 0.38 (4.78) | 0.00 (0.00) | 0.05 (0.93) | 0.00 (0.00) | 0.11 (1.60) | 0.183 |
Number of sulpiride tablets (from another manufacture) orders | 0.14 (1.80) | 0.04 (0.78) | 0.14 (1.76) | 0.17 (2.31) | 0.12 (1.89) | 0.06 (1.06) | 0.17 (1.97) | 0.15 (1.46) | 0.18 (2.24) | 0.20 (2.08) | 0.09 (1.06) | 0.00 (0.00) | 0.05 (0.66) | 0.00 (0.00) | 0.01 (0.26) | 0.272 |
Number of sulpiride for injection orders | 0.16 (1.47) | 0.07 (0.89) | 0.22 (1.80) | 0.07 (1.22) | 0.16 (1.54) | 0.05 (0.91) | 0.11 (1.51) | 0.08 (0.91) | 0.13 (1.41) | 0.00 (0.00) | 0.13 (1.32) | 0.00 (0.00) | 0.00 (0.10) | 0.00 (0.00) | 0.09 (1.49) | 0.166 |
Number of quetiapine orders | 1.84 (6.11) | 3.03 (7.72) | 2.11 (5.92) | 2.03 (6.33) | 3.07 (8.75) | 1.73 (5.70) | 2.82 (8.30) | 1.89 (6.88) | 2.18 (6.74) | 3.55 (9.21) | 1.79 (6.30) | 0.53 (4.07) | 0.30 (2.80) | 0.22 (2.23) | 0.85 (4.23) | < 0.001 |
Sum of anxiolytic orders | 18.12 (17.18) | 36.43 (21.66) | 26.60 (19.89) | 26.62 (22.01) | 35.03 (24.47) | 38.88 (21.18) | 32.62 (26.21) | 33.92 (24.46) | 36.47 (23.77) | 35.65 (26.51) | 26.62 (29.23) | 14.32 (16.96) | 14.67 (23.94) | 12.80 (15.24) | 20.93 (22.68) | < 0.001 |
Number of alprazolam orders | 10.43 (13.34) | 14.07 (18.47) | 11.69 (15.57) | 10.91 (17.29) | 15.22 (19.13) | 15.74 (17.98) | 15.86 (19.82) | 14.92 (18.33) | 16.86 (18.75) | 17.20 (21.28) | 13.05 (18.66) | 7.63 (12.38) | 8.56 (18.81) | 6.84 (10.94) | 9.87 (15.61) | < 0.001 |
Number of estazolam orders | 0.10 (1.04) | 0.15 (1.54) | 0.15 (1.13) | 0.23 (1.73) | 0.17 (1.64) | 0.24 (2.29) | 0.24 (2.36) | 0.13 (1.06) | 0.30 (2.31) | 0.56 (3.24) | 0.09 (0.98) | 1.05 (5.38) | 1.79 (11.63) | 1.36 (7.69) | 0.51 (3.22) | < 0.001 |
Number of diazepam orders | 0.11 (1.89) | 0.06 (1.06) | 0.09 (1.08) | 0.05 (1.03) | 0.09 (1.60) | 0.16 (1.97) | 0.41 (2.97) | 0.74 (4.22) | 0.57 (3.54) | 0.66 (3.53) | 2.18 (10.75) | 0.13 (0.51) | 0.23 (2.09) | 0.39 (2.81) | 0.68 (3.83) | < 0.001 |
Number of diazepam for injection orders | 0.08 (0.45) | 0.01 (0.18) | 0.04 (0.57) | 0.04 (0.34) | 0.03 (0.31) | 0.02 (0.17) | 0.03 (0.20) | 0.08 (0.52) | 0.06 (0.52) | 0.08 (0.94) | 0.08 (0.60) | 0.20 (0.82) | 0.04 (0.25) | 0.13 (1.00) | 0.10 (0.74) | < 0.001 |
Number of lorazepam orders | 1.56 (5.65) | 2.98 (8.64) | 1.39 (5.12) | 1.36 (4.81) | 3.69 (9.80) | 2.25 (8.22) | 3.49 (10.31) | 3.83 (10.60) | 3.04 (8.30) | 4.75 (11.32) | 1.63 (5.68) | 2.05 (7.82) | 1.53 (7.19) | 0.87 (4.48) | 1.97 (7.37) | < 0.001 |
Number of clonazepam orders | 5.00 (11.02) | 17.30 (18.34) | 11.48 (15.73) | 13.29 (16.14) | 13.65 (18.36) | 18.05 (20.50) | 10.23 (17.27) | 10.57 (16.70) | 12.41 (18.42) | 10.04 (17.75) | 8.51 (15.63) | 1.98 (5.67) | 1.85 (7.08) | 2.14 (6.70) | 6.74 (14.23) | < 0.001 |
Number of clonazepam for injection orders | 0.10 (0.59) | 0.14 (1.18) | 0.14 (0.93) | 0.19 (1.34) | 0.13 (1.05) | 0.07 (0.71) | 0.11 (0.94) | 0.06 (0.50) | 0.07 (0.56) | 0.05 (0.35) | 0.19 (1.33) | 0.00 (0.00) | 0.02 (0.26) | 0.00 (0.00) | 0.03 (0.25) | < 0.001 |
Number of midazolam for injection orders | 0.01 (0.08) | 0.01 (0.12) | 0.01 (0.11) | 0.01 (0.11) | 0.03 (0.30) | 0.03 (0.18) | 0.02 (0.14) | 0.16 (0.39) | 0.18 (0.43) | 0.08 (0.98) | 0.02 (0.15) | 0.84 (3.91) | 0.09 (0.83) | 0.14 (0.99) | 0.11 (0.48) | < 0.001 |
Number of tandospirone orders | 0.56 (3.28) | 0.75 (3.92) | 1.18 (4.92) | 0.21 (1.77) | 1.12 (5.14) | 1.83 (6.30) | 1.26 (5.05) | 2.35 (6.67) | 1.78 (6.02) | 1.01 (4.50) | 0.55 (3.32) | 0.15 (1.44) | 0.13 (1.61) | 0.14 (1.56) | 0.44 (3.14) | < 0.001 |
Number of ezopiclone orders | 0.16 (1.58) | 0.52 (3.35) | 0.36 (2.73) | 0.19 (1.77) | 0.49 (3.19) | 0.24 (2.16) | 0.62 (3.74) | 0.88 (3.94) | 1.07 (4.76) | 0.98 (4.20) | 0.11 (1.19) | 0.17 (1.23) | 0.25 (1.75) | 0.49 (2.62) | 0.22 (1.88) | < 0.001 |
Sum of mood stabilizer orders | 3.19 (8.11) | 0.97 (4.88) | 1.86 (6.09) | 2.17 (6.95) | 1.66 (6.71) | 1.71 (6.07) | 1.56 (5.84) | 1.50 (5.85) | 1.44 (5.98) | 2.11 (6.92) | 4.84 (10.83) | 0.33 (1.94) | 0.64 (5.68) | 2.06 (7.43) | 1.40 (6.24) | < 0.001 |
Number of sodium valproate orders | 1.52 (5.23) | 0.40 (3.07) | 1.12 (4.44) | 1.08 (4.41) | 0.71 (4.87) | 0.87 (4.40) | 0.78 (4.14) | 0.67 (3.79) | 0.56 (3.48) | 1.02 (4.31) | 2.51 (6.70) | 0.08 (0.82) | 0.41 (5.48) | 0.91 (4.00) | 0.82 (5.08) | < 0.001 |
Number of lamotrigine orders | 0.77 (4.72) | 0.26 (3.00) | 0.26 (2.84) | 0.74 (4.52) | 0.46 (3.29) | 0.41 (3.06) | 0.20 (2.16) | 0.14 (1.73) | 0.14 (1.66) | 0.19 (1.77) | 0.51 (4.23) | 0.00 (0.00) | 0.04 (0.73) | 0.00 (0.00) | 0.03 (0.69) | < 0.001 |
Number of topiramate orders | 0.01 (0.18) | 0.04 (0.91) | 0.07 (1.11) | 0.00 (0.00) | 0.03 (0.71) | 0.17 (1.94) | 0.16 (2.01) | 0.39 (2.80) | 0.23 (1.99) | 0.39 (3.12) | 0.66 (4.98) | 0.00 (0.00) | 0.00 (0.10) | 0.48 (2.74) | 0.23 (2.22) | < 0.001 |
Number of lithium carbonate orders | 0.80 (3.95) | 0.23 (2.14) | 0.36 (2.49) | 0.30 (2.99) | 0.37 (2.83) | 0.21 (1.87) | 0.23 (2.06) | 0.07 (1.09) | 0.27 (2.81) | 0.14 (1.61) | 0.55 (3.98) | 0.07 (0.72) | 0.03 (0.68) | 0.00 (0.00) | 0.03 (0.82) | < 0.001 |
Sum of anti-side effects drugs orders | 3.88 (10.37) | 4.88 (10.74) | 4.67 (9.24) | 2.91 (10.18) | 6.51 (13.88) | 5.08 (11.88) | 9.23 (17.54) | 9.29 (14.44) | 7.91 (12.88) | 9.92 (16.65) | 5.71 (14.39) | 8.65 (17.24) | 17.99 (35.04) | 11.91 (19.59) | 6.48 (13.59) | < 0.001 |
Number of aspirin orders | 0.00 (0.06) | 0.24 (2.22) | 0.03 (0.73) | 0.00 (0.04) | 1.07 (4.89) | 0.39 (4.04) | 1.60 (6.06) | 0.95 (4.03) | 0.29 (2.23) | 1.93 (6.76) | 0.02 (0.37) | 2.47 (6.96) | 4.84 (10.62) | 2.67 (7.01) | 1.50 (5.64) | < 0.001 |
Number of atorvastatin calcium orders | 0.01 (0.28) | 0.10 (1.41) | 0.01 (0.37) | 0.00 (0.00) | 0.76 (4.06) | 0.41 (3.48) | 2.11 (7.26) | 1.89 (5.72) | 0.31 (2.19) | 1.53 (6.09) | 0.12 (1.48) | 1.69 (6.49) | 5.67 (13.94) | 3.19 (8.36) | 1.29 (5.10) | < 0.001 |
Number of benhexol orders | 2.33 (7.47) | 0.25 (2.08) | 1.59 (6.03) | 0.71 (4.00) | 0.67 (4.13) | 0.83 (4.23) | 0.96 (4.87) | 0.42 (2.63) | 0.41 (2.63) | 1.19 (5.24) | 2.90 (10.00) | 0.45 (3.53) | 0.23 (1.68) | 0.43 (2.72) | 0.31 (2.57) | < 0.001 |
Number of bisacodyl orders | 0.80 (3.37) | 2.11 (5.84) | 1.72 (4.65) | 1.19 (5.78) | 1.78 (5.21) | 1.46 (4.44) | 1.86 (5.19) | 1.74 (4.91) | 2.58 (6.54) | 2.81 (7.37) | 1.29 (5.37) | 0.18 (1.19) | 0.17 (1.32) | 0.11 (1.10) | 0.78 (3.59) | < 0.001 |
Number of polyethylene glycol orders | 0.45 (2.19) | 0.96 (3.64) | 0.72 (2.89) | 0.46 (2.45) | 0.99 (3.96) | 1.12 (3.91) | 1.08 (4.26) | 1.78 (6.31) | 1.78 (4.56) | 1.15 (3.95) | 0.52 (2.44) | 1.24 (4.54) | 2.69 (10.04) | 2.81 (8.79) | 0.98 (4.03) | < 0.001 |
Number of glycerin enema orders | 0.08 (0.43) | 0.15 (0.54) | 0.14 (0.62) | 0.09 (0.61) | 0.18 (0.66) | 0.14 (0.51) | 0.28 (2.19) | 0.29 (1.04) | 0.24 (0.71) | 0.19 (0.67) | 0.15 (0.59) | 0.16 (0.57) | 1.83 (15.17) | 0.76 (5.30) | 0.29 (1.49) | < 0.001 |
Number of Maren Maru orders | 0.12 (1.20) | 0.23 (1.75) | 0.12 (1.02) | 0.09 (0.90) | 0.36 (2.48) | 0.22 (1.86) | 0.37 (2.44) | 0.21 (1.74) | 0.25 (1.90) | 0.47 (2.75) | 0.09 (1.10) | 0.71 (2.56) | 0.25 (3.94) | 0.38 (3.16) | 0.22 (1.78) | 0.006 |
Number of mosapride orders | 0.09 (1.06) | 0.84 (4.25) | 0.34 (2.37) | 0.36 (2.48) | 0.70 (3.93) | 0.51 (3.12) | 0.96 (4.41) | 2.00 (6.57) | 2.05 (6.23) | 0.65 (3.26) | 0.62 (3.40) | 1.75 (6.30) | 2.31 (8.29) | 1.55 (5.59) | 1.10 (4.88) | < 0.001 |
Number of zopiclone orders | 0.14 (1.51) | 1.17 (4.49) | 0.66 (3.44) | 0.49 (2.68) | 0.90 (4.62) | 0.29 (2.10) | 1.07 (4.67) | 1.31 (5.11) | 0.59 (3.31) | 0.86 (4.26) | 0.33 (2.71) | 0.01 (0.10) | 0.76 (3.92) | 0.66 (2.80) | 0.37 (2.63) | < 0.001 |
Sum of β receptor blocker orders | 2.40 (6.79) | 3.96 (9.29) | 3.17 (7.60) | 3.07 (8.01) | 4.31 (9.88) | 5.37 (9.97) | 3.87 (9.36) | 3.57 (9.25) | 3.10 (8.12) | 3.79 (9.85) | 3.58 (10.46) | 2.46 (6.42) | 5.05 (15.68) | 1.15 (4.69) | 3.41 (10.11) | < 0.001 |
Number of metoprolol succinate orders | 0.03 (0.73) | 0.14 (1.48) | 0.24 (2.28) | 0.07 (1.09) | 0.58 (3.67) | 0.43 (2.83) | 0.77 (3.95) | 1.05 (4.95) | 0.40 (2.96) | 0.82 (4.83) | 0.15 (1.47) | 1.45 (4.68) | 2.11 (8.04) | 0.55 (3.19) | 0.63 (3.64) | < 0.001 |
Number of metoprolol tartrate orders | 0.02 (0.50) | 0.35 (2.72) | 0.11 (1.33) | 0.15 (1.81) | 0.52 (3.57) | 0.21 (2.32) | 0.44 (3.16) | 0.42 (4.31) | 0.27 (2.70) | 0.08 (1.15) | 0.26 (4.42) | 0.09 (0.92) | 1.64 (12.24) | 0.19 (2.21) | 0.46 (5.96) | < 0.001 |
Number of propranolol orders | 2.35 (6.75) | 3.47 (8.69) | 2.82 (7.18) | 2.85 (7.65) | 3.20 (8.80) | 4.74 (9.49) | 2.65 (8.09) | 2.11 (6.40) | 2.43 (7.13) | 2.85 (8.63) | 3.18 (9.51) | 0.38 (3.39) | 0.35 (2.58) | 0.27 (2.54) | 2.18 (7.31) | < 0.001 |
Sum of hormonal drugs orders | 0.09 (1.29) | 0.23 (2.05) | 0.09 (1.34) | 0.21 (2.26) | 0.21 (2.41) | 0.26 (2.59) | 2.02 (7.61) | 1.91 (6.47) | 0.35 (2.96) | 0.24 (2.14) | 0.38 (3.65) | 0.54 (2.76) | 1.03 (7.94) | 0.80 (4.37) | 0.32 (2.72) | < 0.001 |
Sum of Chinese patent medicines orders | 0.57 (3.43) | 4.03 (9.05) | 1.03 (4.48) | 1.29 (4.88) | 3.74 (9.00) | 4.04 (8.79) | 3.77 (9.49) | 4.49 (10.12) | 3.15 (8.05) | 3.67 (9.14) | 1.03 (4.51) | 0.12 (1.03) | 0.69 (3.74) | 0.55 (2.74) | 1.58 (6.19) | < 0.001 |
Sum of physiotherapy orders | 15.31 (33.30) | 19.83 (23.62) | 27.28 (40.50) | 15.48 (25.70) | 20.01 (28.08) | 23.07 (27.11) | 17.30 (29.14) | 22.21 (33.77) | 23.69 (33.81) | 20.59 (27.22) | 21.40 (39.15) | 18.78 (48.66) | 5.64 (28.18) | 1.65 (8.76) | 9.59 (20.92) | < 0.001 |
Number of multi-parameter biofeedback therapy orders | 0.32 (2.08) | 0.32 (2.06) | 0.40 (2.07) | 0.28 (1.73) | 0.31 (2.61) | 0.80 (3.35) | 0.23 (1.79) | 0.48 (1.78) | 0.56 (2.60) | 0.40 (2.21) | 0.53 (2.32) | 0.00 (0.00) | 0.01 (0.20) | 0.00 (0.00) | 0.31 (2.37) | < 0.001 |
Number of modified electroconvulsive therapy orders | 8.56 (30.76) | 1.99 (14.74) | 13.89 (37.25) | 3.48 (21.24) | 4.53 (21.24) | 2.86 (17.27) | 3.47 (20.79) | 4.25 (24.72) | 4.84 (26.81) | 2.46 (17.46) | 10.17 (35.29) | 15.09 (41.74) | 0.00 (0.00) | 0.35 (4.90) | 1.20 (11.82) | < 0.001 |
Number of transcranial magnetic stimulation therapy orders | 0.04 (0.33) | 0.12 (0.58) | 0.12 (0.53) | 0.06 (0.39) | 0.11 (0.58) | 0.12 (0.54) | 0.05 (0.34) | 0.16 (0.65) | 0.15 (0.69) | 0.13 (0.69) | 0.10 (0.51) | 0.08 (0.50) | 0.02 (0.27) | 0.01 (0.24) | 0.15 (0.79) | < 0.001 |
Number of electroencephalographic biofeedback therapy orders | 6.34 (13.11) | 17.30 (19.24) | 12.87 (16.49) | 11.54 (16.54) | 14.94 (18.58) | 19.21 (20.23) | 13.49 (20.03) | 17.27 (21.93) | 18.10 (20.29) | 17.40 (21.56) | 10.56 (16.87) | 3.60 (13.04) | 5.52 (28.00) | 1.29 (7.22) | 7.83 (15.44) | < 0.001 |
Sum of psychotherapy orders | 5.80 (5.43) | 2.40 (3.37) | 3.43 (4.20) | 4.89 (4.75) | 2.46 (3.66) | 4.14 (4.85) | 1.90 (3.37) | 2.30 (3.52) | 2.26 (3.53) | 2.00 (3.07) | 4.74 (5.60) | 0.91 (3.75) | 0.13 (0.90) | 0.07 (0.57) | 1.14 (2.84) | < 0.001 |
bContinuous variables are presented as mean and standard deviation.
Category A
In Cluster A1, a predominant majority of patients utilized a combination of ADP with AA, AP, and PSY. Notably, sertraline, quetiapine fumarate, risperidone, lamotrigine, lithium carbonate, and psychotherapy emerged as the most frequently prescribed treatments within this phenotype, surpassing other clusters in utilization rates. In Cluster A2, AA, CM, clonazepam, doxepin hydrochloride, amitriptyline hydrochloride, and clozapine exhibited the highest utilization rates compared to other phenotypes. A distinct treatment profile emerged for this cluster, showcasing unique preferences in medication utilization. Similarly, Cluster A3 mirrored Cluster A1 in utilizing ADP in combination with AA, AP, and PSY. Noteworthy variations included the highest utilization rates of AP, olanzapine, sulpiride for injection, venlafaxine hydrochloride, and modified electroconvulsive therapy within this cluster. Cluster A4’s treatment pattern closely resembled that of Cluster A3, with a shared preference for ADP in combination with AA, AP, and PSY. However, patients in Cluster A4 demonstrated the second-highest rate of utilization of psychotherapy among the clusters, differentiating their treatment approach within this phenotype.
Category B
The treatment profile for Cluster B1 involved the widespread use of AA and ADP, combined with AP, PHY, and PSY. However, no specific treatment stood out with the highest utilization rate among these patients. In parallel, Cluster B2 mirrored Cluster B1 in employing AA and ADP, along with AP, PHY, and PSY. Notably, Cluster B2 held the highest rate of ADP, OT, and PHY utilization among all phenotypes. This included medications such as paroxetine hydrochloride, propranolol hydrochloride, and therapeutic approaches like electroencephalographic biofeedback therapy and multi-parameter biofeedback therapy.
Category C
Within Cluster C1, levothyroxine sodium exhibited the highest utilization rate among patients. For Cluster C2, almost all individuals utilized AA and ADP in combination with AP and PSY, and this cluster had the highest rate of usage for citalopram hydrobromide, tandospirone citrate, and zopiclone tartrate. The treatment pattern of Cluster C3 closely resembles that of Cluster B1 and Cluster B2, where alprazolam, bisacodyl, and transcranial magnetic stimulation therapy were most commonly employed. This similarity suggests a shared therapeutic approach among these clusters, emphasizing the relevance of these specific treatments within their respective phenotypes.
Category D
In Cluster D1, nearly all patients relied on AA and ADP, with over 70% of them incorporating AP into their treatment regimen. Moreover, more than 50% of these patients underwent electroencephalographic biofeedback therapy and psychotherapy. This particular phenotype exhibited the highest utilization rates for several drugs compared to other phenotypes, including tiapride hydrochloride, lorazepam, sulpiride, ezopiclone, and quetiapine. As for Cluster D2, MSB, sodium valproate, aripiprazole, fluoxetine hydrochloride, clomipramine hydrochloride, diazepam, clonazepam for injection, and benhexol hydrochloride emerged as prominent treatments within this phenotype, distinguishing them from other clusters. This delineation highlights the unique therapeutic approach adopted by patients in Cluster D2.
Category E
In Cluster E1, duloxetine hydrochloride enteric-coated capsules, venlafaxine hydrochloride capsules, diazepam for injection, midazolam for injection, haloperidol for injection, and metoprolol succinate were notably more prevalent than in other phenotypes. For Cluster E2, patients exhibited the highest utilization of ASE (aspirin, atorvastatin calcium, polyethylene glycol, mosapride) and metoprolol tartrate. Cluster E3 mirrored Cluster E2 closely, with sertraline and ASE being the most frequently employed treatments. As for the uncharacterized depression phenotype (Cluster E4), there is no discernible characteristic that distinguishes its treatment pattern, suggesting a lack of specific trends or preferences within this particular cluster.
Moreover, Fig. 8B illustrates a compilation of frequently used drug combinations, specifically within the categories of ADP, AA, AP, MSB, ASE, OT, HYP, and CM, consistently appearing in the top 10 across different clusters. Broadly, the most prevalent drug combinations included antidepressants paired with anxiolytics and antipsychotics, as well as combinations of antidepressants with anxiolytics. Notably, Cluster A1, Cluster A3, Cluster A4, and Cluster D1 exhibited the highest utilization of the combination of antidepressants and antipsychotics. This observation supports the presence of prominent psychotic symptoms or comorbidities with other psychiatric disorders in these phenotypes. Additionally, mood stabilizers were extensively used in these four clusters. For phenotypes primarily diagnosed with depression and comorbid with somatic illness (Cluster D1, Cluster B1, Cluster C1, Cluster C2, and Cluster C3), or those reporting prominent physical symptoms (Cluster A2), combinations of antidepressants, anxiolytics, antipsychotics, and anti-side effects drugs were frequently employed. This underscores the nuanced treatment approaches tailored to the distinct characteristics of each cluster.
Association with clinical outcomes
The Kaplan-Meier curves depicted in Fig. 9, Figure S4–S6, and Table 3 illustrate the occurrence of psychiatric readmission within various time windows (60-day, 90-day, 180-day, and 365-day), revealing variations across the 5 phenotypic categories and 15 clusters. Among different clusters, Cluster D1 showed the highest occurrence of 60-day psychiatric readmission, followed by Cluster D2, Cluster A3, and Cluster A1 (refer to Table 3). Remarkably, patients in Category D had the highest risk of psychiatric rehospitalization within all follow-up time windows. Even after adjusting for comorbidities, medications, and age, differences in outcomes across phenotype categories and clusters remained evident (P < 0.001; see Table S9 in Supplementary Information 2.5). In comparison with Category E, the adjusted risks of 60-day psychiatric readmission were significantly higher in the other four phenotype categories (Category C: HR, 1.57; 95% CI, 1.07–2.30; Category B: HR, 1.61; 95% CI, 1.10–2.40; Category A: HR, 1.82; 95% CI, 1.28–2.60; and Category D: HR, 2.38; 95% CI, 1.59–3.60; P < 0.05). Similar results were observed for the adjusted risks of psychiatric readmission within other follow-up time windows (90-day, 180-day, and 365-day) as outlined in Table 3. The risk of psychiatric rehospitalization for Category C rose from the fourth place within 60-day to the third place within 90-day, then to the second place within 180-day and 365-day, compared to the other four phenotypic categories. On the contrary, the risk of psychiatric rehospitalization for Category A ranked second within 60 and 90 days, dropped to third within 180 days, and then dropped to fourth within 365 days.
Fig. 9. Kaplan-Meier curves for 60-day psychiatric rehospitalization.
A 60-day psychiatric rehospitalization stratified by 5 phenotype categories. B 60-day psychiatric rehospitalization stratified by 15 clusters. C Forest plot adjusted by treatment patterns.
Table 3.
Crude Clinical Outcomes Event Rates and Hazard Ratio Values, Stratified by Phenotype Categories and Clusters.
Event Rates | 60-day psychiatric readmission | 90-day psychiatric readmission | 180-day psychiatric readmission | 365-day psychiatric readmission |
---|---|---|---|---|
A: Low Comorbidity | 5.38% | 6.10% | 8.56% | 12.00% |
B: Moderate Comorbidity | 4.96% | 6.07% | 8.41% | 12.59% |
C: Endocrine/Digestive Comorbidity | 4.70% | 5.99% | 8.75% | 12.51% |
D: Neurological/Mental Comorbidity | 6.92% | 7.83% | 11.01% | 15.55% |
E: Other Diseases Comorbid Depression | 2.19% | 2.58% | 3.94% | 6.23% |
p-value | 0.003207 | 0.002618 | 0.002110 | 0.001497 |
Cluster A1 | 5.60% | 6.27% | 7.98% | 10.92% |
Cluster A2 | 5.52% | 5.98% | 9.66% | 13.79% |
Cluster A3 | 6.17% | 7.33% | 8.33% | 11.67% |
Cluster A4 | 3.74% | 4.67% | 7.29% | 10.09% |
Cluster B1 | 5.43% | 6.73% | 9.33% | 13.63% |
Cluster B2 | 3.91% | 4.63% | 6.41% | 10.32% |
Cluster C1 | 4.89% | 6.03% | 8.19% | 11.15% |
Cluster C2 | 4.96% | 6.20% | 8.68% | 11.98% |
Cluster C3 | 4.30% | 5.85% | 9.64% | 14.80% |
Cluster D1 | 7.18% | 7.69% | 10.26% | 13.85% |
Cluster D2 | 6.72% | 7.94% | 11.61% | 16.90% |
Cluster E1 | 3.16% | 4.21% | 4.21% | 7.37% |
Cluster E2 | 1.90% | 2.38% | 3.10% | 4.52% |
Cluster E3 | 1.07% | 1.07% | 2.14% | 2.50% |
Cluster E4 | 2.46% | 2.86% | 4.61% | 7.55% |
p-value | 1.22E-07 | 6.63E-08 | 6.13E-08 | 4.22E-08 |
HR Values (95% CI) | 60-day psychiatric readmission | 90-day psychiatric readmission | 180-day psychiatric readmission | 365-day psychiatric readmission |
E: Other Diseases Comorbid Depression | Reference | Reference | Reference | Reference |
A: Low Comorbidity | 1.82 (1.28, 2.59) | 1.75 (1.26, 2.42) | 1.63 (1.25, 2.12) | 1.42 (1.15, 1.77) |
B: Moderate Comorbidity | 1.61 (1.10, 2.36) | 1.68 (1.18, 2.38) | 1.53 (1.15, 2.05) | 1.46 (1.15, 1.84) |
C: Endocrine/Digestive Comorbidity | 1.57 (1.07, 2.31) | 1.71 (1.20, 2.42) | 1.67 (1.26, 2.22) | 1.50 (1.19, 1.89) |
D: Neurological/Mental Comorbidity | 2.38 (1.59, 3.56) | 2.28 (1.57, 3.32) | 2.13 (1.56, 2.90) | 1.89 (1.47, 2.44) |
p-value | 5e-14 | 3e-16 | < 2e-16 | < 2e-16 |
Based on physicians’ clinical observations, we have the following interpretation for these findings: Category C, the phenotype of depression comorbid with endocrine, nutritional, metabolic, and digestive diseases, predominantly manifests as physical comorbidities in the early follow-up period (60 days, 90 days), with relatively mild psychiatric symptoms, resulting in a low risk of psychiatric readmission. However, as the follow-up time increases, the psychiatric symptoms of patients within this phenotype continue to worsen under the influence of physical comorbidities, leading to an increase in the risk of psychiatric readmission. Conversely, Category A, the phenotype of depression with primarily psychiatric symptoms and fewer physical comorbidities exhibits a high risk of psychiatric readmission in the early follow-up period. Nevertheless, with continued psychiatric interventions and treatment, the risk of psychiatric readmission for patients within this phenotype steadily decreases over time. This underscores the potential association between phenotypic categories and patient prognosis, necessitating a more refined management of depression patients, especially considering the different phenotypic presentations. Management should be tailored to accommodate diverse characteristics of phenotypic clusters, thereby enhancing the efficacy of clinical interventions and optimizing clinical outcomes.
Validation
Details of 5 phenotypic categories including 15 clusters from the validation data set and corresponding mapping relationship between discovery cohort were given in Fig. 4B, Figure S7 and Table S10 in the Supplementary Information. The identified clusters from the validation cohort were either found to be almost identical to clusters identified in the main analysis using the discovering cohort or found to share specific features with them. This result indicated that the depression phenotypes we identified in the discovery data set are robust in another independent new population.
Discussion
Main findings
The current study was one of the first to investigate comorbidity-specific phenotypes of depression. We applied cluster analysis to a large, region-based cohort of 11818 inpatients with depression to identify alternative and informative clinical phenotyping. The study produced four major findings. First, the results of cluster analysis successfully recognize the identification of 5 clinically relevant and recognizable phenotypic categories, which included 15 multifactorial-driven clusters. Second, conventional measures of depression, such as course-based subtypes and core symptoms included in the diagnosis system (depressed mood, interest loss, afraid, upset, worry, flustered, physical discomfort, dizzy, hallucination, etc.), did not drive cluster formation. Instead, we identified the utility of novel variables encompassing various aspects of patients. We found that comorbidity across multiple brain and body systems (nervous, psychiatric, digestive, endocrine, circulatory, respiratory and cancer) combined with age at onset, severity, sleep problems, suicide risk, psychotic symptoms selectively drove the formation of the 15 clusters. Third, treatment patterns (prescribed drugs, physiotherapies, psychotherapies, and combination of drug types) and clinical presentations (i.e., symptomatology, physiological and biological characteristics), varied among the clusters. Finally, these distinct phenotypic categories and clusters were associated with different risks for psychiatric readmission within 60-day, 90-day, 180-day, and 365-day follow-up.
Using a phenotype-mapping algorithm, we were able to leverage the rich patient-level variables in our depression cohort and find patterns of association, which allowed a novel and robust grouping of patients with depression. Surprisingly, conventional measures of depression, did not emerge as defining characteristics of these clusters; rather, the identified clusters often emphasized overlooked variables in previous classification tasks, such as the presence of multimorbidity across multiple brain and body systems, including the brain, kidneys, heart, lungs, liver, skin and blood, musculoskeletal, etc. We derived the first whole-body multimorbidity characterization of depression phenotypes. It has now been established that depression manifests in highly diverse ways among individuals, particularly in terms of comorbid diseases, and its heterogeneity is not always fully captured by conventional classification systems. A number of studies have documented the relatively poor treatment response in depressed patients with comorbid medical disorders, particularly in the elderly [54]. Our comorbidity-specific phenotypes, combined with age and symptom characteristics, enabled us to illuminate unique transdiagnostic mechanisms and the underlying etiologies for each phenotype, paving the way for potentially phenotype-specific treatment strategies. These phenotype deviations predict clinical outcomes, even after controlling for age, comorbidities, and medications. This indicates that our phenotypes could thus be used to identify individuals’ phenotype at admission, before treatment allocation, who may benefit from early effective interventions aimed at improving the clinical outcomes (i.e., responsive rate to treatments, rehospitalization risk).
Association between depression phenotypes and a wide range of psychiatric and somatic diseases
Our analyses, for the first time, identified 5 stable phenotypic categories based on the similarities of the affected disease systems, revealing relationships between depression phenotypes and various psychiatric and somatic diseases. Specifically, patients in Category A had the least comorbidities, with only a few individuals having comorbidities with diseases of the metabolic system (e.g., fatty liver, hypothyroidism), chronic inflammation (e.g., chronic superficial gastritis, asthma, COPD, chronic rhinitis, ethmoidal sinusitis), the musculoskeletal system and connective tissue (e.g., spondylosis, cervical disc disorders, intervertebral disc disorders). None of the 4 clusters in Category A had neurological, mental and behavioral comorbidities. Patients in Category B suffered from moderate comorbidities, mainly including diseases of the cardiometabolic system (e.g., hypertension, chronic ischemic heart disease, diabetes mellitus, cerebral infarction/other cerebrovascular diseases), mental and behavioral disorders (e.g., childhood emotional disorders, anxiety disorders), and chronic inflammation (e.g., COPD, gastritis).
Category C, Category D and Category E had the most comorbidities. All patients in Category C had comorbidities, mainly including diseases of the cardiometabolic system (e.g., diabetes, hypertension, chronic ischemic heart disease, disorders of lipoprotein metabolism and other lipidemia, other disorders of fluid, electrolyte, and acid-base balance, disorders of thyroid gland), digestive system (e.g., fatty liver, gastrointestinal diseases like gastritis and enteritis, constipation, cholelithiasis). A small number of patients also had comorbidities of mental and behavioral disorders (e.g., anxiety disorders). None of the three clusters in Category C had neurological comorbidities. All patients in Category D had comorbidities, mainly including diseases of the neurological system (e.g., degenerative disease of nervous system, sleep disorders, Parkinson’s disease, demyelinating disease of central nervous system, epilepsy, Alzheimer’s disease), and mental and behavioral disorders (e.g., anxiety disorders, obsessive-compulsive disorder, mental and behavioral disorders due to use of alcohol, eating disorders, persistent mood/affective disorders, reaction to severe stress, and adjustment disorders, and phobic anxiety disorders). A few patients also had comorbidities related to the cardiometabolic system (e.g., hypertension, cerebral infarction). Patients in Category E were mainly diagnosed with non-depression diseases, and depression was a supplementary diagnosis. Neurological, circulatory, and surgery-related diseases were the main comorbid conditions for this category.
Previous studies have shown that depression was two to three times more likely in individuals with multimorbidity compared to those without multimorbidity or those with no chronic physical conditions [55]. Depression is also associated with an increased risk of various comorbid diseases [22–31]. Our analysis found that diseases of the cardiometabolic system, chronic inflammation, digestive system, neurological system, along with mental and behavioral disorders, may be key pathways linking depression with a wide range of other psychiatric and somatic diseases and conditions. Also, distinct disease systems and their combinations drive the formation of depression phenotypes. Our results are consistent with previous reports [22–32]. For example, a large community-based cohort study using UK Biobank data identified three main clusters of diseases after depression (i.e., cardiometabolic diseases, chronic inflammatory diseases, and diseases related to tobacco abuse) [32]. An EHRs-based clustering study in New York City, USA derived three depression phenotypes, reporting a cardiometabolic-comorbid phenotype (i.e., hyperlipidemia, hypertension, and diabetes), a chronic inflammatory and pain-comorbid phenotype (i.e., asthma, fibromyalgia, and chronic pain and fatigue), and a phenotype related to anxiety and tobacco use [38]. Moreover, a previous study using UK Biobank data demonstrated causal links between depression and 22 phenotypically associated medical conditions, including anxiety, sleep disorders, inflammatory and hemorrhagic gastrointestinal diseases, the urinary system, asthma and painful respiration, lipid metabolism and ischemic heart disease [56].
Notably, in addition to consistently reported associations with many common diseases, these results should not be directly compared because our study aimed to phenotype depression based on multimorbidity heterogeneity and visualize whole-body comorbidities of distinct phenotypes, rather than identify causal links between depression and other medical conditions. Novel relationships between depression and brain, skin, blood, musculoskeletal systems were also discovered in our analysis. A recent study reported that advanced age of the pulmonary system leads to faster cardiovascular aging, which in turn results in faster aging of the brain, musculoskeletal and renal systems; faster musculoskeletal aging is a common sequela of aging across multiple organ systems [57]. Given the recognized diversity in comorbidity of patients with depression, these findings call for future investigations of underlying transdiagnostic mechanisms or etiologies on these key pathways that may explain the heterogeneity of depression.
Association between depression phenotypes and laboratory biomarkers
In addition to the heterogeneity of comorbidities, it is noteworthy that different phenotypic categories are dominated by different key factors. For example, Category A, which includes four clusters, is a phenotype of depression with a younger onset age to middle age, prominent suicidal, psychiatric, and somatic symptoms, and few other comorbidities. Although their comorbidities are rare, research on the phenotypic differences of laboratory biomarkers found that this category exhibits significant deviations on many biomarkers, especially cluster A1. For example, the frequency of occurrence of higher levels of absolute values of lymphocyte, urobilinogen, and urinary protein than the normal range is higher than any other cluster, which may be related to liver dysfunction and infectious diseases. In addition, these patients exhibit the highest median values of many laboratory biomarkers, such as red blood cell count, monocyte proportion and absolute value, lymphocyte proportion and absolute value, eosinophil proportion, hemoglobin, total protein, albumin, potassium, white blood cell ratio, uric acid, calcium, anion gap, conductivity. The median values of other indicators (such as mean corpuscular hemoglobin, mean corpuscular volume, alanine aminotransferase, urea, total bilirubin, indirect bilirubin, cholesterol, and low-density lipoprotein) were the lowest. Many biomarkers (such as complete blood count, monocyte, kidney and liver function, lipids, and glucose) can inform the health status of individuals’ organs and specific diseases [57, 58], indicating that patients belonging to this category may have potential disease risks in the future and should be intervened and prevented as early as possible. These biomarkers also reflect the biological basis of heterogeneity in the presentation of depression symptoms. Therefore, our study provides evidence for the existence of biologically distinct phenotypes in patients with depression.
Notably, all patients in Category C had comorbidities, with 3 clusters included in this category for similar affected disease systems and middle to older age of onset. They exhibited the highest rates of comorbidities of diseases of cardiometabolic system and digestive system. This is in line with the results of previous phenotypic studies linking depression with many cardiometabolic and digestive diseases. Previous longitudinal disease trajectory analyses have reported that depression may be a risk factor for diseases of the cardiometabolic system, with initial presentations of chronic ischemic heart disease, angina pectoris, hypertension, diabetes, and disorders of lipoprotein metabolism and other lipidemias [32]. Mendelian Randomization analyses suggested a causal association between depression and ischemic heart disease, lipid metabolism, and gastrointestinal diseases [56]. In addition to the association between these common diseases, Category D, which included 2 clusters, found strong associations between depression and neurological diseases and other mental and behavioral disorders, such as Parkinson’s disease [28], dementia [29], anxiety disorder, post-traumatic stress disorder, various phobias, substance use disorders, obsessive compulsive disorder, attention-deficit hyperactivity disorder and personality disorders [31]. For Category D, patients in Cluster D1 with a median age of 63 years were first diagnosed with depression and all comorbid with neurological diseases, whereas patients in Cluster D2 with a median age of 25 years were all free from neurological diseases but comorbid with mental and behavioral disorders. Possible underlying mechanisms include hypothalamic-pituitary-adrenal (HPA) axis [59], heart-brain axis [60], gut-brain axis [61] etc., for which chronic inflammation performed as a major risk factor. Previous studies have shown that the chronic inflammation observed among individuals with depression can accelerate the development or progression of cardiometabolic diseases [62].
Interestingly, diseases of chronic inflammation and cardiometabolic can be found in most of all categories, suggesting that chronic inflammation may be a shared important factor for patients in different phenotypes. These observed associations support the notion that depression may induce exaggerated or prolonged inflammatory responses, as reported in the work on both animals and humans [63]. As a result, anti-inflammatory treatments may have the potential to prevent a general health decline after depression.
Association between depression phenotypes and treatment patterns
Our investigation into treatment patterns by phenotypes showed significant differences among clusters. Category A mainly used ADP combined with AA, AP and PSY, aiming to alleviate prominent suicide risk, psychotic symptoms, sleep problems and aspecific somatization in these patients. Almost all patients in Category B used AA and ADP, combined with AP, PHY and PSY, while no specific treatment had the highest rate of use among these patients. For Category C, patients in Cluster C1 had the highest rate of use of levothyroxine sodium; almost all patients in Cluster C2 used AA and ADP combined with AP and PSY, and patients in Cluster C2 had the highest rate of use of citalopram hydrobromide, tandospirone citrate, and zopiclone tartrate; for Cluster C3, alprazolam, bisacodyl and transcranial magnetic stimulation therapy used most. For each specific cluster, the specific drugs, physiotherapy, and psychotherapy currently used in clinical practice were quite different according to patients’ age, symptoms, and comorbidities. Given the unique phenotypes and patient characteristics within heterogeneous populations with depression, phenotype-specific treatment strategies can be developed in the future.
Association between depression phenotypes and clinical outcomes
In addition, we compared the incidence of psychiatric readmission risk within multiple time windows across the 5 phenotype categories and 15 clusters. Patients in Category E had the lowest risk of psychiatric rehospitalization within 60-day follow-up, followed by Category C (HR, 1.57; 95% CI, 1.07-2.30), Category B (HR, 1.61; 95% CI, 1.10-2.40), Category A (HR, 1.82; 95% CI, 1.28-2.60), and Category D (HR, 2.38; 95% CI, 1.59-3.60) with P < 0.05, after adjustment for comorbidities, medications, and age. Similar results of the adjusted risks of psychiatric readmission were observed within the follow-up time windows of 90, 180, and 365 days. Clarifying the identifying characteristics and key components of each phenotype may help us understand how to intervene clinically within these phenotypes to improve outcomes. For example, despite adjusting for the effects of comorbidities, medications, and age, Category D still had the highest risk of psychiatric readmission. Therefore, risk factor modification and/or phenotype-specific treatment will be the best clinical approach. For patients in Category B, alleviating the symptoms of anhedonia and anxious may be the most effective way to improve outcomes. Moreover, anti-inflammatory treatments may have the potential to improve outcomes for all phenotypes. In future work, the efficacy of these hypotheses, potential treatment heterogeneity should be evaluated and tested.
Strengths
The major strengths of our study are: 1) the application of cluster analysis to visualize the heterogeneity of depression from multiple perspectives, including comorbidity patterns, treatment patterns, clinical presentations, biological characteristics, and outcomes. This multifaceted approach offers a holistic understanding of depression, especially multimorbidity heterogeneity and whole-body comorbidities of distinct phenotypes; 2) the effective utilization of electronic medical record (EMR) data within a large, integrated care delivery system. Leveraging real-world data enhances the robustness of our findings and allows for a more nuanced exploration of depression phenotypes in a practical healthcare setting. This discovery-driven analysis complements previous studies often studying causal links between depression and common medical diseases by illustrating a whole picture of comorbidity patterns by depression phenotypes. While clarifying the common disease clusters and the temporal order of subsequent diseases after depression, further targeted studies are necessary. The subsequent studies should aim to directly explore novel groups exhibiting shared similar comorbidity-specific phenotypes. Identifying these phenotypes holds promise for future research endeavors, facilitating the exploration of underlying transdiagnostic mechanisms and etiologies. Moreover, these insights can inform the development of phenotype-specific treatment strategies, contributing to a more personalized and effective approach in the management of depression.
Limitations
This research is subject to several limitations. Firstly, the reliance on EMR diagnostic codes without structured interviews introduces potential diagnostic uncertainty for depression. Although previous research suggests that EMR-based diagnostic data can demonstrate high specificity and predictive value compared to clinical diagnostic interviews [64], the absence of structured interviews remains a limitation. Efforts were made to address this limitation by analyzing unstructured clinical text, particularly narrative notes on chief complaints, to capture nuanced symptom features. Secondly, it is important to acknowledge that our approach is data-driven and hypothesis-generating. While the objective was to showcase the application of a discovery-driven approach to a large depression patient population, resulting in clinically relevant patient phenotypes, the clustering results are contingent upon the specific population and available clinical variables. Although internal validation using an independent patient cohort demonstrated robust clustering, external validation with larger, multi-site samples could further enhance the stability of identified phenotypes. Unfortunately, due to challenges in obtaining comparable clinical cohorts, especially concerning laboratory test data and treatment prescriptions from other centers, the generalizability and portability of our results remain unverified. The absence of neuroimaging and other biological data is another limitation, precluding the exploration of neurophysiological correlates for the identified phenotypes. These limitations collectively underscore the need for caution in generalizing our findings beyond the confines of the Chinese health system, emphasizing the necessity for future research with broader, diverse populations and comprehensive datasets for validation.
Conclusions
This study utilized clustering-based models to identify distinct groups of patients with depression who share similar characteristics, aiming to evaluate whether these clusters exhibit different comorbidity patterns, treatment patterns, clinical presentations, physiological and biological characteristics, and outcomes. A total dataset of 11818 depressive inpatients from a large academic medical center-based health system in Chengdu, China was used for analysis. These clusters demonstrated significant differences in underlying comorbidity patterns, biological characteristics, treatment patterns, and clinical outcomes. Specifically, cardiometabolic system, the chronic status of inflammation, digestive system, neurological system, together with mental and behavioral disorders, may be key pathways linking depression with a wide range of other psychiatric and somatic diseases and conditions. The good performance of validation results suggested that the combination of unsupervised clustering techniques with EMR data may contribute to an increase in improvements in clinical depression classification and prognostic certainty. The findings on comorbidity-specific phenotypes and laboratory biomarkers heterogeneity within the population of patients with depression provide directions for further study of underlying transdiagnostic mechanisms and treatment strategies at molecular and psychopharmacological levels.
Supplementary information
Author contributions
T.Z. and W.Z. designed the study and wrote the protocol. T.Z. and Y.H. established the database for this study. T.Z. and D.M. analyzed the data. T.Z., D.M. and H.Y. wrote the manuscript. Y.C., M.Y., Y.H., J.X., H.Y. and W.Z. provided critical review and revisions of the manuscript. All authors contributed to and have approved the final manuscript.
Funding
This study was supported by the National Natural Science Foundation of China (Grant Nos. 72474146, 72104160, 81871061); and Department of Science and Technology of Sichuan Province (Grant Nos. 2024NSFSC1078, 2020YFS0582); 1.3.5 projects for disciplines of excellence, West China Hospital, Sichuan University (Grant No. ZYJC21004); and Research Grants Council of Hong Kong (Grant No. N_PolyU590/22).
Data availability
The data used to support the findings of this manuscript are restricted by the West China Hospital, to protect patient privacy and avoid legal and ethical risks. Data are available from West China Hospital for researchers who meet the criteria for access to confidential data.
Code availability
Upon request from the corresponding author (Wei Zhang with email weizhanghx@163.com).
Competing interests
The authors declare no competing interests.
Ethical approval
The Ethics Committee of the West China Hospital, Sichuan University approved the research. The protection and treatment of patient data in our research comply with the Helsinki Declaration. All information was de-identified and extracted from WCH’s EMR database by the hospital information technology department thus written informed consent from patients was waived.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Ethics approval and consent to participate
The Ethics Committee of the West China Hospital, Sichuan University approved the research. The protection and treatment of patient data in our research comply with the Helsinki Declaration. All information was de-identified and extracted from WCH’s EMR database by the hospital information technology department thus written informed consent from patients was waived.
These authors contributed equally: Ting Zhu, Di Mu.
Contributor Information
Heng-Qing Ye, Email: hq.ye@polyu.edu.hk.
Wei Zhang, Email: weizhanghx@163.com.
Supplementary information
The online version contains supplementary material available at 10.1038/s41398-024-03213-2.
References
- 1.Greenberg PE, Fournier A-A, Sisitsky T, Simes M, Berman R, Koenigsberg SH, et al. The Economic Burden of Adults with Major Depressive Disorder in the United States (2010 and 2018). PharmacoEcon. 2021;39:653–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Kessler RC, Bromet EJ. The Epidemiology of Depression Across Cultures. Annu Rev Public Health. 2013;34:119–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bromet E, Andrade LH, Hwang I, Sampson NA, Alonso J, De Girolamo G, et al. Cross-national epidemiology of DSM-IV major depressive episode. BMC Med. 2011;9:90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.World Health Organization. The global burden of disease: 2004 update. 2008;1–146.
- 5.Feczko E, Fair DA. Methods and Challenges for Assessing Heterogeneity. Biol Psychiatry. 2020;88:9–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Shohat S, Amelan A, Shifman S. Convergence and Divergence in the Genetics of Psychiatric Disorders From Pathways to Developmental Stages. Biol Psychiatry. 2021;89:32–40. [DOI] [PubMed] [Google Scholar]
- 7.Kendall KM, Rees E, Bracher-Smith M, Legge S, Riglin L, Zammit S, et al. Association of Rare Copy Number Variants With Risk of Depression. JAMA Psychiatry. 2019;76:818. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Cross-Disorder Group of the Psychiatric Genomics Consortium. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat Genet. 2013;45:984–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Howard DM, Adams MJ, Clarke TK, Hafferty JD, Gibson J, et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nature neuroscience. 2019;22:343–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.CONVERGE consortium, Cai N, Bigdeli TB, Kretzschmar W, Li Y, Liang J, et al. Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature. 2015;523:588–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ressler KJ, Bradley B, Mercer KB, Deveau TC, Smith AK, Gillespie CF, et al. Polymorphisms in CRHR1 and the serotonin transporter loci. Gene × Gene × Environment interactions on depressive symptoms. Am J Med Genet B Neuropsychiatr Genet. 2010;153B:812–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Popovic D, Ruef A, Dwyer DB, Antonucci LA, Eder J, Sanfelici R, et al. Traces of Trauma: A Multivariate Pattern Analysis of Childhood Trauma, Brain Structure, and Clinical Phenotypes. Biol Psychiatry. 2020;88:829–42. [DOI] [PubMed] [Google Scholar]
- 13.Berk M, Williams LJ, Jacka FN, O’Neil A, Pasco JA, Moylan S, et al. So depression is an inflammatory disease, but where does the inflammation come from? BMC Med. 2013;11:200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Spellman T, Liston C. Toward Circuit Mechanisms of Pathophysiology in Depression. Am J Psychiatry. 2020;177:381–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Van Loo HM, De Jonge P, Romeijn J-W, Kessler RC, Schoevers RA. Data-driven subtypes of major depressive disorder: a systematic review. BMC Med. 2012;10:156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Fried EI, Nesse RM. Depression is not a consistent syndrome: An investigation of unique symptom patterns in the STAR*D study. J Affect Disord. 2015;172:96–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Fried EI, Coomans F, Lorenzo-Luaces L. The 341 737 ways of qualifying for the melancholic specifier. Lancet Psychiatry. 2020;7:479–80. [DOI] [PubMed] [Google Scholar]
- 18.Rush AJ. The varied clinical presentations of major depressive disorder. Journal of Clinical Psychiatry. 2007;68:4. [PubMed] [Google Scholar]
- 19.Harald B, Gordon P. Meta-review of depressive subtyping models. J Affect Disord. 2012;139:126–40. [DOI] [PubMed] [Google Scholar]
- 20.Uher R, Muthén B, Souery D, Mors O, Jaracz J, Placentino A, et al. Trajectories of change in depression severity during treatment with antidepressants. Psychol Med. 2010;40:1367–77. [DOI] [PubMed] [Google Scholar]
- 21.Buch AM, Liston C. Dissecting diagnostic heterogeneity in depression by integrating neuroimaging and genetics. Neuropsychopharmacology. 2021;46:156–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Alonso J, De Jonge P, Lim CCW, Aguilar-Gaxiola S, Bruffaerts R, Caldas-de-Almeida JM, et al. Association between mental disorders and subsequent adult onset asthma. J Psychiatr Res. 2014;59:179–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.De Jonge P, Alonso J, Stein DJ, Kiejna A, Aguilar-Gaxiola S, Viana MC, et al. Associations between DSM-IV mental disorders and diabetes mellitus: a role for impulse control disorders and depression. Diabetologia. 2014;57:699–709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wium‐Andersen MK, Wium‐Andersen IK, Jørgensen MB, McGue M, Jørgensen TSH, Christensen K, et al. The association between depressive mood and ischemic heart disease: a twin study. Acta Psychiatr Scand. 2019;140:265–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Seligman F, Nemeroff CB. The interface of depression and cardiovascular disease: therapeutic implications. Ann N Y Acad Sci. 2015;1345:25–35. [DOI] [PubMed] [Google Scholar]
- 26.Tang B, Yuan S, Xiong Y, He Q, Larsson SC. Major depressive disorder and cardiometabolic diseases: a bidirectional Mendelian randomisation study. Diabetologia. 2020;63:1305–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Li GH-Y, Cheung C-L, Chung AK-K, Cheung BM-Y, Wong IC-K, Fok MLY, et al. Evaluation of bi-directional causal association between depression and cardiovascular diseases: a Mendelian randomization study. Psychol Med. 2022;52:1765–76. [DOI] [PubMed] [Google Scholar]
- 28.Wang S, Mao S, Xiang D, Fang C. Association between depression and the subsequent risk of Parkinson’s disease: A meta-analysis. Prog Neuropsychopharmacol Biol Psychiatry. 2018;86:186–92. [DOI] [PubMed] [Google Scholar]
- 29.Kessing LV. Depression and the risk for dementia. Curr Opin Psychiatry. 2012;25:457–61. [DOI] [PubMed] [Google Scholar]
- 30.Nemeroff CB. The State of Our Understanding of the Pathophysiology and Optimal Treatment of Depression: Glass Half Full or Half Empty? Am J Psychiatry. 2020;177:671–85. [DOI] [PubMed] [Google Scholar]
- 31.Rovner B, Casten R. The epidemiology of major depressive disorder. Evidence-Based Ophthalmology. 2003;4:186–7. [Google Scholar]
- 32.Han X, Hou C, Yang H, Chen W, Ying Z, Hu Y, et al. Disease trajectories and mortality among individuals diagnosed with depression: a community-based cohort study in UK Biobank. Mol Psychiatry. 2021;26:6736–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Xie C, Xiang S, Shen C, Peng X, Kang J, Li Y, et al. A shared neural basis underlying psychiatric comorbidity. Nat Med. 2023;29:1232–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Lichtenstein P, Yip BH, Björk C, Pawitan Y, Cannon TD, Sullivan PF, et al. Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: a population-based study. The Lancet. 2009;373:234–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Acosta MT, Castellanos FX, Bolton KL, Balog JZ, Eagen P, Nee L, et al. Latent Class Subtyping of Attention-Deficit/Hyperactivity Disorder and Comorbid Conditions. J Am Acad Child Adolesc Psychiatry. 2008;47:797–807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Smoller JW. The use of electronic health records for psychiatric phenotyping and genomics. Am J Med Genet B Neuropsychiatr Genet. 2018;177:601–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Maglanoc LA, Landrø NI, Jonassen R, Kaufmann T, Córdova-Palomera A, Hilland E, et al. Data-Driven Clustering Reveals a Link Between Symptoms and Functional Brain Connectivity in Depression. Biol Psychiatry Cogn Neurosci Neuroimaging. 2019;4:16–26. [DOI] [PubMed] [Google Scholar]
- 38.Xu Z, Wang F, Adekkanattu P, Bose B, Vekaria V, Brandt P, et al. Subphenotyping depression using machine learning and electronic health records. Learn Health Syst. 2020;4:e10241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Grant RW, McCloskey J, Hatfield M, Uratsu C, Ralston JD, Bayliss E, et al. Use of Latent Class Analysis and k-Means Clustering to Identify Complex Patient Profiles. JAMA Netw Open. 2020;3:e2029068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Inohara T, Shrader P, Pieper K, Blanco RG, Thomas L, Singer DE, et al. Association of of Atrial Fibrillation Clinical Phenotypes With Treatment Patterns and Outcomes: A Multicenter Registry Study. JAMA Cardiol. 2018;3:54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Inferential Statistics. http://vassarstats.net/textbook/. Accessed 25 Nov 2023.
- 42.Beijers L, Wardenaar KJ, Van Loo HM, Schoevers RA. Data-driven biological subtypes of depression: systematic review of biological approaches to depression subtyping. Mol Psychiatry. 2019;24:888–900. [DOI] [PubMed] [Google Scholar]
- 43.Kung B, Chiang M, Perera G, Pritchard M, Stewart R. Identifying subtypes of depression in clinician-annotated text: a retrospective cohort study. Sci Rep. 2021;11:22426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Maj M, Stein DJ, Parker G, Zimmerman M, Fava GA, De Hert M, et al. The clinical characterization of the adult patient with depression aimed at personalization of management. World Psychiatry. 2020;19:269–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Zhu T, Jiang J, Hu Y, Zhang W. Individualized prediction of psychiatric readmissions for patients with major depressive disorder: a 10-year retrospective cohort study. Transl Psychiatry. 2022;12:1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Zhu T, Liu X, Wang J, Kou R, Hu Y, Yuan M, et al. Explainable machine-learning algorithms to differentiate bipolar disorder from major depressive disorder using self-reported symptoms, vital signs, and blood-based markers. Comput Methods Programs Biomed. 2023;240:107723. [DOI] [PubMed] [Google Scholar]
- 47.Boutsidis C, Drineas P, Mahoney MW. Unsupervised feature selection for the k-means clustering problem. Adv Neural Info Processing Syst. 2009;22:1–9.
- 48.Huang JZ, Ng MK, Rong H, Li Z. Automated variable weighting in k-means type clustering. IEEE Trans Pattern Anal Mach Intell. 2005;27:657–68. [DOI] [PubMed] [Google Scholar]
- 49.Ikotun AM, Ezugwu AE, Abualigah L, Abuhaija B, Heming J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inform Sciences. 2023;622:178–210. [Google Scholar]
- 50.D’Silva J, Sharma U. Unsupervised Automatic Text Summarization of Konkani Texts using K-means with Elbow Method. Int J Eng Res Technol. 2020;13:2380. [Google Scholar]
- 51.Rousseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65. [Google Scholar]
- 52.Tibshirani R, Walther G, Hastie T. Estimating the Number of Clusters in a Data Set Via the Gap Statistic. J R Stat Soc Ser B Stat Methodol. 2001;63:411–23. [Google Scholar]
- 53.Wu P, Gifford A, Meng X, Li X, Campbell H, Varley T, et al. Mapping ICD-10 and ICD-10-CM codes to phecodes: workflow development and initial evaluation. JMIR Med Inform. 2019;7:e14325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Xia CH, Ma Z, Ciric R, Gu S, Betzel RF, Kaczkurkin AN, et al. Linked dimensions of psychopathology and connectivity in functional brain networks. Nat Commun. 2018;9:3003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Plana-Ripoll O, Pedersen CB, Holtz Y, Benros ME, Dalsgaard S, De Jonge P, et al. Exploring Comorbidity Within Mental Disorders Among a Danish National Population. JAMA Psychiatry. 2019;76:259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Mulugeta A, Zhou A, King C, Hyppönen E. Association between major depressive disorder and multiple disease outcomes: a phenome-wide Mendelian randomisation study in the UK Biobank. Mol Psychiatry. 2020;25:1469–76. [DOI] [PubMed] [Google Scholar]
- 57.Tian YE, Cropley V, Maier AB, Lautenschlager NT, Breakspear M, Zalesky A. Heterogeneous aging across multiple organ systems and prediction of chronic disease and mortality. Nat Med. 2023;29:1221–31. [DOI] [PubMed] [Google Scholar]
- 58.Cathomas F, Lin H-Y, Chan KL, Li L, Parise LF, Alvarez J, et al. Circulating myeloid-derived MMP8 in stress susceptibility and depression. Nature. 2024;626:1108–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Pariante CM, Lightman SL. The HPA axis in major depression: classical theories and new developments. Trends Neurosci. 2008;31:464–8. [DOI] [PubMed] [Google Scholar]
- 60.Tahsili-Fahadan P, Geocadin RG. Heart–Brain Axis: Effects of Neurologic Injury on Cardiovascular Function. Circ Res. 2017;120:559–72. [DOI] [PubMed] [Google Scholar]
- 61.Foster JA, McVey Neufeld K-A. Gut–brain axis: how the microbiome influences anxiety and depression. Trends Neurosci. 2013;36:305–12. [DOI] [PubMed] [Google Scholar]
- 62.Kuijpers PMJC, Hamulyak K, Strik JJMH, Wellens HJJ, Honig A. Beta-thromboglobulin and platelet factor 4 levels in post-myocardial infarction patients with major depression. Psychiatry Res. 2002;109:207–10. [DOI] [PubMed] [Google Scholar]
- 63.Kiecolt-Glaser JK, Derry HM, Fagundes CP. Inflammation: Depression Fans the Flames and Feasts on the Heat. Am J Psychiatry. 2015;172:1075–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Castro VM, Minnier J, Murphy SN, Kohane I, Churchill SE, Gainer V, et al. Validation of Electronic Health Record Phenotyping of Bipolar Disorder Cases and Controls. Am J Psychiatry. 2015;172:363–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data used to support the findings of this manuscript are restricted by the West China Hospital, to protect patient privacy and avoid legal and ethical risks. Data are available from West China Hospital for researchers who meet the criteria for access to confidential data.
Upon request from the corresponding author (Wei Zhang with email weizhanghx@163.com).