Abstract
Objective:
Depression is a prevalent mental health disorder that significantly impacts well-being and quality of life. This study investigates the relationship between depression and cardiovascular function, exploring time-series features derived from electrocardiogram (ECG) and photoplethysmogram (PPG) data as potential biomarkers for depression prescreening.
Approach:
As part of a comprehensive psycholinguistic experiment, we collected data from 60 individuals, including both healthy participants and those with varying levels of depression, assessed using the Beck Depression Inventory-II (BDI-II) and the Patient Health Questionnaire-9 (PHQ-9).
Bimodal features derived from both ECG and PPG data were used to develop machine learning models for depression risk classification, employing classifiers such as Random Forest, XGBoost, Logistic Regression, and Support Vector Machines (SVM). Additionally, regression models were built to predict depression severity based on ECG- and PPG-derived biomarkers.
Main Results:
Key findings indicate that short-term variability (SD1) features in the ECG RR interval, peripheral systolic and diastolic phases from the PPG, and pulse duration significantly differ between healthy individuals and those at risk of depression. SVM achieved the best classification performance, with an AUROC of 0.83 ± 0.11 for BDI-II-based classification and 0.78 ± 0.11 for PHQ-9-based classification. SHAP analysis consistently identified systolic-SD1 and RR-SD1 as key predictors. Regression analysis further supported the role of cardiovascular features in assessing depression severity, yielding a mean absolute error (MAE) of 10.18 for BDI-II and 5.27 for PHQ-9 score regression.
Significance:
This study demonstrates the feasibility of using wearable ECG and PPG technologies for depression prescreening. The findings suggest that cardiac activity-based biomarkers can contribute to the development of cost-effective, objective, and non-invasive tools for mental health assessment, complementing traditional diagnostic methods.
Keywords: Depression, Electrocardiogram, Photoplethysmogram, Cardiovascular timing, Heart rate variability, BDI-II, PHQ-9
1. Introduction
Mental health disorders, such as depression, are a significant global concern, with approximately 5% of adults worldwide suffering from this condition [1]. In the United States alone, it is reported that suicide, which is associated with depression in the majority of cases, claimed 49,476 lives in 2022, equating to one death every 11 minutes [2]. Despite the critical need for effective mental health assessments, traditional methods such as self-reported questionnaires and clinical interviews can potentially be limited by biases, underreporting, and a reliance on conscious self-reflection. These limitations hinder their ability to predict and address depression and suicidality reliably.
To address this challenge, the development of objective, reliable, accessible, and affordable mental health assessment tools has become essential. Such technologies can enable earlier and possibly more reliable detection and intervention, particularly in underserved or resource-limited settings. Advances in biosensing technologies, when combined with artificial intelligence, hold the potential to revolutionize mental health assessments by evaluating autonomic nervous system activity, preconscious, and subconscious mental processes.
Cardiovascular dynamics have been correlated with mental health disorders, particularly depression, suggesting that cardiovascular signal characteristics may be potentially useful for prescreening depression, both at rest and in response to stress and fatigue [3–5].
Metrics such as heart rate variability (HRV) and heart rate fragmentation (HRF), which quantify variations in the time intervals between heartbeats, have emerged as robust indicators of mental health [6, 7]. Although cardiovascular biomarkers are not specific to depression and may be influenced by cardiovascular conditions, studies have shown that individuals with depression and anxiety exhibit lower HRV compared to healthy controls, reflecting autonomic nervous system dysregulation [8,9]. HRV metrics have also been found to correlate with depression severity and treatment response, highlighting their potential as tools for monitoring therapeutic progress [10].
Previous studies indicate that HRV decreases significantly during acute mental stress, accompanied by a shift toward sympathetic activation [11]. This association has been observed across diverse demographic groups and varying levels of depression severity, reinforcing the potential of HRV and HRF as promising physiological markers of mental health [12]. Despite these advances, the relationship between heart rate dynamics, stress reactivity, and depression remains an area of active investigation [13, 14].
Traditional cardiac monitoring relies on electrocardiogram (ECG) recordings, but advancements in wearable technology have facilitated the use of other modalities, such as photoplethysmogram (PPG), as complementary methods for capturing cardiovascular activity [15, 16]. Together, ECG and PPG provide a complementary view of the heart and vascular system, with ECG capturing electrical activity and PPG measuring peripheral blood flow dynamics. These methods facilitate the extraction of features such as wave morphology, inter-modal timings between the electrical impulses in the ECG, and peripheral systolic/diastolic responses captured by PPG, as well as metrics like pulse arrival time, providing insights into cardiovascular and autonomic function beyond heart rate, HRV, and HRF [17, 18].
Recent studies have investigated cardiovascular data, specifically ECG and PPG, for identifying depression and anxiety [19]. Machine learning models have shown promising outcomes in classifying depressive states [20, 21]. These studies utilized time-frequency analysis and feature extraction techniques alongside neural networks or support vector machines for classification. Additionally, heart rate responses to cognitive load have been reported as potential indicators of depression and anxiety, with strong links identified between heart rate metrics and psychological conditions [22]. These results indicate that ECG and PPG-based techniques could be useful tools for screening and monitoring mental health disorders in clinical settings and wearable healthcare applications.
In this study, we explore the predictive value of ECG and PPG data in prescreening depression and suicidality. The data was collected as part of the Preconscious Signal Compilation for Robust and Individualized Belief Evaluation (PRESCRIBE) project, which explored the relationship between psycholinguistic stimuli (vignettes displayed on a screen) and physiological responses to identify biomarkers for depression and suicidality. The study involved individuals diagnosed with major depressive disorder (MDD) and individuals without a current diagnosis, who underwent extensive psychological prescreening followed by a psycholinguistic experiment with simultaneous multimodal physiological data collection.
This research focuses on leveraging ECG and PPG data from PRESCRIBE to develop accessible, wearable technologies for preliminary mental health assessments outside clinical settings. These tools could support early detection and facilitate timely referrals for individuals at risk of depression. Despite their simplicity and accessibility, we demonstrate that ECG and PPG have strong potential as cost-effective, reliable methods for prescreening mental health conditions outside clinical settings.
In Section 2, an overview of the PRESCRIBE study design is provided, including signal recording procedures, participant demographics, and questionnaire-based depression assessments. Section 3 details our methodology for analyzing the ECG and PPG, emphasizing cardiac activity time intervals and their transformation into bimodal cardiac features for machine learning. In Section 4, we present our findings, statistical analyses, and the machine learning models used to evaluate the predictive accuracy of the cardiac biomarkers for depression. Finally, Sections 5 and 6 discuss the implications of the findings for depression prescreening, their contribution to current research, and future directions for integrating this technology into mental health evaluations.
2. Study Design
PRESCRIBE was conducted under the DARPA Neural Evidence Aggregation Tool (NEAT) program [23, 24], which aimed to transform mental health assessment by integrating advances in neuroscience, biosensing, and artificial intelligence. The PRESCRIBE project was specifically designed to leverage psycholinguistic stimuli and multimodal physiological sensing to detect preconscious processes associated with symptoms of depression and suicidality. This collaborative effort included Charles River Analytics, Tufts University, Georgia Institute of Technology (Georgia Tech), and Emory University. The cardiovascular data used in the current study were collected at the Emory and Georgia Tech sites.
The study was conducted in accordance with the ethical principles of the Declaration of Helsinki. It was approved by the Institutional Review Boards (IRBs) at Emory University under study number STUDY00006938, Georgia Tech under protocol H23151, Tufts University under protocol STUDY00003388, and collectively by the Navy’s Human Research Protection Office (HRPO). All subjects provided written informed consent prior to participation. Participants were recruited under the approval of each of the university’s IRBs and HRPO. For a full overview of the study protocol, refer to [25].
2.1. Participants and Psychological Prescreening
Participants were recruited to Emory University and Georgia Tech through public announcements, including flyers and digital platforms, via a two-step process. Initially, volunteers were remotely evaluated against IRB-approved inclusion and exclusion criteria, and eligible volunteers provided written consent to participate in the study.
Inclusion criteria required participants to be aged 18 to 75 with over 50% exposure to English before age five. They needed to either meet the diagnostic criteria for Major Depressive Disorder (MDD) or be healthy controls with no current psychiatric diagnosis.
Exclusion criteria included positive pregnancy tests or breastfeeding, inadequate English exposure, history of meningitis or traumatic brain injury, significant substance use disorders, head trauma with loss of consciousness over one minute, recent benzodiazepine or opioid use, history of cardiovascular diseases, and several specific psychiatric conditions. Individuals with cognitive impairments or non-English speakers were also disqualified.
Eligible participants completed psychological questionnaires remotely (for Emory) or in person (for Georgia Tech), with healthy controls primarily sourced from the local communities surrounding Georgia Tech and Emory University (Midtown Atlanta, GA, USA). MDD participants were recruited from Emory’s psychiatric outpatient clinic. All participants underwent assessments using the Beck Depression Inventory-II (BDI-II) [26], and Patient Health Questionnaire-9 (PHQ-9) [27], to evaluate depressive symptomatology. Emory participants additionally completed a structured evaluation through the Mini-International Neuropsychiatric Interview (MINI) for psychiatric conditions [28], ensuring compliance with inclusion criteria.
After prescreening, participants were scheduled for data collection sessions.
2.2. Experimental Procedure
Tufts University designed a psycholinguistic experiment using PsychoPy [29,30], which Georgia Tech and Emory modified to simultaneously collect multimodal physiological data. During data collection, participants read vignettes on a computer screen, displayed either one word or one sentence at a time. The vignettes varied in predictability and emotional tone. Two types of stimuli were used: self-relevant (SR), exploring mental health beliefs in the first person, and non-self-relevant (NSR), serving as neutral controls in the third person. Neural and physiological responses were time-locked to the onset of a critical word, always the last word of the vignette.
Participants sat in a quiet room wearing the sensor suite described below. Each session began with a short baseline recording (only at Emory) to help participants adjust and capture resting-state data. Trials were organized into eight 40-trial blocks, with short breaks for rest and device recalibration. PsychoPy managed stimulus presentation and response recording. The stimuli included periodic yes-no questions to assess engagement. Using their right hand, participants operated a three-button keypad labeled YES, NO, and GO (for block transitions). Button positions were randomized across subjects to mitigate left-right-button click biases. Participants were instructed to maintain gaze on the screen and minimize movement to improve data quality and reduce errors in eye-tracking and pupillometry.
A multimodal sensor suite captured physiological and neurophysiological signals, including electrocardiogram (ECG), photoplethysmography (PPG), electroencephalogram (EEG), respiration, seismocardiogram (SCG) via triaxial accelerometers, electrodermal activity (EDA), continuous blood pressure, and eye tracking. EEG was recorded using the BioSemi system, eye movements and pupil dilation with the EyeLink 1000 Plus system, and other physiological data with a Biopac MP160 device. ECG was recorded using a three-lead chest configuration with a wireless BioNomadix module (BIOPAC Systems Inc.), with two electrodes across the heart and a reference lead on the hip. PPG data were collected from the ring finger of the left hand using the Berry reusable SpO2 sensor (BerryMedical Inc.), ensuring no interference with keypad use. Both signals were sampled at 2 kHz.
The synchronization of stimulus presentation and physiological data collection was accomplished using precise triggers sent from a computer running PsychoPy to the acquisition systems. This ensured accurate alignment between the timing of stimuli and the recorded physiological signals. Data from the Biopac system were recorded in real-time with AcqKnowledge software and saved for later processing.
Session durations ranged from 74 to 180 minutes, averaging 120±22 minutes, with variations due to preparation time, practice, response speed, and inter-block breaks. Each block lasted 6 to 22 minutes, with an average duration of 11.4±2.4 minutes.
For further details regarding other data modalities considered in related work, see [25].
While the psycholinguistic environment may have influenced physiological responses, for a more objective assessment, this study uses only the BDI-II and PHQ-9 depression scores as ground truth. In terms of predictors, we focus exclusively on the ECG and PPG data collected in PRESCRIBE, selected over other physiological modalities for their non-invasive nature, cost-effectiveness, portability, and compatibility with wearable devices. Together, ECG and PPG provide a complementary view of cardiovascular function, capturing both central cardiac activity and peripheral vascular responses—both known to be altered in depressive states [22]—and are linked to autonomic nervous system dysfunction commonly associated with depression [11,20,21]. Their accessibility also makes them well-suited for out-of-clinic depression prescreening.
Recent studies also support this approach. Wrist-worn PPG devices have been shown to track day-to-day drops in heart rate variability (HRV) that align with degradation of PHQ-9 scores in adults with mild-to-moderate depression [31]. A 2023 network meta-analysis of over 6,000 recordings found that patients with major depressive disorder have significantly lower time- and frequency-domain HRV than healthy controls [32]. A prolonged QT interval on a standard 12-lead ECG independently predicted the co-occurrence of depressive and anxious symptoms in first-episode patients [33]. Finally, [34] showed that a composite Electrical Risk Score—based on QTc, T-peak-to-T-end, heart rate, and frontal QRS-T angle—is elevated in patients with major depressive disorder and correlates with both Hamilton scores and illness duration.
3. Method
3.1. Dataset
Data collected from 60 participants (32 from Emory and 28 from Georgia Tech) was used in this study. The demographics of the study population are summarized in Table 1. The cohort included 29 males (48%) and 31 females (52%). Thirty individuals were between ages 20 and 30.
Table 1:
Demographics of the study participants (N = 60) along with their breakdown by BDI-II and PHQ-9 scores [26, 27].
| Variable | Category | Frequency | Percentage (%) |
|---|---|---|---|
| Gender | Male | 29 | 48 |
| Female | 31 | 52 | |
| Age | 20–30 yrs | 30 | 50 |
| 31–40 yrs | 13 | 22 | |
| 41–50 yrs | 9 | 15 | |
| 51–70 yrs | 8 | 13 | |
| BDI-II Score | Healthy: BDI-II≤13 | 29 | 48 |
| Depressed: BDI-II≥14 | 31 | 52 | |
| PHQ-9 Score | Healthy: PHQ-9≤4 | 20 | 33 |
| Depressed: PHQ-9≥5 | 40 | 67 |
Participants were grouped by their risk of depression based on BDI-II and PHQ-9 scores using the thresholds defined in Table 2. While various binary and multilabel classification problems can be explored, here, we focused on distinguishing healthy individuals—those with minimal depression (defined as BDI-II ≤ 13 or PHQ-9 ≤ 4)—from those at risk of depression with varying severity levels (mild, moderate, or severe, as listed in Table 2). Accordingly, based on BDI-II scores, n = 29 participants were labeled as healthy and n = 31 as depressed. Using PHQ-9 scores, n = 20 participants were labeled as healthy and n = 40 as depressed.
Table 2:
3.1.1. Concordance between BDI-II vs PHQ-9 Scores:
Fig. 1 illustrates the scatter plot of BDI-II and PHQ-9 scores per subject. The dots closer to the origin (in red) represent healthier individuals, while the farther points (in blue) indicate greater depression score. Although the Pearson coefficient is 0.90 (p-value ≤ 10−6) suggesting general agreement, discrepancies exist between the two instruments.
Figure 1:

Scatter plot of BDI-II and PHQ-9 scores [26, 27], for 60 participants color-coded by distance from the origin, with thresholds for healthy and depressed groups as defined in Table 2. The Pearson coefficient is 0.90, the Kendall τ is 0.75, and the Spearman rank coefficient is 0.91.
Given the definition of depression scores in Table 2, we also tested the monotonic relationship between BDI-II and PHQ-9 scores of the participants. The rationale is that if a subject A has a higher or equal BDI-II score compared to subject B, i.e., BDI-II(A) ≥ BDI-II(B), then we expect the same ordering in their PHQ-9 scores, PHQ-9(A) ≥ PHQ-9(B). This would ensure consistency between the two scoring instruments, reflecting their expected correlation in assessing depression severity. To assess this, we calculated Kendall’s and Spearman’s rank correlation coefficients to evaluate the agreement in rankings [36]. A Kendall’s τ of 0.75 and a Spearman’s correlation of 0.91 were obtained (both with p-values ≤ 10−6). This confirms a significant yet imperfect positive correlation in rankings between BDI-II and PHQ-9 scores. This finding is consistent with the literature on the sensitivity and specificity of BDI-II [37] and PHQ-9 [38] in identifying depression, highlighting that these instruments are not perfect. This underscores the need for multiple screening instruments and highlights the complementary roles of BDI-II and PHQ-9 in evaluating depressive symptoms. This discrepancy also impacts the “ground truth” in machine learning analyses, which use these scores to label the subjects.
3.2. Data Analysis
The ECG and PPG analysis pipeline is summarized in Fig. 2. Each step is detailed below.
Figure 2:

Signal processing block diagram for extracting fiducial points and features from ECG and PPG data
3.2.1. Preprocessing:
Baseline wander in the ECG and PPG channels was corrected using a two-stage filtering approach [39–41], consisting of a moving median filter (1 s for ECG, 4 s for PPG) followed by a moving average filter (0.5 s for ECG, 2 s for PPG). To eliminate 60 Hz power-line interference, a second-order IIR notch filter with a Q-factor of 45 was applied to the ECG using zero-phase forward-backward filtering. The PPG signals did not contain any interference; no additional filtering was required.
3.2.2. R-peak detection:
To detect ECG R-peaks, we used an efficient R-peak detector from the open-source electrophysiological toolbox (OSET) [42] (peak_det_likelihood_long_recs.m). This R-peak detector, inspired by the Pan-Tompkins algorithm [43], applies a bandpass FIR filter, followed by hyperbolic tangent amplitude saturation to mitigate spike noise and motion artifacts. The power envelope is then computed using a sliding window, and R-peaks are detected as local maxima within adaptive windows based on heart rate, and corrected by multiple rule-based heuristics on the ECG amplitude and rhythm. The function has been computationally optimized for long recordings.
3.2.3. PPG enhancement:
To reduce artifacts and noise in the PPG, we applied a bandpass filter with a passband of 1–20 Hz, followed by an enhancement step for the dicrotic notch (DN) as proposed in [44], which is provided in OSET [42]. The DN serves as a crucial reference point for identifying peripheral systolic and diastolic events. However, it is not always visible in the raw PPG and requires enhancement. We created a DN enhancer inspired by [45]. This method applies a high-pass filter with a signal-dependent cutoff frequency. To determine the cutoff, we sequentially applied a high-pass filter with a cutoff frequency sweeping from 1 Hz to 2 Hz in 0.2 Hz increments until the output signal contained less than 25% of its total power in frequencies below 2 Hz. To prevent phase distortion and preserve signal fidelity, we employed forward-backward filtering.
While this filter can generally be implemented adaptively and in real-time, our analysis was conducted offline. Therefore, we determined a single high-pass cutoff frequency for each PPG record (participant). This DN enhancement method has demonstrated robust performance, as validated on a large dataset [45]. Additionally, we reviewed each record by visual inspection to ensure its accurate performance for each participant.
3.3. ECG-PPG Fiducial Point Extraction
Accurate fiducial point detection is crucial for ECG beat annotation and cardiac time interval extraction. We implemented robust algorithms for this purpose, as detailed below.
3.3.1. ECG Fiducial Point Detection:
Our primary ECG waveforms of interest—P-wave, QRS complex, and T-wave—are used to derive cardiovascular events, as illustrated in Fig. 3. To extract these waveforms, we implemented an algorithm based on the Latent Structure Influence Model (LSIM) to identify the onset and offset of the P-wave, QRS complex, and T-wave [46,47]. Referred to as the LSIM-FD block in Fig. 2, this algorithm takes the ECG and R-peaks as inputs and outputs the fiducial points. The source codes for this LSIM-based fiducial detection algorithm is provided in OSET [42].
Figure 3:

Illustrations of ECG (top, blue) and PPG (bottom, red) fiducial points and their bimodal inter-relationship, along with the fiducial points and the electro-vascular intervals.
After extracting the beat-wise fiducial points, we calculated several key ECG-based intervals: P-wave width, QRS complex width, T-wave width, PQ interval, PT interval, QT interval, and RR interval (see Table 3). These beat-wise parameters resulted in time-series for each parameter, across each data collection session.
Table 3:
ECG- and PPG-based electro-vascular time intervals (top) along with the 52 aggregated features used for building machine learning models (bottom)
| No. | Time Interval | Electro-vascular Interval Description |
|---|---|---|
| 1 | P-wave duration | P-wave onset to offset |
| 2 | QRS complex duration | QRS complex onset to offset |
| 3 | T-wave duration | T-wave onset to offset |
| 4 | PQ interval | P-wave onset to QRS-complex onset |
| 5 | PT interval | P-wave onset to T-wave offset |
| 6 | QT interval | QRS-complex onset to T-wave offset |
| 7 | RR interval | Time between successive R-peaks |
| 8 | Systolic interval | PPG waveform onset to the Dicrotic notch |
| 9 | Diastolic interval | Dicrotic notch to PPG waveform offset |
| 10 | Systolic peak time | PPG waveform onset to systolic peak |
| 11 | Pulse interval | Time between two consecutive PPG onsets |
| 12 | Pulse arrival time foot | Time from ECG R-peak to PPG foot onset |
| 13 | Pulse arrival time peak | Time from ECG R-peak to PPG systolic peak |
| No. | Features | Feature Description |
| 1–13 | Interval Median | Median of 13 electro-vascular intervals over 1-min segments |
| 14–26 | Interval SD1 | Poincaré SD1 for 13 intervals over 1-min segments |
| 27–39 | Interval SD2 | Poincaré SD2 for 13 intervals over 1-min segments |
| 40–52 | Interval ρ | Successive samples corr. coef. of 13 intervals on 1-min segments |
3.3.2. PPG Fiducial Point Detection:
The following PPG-based fiducial points were extracted: peripheral systolic onset (ON), peripheral systolic peak (SP), and the dicrotic notch (DN), as shown in Fig. 3. Since ECG and PPG data were recorded simultaneously, PPG beats were segmented using the ECG R-peak as a reference (Fig. 2). For fiducial point detection, we adapted methods from PyPPG and PPGFeat [48, 49], modifying them to use the ECG R-peak for beat segmentation. Accordingly, the most dominant PPG peak between consecutive ECG R-peaks was identified as the peripheral systolic peak. The onset was determined as the deepest valley between the R-peak and the systolic peak, and the DN was found as a local minimum between the systolic peak and the next R-peak. Our PPG delineator is implemented as the function fiducial_det_ppg.m in OSET [42].
After identifying the PPG fiducial points, we derived four key beat-wise time interval series: the systolic interval, diastolic interval, pulse interval, and systolic peak time, as illustrated in Fig. 3 and Table 3.
3.3.3. Bimodal Time Intervals:
With the ECG and PPG fiducial points synchronized based on the R-peak, we derived hybrid bimodal time intervals, such as Pulse Arrival Time (PAT) [17, 18, 50, 51]. PAT represents the time taken for a pulse wave to travel from the heart to a peripheral site, such as the fingertip, where the PPG is recorded. We focused on two specific PAT measurements: the interval between the ECG R-peak and the PPG onset (PATfoot) and the interval between the ECG R-peak and the PPG systolic peak (PATpeak), as shown in Fig. 3 and Table 3.
3.4. Feature Dynamics and Poincaré Representations
The detailed fiducial point extraction algorithms provide multiple beat-wise time-series features. These features can be used to derive various time, frequency, and statistical characteristics from the ECG and PPG [52]. Motivated by research on HRV/HRF and their relationship with depression, we focus on simple, interpretable features that can facilitate further exploration of the dynamics of cardiac biomarkers in connection with depression. To achieve this, we emphasize the Poincaré representation of the extracted intervals, highlighting its clinical relevance and simplicity [53, 54].
The Poincaré plot of RR intervals is a graphical representation of heart rate variations, depicting each RR interval versus the previous RR interval [55], also referred to as the phase space in dynamic system analysis. Herein, we extend the concept of Poincaré plot analysis to all time intervals extracted from the ECG and PPG. Fig. 4 shows the Poincaré plots for RR, systolic, and diastolic intervals across all heartbeats for our study participants, grouped by their BDI-II scores. Further details are provided in the Results section.
Figure 4:

Poincaré plots for all (410,000) heartbeats across all subjects. Blue and red contours show the 75th percentiles for healthy and depressed individuals based on BDI-II scores. ‘x’ markers denote average of each group.
Quantitatively, Poincaré plots can be characterized by their spread along the major and minor axes of the scatter plot. For a time interval of interest , denoting each point in the two-dimensional Poincaré plot as , the covariance matrix of the phase space scatter can be expressed as:
| (1) |
where is the sample variance, is the sample mean, and is the correlation coefficient between the successive samples of . We can show that the square root of the minor and major eigenvalues of , denoted by SD1 and SD2, respectively, are:
| (2) |
Due to the way the Poincaré plots are formed, the sample scatters are symmetric around the diagonal (excluding the very first and last sample points); therefore, the major eigenvector of aligns with the identity line, and the minor eigenvector is perpendicular to the identity line. SD2 (scatter along the identity line) has been associated with long-term variability, while SD1 (scatter perpendicular to the identity line) has been linked to short-term variability, both of which link to autonomic regulation mechanisms [54]. SD1 and the root mean square of successive differences (RMSSD), which is common in ECG analysis, are equivalent metrics [56]. In Appendix A, we derive the relationship between and the spectrum of the time series , showing the relationship between the HRV spectrum and Poincaré plots.
In summary, the ECG and PPG processing yielded 13 beat-wise time intervals: seven ECG-based, four PPG-based, and two hybrid bimodal intervals, represented as beat-wise time series across each session. For subsequent machine learning, each time series was further summarized into four features over 1-minute intervals: the median, SD1, SD2, and , resulting in 52 features per 1-minute period for each subject as listed at the bottom of Table 3. To note, from a purely machine learning perspective, it is possible to extract other generic and abstract (less explainable) ECG- and PPG-based features. However, to ensure the results are explainable from a physiological standpoint, we focus on fiducial points and time intervals listed in Table 3, which have proven physiological value.
4. Results
4.1. Feature Visualization
Before presenting the quantitative results, we begin by visualizing some of the key features of healthy and depressed individuals.
4.1.1. Poincaré Plots:
Fig. 4 illustrates Poincaré plots for subjects with a BDI-II score below 14 (red) and those with a score equal to or above 14 (blue) for RR, systolic, and diastolic intervals across 410,000 heartbeats from all participants. The solid blue and red lines represent the 75th percentile contours for each group, while the markers indicate the average values for each group. The contour plots suggest differences between the two groups; however, quantitative analysis is required to confirm the statistical significance of these differences (as presented later). The average RR interval for the healthy group is 841 ms, whereas for the depressed group, based on the BDI-II score, it is 757 ms. This indicates that the depressed group has a lower RR interval, reflecting a higher heart rate.
According to Fig. 4, the SD1 features, which characterize short-term variability in RR, systolic, and diastolic intervals, show lower values for depressed individuals across all three intervals. This indicates a decrease in short-term variability of heart activity in this group, which is consistent with the literature [57,58]. Notably, Fig. 4 aggregates the Poincaré plot across all participants. A statistical analysis of SD1 and SD2 across each subject is required to determine the significance of the results.
4.2. Statistical Significance and Hypothesis Testing
To assess the differences between ECG- and PPG-derived features in healthy and depressed groups, we conducted tests to determine whether these differences are statistically significant. For each subject, we aggregated features by taking the median of 1-minute features before running the statistical tests.
We use nonparametric statistical methods for hypothesis testing, avoiding assumptions about sample distributions. Specifically, we employ the Kolmogorov-Smirnov (KS) test and the Wilcoxon Rank-Sum test [59, 60]. The KS test evaluates overall distribution differences without assuming a specific distribution, while the Wilcoxon Rank-Sum test assesses rank-based median differences.
Table 4 summarizes the statistically significant features identified under an alpha level of 0.01 for groups defined by BDI-II and PHQ-9 scores. The significance of these features is determined using at least one of the two statistical tests.
Table 4:
Kolmogorov-Smirnov (KS) and Wilcoxon Rank-Sum (WRS) tests identify ECG-PPG features that significantly differ (p < 0.01) between healthy and depressed subjects based on BDI-II and PHQ-9 labels. Statistically significant features are marked with asterisks. AUPRC and AUROC indicate the predictive power of each feature for classification using feature thresholding and without any training process (see Section 4.4.1).
| Analysis | Signal | Feature | KS-test | WRS-test | AUPRC | AUROC |
|---|---|---|---|---|---|---|
| BDI-II | ECG | RR SD1* | 0.0001 | 0.003 | 0.78 | 0.73 |
| P-wave ρ | 0.022 | 0.008 | 0.72 | 0.70 | ||
| PPG | Pulse SD1* | 0.0001 | 0.003 | 0.77 | 0.73 | |
| Diastolic SD1* | 0.0001 | 0.008 | 0.77 | 0.70 | ||
| Systolic SD1* | 0.024 | 0.008 | 0.67 | 0.70 | ||
| Pulse SD2 | 0.009 | 0.022 | 0.74 | 0.67 | ||
| PHQ-9 | ECG | RR SD1* | 0.003 | 0.025 | 0.83 | 0.68 |
| PPG | Pulse SD1* | 0.003 | 0.021 | 0.83 | 0.68 | |
| Diastolic SD1* | 0.003 | 0.038 | 0.83 | 0.67 | ||
| Systolic SD1* | 0.003 | 0.003 | 0.87 | 0.74 | ||
| Systolic ρ | 0.006 | 0.014 | 0.82 | 0.70 |
For both BDI-II-based and PHQ-9-based groups, the SD1 features of RR, systolic, pulse duration, and diastolic intervals exhibit significant differences between the healthy and depressed groups. Table 4 also highlights the significance of the correlation coefficient (ρ) for the P-wave duration interval, indicating that ECG-based morphological characteristics, particularly those related to atrial activity, may provide valuable insights into group differences. Additionally, Table 4 shows that the feature ρ for the systolic interval is significant for the PHQ-9-based grouping, suggesting that systolic interval variability also contributes meaningfully to classifying these groups.
4.3. Low-Dimensional Feature Visualization
According to Table 4, six ECG- and PPG-based features were identified as significant for the BDI-II-based grouping and five for the PHQ-9-based grouping. We use Principal Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE) to visualize these features in two dimensions in an unsupervised manner (without considering labels). PCA aims to minimize reconstruction error in the low-dimensional space, while t-SNE preserves local relationships by computing pairwise similarities and embedding the data into a lower-dimensional space [61], both independent of labels.
Fig. 5a and Fig. 5b display the PCA- and t-SNE-based two-dimensional projections of the average subject-wise features (60 subjects), visually illustrating the separability of healthy and depressed individuals in BDI-II-based projections. Fig. 5c and Fig. 5d show PHQ-9-based projections, where the separation is less distinct, suggesting weaker differentiation. Notably, the two-dimensional projections are not linearly separable, highlighting the need for more advanced machine-learning techniques for classification.
Figure 5:

Two-dimensional PCA projection and t-SNE embedding of the significant features from Table 4, for healthy (red) and depressed (blue) groups based on BDI-II and PHQ-9 scores.
4.4. Healthy vs Depressed Classification
To assess the discriminative capability of the features identified in Table 4, we study various machine learning models to distinguish between healthy and depressed groups, as well as their levels of depression severity, based on BDI-II and PHQ-9 scores outlined in Table 1.
4.4.1. Basic Feature Thresholding:
As a preliminary attempt, we use basic feature thresholding on the features listed in Table 4 to classify healthy versus depressed individuals. While this is a basic approach, it requires no training and provides insights into the usefulness of each individual feature and their ranking [62,63], setting a baseline for comparison with more advanced machine learning models. The procedure is similar to a standard detection problem: for each individual, we average their ECG/PPG-based features across their entire record. Next, we sweep a threshold ranging from the minimum to the maximum of each feature and associate lower/higher values with the healthy/depressed groups. At each decision level, we count the correctly assigned healthy and depressed labels. This provides us with data points for standard receiver operating characteristic (ROC) and Precision-Recall (PR) curves for each individual feature [64]. For reference, the resulting area under the ROC curve (AUROC) and area under the PR curve (AUPRC) for each individual feature is reported in the last two columns of Table 4. The corresponding ROC and PR curves are also illustrated with faint colors in Fig. 6. Accordingly, RR-SD1 yields the highest AUROC of 0.73 for BDI-II-based grouping, while Systolic-SD1 achieves the highest AUROC of 0.74 for the PHQ-9-based grouping. These results set baselines for comparison of more advanced machine learning models that involve training.
Figure 6:

ROC and PR curves for significant features (thin lines) based on BDI-II and PHQ-9 groups from Table 4, along with the average ROC and PR curves of various classifiers utilizing these features. The corresponding average operating points at a 75% sensitivity threshold for the training set are indicated by markers of the same color with black edges.
4.4.2. Classification Results:
Next, we test standard classification schemes involving training and validation, including Random Forest (RF), XGBoost (XGB), Logistic Regression (LR), and Support Vector Machine (SVM). We apply a stratified subject-wise 5-fold cross-validation procedure with depressed-healthy stratification based on BDI-II and PHQ-9 scores to maintain consistent healthy-to-depressed ratios in both training and test folds while ensuring that no subject appears in both sets.
For each feature, the lower and upper 5th quantiles of the 1-minute features are clipped at the 5th quantiles to mitigate the effects of outliers. A key challenge is the variable length of records (i.e., the number of 1-minute features per subject). To address this, during training, we use a balanced sampling approach, randomly subsampling 1-minute features per subject to match the subject with the fewest available samples, ensuring equal representation of subjects across the entire data collection sessions in the training set. This random selection procedure is repeated 10 times, and with a high probability ensures that all 1-minute segments of all subjects are used for training and testing. Considering the duration of the sessions (ranging from 74 to 180 minutes), for the subject with the shortest session length, all 1-minute segments are used across the 10 random repetitions (in the folds in which the subject is in the training set). For the subject with the longest session length of 180 minutes, the probability of a 1-minute segment not being randomly selected in each of the 10 repetitions is , and the probability of it not being selected across all 10 random repetitions is 0.5810 = 4.3 × 10−3, which is negligible. This probability is even smaller for the shorter sessions. Therefore, using this subsampling approach, with a probability close to one, all 1-minute segments of all subjects appear in both the training and test procedures, while each subject has an equal representation in the training set.
Each classification model undergoes hyperparameter optimization using nested stratified group 4-fold cross-validation on the training set, allocating 25% of the training set for nested validation and 75% for nested training. A grid search over the set of hyperparameters listed in Table 5, with stratified subject-wise cross-validation, maximizes the AUROC score while preventing overfitting.
Table 5:
Hyperparameter search space for optimizing machine learning models.
| Models | Hyperparameters | Search Space |
|---|---|---|
| LR | C (regularization inverse) | {0.001, 0.01, 0.1, 1, 10, 100} |
| SVM | Kernel type | {‘linear’, ‘rbf’} |
| C (regularization inverse) | {0.001, 0.01, 0.1, 1, 10, 100} | |
| XGB | Number of trees | {10, 50, 100, 200} |
| Maximum depth | {3, 5, 7, 10} | |
| Observation subsampling | {0.5, 1} | |
| Feature subsampling | {0.5, 1} | |
| RF | Number of trees | {10, 50, 100, 200} |
| Maximum depth | {‘None’, 5, 10, 20} |
The cross-validation process is repeated 10 times with different random seeds to mitigate any biases due to initial conditions across training and test folds. Performance metrics are calculated on the test fold for each repetition, with means and standard deviations reported across all ten repetitions. The average ROC and PR curves are generated using the test set for all classifiers.
Fig. 6a and Fig. 6b show the overlaid ROC curves for all four models in BDI-II and PHQ-9 classification. Accordingly, for BDI-II-based classification, tree-based models (Random Forest and XGBoost) and SVM perform robustly, each achieving an average AUROC above 0.81, while Logistic Regression (as a Generalized Linear Model) performs slightly lower but remains acceptable. This suggests that both linear and non-linear classifiers can effectively distinguish between groups, with non-linear methods offering modest advantages.
For PHQ-9-based results, classification performance is generally lower in terms of AUROC, with SVM attaining the highest AUROC of 0.78. Fig. 6c and Fig. 6d further illustrate the PR curves, highlighting SVM’s strong performance, while LR’s effectiveness declines as recall increases.
Table 6 presents the classification performance metrics for BDI-II and PHQ-9 scores, including AUROC, AUPRC, accuracy, sensitivity, specificity, and F1-score. The operating points are selected to achieve 75% sensitivity on the training ROC plots, in accordance with the performance milestones of the PRESCRIBE project.
Table 6:
Classification performances for various models using BDI-II and PHQ-9 scores, reported at a sensitivity threshold of 75% (on the training set). The best-performing models are highlighted in bold.
| Analysis | Model | AUROC | AUPRC | Accuracy | Sensitivity | Specificity | F1-score |
|---|---|---|---|---|---|---|---|
| BDI-II | LR | 0.72 ± 0.15 | 0.80 ± 0.13 | 0.64 ± 0.12 | 0.77 ± 0.16 | 0.52 ± 0.24 | 0.61 ± 0.13 |
| SVM | 0.83 ± 0.11 | 0.86 ± 0.11 | 0.72 ± 0.11 | 0.76 ± 0.18 | 0.67 ± 0.23 | 0.69 ± 0.13 | |
| XGB | 0.81 ± 0.12 | 0.81 ± 0.14 | 0.73 ± 0.09 | 0.78 ± 0.19 | 0.68 ± 0.18 | 0.71 ± 0.10 | |
| RF | 0.81 ± 0.12 | 0.81 ± 0.15 | 0.72 ± 0.10 | 0.78 ± 0.18 | 0.67 ± 0.21 | 0.70 ± 0.11 | |
| PHQ-9 | LR | 0.67 ± 0.14 | 0.85 ± 0.10 | 0.62 ± 0.13 | 0.74 ± 0.18 | 0.39 ± 0.29 | 0.53 ± 0.14 |
| SVM | 0.78 ± 0.11 | 0.89 ± 0.08 | 0.72 ± 0.13 | 0.78 ± 0.14 | 0.64 ± 0.27 | 0.68 ± 0.14 | |
| XGB | 0.74 ± 0.12 | 0.87 ± 0.09 | 0.72 ± 0.10 | 0.80 ± 0.15 | 0.55 ± 0.24 | 0.65 ± 0.11 | |
| RF | 0.76 ± 0.12 | 0.87 ± 0.09 | 0.73 ± 0.10 | 0.80 ± 0.14 | 0.59 ± 0.27 | 0.67 ± 0.13 |
For BDI-II scores, SVM achieved the highest AUROC and AUPRC, at 0.83 ± 0.11 and 0.86 ± 0.11, respectively. XGB demonstrated the highest accuracy (0.73 ± 0.09), sensitivity (0.78 ± 0.19), specificity (0.68 ± 0.18), and F1-score (0.71 ± 0.10) at the chosen operating point. XGB and RF had identical performance in terms of AUROC (0.81±0.12), both outperforming LR across all metrics. For PHQ-9 scores, SVM again achieved the highest AUROC (0.78 ± 0.11) and AUPRC (0.89 ± 0.08), followed closely by RF, which had an AUROC of 0.76±0.12 and the highest accuracy (0.73±0.10). LR showed notably lower performance across all metrics compared to the other models. As noted, the target performance for the PRESCRIBE project was to achieve a sensitivity of 75%. Apparently, for any imperfect classifier, there is a tradeoff between sensitivity and specificity [65], and the classification results will change by selecting a different operating point, as seen in Fig. 6.
The substantial performance gap between all classifiers and the random chance baseline confirms the discriminative power of our features, though this gap is more pronounced for BDI-II than PHQ-9. A notable trend across all models is the higher AUPRC and sensitivity but lower specificity for PHQ-9 classification compared to BDI-II, suggesting that models trained on PHQ-9 scores are more effective at identifying depression cases but also more prone to false positives. This difference could be associated with the varying focus and structure of the BDI-II and PHQ-9 questionnaires, with BDI-II potentially capturing depression aspects more strongly reflected in cardiovascular measures. The F1-score further highlights this gap; for instance, XGB’s F1-score decreases from 0.71±0.10 for BDI-II to 0.65±0.11 for PHQ-9. This decline suggests that PHQ-9-based classification models may struggle more with false positives or negatives, resulting in a less balanced trade-off between precision and recall.
4.4.3. Classification Feature Importance:
We conducted SHAP (SHapley Additive exPlanations) analysis to rank feature importance across different classifiers for both BDI-II- and PHQ-9-based depression assessments.
For the BDI-II-based classification task, the SVM with an RBF kernel, which achieved the highest AUC score, identified Systolic-SD1 and P-wave ρ as the most influential features.
A similar pattern is observed in the PHQ-9 classification results. Systolic-SD1 remained the most prominent feature in the RBF-based SVM model, which achieved both the highest AUROC and F1-score. This consistency across both depression metrics (BDI-II and PHQ-9) reinforces the reliability of these features as potential biomarkers for depression.
SHAP analysis further suggests that higher values of these key features (shown in red) generally corresponded to an increased likelihood of depression classification, particularly evident in the broader distribution patterns of systolic-SD1 and RR-SD1. This aligns with previous discriminative power analyses, where both RR-SD1 and systolic-SD1 demonstrated strong statistical significance in distinguishing between depressed and non-depressed individuals.
These findings suggest that a consistent subset of heart rate variability and heart rate dynamics parameters—particularly systolic-SD1, RR-SD1, pulse-SD1, and P-wave ρ—serve as reliable indicators of depression across different classification approaches and assessment metrics.
In Appendix B, we further investigate the effectiveness of regression models in predicting depression severity scores (BDI-II and PHQ-9) from ECG- and PPG-based markers, where Random Forest demonstrates the best overall performance, though all models exhibit a tendency to regression-to-the-mean effect [66].
5. Discussion
The results of this study provide compelling insights into the relationship between cardiovascular activity-based features extracted from ECG-PPG and depression, using BDI-II and PHQ-9 scores to label the participants. This aligns with recent research using independently collected datasets, which has demonstrated the feasibility of using ECG and PPG data to classify individuals with MDD [20,21], identify depression based on ECG [67], or associate depression with metabolic alterations (which, in turn, affect cardiovascular function) [68]. Additionally, other lines of research have explored the genetic basis of depression, further underscoring its multifactorial nature [69].
For comparison, [21] reported an accuracy of 85.7% in classifying healthy versus MDD participants using time-frequency features and LSTM classifiers. Although their MDD group had higher PHQ-9 scores (average 19) compared to our depressed group (average 12.8). This baseline difference may explain the performance difference between clinical MDD classification and our prescreening approach. Another study extracted more abstract features such as statistical entropy and coherence from RR, respiratory, and PAT intervals, generating 20 features for healthy-MDD classification using random forest and logistic regression [20]. Herein, we focused on the prescreening capacity of ECG and PPG, using physiologically interpretable features that contribute to healthy-depressed classification. We also employed subject-wise evaluation to ensure real-world generalizability.
The statistical analysis and feature visualization of our results highlight several key findings discussed below.
The Poincaré plots in Fig. 4 expand traditional RR interval analysis to include systolic and diastolic intervals, showing significant differences between healthy (BDI-II < 14) and depressed (BDI-II ≥ 14) groups. Notably, the group with depression exhibited a lower average RR interval and a reduced phase-space scatter width (lower SD1), which indicates a higher resting heart rate and decreased HRV. This aligns with previous studies that have explored the relationship between depression and autonomic dysregulation [70].
Based on Table 4, the significance of SD1 across various heart rate intervals in distinguishing between healthy and depressed individuals underscores the importance of short-term HRV as a potential biomarker for depression. This aligns with existing research on autonomic nervous system dysfunction in depression, particularly reduced parasympathetic activity [70]. The consistent identification of SD1 features across both BDI-II and PHQ-9 groupings further supports the reliability of this metric as a robust indicator of depressive states, independent of the specific assessment tool used. Additionally, the ρ of P-wave in BDI-II analysis suggests that depression may impact (or be correlated with) cardiac electrical conduction patterns.
PCA and t-SNE visualizations showed clear separation between healthy and depressed groups, especially with BDI-II scores (Fig. 5a, Fig. 5b), reinforcing the discriminative power of cardiac features. In contrast, PHQ-9 scores showed less distinct separation (Fig. 5c, Fig. 5d), aligning with classification results (Table 6) and suggesting a stronger link between BDI-II scores and cardiovascular activity.
The SHAP value analysis highlighted key features in classification. Systolic-SD1 emerged as the most important feature, particularly for SVM, in both BDI-II and PHQ-9-based classifications, supporting its potential role in depression screening and aligning with literature linking altered cardiovascular dynamics to depression [71]. RR-SD1 and Pulse-SD1 also consistently ranked as important in SVM for both classification models, in line with research associating short-term HRV measures, such as SD1, with parasympathetic nervous system activity, which is often disrupted in depression [70].
The stronger association of cardiovascular features with depression symptoms measured by BDI-II, compared to PHQ-9, suggests that BDI-II may be more sensitive to the physiological aspects of depression, which aligns with previous studies [72].
The regression analysis presented in Appendix B, highlights the potential of cardiovascular features as indicators of depression severity, while it is generally more challenging than the classification problem, requiring further investigations on a larger population.
5.1. Limitations of the Study and Future Work
While the research findings highlight the link between cardiovascular health and mental well-being, there are limitations that require further investigation in future work.
This study focused solely on cardiovascular-related modalities—ECG and PPG. Future research should incorporate additional modalities from the PRESCRIBE study, particularly EEG and its N400 responses to psycholinguistic stimuli. PRESCRIBE emphasized the diversity of the sensor suite and the interactions between different modalities and the stimuli. Future studies may focus on a reduced set of wearable sensors, allowing for a larger and more diverse participant pool across varying levels of health and depression.
The data collection sessions in PRESCRIBE lasted a few hours. Hypothetically, patterns of fatigue and stress, which have proven impacts on cardiac biomarkers such as the QT interval, may differ between healthy and depressed individuals. While our features were aggregated across the entire data collection session, in future work, we may study and model the temporal patterns of cardiac and non-cardiac modalities used in PRESCRIBE across healthy and depressed individuals.
The age distribution of the PRESCRIBE study participants was skewed towards younger adults (with 50% of the participants aged between 20 and 30). To ensure the generalization of the proposed framework, future studies should be conducted on larger and more diverse age groups.
While our models did not explicitly incorporate users’ interactions and responses to the psycholinguistic stimuli, it remains unclear whether the observed results were entirely independent of the experimental context. Factors such as the experimental ambiance—including potential stress, cognitive load, and fatigue—may have influenced participants’ physiological responses. This raises questions about the generalizability of our findings beyond the controlled laboratory setting. Future research could address this question by incorporating sham-like experimental scenarios, where control groups are exposed to generic, neutral vignettes rather than depression-relevant stimuli. This would help disentangle the effects of the experimental setup from intrinsic physiological patterns associated with depression.
The inherent mismatches between BDI-II and PHQ-9 scores highlight the need for a more objective depression assessment. This mismatch also impacts the performance of machine learning models trained on these scores as labels. In future research, more specific tests, such as the MINI (which was only conducted at Emory in our study) or a psychologist’s assessment, may be used to adjudicate the disparities between BDI-II- and PHQ-9-based labels.
Importantly, in this work, we emphasized the “prescreening potential” of cardiac modalities for depression rather than their use as a definitive screening tool. We envision that, using the findings of this research, we can develop a mental health prescreening technology that enables individuals to be accurately prescreened remotely in ambulatory settings (e.g., at home). This technology could use a wearable set of sensors alongside an online questionnaire conducted on mobile phones or tablets to refer individuals to psychiatric clinics for a more accurate diagnosis or for further evaluation using additional physiological measurements or psychiatric instruments. To accomplish this, the generalizability of our machine learning models should be further examined using commercial-grade ECG and PPG wearable sensors.
Although our recruitment criteria excluded individuals with a history of cardiovascular disease, cardiovascular biomarkers of depression can be generally confounded with other cardiac conditions. Further investigation is needed to assess the specificity of these biomarkers against other cardiovascular conditions or clinical conditions such as anxiety, ensuring the generalizability of our findings.
While many additional features could be extracted from ECG and PPG signals, the current study focused on standard single and bimodal physiological time intervals—such as QT, RR, systolic, and diastolic durations—that have established clinical and psychological relevance to mental states. This hypothesis-driven approach was adopted to ensure electrophysiological explainability and generalizability, rather than pursuing a purely machine learning-driven strategy aimed at optimizing performance metrics. Given the relatively small sample size, avoiding high-dimensional or arbitrarily defined features also helped reduce the risk of overfitting and prevented information leakage between training and testing sets. Future work may expand this approach to explore a broader range of features and even deep learning models that operate directly on time-series data.
In the current study, due to the limited number of participants, we did not perform a sex-specific analysis, as the sample size was inadequate for making statistically significant conclusions regarding the impact of biological sex on the performance of the models. However, recent studies have reported sex-specific differences in ECG characteristics [73–75], suggesting that biological sex may serve as an additional input feature for depression detection/scoring classifiers. We acknowledge this as a limitation of our study, as the current findings cannot be considered generalizable without accounting for potential sex-based variations. Future research involving larger and more balanced cohorts will be essential to develop and validate sex-specific models or to incorporate biological sex as a feature for these models.
6. Conclusion
This study demonstrates the feasibility of using cost-effective, accessible ECG and PPG technologies for preliminary depression prescreening through heart activity-based data collection modalities in a psycholinguistic experiment. Key predictors such as Systolic-SD1, Pulse-SD1, and RR-SD1 are consistently linked to depression, aligning with existing research on autonomic dysfunction. Integrating these physiological markers into mental health assessments could enhance early detection and monitoring, particularly in non-clinical settings.
Future research may incorporate additional physiological modalities, expand participant diversity, and investigate the specificity of these biomarkers between depression and cardiovascular diseases.
Figure 7:

SHAP summary plots of one-minute features showing feature importance and impact for the SVM classifier using BDI-II- and PHQ-9-based classification results. Colors indicate feature values (red: high, blue: low), and SHAP values represent the impact on model output, with positive values indicating an increased likelihood of depression classification.
Acknowledgment
This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) and Naval Information Warfare Center Pacific (NIWC Pacific) under Contract N6600123C4002. Any opinions, findings and conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of DARPA or NIWC Pacific. R. Sameni also acknowledges support from the American Heart Association through the Innovative Project Award #23IPA1054351 on Developing Multimodal Cardiac Biomarkers for Cardiovascular-related Health Assessment.
Appendix A. Spectral Interpretation of Poincaré Plot Features
In Section 3.4, the general properties of Poincaré plots were discussed for a random process xn, corresponding to any of the discussed cardiovascular time-intervals. Further insights can be gained from a spectral perspective. Defining the zero-mean version of as , a first-order autoregressive model for can be expressed as: and is zero-mean process noise independent from with variance , where is the variance of and is the correlation coefficient between and (as defined in Section 3.4). Therefore, the autocorrelation function of at lag k fulfills , or . This relationship can also be shown in the spectral domain:
| (A.1) |
Therefore, the Poincaré features SD1 and SD2, derived in (2) are closely related to ρ and the spectral characteristics of . Higher correlation coefficients result in a narrower spectrum.
Appendix B. Depression Severity Prediction
We briefly investigated whether machine learning models could predict depression severity scores (BDI-II and PHQ-9) from ECG- and PPG-based markers. The regression pipeline followed the classification pipeline detailed in Section 4, with the key difference being the optimization criterion in the grid search: instead of maximizing AUROC, we minimized the mean squared error for the validation fold.
Table B1:
Regression performance across different models for BDI-II and PHQ-9 scores
| Analysis | Model | RMSE | MAE | Corr |
|---|---|---|---|---|
| BDI-II | LR | 13.39 ± 1.05 | 11.56 ± 0.87 | 0.32 ± 0.19 |
| SVM | 13.17 ± 1.98 | 11.19 ± 1.83 | 0.50 ± 0.13 | |
| XGB | 12.25 ± 1.55 | 10.29 ± 1.43 | 0.53 ± 0.19 | |
| RF | 12.24 ± 1.51 | 10.18 ± 1.38 | 0.53 ± 0.17 | |
| PHQ-9 | LR | 6.56 ± 1.07 | 5.61 ± 0.99 | 0.39 ± 0.20 |
| SVM | 6.45 ± 1.56 | 5.38 ± 1.58 | 0.48 ± 0.24 | |
| XGB | 6.27 ± 1.45 | 5.31 ± 1.30 | 0.47 ± 0.22 | |
| RF | 6.31 ± 1.42 | 5.27 ± 1.24 | 0.47 ± 0.21 |
Table B1 presents regression performance metrics, including root mean square error (RMSE), mean absolute error (MAE), and the Pearson correlation coefficient across all models. For BDI-II prediction, Random Forest demonstrated the best performance in both error metrics and correlation. It achieved an RMSE of 12.24 ± 1.51, an MAE of 10.18±1.38, and a correlation coefficient of 0.53±0.17. We also tested other regression models. XGBoost showed similar performance, with only marginally higher errors than Random Forest. SVM exhibited lower performance.
For PHQ-9 prediction, while the absolute error metrics were lower than for BDI-II, this primarily reflected differences in scoring scales. Random Forest achieved an RMSE of 6.31±1.42, an MAE of 5.27±1.24, and a correlation coefficient of 0.47±0.21. Notably, correlation values were lower for PHQ-9 prediction compared to BDI-II.
Fig. B1 illustrates scatter plots of predicted versus true BDI-II and PHQ-9 scores for Random Forest as the best model, with the identity line representing perfect prediction and dashed lines marking clinical healthy-depressed thresholds. These plots could be interpreted similarly to confusion matrices: points in the lower left and upper right quadrants represented correct classifications of healthy and depressed states, respectively, while points in the upper left and lower right quadrants indicated false positives and false negatives. Across both BDI-II and PHQ-9 predictions, models tended to underestimate high scores and overestimate low scores, suggesting a possible “regression-to-the-mean” effect [66], which was more pronounced in PHQ-9 predictions. This pattern indicated potential limitations in capturing the full range of depression severity, particularly for extreme cases, and warrants further investigation.
Figure B1:

Regression results for predicting BDI-II and PHQ-9 scores using Random Forest using ECG-PPG features. The identity line represents perfect predictions; dashed lines indicate thresholds from Table 1.
References
- [1].World Health Organization (WHO), “Depression,” https://www.who.int/news-room/fact-sheets/detail/depression, 2024, accessed January 16, 2025.
- [2].Centers for Disease Control and Prevention (CDC), “Suicide facts,” https://www.cdc.gov/suicide/facts/index.html, 2024, last modified July 23, 2024; accessed January 16, 2025.
- [3].Cheng Y-C, Su M-I, Liu C-W, Huang Y-C, and Huang W-L, “Heart rate variability in patients with anxiety disorders: A systematic review and meta-analysis,” Psychiatry and Clinical Neurosciences, vol. 76, no. 7, pp. 292–302, 2022. [DOI] [PubMed] [Google Scholar]
- [4].Guendelman S, Kaltwasser L, Bayer M, Gallese V, and Dziobek I, “Brain mechanisms underlying the modulation of heart rate variability when accepting and reappraising emotions,” Scientific reports, vol. 14, no. 1, p. 18756, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Goldstein BI, Carnethon MR, Matthews KA, McIntyre RS, Miller GE, Raghuveer G, Stoney CM, Wasiak H, and McCrindle BW, “Major depressive disorder and bipolar disorder predispose youth to accelerated atherosclerosis and early cardiovascular disease: a scientific statement from the American Heart Association,” Circulation, vol. 132, no. 10, pp. 965–986, 2015. [DOI] [PubMed] [Google Scholar]
- [6].Umair M, Chalabianloo N, Sas C, and Ersoy C, “HRV and stress: a mixed-methods approach for comparison of wearable heart rate sensors for biofeedback,” IEEE Access, vol. 9, pp. 14 005–14 024, 2021. [Google Scholar]
- [7].Omoto ACM, Lataro RM, Silva TM, Salgado HC, Fazan R, and Silva LEV, “Heart rate fragmentation, a novel approach in heart rate variability analysis, is altered in rats 4 and 12 weeks after myocardial infarction,” Medical & biological engineering & computing, vol. 59, pp. 2373–2382, 2021. [DOI] [PubMed] [Google Scholar]
- [8].Chalmers JA, Quintana DS, Abbott MJ-A, and Kemp AH, “Anxiety disorders are associated with reduced heart rate variability: a meta-analysis,” Frontiers in psychiatry, vol. 5, p. 80, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Koch C, Wilhelm M, Salzmann S, Rief W, and Euteneuer F, “A meta-analysis of heart rate variability in major depression,” Psychological Medicine, vol. 49, no. 12, p. 1948–1957, 2019. [DOI] [PubMed] [Google Scholar]
- [10].Kemp AH, Quintana DS, Gray MA, Felmingham KL, Brown K, and Gatt JM, “Impact of depression and antidepressant treatment on heart rate variability: a review and meta-analysis,” Biological psychiatry, vol. 67, no. 11, pp. 1067–1074, 2010. [DOI] [PubMed] [Google Scholar]
- [11].Castaldo R, Melillo P, Bracale U, Caserta M, Triassi M, and Pecchia L, “Acute mental stress assessment via short term HRV analysis in healthy adults: A systematic review with meta-analysis,” Biomedical Signal Processing and Control, vol. 18, pp. 370–377, 2015. [Google Scholar]
- [12].Blood JD, Wu J, Chaplin TM, Hommer R, Vazquez L, Rutherford HJ, Mayes LC, and Crowley MJ, “The variable heart: High frequency and very low frequency correlates of depressive symptoms in children and adolescents,” Journal of affective disorders, vol. 186, pp. 119–126, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Sharma R and Meena HK, “Machine learning-based prediction of depression and anxiety using ECG signals,” in Signal Processing Driven Machine Learning Techniques for Cardiovascular Data Processing. Elsevier, 2024, pp. 65–80. [Google Scholar]
- [14].Shaw V, Ngo QC, Pah ND, Oliveira G, Khandoker AH, Mahapatra PK, Pankaj D, and Kumar DK, “Screening major depressive disorder in patients with obstructive sleep apnea using single-lead ECG recording during sleep,” Health Informatics Journal, vol. 30, no. 4, p. 14604582241300012, 2024. [DOI] [PubMed] [Google Scholar]
- [15].Neha, Sardana H, Kanwade R, and Tewary S, “Arrhythmia detection and classification using ECG and PPG techniques: A review,” Physical and Engineering Sciences in Medicine, vol. 44, no. 4, pp. 1027–1048, 2021. [DOI] [PubMed] [Google Scholar]
- [16].Chowdhury MR, Madanu R, Abbod MF, Fan S-Z, and Shieh J-S, “Deep learning via ECG and PPG signals for prediction of depth of anesthesia,” Biomedical Signal Processing and Control, vol. 68, p. 102663, 2021. [Google Scholar]
- [17].Liu S, Huang Z, Zhu J, Liu B, and Zhou P, “Continuous blood pressure monitoring using photoplethysmography and electrocardiogram signals by random forest feature selection and gwo-gbrt prediction model,” Biomedical Signal Processing and Control, vol. 88, p. 105354, 2024. [Google Scholar]
- [18].Shao J, Shi P, Hu S, Liu Y, and Yu H, “An optimization study of estimating blood pressure models based on pulse arrival time for continuous monitoring,” Journal of Healthcare Engineering, vol. 2020, no. 1, p. 1078251, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Zitouni MS, Lih Oh S, Vicnesh J, Khandoker A, and Acharya UR, “Automated recognition of major depressive disorder from cardiovascular and respiratory physiological signals,” Frontiers in Psychiatry, vol. 13, p. 970993, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Alzate M, Torres R, De la Roca J, Quintero-Zea A, and Hernandez M, “Machine Learning Framework for Classifying and Predicting Depressive Behavior Based on PPG and ECG Feature Extraction,” Applied Sciences, vol. 14, no. 18, p. 8312, 2024. [Google Scholar]
- [21].Zitouni MS and Khandoker A, “Depressed patients identification using cardiovascular signals,” in 2022 Computing in Cardiology (CinC), vol. 498. IEEE, 2022, pp. 1–4. [Google Scholar]
- [22].Alshanskaia EI, Zhozhikashvili NA, Polikanova IS, and Martynova OV, “Heart rate response to cognitive load as a marker of depression and increased anxiety,” Frontiers in Psychiatry, vol. 15, p. 1355846, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Defense Advanced Research Projects Agency (DARPA), “NEAT: Neural Evidence Aggregation Tool,” 2023, accessed January 16, 2025. [Online]. Available: https://www.darpa.mil/research/programs/neural-evidence-aggregation-tool
- [24].——, “Suicide Screening Tool,” 2023, accessed January 16, 2025. [Online]. Available: https://www.darpa.mil/news/2023/suicide-screening-tool
- [25].Sameni R, Cestero GI, Nateghi M, Sharpe VP, Chen C, Yang Y, Vallampati A, Chitadze L, Choi A, Bouzid Z, not provided D, Vyas J, Shallenberger L, Murray C, Choi J, Vollmer I, Karimi S, Bull R, Winder A, Stone BT, Kuperberg GR, Lynn SK, Bracken BK, Douglas Bremner J, and Inan OT, “A Psycholinguistics Protocol with Simultaneous Multimodal Physiological Data Collection for Individualized Pre-Screening Depressive Disorders,” Jun. 2025. [Online]. Available: 10.17504/protocols.io.dm6gpme3dgzp/v1 [DOI] [Google Scholar]
- [26].Beck AT, Steer RA, Ball R, and Ranieri WF, “Comparison of Beck Depression Inventories-IA and-II in Psychiatric Outpatients,” Journal of Personality Assessment, vol. 67, no. 3, p. 588–597, Dec. 1996. [Online]. Available: 10.1207/s15327752jpa670313 [DOI] [PubMed] [Google Scholar]
- [27].Kroenke K, Spitzer RL, and Williams JBW, “The PHQ-9: Validity of a brief depression severity measure,” Journal of General Internal Medicine, vol. 16, no. 9, p. 606–613, Sep. 2001. [Online]. Available: 10.1046/j.1525-1497.2001.016009606.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Sheehan DV, Lecrubier Y, Sheehan KH, Amorim P, Janavs J, Weiller E, Hergueta T, Baker R, Dunbar GC et al. , “The Mini-International Neuropsychiatric Interview (MINI): the development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10,” Journal of clinical psychiatry, vol. 59, no. 20, pp. 22–33, 1998. [PubMed] [Google Scholar]
- [29].Peirce J, Gray JR, Simpson S, MacAskill M, Höchenberger R, Sogo H, Kastman E, and Lindeløv JK, “PsychoPy2: Experiments in behavior made easy,” Behavior Research Methods, vol. 51, no. 1, p. 195–203, Feb. 2019. [Online]. Available: 10.3758/s13428-018-01193-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Peirce J, Hirst R, and MacAskill M, Building experiments in PsychoPy. Sage, 2022. [Google Scholar]
- [31].Jo YT, Lee SW, Park S, and Lee J, “Association between heart rate variability metrics from a smartwatch and self-reported depression and anxiety symptoms: a four-week longitudinal study,” Frontiers in Psychiatry, vol. 15, p. 1371946, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Wang Z, Luo Y, Zhang Y, Chen L et al. , “Heart rate variability in generalized anxiety disorder, major depressive disorder and panic disorder: A network meta-analysis and systematic review,” Journal of Affective Disorders, vol. 330, pp. 259–266, 2023. [DOI] [PubMed] [Google Scholar]
- [33].Tang M, Xi J, and Fan X, “QT interval is correlated with and can predict the comorbidity of depression and anxiety: A cross-sectional study on outpatients with first-episode depression,” Frontiers in Cardiovascular Medicine, vol. 9, p. 915539, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Atilan Fedai U, Fedai H, and Tanriverdi Z, “Cardiac Clues in Major Depressive Disorder: Evaluating Electrical Risk Score as a Predictive Electrocardiography Biomarker,” Medicina, vol. 61, no. 6, p. 1026, 2025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Titov N, Dear BF, McMillan D, Anderson T, Zou J, and Sunderland M, “Psychometric Comparison of the PHQ-9 and BDI-II for Measuring Response during Treatment of Depression,” Cognitive Behaviour Therapy, vol. 40, no. 2, p. 126–136, Jun. 2011. [Online]. Available: 10.1080/16506073.2010.550059 [DOI] [PubMed] [Google Scholar]
- [36].Kendall MG, Rank Correlation Methods. London: Charles Griffin, 1948. [Google Scholar]
- [37].Park K, Jaekal E, Yoon S, Lee S-H, and Choi K-H, “Diagnostic Utility and Psychometric Properties of the Beck Depression Inventory-II Among Korean Adults,” Frontiers in Psychology, vol. 10, Jan. 2020. [Online]. Available: 10.3389/fpsyg.2019.02934 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Muñoz-Navarro R, Cano-Vindel A, Medrano LA, Schmitz F, Ruiz-Rodríguez P, Abellán-Maeso C, Font-Payeras MA, and Hermosilla-Pasamar AM, “Utility of the PHQ-9 to identify major depressive disorder in adult patients in Spanish primary care centres,” BMC Psychiatry, vol. 17, no. 1, Aug. 2017. [Online]. Available: 10.1186/s12888-017-1450-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Karimi S, Karimi S, Shah AJ, Clifford GD, and Sameni R, “Electromechanical Dynamics of the Heart: A Study of Cardiac Hysteresis During Physical Stress Test,” arXiv preprint arXiv:2410.19667, 2024. [Google Scholar]
- [40].Kazemnejad A, Karimi S, Gordany P, Clifford GD, and Sameni R, “An open-access simultaneous electrocardiogram and phonocardiogram database,” Physiological Measurement, vol. 45, no. 5, p. 055005, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Jamshidian-Tehrani F and Sameni R, “Fetal ECG extraction from time-varying and low-rank noninvasive maternal abdominal recordings,” Physiological measurement, vol. 39, no. 12, p. 125008, 2018. [DOI] [PubMed] [Google Scholar]
- [42].Sameni R, The Open-Source Electrophysiological Toolbox (OSET) , version 4.0, 2006–2025. [Online]. Available: https://github.com/alphanumericslab/OSET.git
- [43].Pan J and Tompkins WJ, “A Real-Time QRS Detection Algorithm,” Biomedical Engineering, IEEE Transactions on, vol. BME-32, no. 3, pp. 230–236, 1985. [DOI] [PubMed] [Google Scholar]
- [44].Liang Y, Elgendi M, Chen Z, and Ward R, “An optimal filter for short photoplethysmogram signals,” Scientific data, vol. 5, no. 1, pp. 1–12, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Pal R, Rudas A, Kim S, Chiang JN, Barney A, and Cannesson M, “An algorithm to detect dicrotic notch in arterial blood pressure and photoplethysmography waveforms using the iterative envelope mean method,” Computer Methods and Programs in Biomedicine, vol. 254, p. 108283, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Karimi S and Shamsollahi MB, “Tractable inference and observation likelihood evaluation in latent structure influence models,” IEEE Transactions on Signal Processing, vol. 68, pp. 5736–5745, 2020. [Google Scholar]
- [47].Karimi S and Shamsollahi MB, “Tractable maximum likelihood estimation for latent structure influence models with applications to eeg & ecog processing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 8, pp. 10466–10477, 2023. [DOI] [PubMed] [Google Scholar]
- [48].Goda MÁ, Charlton PH, and Behar JA, “pyPPG: A Python toolbox for comprehensive photoplethysmography signal analysis,” Physiological Measurement, vol. 45, no. 4, p. 045001, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Abdullah S, Hafid A, Folke M, Lindén M, and Kristoffersson A, “PPGFeat: a novel MATLAB toolbox for extracting PPG fiducial points,” Frontiers in Bioengineering and Biotechnology, vol. 11, p. 1199604, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Rajala S, Ahmaniemi T, Lindholm H, and Taipalus T, “Pulse arrival time (PAT) measurement based on arm ECG and finger PPG signals-comparison of PPG feature detection methods for PAT calculation,” in 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2017, pp. 250–253. [DOI] [PubMed] [Google Scholar]
- [51].Sun S, Bezemer R, Long X, Muehlsteff J, and Aarts R, “Systolic blood pressure estimation using PPG and ECG during physical exercise,” Physiological measurement, vol. 37, no. 12, p. 2154, 2016. [DOI] [PubMed] [Google Scholar]
- [52].Li F, Xu P, Zheng S, Chen W, Yan Y, Lu S, and Liu Z, “Photoplethysmography based psychological stress detection with pulse rate variability feature differences and elastic net,” International Journal of Distributed Sensor Networks, vol. 14, no. 9, p. 1550147718803298, 2018. [Google Scholar]
- [53].Nickel M and Kiela D, “Poincaré embeddings for learning hierarchical representations,” Advances in neural information processing systems, vol. 30, 2017. [Google Scholar]
- [54].Brennan M, Palaniswami M, and Kamen P, “Poincaré plot interpretation using a physiological model of HRV based on a network of oscillators,” American Journal of Physiology-Heart and Circulatory Physiology, vol. 283, no. 5, pp. H1873–H1886, 2002. [DOI] [PubMed] [Google Scholar]
- [55].Khandoker AH, Karmakar C, Brennan M, Palaniswami M, and Voss A, Poincaré plot methods for heart rate variability analysis. Springer, 2013. [Google Scholar]
- [56].Ciccone AB, Siedlik JA, Wecht JM, Deckert JA, Nguyen ND, and Weir JP, “Reminder: RMSSD and SD1 are identical heart rate variability metrics,” Muscle & nerve, vol. 56, no. 4, pp. 674–678, 2017. [DOI] [PubMed] [Google Scholar]
- [57].Liao L-M, Al-Zaiti SS, and Carey MG, “Depression and heart rate variability in firefighters,” SAGE open medicine, vol. 2, p. 2050312114545530, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [58].Ha JH, Park S, Yoon D, and Kim B, “Short-term heart rate variability in older patients with newly diagnosed depression,” Psychiatry research, vol. 226, no. 2–3, pp. 484–488, 2015. [DOI] [PubMed] [Google Scholar]
- [59].Lilliefors HW, “On the Kolmogorov-Smirnov test for normality with mean and variance unknown,” Journal of the American statistical Association, vol. 62, no. 318, pp. 399–402, 1967. [Google Scholar]
- [60].Wilcoxon F, Katti S, Wilcox RA et al. , “Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test,” Selected tables in mathematical statistics, vol. 1, pp. 171–259, 1970. [Google Scholar]
- [61].Van der Maaten L and Hinton G, “Visualizing data using t-SNE,” Journal of machine learning research, vol. 9, no. 11, 2008. [Google Scholar]
- [62].Chen X.-w. and Wasikowski M, “Fast: a ROC-based feature selection metric for small samples and imbalanced data classification problems,” in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008, pp. 124–132. [Google Scholar]
- [63].Van Hulse J, Khoshgoftaar TM, and Napolitano A, “A comparative evaluation of feature ranking methods for high dimensional bioinformatics data,” in 2011 IEEE International Conference on Information Reuse & Integration. IEEE, 2011, pp. 315–320. [Google Scholar]
- [64].Bradley AP, “The use of the area under the ROC curve in the evaluation of machine learning algorithms,” Pattern recognition, vol. 30, no. 7, pp. 1145–1159, 1997. [Google Scholar]
- [65].Sameni R, “On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves,” arXiv, Apr. 2025. [Online]. Available: https://arxiv.org/abs/2504.02169 [Google Scholar]
- [66].Barnett AG, “Regression to the mean: what it is and how to deal with it,” International Journal of Epidemiology, vol. 34, no. 1, p. 215–220, Aug. 2004. [Online]. Available: 10.1093/ije/dyh299 [DOI] [PubMed] [Google Scholar]
- [67].Zang X, Li B, Zhao L, Yan D, and Yang L, “End-to-End Depression Recognition Based on a One-Dimensional Convolution Neural Network Model Using Two-Lead ECG Signal,” Journal of Medical and Biological Engineering, vol. 42, no. 2, p. 225–233, Feb. 2022. [Online]. Available: 10.1007/s40846-022-00687-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [68].Singh P, Vasundhara B, Das N, Sharma R, Kumar A, and Datusalia AK, “Metabolomics in Depression: What We Learn from Preclinical and Clinical Evidences,” Molecular Neurobiology, vol. 62, no. 1, p. 718–741, Jun. 2024. [Online]. Available: 10.1007/s12035-024-04302-5 [DOI] [PubMed] [Google Scholar]
- [69].Zhong X, Chen Y, Chen W, Liu Y, Gui S, Pu J, Wang D, He Y, Chen X, Chen X, Qiao R, and Xie P, “Identification of Potential Biomarkers for Major Depressive Disorder: Based on Integrated Bioinformatics and Clinical Validation,” Molecular Neurobiology, vol. 61, no. 12, p. 10355–10364, May 2024. [Online]. Available: 10.1007/s12035-024-04217-1 [DOI] [PubMed] [Google Scholar]
- [70].Schiweck C, Piette D, Berckmans D, Claes S, and Vrieze E, “Heart rate and high frequency heart rate variability during stress as biomarker for clinical depression. A systematic review,” Psychological medicine, vol. 49, no. 2, pp. 200–211, 2019. [DOI] [PubMed] [Google Scholar]
- [71].Costa T, Taylor A, Black F, Hill S, McAllister-Williams RH, Gallagher P, and Watson S, “Autonomic dysregulation, cognition and fatigue in people with depression and in active and healthy controls: observational cohort study,” BJPsych Open, vol. 9, no. 4, p. e106, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [72].Titov N, Dear BF, McMillan D, Anderson T, Zou J, and Sunderland M, “Psychometric comparison of the PHQ-9 and BDI-II for measuring response during treatment of depression,” Cognitive behaviour therapy, vol. 40, no. 2, pp. 126–136, 2011. [DOI] [PubMed] [Google Scholar]
- [73].Perez Alday EA, Rad AB, Reyna MA, Sadr N, Gu A, Li Q, Dumitru M, Xue J, Albert D, Sameni R, and Clifford GD, “Age, sex and race bias in automated arrhythmia detectors,” Journal of Electrocardiology, vol. 74, p. 5–9, Sep. 2022. [Online]. Available: 10.1016/j.jelectrocard.2022.07.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [74].Kittnar O, “Sex Related Differences in Electrocardiography,” Physiological Research, vol. 72, no. Suppl 2, p. S127, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [75].Shah A, She H, elon l., Li Q, Roberts T, Stefanos L, Haddad G, Gupta S, Tarlapally N, Lampert R, Raggi P, Pearce B, Quyyumi A, Sameni R, Lewis T, Sullivan S, Bremner JD, Clifford G, and Vaccarino V, “Abstract 4145127: Sex Differences in Everyday Mood States and its Association with Autonomic Physiology,” Circulation, vol. 150, no. Suppl 1, p. A4145127–A4145127, Nov. 2024. [Google Scholar]
