Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

Research Square logoLink to Research Square
[Preprint]. 2024 Feb 15:rs.3.rs-3952163. [Version 1] doi: 10.21203/rs.3.rs-3952163/v1

Blood-based DNA methylation and exposure risk scores predict PTSD with high accuracy in military and civilian cohorts

Agaz Wani 1, Seyma Katrinli 2, Xiang Zhao 3, Nikolaos Daskalakis 4, Anthony Zannas 5, Allison Aiello 6, Dewleen Baker 7, Marco Boks 8, Leslie Brick 9, Chia-Yen Chen 10, Shareefa Dalvie 11, Catherine Fortier 12, Elbert Geuze 13, Jasmeet Hayes 14, Ronald Kessler 15, Anthony King 16, Nastassja Koen 17, Israel Liberzon 18, Adriana Lori 19, Jurjen Luykx 20, Adam Maihofer 21, William Milberg 22, Mark Miller 23, Mary Mufford 24, Nicole Nugent 25, Sheila Rauch 26, Kerry Ressler 27, Victoria Risbrough 28, Bart Rutten 29, Dan Stein 30, Murrary Stein 31, Robert Ursano 32, Mieke Verfaellie 33, Erin Ware 34, Derek Wildman 35, Erika Wolf 36, Caroline Nievergelt 37, Mark Logue 38, Alicia Smith 39, Monica Uddin 40, Eric Vermetten 41, Christiaan Vinkers 42
PMCID: PMC10896387  PMID: 38410438

Abstract

Background

Incorporating genomic data into risk prediction has become an increasingly useful approach for rapid identification of individuals most at risk for complex disorders such as PTSD. Our goal was to develop and validate Methylation Risk Scores (MRS) using machine learning to distinguish individuals who have PTSD from those who do not.

Methods

Elastic Net was used to develop three risk score models using a discovery dataset (n = 1226; 314 cases, 912 controls) comprised of 5 diverse cohorts with available blood-derived DNA methylation (DNAm) measured on the Illumina Epic BeadChip. The first risk score, exposure and methylation risk score (eMRS) used cumulative and childhood trauma exposure and DNAm variables; the second, methylation-only risk score (MoRS) was based solely on DNAm data; the third, methylation-only risk scores with adjusted exposure variables (MoRSAE) utilized DNAm data adjusted for the two exposure variables. The potential of these risk scores to predict future PTSD based on pre-deployment data was also assessed. External validation of risk scores was conducted in four independent cohorts.

Results

The eMRS model showed the highest accuracy (92%), precision (91%), recall (87%), and f1-score (89%) in classifying PTSD using 3730 features. While still highly accurate, the MoRS (accuracy = 89%) using 3728 features and MoRSAE (accuracy = 84%) using 4150 features showed a decline in classification power. eMRS significantly predicted PTSD in one of the four independent cohorts, the BEAR cohort (beta = 0.6839, p-0.003), but not in the remaining three cohorts. Pre-deployment risk scores from all models (eMRS, beta = 1.92; MoRS, beta = 1.99 and MoRSAE, beta = 1.77) displayed a significant (p < 0.001) predictive power for post-deployment PTSD.

Conclusion

Results, especially those from the eMRS, reinforce earlier findings that methylation and trauma are interconnected and can be leveraged to increase the correct classification of those with vs. without PTSD. Moreover, our models can potentially be a valuable tool in predicting the future risk of developing PTSD. As more data become available, including additional molecular, environmental, and psychosocial factors in these scores may enhance their accuracy in predicting the condition and, relatedly, improve their performance in independent cohorts.

Keywords: DNA methylation, Machine learning, PTSD, Risk scores

Background

Posttraumatic stress disorder (PTSD) is a psychiatric disorder that can develop after experiencing or witnessing a life-threatening event such as a war/combat, natural disaster, violence, or serious accident. PTSD occurs in 5–10% of the population, and females are twice as likely to experience PTSD as males [1]. PTSD commonly occurs together with other psychiatric disorders [2, [3, [4, [5] and has also been associated with other health conditions such as accelerated aging [6, [7], cardiovascular and metabolic disorders [8, [9], and poor physical health [10]. Consequently, the individual and societal burden caused by PTSD is quite high [11, [12]. Identifying individuals at elevated risk of PTSD would enhance the ability to develop timely preventive strategies and therapies for this disorder, which would, in turn, help to reduce the associated disease burden.

Incorporating genomic data into risk prediction has become an increasingly useful approach for rapid identification of individuals most at risk for complex disorders such as PTSD. In particular, polygenic risk scores (PRS) have been evaluated in both research and clinical contexts to estimate risk to develop complex disorders, including coronary artery disease, breast cancer, Type 2 diabetes, and Alzheimer’s Disease (reviewed in [13]). These genetically-based risk scores are attractive as they access lifetime risk for a particular disorder and leverage variation across hundreds to thousands of variants. However, PRSs typically explain only a small proportion of variance in risk for a particular disorder and do not capture environmental factors that influence risk or detect the effect of disease progression itself [14], both of which may be important to identifying individuals at highest risk for disease.

In contrast, risk scores based on DNA methylation levels, which are modifiable and dynamic, can potentially convey more information about disease risk. A growing literature has shown that approaches originally developed for generating PRS can be adapted for DNA methylation data (reviewed in [15, [16]). The resulting methylation risk scores (MRS) have been shown in some cases to be more indicative of current disease state [17] and health-related phenotypes [18], as well as more predictive of future disease risk [19], than PRS-based approaches. Indeed, for PTSD, which requires an environmental exposure—trauma/shocking event —to meet the requirements for a diagnosis, MRS-based risk scores that capture the differential effects of this exposure may be particularly informative for identifying trauma-exposed individuals most at risk for the disorder.

To this end, here we leverage a large, ancestrally diverse set of cohorts to take a first step toward developing methylation risk scores for PTSD. We focus specifically on developing scores that distinguish between those with vs. without the disorder (i.e., a diagnostic MRS that correctly classifies current cases vs. trauma-exposed controls), and attempt to replicate these MRS in multiple external validation cohorts. We further test whether these diagnostic risk scores have prognostic value, i.e., can predict future PTSD among individuals prior to trauma exposure. Finally, to gain insight into potential mechanisms, we investigate the biological significance associated with the CpGs that comprise the MRS.

Methods

Cohorts

In order to maximize the available data from which to develop risk scores using machine learning approaches, we created a discovery cohort comprised of 1226 individuals drawn from five cohorts (Table 1). Two of these cohorts are civilian— Detroit Neighborhood Health Study (DNHS) and Grady Trauma Project (GTP), and three cohorts are military — Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS), Marine Resilience Study (MRS I&II), and Prospective Research in Stress-related Military Operations (PRISMO). Details about each cohort are given in the supplementary file. The overall workflow of the pre-processing and methods combining data from the five cohorts is shown in supplementary file (Figure S1).

Table 1:

Demographic and clinical characteristics of the studies included in the discovery cohort.

Current PTSD
Cases Controls P value Total
N
 Army STARRS 42 111 153
 DNHS 31 385 416
 GTP 161 323 484
 MRS I&II 63 60 123
 PRISMO 17 33 50
 All 314 912 1226
Gender, Male (%)
 Army STARRS 42 (27) 111 (73) 153 (100)
 DNHS 10 (2) 161 (39) 171 (41)
 GTP 25 (5) 107 (22) 132 (27)
 MRS I&II 63 (51) 60 (49) 123 (100)
 PRISMO 17 (34) 33 (66) 50 (100)
 All 157 (50) 472 (51.8) 629 (51.3)
Age, mean (SD)
 Army STARRS 25.8 (5.1) 25.5 (5.2) 7.54E-01 25.6 (5.2)
 DNHS 51.6 (11.1) 55.6 (17.1) 7.66E-02 55.3 (16.8)
 GTP 41.7 (11.4) 42.4 (12.5) 5.48E-01 42.2 (12.1)
 MRS I&II 23.3 (2.3) 22.9 (1.9) 3.59E-01 23.1 (2.1)
 PRISMO 28.1 (10.1) 27.5 (9.1) 8.29E-01 27.7 (9.3)
 All 36.1 (13.4) 44.1 (18) 5.89E-16 42.1 (17.3)
PTSD symptoms severity, mean (SD)
 Army STARRS 56.9 (9.6) 22.4 (5.8) 4.83E-28 32 (17)
 DNHS 63 (16) 32.7 (11.4) 1.89E-11 34.9 (14.2)
 GTP 70.4 (18.6) 25.1 (16.9) 2.17E-32 38.5 (27.1)
 MRS I&II 65.4 (14.8) 13.6 (11.8) 1.30E-42 40.2 (29.2)
 PRISMO 42 (4.4) 27 (4.8) 6.72E-13 32.1 (8.5)
 All 63.1 (16.7) 27.7 (13.3) 5.88E-88 35.8 (20.5)
Self-reported Race/Ethnicity, N (%)
 Army STARRS
  African American 3 (2) 12 (7.8) 15 (9.8)
  White 29 (19) 88 (57.5) - 117 (76.5)
  Other 10 (6.5) 11 (7.2) - 21 (13.7)
 DNHS -
  African American 28 (6.7) 381 (91.6) - 409 (98.3)
  Other 3 (0.7) 4 (1) - 7 (1.7)
 GTP -
  African American 153 (31.6) 307 (63.4) - 460 (95)
  Other 8 (1.7) 16 (3.3) - 24 (5)
 MRS I&II -
  African American 2 (1.6) 2 (1.6) 4 (3.3)
  White 53 (43.1) 53 (43.1) - 106 (86.2)
  Other 8 (6.5) 5 (4.1) - 13 (10.6)
 PRISMO -
  African American 1 (2) 1 (2) - 2 (4)
  White 11 (22) 27 (54) - 38 (76)
  Other 5 (10) 5 (10) - 10 (20)
 All
  African American 187 (59.6) 703 (77.1) - 890 (72.6)
  White 93 (29.6) 168 (18.4) - 261 (21.3)
  Other 34 (10.8) 41 (4.5) - 75 (6.1)
Smoking Score, mean (SD)
 Army STARRS −5.4 (18.4) −7.8 (18) 4.75E-01 −7.1 (18.1)
 DNHS 3.8 (30.5) −0.6 (33) 4.45E-01 −0.3 (32.8)
 GTP −4.1 (35.4) −2.8 (35.4) 7.05E-01 −3.3 (35.4)
 MRS I&II −8.5 (17) −10.8 (15) 4.43E-01 −9.6 (16)
 PRISMO 1.3 (16.9) 2 (21.3) 9.06E-01 1.7 (19.7)
 All −4.1 (29.3) −2.9 (31.3) 5.21E-01 −3.2 (30.8)
Childhood Trauma, mean (SD)
 Army STARRS 7.1 (3.3) 6.3 (2.2) 1.44E-01 6.5 (2.5)
 DNHS 7.6 (5.7) 4.4 (3.4) 4.44E-03 4.7 (3.7)
 GTP 56.1 (20.1) 37.7 (13.4) 1.67E-21 43.8 (18.1)
 MRS I&II 41.7 (12.2) 37.5 (10.4) 4.10E-02 39.6 (11.5)
 PRISMO 5.5 (2.6) 2.8 (2.2) 1.03E-03 3.7 (2.7)
 All 39.1 (26.2) 18.5 (18.5) 3.3E-32 23.8 (22.6)
Cumulative Trauma, Mean (SD)
 Army STARRS 1 (0) 1 (0) - 1 (0)
 DNHS 12.2 (7) 6.1 (4.1) 3.63E-05 6.6 (4.7)
 GTP 7 (3.1) 4.4 (2.8) 2.33E-17 5.3 (3.1)
 MRS I&II 11.2 (2.9) 10.3 (3.8) 0.125 10.8 (3.4)
 PRISMO 6.5 (3.1) 5.9 (3.8) 0.557 6.1 (3.5)
 All 7.6 (4.8) 5.2 (4) 9.45E-15 5.8 (4.3)

Quality Control (QC) Procedures

DNAm from whole blood was measured using the Illumina MethylationEPIC BeadChip following the manufacturer’s recommended protocol. Raw DNAm β values were obtained, and a sex check was conducted using the minfi R package [20] to eliminate any sex-discordant samples. Quality control (QC) was performed on each cohort separately, using a standardized pipeline as previously described [21], producing 818,691 probes that passed QC. Normalization was carried out using the single-sample Noob (ssNoob) method in the minfi R package [20]. Furthermore, ComBat adjustment was performed, using an empirical Bayesian framework implemented in the SVA R package [22, [23] to reduce the likelihood of bias due to known batch effects (chip and position), while preserving the variation for age, sex (if applicable), and PTSD. The resulting QC’d data was used in subsequent analyses.

Estimation of Covariates

Smokingscores.

Studies have linked methylation at many genomic loci to smoking status [24, [25, [26, [27, [28, [29]. Therefore, to adjust for DNA methylation differences related to smoking, we calculated smoking scores from DNA methylation data based on the weights obtained from 39 CpGs located at 27 loci, as previously described [30].

Cell proportions.

It is important to consider cellular heterogeneity in EWAS [31] since whole blood contains various cell types, each with its own DNA methylation profile [32, [33]. To address this, cell proportions (CD4+T, CD8+T, Natural Killer (NK), B-cells, monocytes, and neutrophils) were estimated using reference data [34] and the Robust Partial Correlation (RPC) method implemented in the EpiDISH R package [35].

Ancestry principal components.

Several DNA methylation studies have found variations in DNA methylation levels among different populations (race/ethnicity) at certain CpG sites [36, [37, [38, [39, [40, [41]. Therefore, to account for population stratification in DNA methylation studies, ancestry principal components (PCs) were generated from methylation data using a subset of CpGs in close proximity to SNPs in data from the 1000 Genomes Project [42, [43]. As previously reported [42, [43], PC 2 and 3 were the components most correlated with ancestry and thus, used to adjust for population stratification in this study.

Covariate adjustment

To ensure accuracy of models, missing values in DNAm were imputed using the mean method [44, [45]. We then adjusted the DNAm data for potential confounding factors, including cell composition, ancestry, smoking score, sex (if applicable), and age, for models 1 and 2 (described below). The adjustment was made for each CpG by regressing out the covariates using linear regression and then replacing the values of CpG with the corresponding residuals [46]. For model 3 (described below), we also accounted for the two exposure variables of interest, cumulative trauma and childhood trauma, in addition to the covariates used in models 1 and 2. This was done to account for any differences related to exposure variables in individual cohorts.

Analysis

Overall Approach.

Our goal was to develop a series of models based on important (i.e. set of features with best classification accuracy) methylation- and (in some cases) exposure-related features to classify PTSD that would then be used to derive risk scores with which to predict PTSD. To train the models, we utilized unique, trauma exposed participants from the discovery cohort in a cross-sectional approach. Model 1 was designed to classify PTSD by including two exposure variables—cumulative trauma (number of traumatic events experienced) and childhood trauma (experienced at < 18 years of age)—along with DNAm data, as increasing levels of exposures are known to substantially increase the risk of developing PTSD [47, [48, [49, [50] and were thus hypothesized to contribute high predictive power to our model. The purpose of Model 2 was to classify PTSD using only DNAm data, without relying on the discriminatory power of cumulative trauma or childhood trauma; this model would enable potential application to cohorts in which only DNAm data were available. Model 3 was developed with a unique purpose, distinct from Model 1. Namely, it was created to account for variations in exposure variables among individual cohorts. In this model, exposure variables were intentionally excluded from the analysis because they were used as covariates in DNAm data adjustment. While Model 3 addresses the challenge of cohort-specific variations, it does not possess the same predictive power as Model 1, which incorporates these exposure variables. The adjusted data was then subjected to the following analysis processes.

Feature Selection and Scaling

We used SelectKBest in Scikit-learn [51], a univariate feature selection approach. This method computes ANOVA F-values based on univariate statistical tests to identify the best features in relation to a particular phenotype. We identified the most important features from DNAm and exposure variables (in cases of Models 1 & 2) based on the highest score. For Model 3, we selected features solely from DNAm data.The feature selection process was repeated 500 times, ranging from 10 to 5000 features with a 10-feature increment each time to determine the optimal feature set for the Elastic Net model best accuracy. As different studies/cohorts used different instruments to measure cumulative trauma and childhood trauma, we normalized the data using a min-max scale that ranged from [0, 1].

Training and Testing

In order to identify the best model to classify PTSD and determine risk scores, we trained three popular machine learning models —Random Forest, Lasso, and Elastic Net on 75% of the data, and then tested them on the remaining 25% using the Scikit-learn [51] framework. We also conducted a 10-fold cross-validation on training and testing data to evaluate the effectiveness of the models (Figure S1.1). After selecting the most accurate model (Model 1), we used important features (methylation and exposure variables) identified during the feature selection process to classify PTSD. Following covariate adjustment, we re-ran the feature selection process to identify important features for Model 3. Performance of the models was assessed using accuracy, precision, recall, f1-score and area under the curve (AUC) metrics.

Risk scores

Risk scores, which are the weighted sum of the important features, were created on discovery cohort test data (25%), using feature weights (effect sizes) from training data (75%) to test for an association between risk scores and PTSD. Model 1 contributed to the development of exposure and methylation risk scores (eMRS), whereas Model 2 provided methylation-only risk scores (MoRS). Finally, Model 3 led to the creation of methylation-only risk scores with adjusted exposure variables (MoRSAE).

A logistic model was employed to test for an association between risk scores (eMRS, MoRS and MoRSAE) and PTSD, and the Nagelkerke approach was used to assess the models’ resulting R-Squared (R2) values. Further, in cohorts with available pre-deployment DNA methylation, a logistic model was used to predict post-deployment PTSD using risk scores calculated from pre-deployment DNAm data and exposure data (Army STARRS, MRS I&II; note that these participants had their post-deployment DNAm data included in the discovery cohort analyses described above). For all analyses, a Wilcox rank-sum test was used to assess differences in risk scores between cases and controls. To investigate correlation among study variables in discovery and independent cohorts, Pearson’s and point-biserial correlation was used, as appropriate.

Independent Validation

To validate the risk scores, we tested their ability to distinguish those with vs. without PTSD in four independent, external cohorts using the same pre-processing and covariate adjustment pipeline as in the discovery cohort. Brief descriptions of the external cohorts (NCPTSD-TRACTS, BEAR, DCHS and PROGrESS) are provided in the Supplementary file. We utilized weights from significant features identified in models 1, 2, and 3 of the discovery cohort to generate risk scores i.e., eMRS, MoRS, and MoRSAE in the external cohorts. Similar to the discovery cohorts, we conducted Pearson and Point-Biserial correlation tests, association tests using logistic regression model, and Wilcox rank-sum tests on external cohorts.

Enrichment Analysis

To investigate the biological significance of the important CpGs identified in the feature selection step, we performed Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis using missMethyl [52]. Gene ontologies and KEGG pathways that reached a nominal significance level of p < 0.05 were considered important.

Results

Description of Discovery Cohort

Table 1 provides a summary of the demographic characteristics and clinical information of all participants (n=1226) in the discovery cohort with current PTSD. More information about cumulative and childhood trauma is provided in Table S1. A slight majority of participants were male (n=629). Two cohorts, DNHS and GTP, were comprised mostly of African Americans, while the remaining three cohorts were predominantly of European ancestry. In all cohorts, a significant difference in PTSD symptom severity was observed between cases and controls (p<0.05). With the exception of Army STARRS, childhood trauma also demonstrated a significant difference between PTSD cases and controls (p<0.05) in all cohorts. Finally, a significant difference was observed in cumulative trauma between cases and controls in DNHS and GTP (p < 0.001).

Development of Methylation Risk Scores to distinguish those with vs. without PTSD

We developed three different risk scores with the goal of distinguishing those with vs. without PTSD using machine learning approaches. Our first model, eMRS, included both exposure and DNA methylation variables and identified 3730 features (3728 CpGs, cumulative trauma, and childhood trauma) as important in the discovery cohort. Using these 3730 features, Elastic Net approaches were employed to achieve the best accuracy (92%; Figure 1), precision (91%), recall (87%), and f1-score (89%); Table 2 (See Fig. S2 for AUCs with Lasso and Random Forest approaches). The eMRS significantly predicted PTSD (beta = 2.64, p < 0.001), R2 =0 .70), with higher eMRS values in PTSD cases than controls (p<0.001; Figure 2A, left plot). Our second MoRS model, based solely on the 3728 methylation features in model 1, accurately classified PTSD with 89% accuracy and had an AUC of 95% (Figure 3; Table 2). Additionally, the precision, recall, and f1-score were at 86%, 83%, and 84%, respectively, as shown in Table 2. As with eMRS, the MoRS significantly predicted PTSD (beta =2, p < 0.001, R2 = .54) and had higher MoRS values in cases vs controls (p < 0.001) (Figure 2A, middle plot). Our third and final model (i.e., MoRSAE), which used DNA methylation data adjusted for the two exposure variables as well as the other covariates in models 1 and 2, identified 4150 significant features that classified PTSD with 84% accuracy and an AUC of 89% (Figure 4, with precision, recall, and f1-score at 80%, 77%, and 78%, respectively (Table 2). As with models eMRS and MoRS, MoRSAE significantly predicted PTSD (beta = 1.20, p < 0.001, R2= 0.36) and had significantly (p < 0.001) different, and higher, MoRSAE in PTSD cases vs. controls (Figure 2A, right plot). In summary, while all three models produced risk scores that significantly predicted PTSD in the test dataset, and showed higher scores in aggregate between cases and controls, there was a decline in effect size (b) and explanatory power (R2) such that eMRS > MoRS > MoRSAE.

Figure 1.

Figure 1

The confusion matrix for Model 1 displays an accuracy of 92% on test data (N = 307), while the ROC curve indicates an AUC of 96% during the 10-fold cross-validation using all data (N = 1226).

Table 2:

Model performance in accuracy, precision, recall, f1-score, and AUC. Elastic net performed best with an accuracy of 92%.

Model Accuracy (%) Precision (%) Recall (%) F1-score (%) AUC (%)
Elastic Net (Model 1) 92 91 87 89 96
Elastic Net (Model 2) 89 86 83 84 95
Elastic Net (Model 3) 84 80 77 78 89

Figure 2.

Figure 2

Distribution and variation of risk scores between cases and controls in test data (N =307) in figure legend, 0 is No PTSD and 1 is PTSD. A) The distribution of risk scores for Models 1, 2, and 3 is shown for both cases and controls. B) The difference in risk scores between cases and controls is displayed. Model 1 calculates exposure and methylation risk scores (eMRS), while Model 2 calculates risk scores based only on methylation variables (MoRS). Model 3 calculates risk scores based on methylation variables adjusted for exposure variables (MoRSAE). The risk scores are higher in PTSD cases compared to controls. The Wilcox test confirms a significant difference in risk scores between cases and controls with p < 0.001 for all models (1, 2, and 3).

Figure 3.

Figure 3

The confusion matrix for Model 2 displays an accuracy of 89% on test data (N = 307), while the ROC curve indicates an AUC of 95% during the 10-fold cross-validation using all data (N = 1226).

Figure 4.

Figure 4

The confusion matrix for Model 3 displays an accuracy of 84% on test data (N = 307), while the ROC curve exhibits an AUC of 89% during the 10-fold cross-validation process using all data (N = 1226).

Intercorrelation among study variables

A significant positive point-biserial correlation between eMRS and current PTSD was observed (ρ = .72, p < 0.001; Figure S3). Cumulative trauma (ρ = .40, p < 0.001) and childhood trauma (ρ = .57, p < 0.001) also showed a positive and significant correlation with eMRS. Notably, there was also a significant and positive point-biserial correlation (ρ = .62, p < 0.001) between methylation-only risk scores (MoRS) and PTSD, significant and positive correlation between cumulative trauma and MoRS (ρ = .16, p < 0.01) and childhood trauma and MoRS (ρ = .169, p < 0.01) (Figure S3). In contrast, while we observed a significant (p < 0.001) and positive point-biserial correlation (ρ = 0.49) between MoRSAE and PTSD (Figure S3), we observed a negative correlation between MoRSAE and cumulative trauma (ρ = −.13, p = 0.02) and childhood trauma (ρ = −.12, p = 0.03), respectively.

Validation of Risk Scores in External Cohorts

We conducted external validation on risk scores from the three different models across four external cohorts— NCPTSD-TRACTS, BEAR, DCHS and PROGrESS. The NCPTSD-TRACTS cohort demonstrated a noticeable distinction (p < 0.05) in childhood trauma, but not in cumulative trauma (Table S1) between cases and controls. Similar to the discovery cohorts, the BEAR cohort exhibited a significant difference in both cumulative trauma and childhood trauma when comparing cases and controls. The DCHS cohort, on the other hand, only showed a significant difference in cumulative trauma, while the PROGrESS cohort did not display any significant difference in trauma variables between cases and controls.

The eMRS significantly predicted PTSD in one external cohort, BEAR (beta = 0.6839, p = 0.006) (Table S2); in this cohort, there was also a significant correlation (ρ = .24, p = 0.003) between eMRS and PTSD (Figure S4) and a significant difference in eMRS between PTSD cases and controls (p = 0.02, Figure S5). The eMRS did not significantly predict PTSD in any of the other three independent cohorts; however, the correlation between eMRS and PTSD showed the same (i.e., positive) direction of effect in the NCPTSD-TRACTS (beta = 0.0598, p = 0.35), PROGrESS (beta = 0.1141, p = 0.53) and DCHS (beta = 0.0631, p = 0.81) cohorts (Figures S6–S11). For model 2, the MoRS did not significantly predict PTSD in any external cohort (NCPTSD-TRACTS: beta = −0.0977, p = 0.28; BEAR: beta = 0.0239, p = 0.93; PROGrESS: beta = 0.2156, p = 0.52; DCHS: beta = 0.3739, p = 0.37). On the other hand, for model 3, the MoRSAE approached significance in association with PTSD in the NCPTSD-TRACTS cohort (beta = −0.1707, p = 0.05) and had significant difference in risk scores between cases and controls (p = 0.018) (Figure S7); however, the direction of effect was opposite to that observed in the discovery cohort.

Testing of Pre-Deployment Risk scores to predict future PTSD

A compelling feature of risk scores is their ability to predict future disease risk. In our data, we were able to test the predictive ability of the MRS derived from our diagnostic/classification models on prospective risk of PTSD in two of our pre-deployment military cohorts, MRS I&II and Army STARRS (with data from the two cohorts analyzed together). MRS were calculated using “unseen” DNAm data from a pre-deployment timepoint, i.e. using DNAm data not included in the discovery cohort. All three models significantly predicted future PTSD based on risk scores calculated with pre-deployment data (eMRS beta = 1.92, p < 0.001, R2 = 0.53; MoRS beta = 1.99, p < 0.001, R2 = 0.46; and MoRSAE beta = 1.77, p < 0.001, R2 = 0.47) and had significant difference in risk scores between individuals who developed PTSD and those who did not (Figures 5, 6, 7).

Figure 5.

Figure 5

Distribution and difference in risk scores (eMRS) between PTSD cases and controls pre- and post-deployment (N = 262) — in figure legend, 0 is No PTSD and 1 is PTSD. A) The distribution of risk scores revealed that individuals who developed PTSD post-deployment had higher scores compared to those who did not, both before and after deployment. B) The difference in risk scores showed there was a significant (p < 0.001) difference in risk scores in those with PTSD post-deployment using Wilcox test.

Figure 6.

Figure 6

Distribution and difference in risk scores (MoRS) between cases and controls pre- and post-deployment (N = 262) — in figure legend, 0 is No PTSD and 1 is PTSD. A) Distribution of risk scores between cases and controls. Risk scores are higher in those who developed PTSD post-deployment than who didn’t in both pre and post deployment. B) Difference in risk scores between cases and controls. Wilcox test showed a significant difference (p < 0.001) in risk scores between cases and controls.

Figure 7.

Figure 7

Distribution and difference in risk scores (MoRSAE) between cases and controls pre- and post-deployment (N = 262) — in figure legend, 0 is No PTSD and 1 is PTSD. A) The distribution of MoRSAE is higher in those who developed PTSD post-deployment B) The difference in risk scores showed there was a significant (p < 0.001) difference in risk scores in those with PTSD post-deployment using Wilcox test.

Assessment of Biological Significance among Important CpGs

Gene ontology (GO) analysis on the set of 3728 CpGs from models 1 and 2 revealed 403 nominally significant GO terms; among the 4150 important CpGs from Model 3, 382 nominally significant GO terms were identified. There were 115 GO terms common between models, including regulation of muscle adaptation, positive regulation of autophagy of mitochondrion, and sucrose metabolic process. Additionally, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis identified 47 pathways for models 1 and 2 and 25 pathways for model 3 at p < 0.05 (list of GO and KEGG in excel sheet as a supplementary file). Further, 14 pathways were common in models 1 and 2, and model 3, including, HIF-1 signaling pathway, mTOR signaling pathway, Insulin signaling pathway and Galactose metabolism. None of the GO terms or KEGG pathways passed the multiple hypothesis correction test.

Discussion

It is crucial to identify individuals who are at a higher risk of developing PTSD in order to provide timely preventive measures and effective therapeutic interventions. Methylation risk scores (MRS) offer dynamic and modifiable genomic-based insights into disease risk. In this study, we leveraged machine learning and a diverse set of cohorts to develop MRS for PTSD, with an initial aim of distinguishing those with vs. without PTSD and to predict future PTSD cases. MRS derived from three different models demonstrated both high precision and high accuracy in predicting PTSD (i.e., identifying probable PTSD cases vs. controls) in the test dataset and, moreover, significantly predicted future PTSD. Although our approach did not yield MRS that consistently predict PTSD in independent cohorts fell short, our work leverages data from a diverse set of cohorts to develop what is, to our knowledge, the first methylation-based risk scores for PTSD. Future work that builds on these results will help to advance personalized preventive strategies and therapeutic interventions for PTSD in order to reduce the impact of this debilitating disorder on individuals and society.

Among the three models tested, the eMRS model showed the highest accuracy and precision to classify PTSD by using both exposure and DNA methylation variables. The inclusion of exposure variables substantially adds to the predictive power of the model. This finding aligns with the literature that suggests that experiencing trauma, particularly during childhood, significantly increases the likelihood of developing PTSD [47, [53, [54]. It is noteworthy that, despite not including any trauma exposure factors, the second model (MoRS) and third model (MoRSAE) that solely utilized methylation data in training still displayed notable predictive ability in the test dataset. These findings suggest that, even without using trauma variables in prediction, DNA methylation can still provide significant predictive information about PTSD. This also emphasizes the significant impact that trauma can have on the epigenetic landscape, which is consistent with other research studies [55, [56] that reported methylation differences linked to trauma. Overall, the decrease in classification accuracy across the models in the test dataset, from eMRS to MoRSAE, highlights the crucial role and predictive power that both methylation and increased trauma exposure have in forecasting PTSD. Also, the significant and strong correlation between eMRS, MoRS, MORSAE, and PTSD, respectively, indicates that these scores can aid in the early identification and risk assessment for individuals susceptible to PTSD and thus could be a valuable tool.

While validating our findings in the external cohorts, our findings were supported by the BEAR cohort for eMRS and NCPTSD-TRACTS cohort for MoRSAE, in terms of significant association and prediction for PTSD. This variability in validating the results in external cohorts could be due to individual differences in each cohort, for example differences in the type or severity of trauma in each cohort. For instance, similar to discovery cohorts, the BEAR cohort was the only cohort among the external cohorts that showed a significant difference in both cumulative and childhood trauma between the participants who had PTSD and those who didn’t. These differences emphasize the difficulty in creating predictive models and risk scores that are generalizable and that can apply to every cohort. Future work based on larger datasets may overcome this issue.

The ability to predict PTSD prior to deployment is particularly important, as deployment is linked to a higher probability of trauma exposure than typically observed in community samples and higher trauma load increases risk for PTSD [57]. It’s notable that all three models (eMRS, MoRS and MoRSAE) demonstrated significant capability in predicting future development of PTSD based on pre-deployment data, in particular since these data preceded trauma exposure and were not included in the training or testing phase of MRS model development. This type of predictive model and risk scores could be useful tool in predicting risk for future PTSD in populations with anticipated trauma exposure, and taking preventive measures to mitigate the risk.

Previous work has leveraged DNA methylation data as one among many biomarker types included in risk score approaches to predicting PTSD [58, [59]. An earlier study focused on war zone-related PTSD identified a set of 343 candidate biomarkers, of which 98 were DNA methylation values associated with particular genes [58]. From our identified list of significant CpGs (3728 in models 1 and 2), cg16335858 in GYLTL1B (Glycosyltransferase-like 1B) was previously identified as a biomarker in diagnosing war zone-related PTSD [58]. From the list of 4150 CpGs (model 3), one additional CpG, cg25448062 in FADS1 (Fatty acid desaturase 1) was identified as a diagnostic biomarker in the same study. A subsequent study [59] showed prediction of post-deployment PTSD symptoms with the best AUC of 88% and CpGs cg01208318 and cg17137457 as top predictors but none of these were replicated in our study. More broadly, it is interesting to note that, 4 CpGs (cg04583842, cg04987734, cg16758086 and cg19719391) in genes BANP, CDC42BPB, CHD5 and Intergenic respectively, have been associated with PTSD in recent PGC EWAS meta-analyses (Katrinli et al., submitted). Our results build on these earlier studies, highlighting novel CpGs that, when combined in a weighted, risk score format, may contribute to PTSD prediction.

While MRS identify CpGs as important features independent of their mechanistic contribution to the disease/phenotype in question, examining the functional significance associated with these important features may shed light on the biological processes implicated in the disease. In this study, gene ontology results provide interesting clues about the biological mechanisms that may be involved in the development of PTSD. For example, positive regulation of autophagy of mitochondrion, identified as nominally significant biological processes in all three models, is noteworthy, as prior research has suggested that autophagy plays a role in neurodegenerative illnesses [60, [61, [62], and exploring its connection to PTSD could provide insights into the disorder’s neurobiological underpinnings. Additionally, the link to sucrose metabolic process is intriguing and raises questions about the relationship between energy metabolism and stress responses [63], as metabolic disorders have been associated with PTSD [64]. Through our KEGG pathway analysis, we discovered additional implicated pathways, including mTOR and insulin signaling. These pathways play a crucial role in cellular growth and metabolism, highlighting the extensive physiological effects of PTSD beyond psychological distress [64, [65].

Our study is not without limitations. Chief among these is our external validation results: Even though we attempted to replicate the results from our models on external cohorts we found that some cohorts did not show significant correlations or associations. This indicates that there could be variability based on population characteristics, highlighting the importance of being cautious when generalizing our results. We note that, to date, attempts to validate risk scores in external, independent cohorts, as done in this study are not common, and most work focusses on reporting results based on validation in a test (i.e., internal) dataset [14]. Results from this work highlight the need to increase efforts to do so, in order to increase the generalizability of findings.

Conclusions

We have presented three MRS that classify PTSD with high accuracy. Our models, especially a model that includes trauma exposure variables and DNAm (eMRS), reinforce earlier findings that methylation and trauma are interconnected and can be leveraged to increase the correct classification of those with vs. without PTSD, which may offer promising tools for early diagnosis and preventive strategies. Moreover, our models can potentially be a valuable tool in predicting the risk of developing PTSD in the future. Continued investigation into the functional significance of identified methylation features may help to shed additional light on the systemic pathology involved in this disorder. Finally, as more data become available, including additional molecular, environmental, and psychosocial factors in these scores may enhance their accuracy in predicting the condition and, relatedly, improve their performance in independent cohorts.

Funding

This work was supported by the National Institutes of Health (R01MD011728 to MU, DW, AEA, R01MH108826 to AKS, MWL, CMN, MU and R01MH106595 to CMN, KCK, KJR and MBS). Army STARRS was sponsored by the Department of the Army and funded under cooperative agreement number U01MH087981 with the U.S. Department of Health and Human Services, National Institutes of Health, National Institute of Mental Health (NIH/NIMH) to co-PIs Robert J. Ursano and Murray B. Stein. NCPTSD-TRACTS was supported by Translational Research Center for TBI and Stress Disorders (TRACTS), a VA Rehabilitation Research and Development Traumatic Brain Injury National Research Center (B3001-C) to CF, and RF1AG068121 to EW. DCHS was supported by the Bill and Melinda Gates Foundation (OPP 1017641). Additional support for DJS and NK was provided by the South African Medical Research Council. The BEAR cohort was supported by R01MH105379 to NRN. The PROGrESS Cohort was supported by DOD #W81XWH-11-1-0073 to SMR and by the National Center for Advancing Translational Sciences of the NIH Award #UL1TR000433 to GAM. Data collection of PRISMO was funded by the Dutch Ministry of Defense, and DNAm analyses were funded by the VIDI Award fellowship from the Netherlands Organization for Scientific Research (NWO, grant number 917.18.336 to BPFR). SK was supported by NIH BIRCWH K12HD085850. APK was supported by K23 MH112852. VBR was supported by VA Merit Award BX005872. EW was additionally supported by Merit Review Award Number I01 CX-001276-01 from the U.S. Dept. of Veterans Affairs CSRD Service.

Additional Declarations:

Competing interest reported. Competing interests Murray B. Stein has in the past 3 years received consulting income from Acadia Pharmaceuticals, Aptinyx, atai Life Sciences, BigHealth, Biogen, Bionomics, BioXcel Therapeutics, Boehringer Ingelheim, Clexio, Delix Therapeutics, Eisai, EmpowerPharm, Engrail Therapeutics, Janssen, Jazz Pharmaceuticals, NeuroTrauma Sciences, PureTech Health, Sage Therapeutics, Sumitomo Pharma, and Roche/Genentech. Dr. Stein has stock options in Oxeia Biopharmaceuticals and EpiVario. He has been paid for his editorial work on Depression and Anxiety (Editor-in-Chief), Biological Psychiatry (Deputy Editor), and UpToDate (Co-Editor-in-Chief for Psychiatry). He has also received research support from NIH, Department of Veterans Affairs, and the Department of Defense. He is on the scientific advisory board for the Brain and Behavior Research Foundation and the Anxiety and Depression Association of America. Dr. Chia-Yen Chen is an employee of Biogen. Dr. Nikolaos P. Daskalakis has served on scientific advisory boards for BioVie Pharma, Circular Genomics and Sentio Solutions for unrelated work. Dr. Nicole R. Nugent is a member of the scientific advisory board for Ilumivu. Dr. Sheila Rauch support from Wounded Warrior Project (WWP), Department of Veterans Affairs (VA), National Institute of Health (NIH), McCormick Foundation, Tonix Pharmaceuticals, Woodruff Foundation, and Department of Defense (DOD). Dr. Rauch also receives royalties from Oxford University Press and American Psychological Association Press. Dr Ressler reported receiving personal consulting fees from Sage Therapeutics, Senseye, Boerhinger Ingelheim, Jazz Pharmaceuticals, and Acer, Inc. and a sponsored research grant from Alto Neuroscience outside the submitted work.

Footnotes

Ethics approval and consent to participate

All participants included in analyses of the discovery and external validation cohorts provided consent for their participation. Details on the specific IRB approval associated with each study are provided in the Supplementary Material.

Supplementary Files

This is a list of supplementary files associated with this preprint. Click to download.

PTSDMethylationRiskScoreSupplementary.docx

Contributor Information

Agaz Wani, University of South Florida College of Public Health, Genomics Program.

Seyma Katrinli, Emory University Department of Gynecology and Obstetrics.

Xiang Zhao, Boston University School of Public Health.

Nikolaos Daskalakis, Broad Institute of MIT and Harvard, Stanley Center for Psychiatric Research.

Anthony Zannas, University of North Carolina at Chapel Hill, Carolina Stress Initiative.

Allison Aiello, Robert N Butler Columbia Aging Center, Columbia University.

Dewleen Baker, University of California San Diego, Department of Psychiatry.

Marco Boks, Brain Center University Medical Center Utrecht, Department of Psychiatry.

Leslie Brick, Alpert University, Brown University.

Chia-Yen Chen, Biogen Inc., Translational Sciences.

Shareefa Dalvie, University of Cape Town, Department of Pathology.

Catherine Fortier, Harvard Medical School, Department of Psychiatry.

Elbert Geuze, Netherlands Ministry of Defence, Brain Research and Innovation Centre.

Jasmeet Hayes, The Ohio State University, Department of Psychology.

Ronald Kessler, Harvard Medical School, Department of Health Care Policy.

Anthony King, The Ohio State University, College of Medicine, Institute for Behavioral Medicine Research.

Nastassja Koen, University of Cape Town, Department of Psychiatry & Mental Health.

Israel Liberzon, Texas A&M University College of Medicine, Department of Psychiatry and Behavioral Sciences.

Adriana Lori, Emory University, Department of Psychiatry and Behavioral Sciences.

Jurjen Luykx, UMC Utrecht Brain Center Rudolf Magnus, Department of Psychiatry.

Adam Maihofer, University of California, San Diego.

William Milberg, VA Boston Healthcare System, TRACTS/GRECC.

Mark Miller, Boston University School of Medicine, Psychiatry.

Mary Mufford, University of Cape Town, Neuroscience Institute.

Nicole Nugent, Alpert Brown Medical School, Department of Emergency Medicine.

Sheila Rauch, Emory University, Department of Psychiatry & Behavioral Sciences.

Kerry Ressler, Harvard Medical School, Department of Psychiatry.

Victoria Risbrough, University of California San Diego, Department of Psychiatry.

Bart Rutten, Maastricht Universitair Medisch Centrum, School for Mental Health and Neuroscience, Department of Psychiatry and Neuropsychology.

Dan Stein, University of Cape Town, Department of Psychiatry & Mental Health.

Murrary Stein, University of California San Diego, Department of Psychiatry.

Robert Ursano, Uniformed Services University, Department of Psychiatry.

Mieke Verfaellie, Boston University School of Medicine, Psychiatry.

Erin Ware, University of Michigan, Population Studies Center.

Derek Wildman, University of South Florida College of Public Health, Genomics Program.

Erika Wolf, VA Boston Healthcare System, National Center for PTSD.

Caroline Nievergelt, University of California San Diego, Department of Psychiatry.

Mark Logue, Boston University School of Public Health.

Alicia Smith, Emory University Department of Gynecology and Obstetrics.

Monica Uddin, University of South Florida College of Public Health, Genomics Program.

Eric Vermetten, Leiden University Medical Center, Department of Psychiatry.

Christiaan Vinkers, Amsterdam Neuroscience, Mood, Anxiety, Psychosis, Sleep & Stress Program.

Availability of data and materials

Owing to military cohort data sharing restrictions, data from MRS I& II, Army STARRS, PRISMO, and NCPTSD-TRACTS cannot be publicly posted. For other cohorts, individual-level data from the cohorts or cohort-level summary statistics may be made available to researchers following an approved analysis proposal through the PGC Post-traumatic Stress Disorder EWAS group with agreement of the cohort PIs. For additional information on access to these data, including PI contact information for the contributing cohorts, please contact the corresponding author.

References

  • 1.Yehuda R, Hoge CW, McFarlane AC, Vermetten E, Lanius RA, Nievergelt CM, et al. Post-traumatic stress disorder. Nature Reviews Disease Primers. 2015;1(1):15057. [DOI] [PubMed] [Google Scholar]
  • 2.Kessler RC, Sonnega A, Bromet E, Hughes M, Nelson CB. Posttraumatic stress disorder in the National Comorbidity Survey. Archives of general psychiatry. 1995;52(12):1048–60. [DOI] [PubMed] [Google Scholar]
  • 3.Kulka RA, Schlenger WE, Fairbank JA, Hough RL, Jordan BK, Marmar CR, et al. Trauma and the Vietnam war generation: Report of findings from the National Vietnam Veterans Readjustment Study. Philadelphia, PA, US: Brunner/Mazel; 1990. xxix, 322–xxix, p. [Google Scholar]
  • 4.Brady KT, Killeen TK, Brewerton T, Lucerini S. Comorbidity of psychiatric disorders and posttraumatic stress disorder. The Journal of clinical psychiatry. 2000;61 Suppl 7:22–32. [PubMed] [Google Scholar]
  • 5.Kessler RC, Wang PS. The Descriptive Epidemiology of Commonly Occurring Mental Disorders in the United States. 2008;29(1):115–29. [DOI] [PubMed] [Google Scholar]
  • 6.Lohr JB, Palmer BW, Eidt CA, Aailaboyina S, Mausbach BT, Wolkowitz OM, et al. Is Post-Traumatic Stress Disorder Associated with Premature Senescence? A Review of the Literature. The American Journal of Geriatric Psychiatry. 2015;23(7):709–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Katrinli S, Stevens J, Wani AH, Lori A, Kilaru V, van Rooij SJH, et al. Evaluating the impact of trauma and PTSD on epigenetic prediction of lifespan and neural integrity. Neuropsychopharmacology : official publication of the American College of Neuropsychopharmacology. 2020;45(10):1609–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Rosenbaum S, Stubbs B, Ward PB, Steel Z, Lederman O, Vancampfort D. The prevalence and risk of metabolic syndrome and its components among people with posttraumatic stress disorder: a systematic review and meta-analysis. Metabolism. 2015;64(8):926–33. [DOI] [PubMed] [Google Scholar]
  • 9.Dedert EA, Calhoun PS, Watkins LL, Sherwood A, Beckham JC. Posttraumatic Stress Disorder, Cardiovascular, and Metabolic Disease: A Review of the Evidence. Annals of Behavioral Medicine. 2010;39(1):61–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Pacella ML, Hruska B, Delahanty DL. The physical health consequences of PTSD and PTSD symptoms: A meta-analytic review. Journal of Anxiety Disorders. 2013;27(1):33–46. [DOI] [PubMed] [Google Scholar]
  • 11.Kessler RC. Posttraumatic stress disorder: the burden to the individual and to society. The Journal of clinical psychiatry. 2000;61 Suppl 5:4–12; discussion 3–4. [PubMed] [Google Scholar]
  • 12.Nichter B, Norman S, Haller M, Pietrzak RH. Physical health burden of PTSD, depression, and their comorbidity in the U.S. veteran population: Morbidity, functioning, and disability. Journal of Psychosomatic Research. 2019;124:109744. [DOI] [PubMed] [Google Scholar]
  • 13.Lambert SA, Abraham G, Inouye M. Towards clinical utility of polygenic risk scores. Hum Mol Genet. 2019;28(R2):R133–R42. [DOI] [PubMed] [Google Scholar]
  • 14.Yousefi PD, Suderman M, Langdon R, Whitehurst O, Davey Smith G, Relton CL. DNA methylation-based predictors of health: applications and statistical considerations. Nature reviews Genetics. 2022;23(6):369–83. [DOI] [PubMed] [Google Scholar]
  • 15.Huls A, Czamara D. Methodological challenges in constructing DNA methylation risk scores. Epigenetics. 2020;15(1–2):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Nabais MF, Gadd DA, Hannon E, Mill J, McRae AF, Wray NR. An overview of DNA methylation-derived trait score methods and applications. Genome Biol. 2023;24(1):28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Thompson M, Hill BL, Rakocz N, Chiang JN, Geschwind D, Sankararaman S, et al. Methylation risk scores are associated with a collection of phenotypes within electronic health record systems. NPJ Genom Med. 2022;7(1):50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.McCartney DL, Hillary RF, Stevenson AJ, Ritchie SJ, Walker RM, Zhang Q, et al. Epigenetic prediction of complex traits and death. Genome Biol. 2018;19(1):136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Clark SL, Hattab MW, Chan RF, Shabalin AA, Han LKM, Zhao M, et al. A methylation study of long-term depression risk. Molecular Psychiatry. 2020;25(6):1334–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD, et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics. 2014;30(10):1363–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Katrinli S, Maihofer AX, Wani AH, Pfeiffer JR, Ketema E, Ratanatharathorn A, et al. Epigenome-wide meta-analysis of PTSD symptom severity in three military cohorts implicates DNA methylation changes in genes involved in immune system and oxidative stress. Molecular Psychiatry. 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2006;8(1):118–27. [DOI] [PubMed] [Google Scholar]
  • 23.Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics (Oxford, England). 2012;28(6):882–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Ambatipudi S, Cuenin C, Hernandez-Vargas H, Ghantous A, Le Calvez-Kelm F, Kaaks R, et al. Tobacco smoking-associated genome-wide DNA methylation changes in the EPIC study. 2016;8(5):599–618. [DOI] [PubMed] [Google Scholar]
  • 25.Besingi W, Johansson ÅJHmg. Smoke-related DNA methylation changes in the etiology of human disease. 2014;23(9):2290–7. [DOI] [PubMed] [Google Scholar]
  • 26.Breitling LP, Yang R, Korn B, Burwinkel B, Brenner HJTAJoHG. Tobacco-smoking-related differential DNA methylation: 27K discovery and replication. 2011;88(4):450–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Guida F, Sandanger TM, Castagné R, Campanella G, Polidoro S, Palli D, et al. Dynamics of smoking-induced genome-wide methylation changes with time since smoking cessation. 2015;24(8):2349–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Joehanes R, Just AC, Marioni RE, Pilling LC, Reynolds LM, Mandaviya PR, et al. Epigenetic signatures of cigarette smoking. 2016;9(5):436–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Zong D, Liu X, Li J, Ouyang R, Chen P. The role of cigarette smoke-induced epigenetic alterations in inflammation. Epigenetics & Chromatin. 2019;12(1):65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Li S, Wong EM, Bui M, Nguyen TL, Joo J-HE, Stone J, et al. Causal effect of smoking on DNA methylation in peripheral blood: a twin and family study. Clinical epigenetics. 2018;10:18-. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Jaffe AE, Irizarry RA. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol. 2014;15(2):R31–R. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC bioinformatics. 2012;13:86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Reinius LE, Acevedo N, Joerink M, Pershagen G, Dahlén SE, Greco D, et al. Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PloS one. 2012;7(7):e41361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Salas LA, Koestler DC, Butler RA, Hansen HM, Wiencke JK, Kelsey KT, et al. An optimized library for reference-based deconvolution of whole-blood biospecimens assayed using the Illumina HumanMethylationEPIC BeadArray. Genome Biol. 2018;19(1):64-. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Teschendorff AE, Breeze CE, Zheng SC, Beck S. A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies. BMC Bioinformatics. 2017;18(1):105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Adkins RM, Krushkal J, Tylavsky FA, Thomas F. Racial differences in gene-specific DNA methylation levels are present at birth. Birth defects research Part A, Clinical and molecular teratology. 2011;91(8):728–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Heyn H, Moran S, Hernando-Herraez I, Sayols S, Gomez A, Sandoval J, et al. DNA methylation contributes to natural human variation. Genome research. 2013;23(9):1363–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Terry MB, Ferris JS, Pilsner R, Flom JD, Tehranifar P, Santella RM, et al. Genomic DNA methylation among women in a multiethnic New York City birth cohort. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology. 2008;17(9):2306–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Nielsen DA, Hamon S, Yuferov V, Jackson C, Ho A, Ott J, et al. Ethnic diversity of DNA methylation in the OPRM1 promoter region in lymphocytes of heroin addicts. Human genetics. 2010;127(6):639–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Husquin LT, Rotival M, Fagny M, Quach H, Zidane N, McEwen LM, et al. Exploring the genetic basis of human population differences in DNA methylation and their causal impact on immune gene regulation. Genome Biol. 2018;19(1):222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kwabi-Addo B, Wang S, Chung W, Jelinek J, Patierno SR, Wang BD, et al. Identification of differentially methylated genes in normal prostate tissues from African American and Caucasian men. Clinical cancer research : an official journal of the American Association for Cancer Research. 2010;16(14):3539–47. [DOI] [PubMed] [Google Scholar]
  • 42.Barfield RT, Almli LM, Kilaru V, Smith AK, Mercer KB, Duncan R, et al. Accounting for population stratification in DNA methylation studies. Genet Epidemiol. 2014;38(3):231–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Ratanatharathorn A, Boks MP, Maihofer AX, Aiello AE, Amstadter AB, Ashley-Koch AE, et al. Epigenome-wide association of PTSD from heterogeneous cohorts with a common multi-site analysis pipeline. American journal of medical genetics Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics. 2017;174(6):619–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Jamshidian M, Mata M. 2 - Advances in Analysis of Mean and Covariance Structure when Data are Incomplete**This research was supported in part by the National Science Foundation Grant DMS-0437258. In: Lee S-Y, editor. Handbook of Latent Variable and Related Models. Amsterdam: North-Holland; 2007. p. 21–44. [Google Scholar]
  • 45.Hasan MK, Alam MA, Roy S, Dutta A, Jawad MT, Das S. Missing value imputation affects the performance of machine learning: A review and analysis of the literature (2010–2021). Informatics in Medicine Unlocked. 2021;27:100799. [Google Scholar]
  • 46.Manduchi E, Fu W, Romano JD, Ruberto S, Moore JH. Embedding covariate adjustments in tree-based automated machine learning for biomedical big data analyses. BMC Bioinformatics. 2020;21(1):430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Breslau N, Peterson EL, Schultz LR. A Second Look at Prior Trauma and the Posttraumatic Stress Disorder Effects of Subsequent Trauma: A Prospective Epidemiological Study. Archives of general psychiatry. 2008;65(4):431–7. [DOI] [PubMed] [Google Scholar]
  • 48.Breslau Naomi, Ph.D.,, Chilcoat Howard D., Sc.D. ,, Kessler Ronald C., Ph.D., and, Davis Glenn C., M.D. Previous Exposure to Trauma and PTSD Effects of Subsequent Trauma: Results From the Detroit Area Survey of Trauma. 1999;156(6):902–7. [DOI] [PubMed] [Google Scholar]
  • 49.Brady KT, Back SE. Childhood trauma, posttraumatic stress disorder, and alcohol dependence. Alcohol research : current reviews. 2012;34(4):408–13. [PMC free article] [PubMed] [Google Scholar]
  • 50.LeardMann CA, Smith B, Ryan MAK. Do adverse childhood experiences increase the risk of postdeployment posttraumatic stress disorder in US Marines? BMC Public Health. 2010;10(1):437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. 2011;12(null %J J. Mach. Learn. Res.):2825–30. [Google Scholar]
  • 52.Phipson B, Maksimovic J, Oshlack A. missMethyl: an R package for analyzing data from Illumina’s HumanMethylation450 platform. Bioinformatics. 2015;32(2):286–8. [DOI] [PubMed] [Google Scholar]
  • 53.McFarlane AC. Posttraumatic stress disorder: a model of the longitudinal course and the role of risk factors. The Journal of clinical psychiatry. 2000;61 Suppl 5:15–20; discussion 1–3. [PubMed] [Google Scholar]
  • 54.Brewin CR, Andrews B, Valentine JD. Meta-analysis of risk factors for posttraumatic stress disorder in trauma-exposed adults. Journal of Consulting and Clinical Psychology. 2000;68(5):748–66. [DOI] [PubMed] [Google Scholar]
  • 55.Mehta D, Klengel T, Conneely KN, Smith AK, Altmann A, Pace TW, et al. Childhood maltreatment is associated with distinct genomic and epigenetic profiles in posttraumatic stress disorder. 2013;110(20):8302–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Uddin M, Aiello AE, Wildman DE, Koenen KC, Pawelec G, de Los Santos R, et al. Epigenetic and immune function profiles associated with posttraumatic stress disorder. Proc Natl Acad Sci U S A. 2010;107(20):9470–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Brewin CR, Andrews B, Valentine JD. Meta-analysis of risk factors for posttraumatic stress disorder in trauma-exposed adults. J Consult Clin Psychol. 2000;68(5):748–66. [DOI] [PubMed] [Google Scholar]
  • 58.Dean KR, Hammamieh R, Mellon SH, Abu-Amara D, Flory JD, Guffanti G, et al. Multi-omic biomarker identification and validation for diagnosing warzone-related post-traumatic stress disorder. Molecular Psychiatry. 2020;25(12):3337–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Schultebraucks K, Qian M, Abu-Amara D, Dean K, Laska E, Siegel C, et al. Pre-deployment risk factors for PTSD in active-duty personnel deployed to Afghanistan: a machine-learning approach for analyzing multivariate predictors. Mol Psychiatry. 2021;26(9):5011–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Guo F, Liu X, Cai H, Le W. Autophagy in neurodegenerative diseases: pathogenesis and therapy. Brain pathology (Zurich, Switzerland). 2018;28(1):3–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Nixon RA. The role of autophagy in neurodegenerative disease. Nature Medicine. 2013;19(8):983–97. [DOI] [PubMed] [Google Scholar]
  • 62.Mizushima N, Levine B, Cuervo AM, Klionsky DJ. Autophagy fights disease through cellular self-digestion. Nature. 2008;451(7182):1069–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Picard M, McManus MJ, Gray JD, Nasca C, Moffat C, Kopinski PK, et al. Mitochondrial functions modulate neuroendocrine, metabolic, inflammatory, and transcriptional responses to acute psychological stress. 2015;112(48):E6614–E23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Michopoulos V, Vester A, Neigh G. Posttraumatic stress disorder: A metabolic disorder in disguise? Experimental neurology. 2016;284(Pt B):220–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Ni L, Xu Y, Dong S, Kong Y, Wang H, Lu G, et al. The potential role of the HCN1 ion channel and BDNF-mTOR signaling pathways and synaptic transmission in the alleviation of PTSD. Translational Psychiatry. 2020;10(1):101. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Owing to military cohort data sharing restrictions, data from MRS I& II, Army STARRS, PRISMO, and NCPTSD-TRACTS cannot be publicly posted. For other cohorts, individual-level data from the cohorts or cohort-level summary statistics may be made available to researchers following an approved analysis proposal through the PGC Post-traumatic Stress Disorder EWAS group with agreement of the cohort PIs. For additional information on access to these data, including PI contact information for the contributing cohorts, please contact the corresponding author.


Articles from Research Square are provided here courtesy of American Journal Experts

RESOURCES