Abstract
Mediation analysis has been a useful tool for investigating the effect of mediators that lie in the path from the independent variable to the outcome. With the increasing dimensionality of mediators such as in (epi)genomics studies, high-dimensional mediation model is needed. In this work, we focus on epigenetic studies with the goal to identify important DNA methylations that act as mediators between an exposure disease outcome. Specifically, we focus on gene-based high-dimensional mediation analysis implemented with kernel principal component analysis to capture potential nonlinear mediation effect. We first review the current high-dimensional mediation models and then propose two gene-based analytical approaches: gene-based high-dimensional mediation analysis based on linearity assumption between mediators and outcome (gHMA-L) and gene-based high-dimensional mediation analysis based on nonlinearity assumption (gHMA-NL). Since the underlying true mediation relationship is unknown in practice, we further propose an omnibus test of gene-based high-dimensional mediation analysis (gHMA-O) by combing gHMA-L and gHMA-NL. Extensive simulation studies show that gHMA-L performs better under the model linear assumption and gHMA-NL does better under the model nonlinear assumption, while gHMA-O is a more powerful and robust method by combining the two. We apply the proposed methods to two datasets to investigate genes whose methylation levels act as important mediators in the relationship: (1) between alcohol consumption and epithelial ovarian cancer risk using data from the Mayo Clinic Ovarian Cancer Case-Control Study and (2) between childhood maltreatment and comorbid post-traumatic stress disorder and depression in adulthood using data from the Gray Trauma Project.
Keywords: DNA methylation, high-dimensional mediation analysis, nonlinear effect, Kernel principal components, omnibus test
Introduction
Mediation analysis has been an important and useful statistical approach in social science studies for investigating the mediating effects of intermediate variables that lie in the pathway from the independent variable to the dependent variable. It was proposed by Baron and Kenny in 1986 [1, 2] and then has been further extended and developed by taking nonlinearity, interactions, various types of mediating and outcome variables, and missing data into account [3–7]. In a single mediation analysis, the estimator of mediation effect can be obtained by solving a series of regression equations based on the procedure proposed by Baron and Kenny [1]. Recently, more attentions have been focused on mediation analysis with multiple mediators [8–11]. Given that mediation analysis can be used for potential causal inference, it has been widely applied in many scientific disciplines, including sociology, psychology, behavioral science, economics, epidemiology, public health science and genetics researches [4, 12–14]. In genetic studies, mediation analysis has been conducted to dissect the mediation effect of risk factors in the pathway from exposure (e.g. single nucleotide polymorphisms) to complex diseases, which can give insight into the causal mechanisms of some complex diseases [15, 16].
Most literatures mentioned above focus on low-dimensional mediators in a mediation analysis. With the rapid development of modern data collection techniques such as genomic sequencing technology, high-dimensional data are routinely collected. However, statistical method development based on high-dimensional mediators falls largely behind with only a few reported in the literatures [17–20]. In the context of multiple mediation analysis, Zhang et al. [17] proposed a high-dimensional mediation analysis model, termed HIMA, based on the idea of sure independence screening (SIS) and regularization techniques, to identify DNA methylations mediating the relationship between smoking and lung function reduction. Huang and Pan [18] proposed a transformation model based on the spectral decomposition, which can estimate the mediation effects of multiple mediators by a series of low-dimensional regression models and evaluate the significance of mediation effects using a Monte-Carlo testing procedure. In epigenome-wide association studies, Barfield et al. proposed to conduct mediation analysis under a composite null hypothesis of no mediation effect demonstrated by simulation study and real data analysis. Their proposed method has been shown good performance with adequate type I error rate and statistical power [21]. Chen et al. developed a model called directions of model for investigating high-dimensional mediation effects in a functional magnetic resonance imaging study. The idea is to transform high-dimensional mediators into orthogonal low-dimensional components iteratively with maximum likelihood estimation [19]. Zhao et al. [20] introduced a sparse high-dimensional mediation analysis model based on sparse principal component analysis (PCA) technique, which is the extension of Huang and Pan’s approach. Wu et al. [22] proposed a causal inference test based on high-dimensional desparsified estimators to identify potential CpG sites that lie in the pathway between drinking and epithelial ovarian cancer (EOC) risk. Gao et al. [23] recently extended the method by Zhang et al. [17] by adopting high-dimensional desparsified estimators and showed improved performance when mediators are correlated.
Although many high-dimensional mediation analysis methods were not specifically proposed for epigenetic studies, some were developed to identify important epigenetic markers that act as potential mediators between an exposure and a disease outcome, e.g. Wu et al. [22], Gao et al. [23] and Zhang et al. [17]. Recent epigenetic studies show that DNA methylations play vital roles in mediating the effect between an exposure and a disease outcome (see e.g. Bind et al. [24], Timms et al. [25], Barker et al. [26], Huang et al. [27], Bozack et al. [28, 29] and Jordahl et al. [30]). Furthermore, it is well known that genes are functional units in living organisms and a gene typically contains multiple methylation signals. Thus, it makes more biological sense to identify the mediation effect of the whole gene rather than focusing on individual methylation site when performing a mediation analysis in an epigenetic study. Although the aforementioned multi-mediator models can be applied to epigenetics studies to identify important methylation mediators, they mostly focus on identifying individual players and cannot be directly applied to assess gene-level effects. This motivates us to develop a gene-based mediation analysis method in epigenetic studies while considering the nature of high data dimension.
In practice, the relationship between multiple mediators and a disease outcome can be linear or nonlinear. Under a linear relationship, we propose a gene-based high-dimensional mediation analysis method based on linearity assumption, termed as gHMA-L. On the other hand, the relationship between multiple mediators and a disease outcome can be nonlinear. Under this assumption, following Schölkopf et al., we propose extending the linear PCA to kernel PCA (KPCA), a useful tool for nonlinear feature extraction via kernel trick [31, 32]. KPCA has been broadly applied in genomics studies to capture nonlinear relationships [33–36]. We call this gene-based mediation analysis incorporating nonlinear effect as gHMA-NL. Given that the true relationship is never known in real applications, it is important to develop a method that can be flexible, yet robust enough to miss-specification of any fitting model. For this purpose, we adopt a Cauchy transformation to combine P-values obtained with gHMA-L and gHMA-NL using an aggregated Cauchy association test (ACAT) [37]. ACAT functions as the minimum P-value approach without the need to do a permutation test to assess the null distribution of the minimum P-value. We call this gene-based omnibus test as gHMA-O, which integrates gHMA-L and gHMA-NL.
In this article, we first review the linear mediation analysis model with single mediator and multiple mediators and then introduce two gene-based high-dimensional mediation analysis methods based on linear (gHMA-L) and nonlinear (gHMA-NL) assumptions. The two methods are further integrated together by an omnibus test strategy, termed as gHMA-O. We carry out extensive simulation studies to evaluate the performance of our proposed methods. We then apply our proposed methods to two epigenetic data sets, the EOC cancer dataset and the Gray Trauma Project (GTP) dataset to identify important gene-level methylations that mediate the effect of exposure on disease outcomes. The proposed method can be extended to other high-dimensional mediation studies in which the interest is focused on evaluating the effect of a gene set or a pathway.
Statistical method for mediation analysis
Linear mediation analysis
We first review the linear mediation analysis model, which is also called mediation analysis model with a single mediator. Let X denotes the independent (or exposure) variable, Y denotes the dependent (or outcome) variable and M denotes the mediating variable (or mediator). The relationship among the independent, dependent and mediating variables is shown in Figure 1A. The mediation analysis model is implemented under a causal relationship in which the independent variable affects the mediating variable, which in turn affects the dependent variable [1]. The mediation analysis model with a single mediator can be expressed in the form of three regression equations [1]:
![]() |
![]() |
(1) |
![]() |
where
measures the total effect from X to Y;
measures the relationship between X and M; coefficients
and
in the third equation, respectively, measure the partial relationship between X and Y and between M and Y. Note that the mediation analysis can only be performed to detect mediators that mediate the effect of X on Y if the coefficient γ* in the first equation of model (1) shows statistical significance.
Figure 1 .

The path diagram of the low-dimensional mediation model illustrating the total effect and direct effect of the exposure on the outcome and the mediating effect of the exposure on the outcome through the mediator: (A) The relationship among the exposure (X), the outcome (Y) and the mediator (M) in a simple mediation model; (B) the relationship among the exposure (X), the outcome (Y) and multiple mediators (M) in a multiple mediation model, and M’s are independent.
Under the mediation analysis, the total effect (
of the exposure on the outcome can be apportioned into the direct effect (
) of the exposure on the outcome and the indirect effect of the exposure on the outcome through the mediator. The indirect effect is denoted by
, which is also called the mediating effect. The total effect, direct effect and indirect effect can be estimated by fitting a series of regression equations in model (1). For testing of the mediation effect, Baron and Kenny [1] suggested that it can be obtained by separately testing the regression coefficients
and
. Notice that the mediation effect can be considered as significance if both the coefficients
and
are tested to be statistically significant. Several methods can be employed to measure the significance of the mediation effect, such as using the delta method and bootstrap method to obtain the variance of the mediation effect estimator for generalized linear models and nonlinear predictive models [38, 39].
For a multiple mediator model, the path diagram of the mediation model with multiple mediators is similar to the path of single mediation model, which is depicted in Figure 1B. In the multiple mediator model, the mediators are typically independent. The estimators and significance of the mediation effect are measured by adopting the procedure of product-of-coefficients and joint significance test, which is similar to the single mediation model. The mediating effect of X on Y through M can be measured by
, so the mediating effect in Figure 1B can be measured by
. Preacher and Hayes [10] suggested that analyzing multiple mediators in one model has advantages over single-mediator analyses. This strategy, however, cannot be directly applied to the high-dimensional case where the number of mediators is large and can be potentially correlated.
High-dimensional linear mediation analysis
When the number of mediators is large or larger than the sample size, a high-dimensional mediation analysis model is needed. Zhang et al. [17] proposed a HIMA model, which is based on the multiple mediator model proposed by Preacher and Haye [10]. In addition to high data dimension, mediators are often correlated in a high-dimensional case. The correlation structure typically varies from region to region in the genome. The relationship among exposure, mediators and outcome variable is shown in Figure 2 where the double arrow sign between mediators indicates correlation.
Figure 2 .

The relationship among exposure, mediators and outcome variable in the high-dimensional mediation model. Mediating variable M is in high dimensional and often is correlated. Double arrows refer to the correlation between mediation variables. We call the left arm from X to M as the α arm and right arm from M to Y as the β arm.
Under the high-dimensional linear mediation model, the following regression equations are considered to estimate and test mediation effects [17]:
![]() |
![]() |
(2) |
![]() |
where
represents the total effect of exposure variable X on the outcome variable Y, and
in the third equation of model (2) represents the direct effect of X on Y adjusting for all mediators. In this model, the relationship between M and outcome variable Y is assumed to be linear and the mediation effect of X on Y through M can be defined as
. Zhang et al. [17] suggested to apply the sure independent screening (SIS) and regularization technique to reduce the dimension of the mediators from ultra-high to relatively high and then perform further analysis under the reduced dimension. To make the paper self-contained, we briefly summarize the practical procedures of HIMA proposed by Zhang et al. [17], which has three steps.
Step 1. Screening: The SIS [40] is employed to reduce the dimension of potential mediators.
Step 2. Regularized estimation. The minimax concave penalty is used for variable selection to find important mediators according to the estimators of
.
Step 3. Joint significance test. A regression model is refitted with variables selected at Step 2. The significance of mediation effect
is evaluated according to
.
The analytical procedure for HIMA is implemented in R package called HIMA.
Gene-based high-dimensional linear mediation analysis
Note that the HIMA method proposed by Zhang et al. [17] focuses on identifying individual methylation site that shows significant mediation effect. As we all know that genes are the basic functional unit in the process of human growth and development, as well as the development of complex disease. This motivates us to consider genes as testing units to assess the mediation effect at the gene level, rather than at the individual methylation site. The following steps describe the detailed testing procedure to assess a gene-level mediation effect.
Assess the α arm effect
Suppose there are total J genes in the genome. Let
be the kth methylation site in gene j (
). For the ease of notation, we drop the gene-level index j in the following presentation. To evaluate the mediation effect at the gene level, we treat the
methylation sites in a gene as multiple traits and adopt a multi-trait analysis procedure, which has been shown to improve testing power when the traits are correlated. For each methylation site
, we can get a P-value denoted as
following the second equation in model (2). Thus, for K mediators in a gene, we can get K marginal P-values, denoted as
. Since the K methylation sites in a given gene are typically correlated, we propose to adopt a new Fisher combination test proposed by Yang et al. [41], which is used for combining the significance of association tests for multivariate correlated phenotypes. It has been shown that this new method is an efficient statistical tool that takes into account the correlation structure of multiple traits and saves computational time. The combined P-values form a test statistic defined as:
![]() |
(3) |
When the K methylation sites are independent, the test statistic
is distributed as chi-square with
degrees of freedom, i.e.
, so the aggregated P-value can be calculated straightforwardly. When the multivariate mediators are correlated, the statistic
can be approximated by a gamma distribution with the shape parameter
and the scale parameter
, i.e.
. Yang et al. proposed to estimate the parameters
and
by using the first and second moments of T, so
and
, where
is the mean of T and
is the variance of T [41]. Then, the combined P-value of
can be calculated as
. When estimating the gamma parameters, correlation information between the K mediators can be incorporated. The detailed estimation process for
and
can be found in Yang et al. [41]. After P-value combination, we obtain a gene-level P-value denoted as
which shows the association strength between the exposure X and multiple methylations in a gene.
Assess the β arm effect
For the
arm in the mediation model, the third equation in model (2) is fitted. Here, we conduct a likelihood ratio (LR) test to assess the joint linear effect of the K methylation sites by fitting a logistic or linear regression model depending on the type of disease trait. The LR test P-value in this step is denoted as
.
Assess the total mediation effect
The joint intersection-union test is employed at this step to declare the significance of the mediating effect of a gene by
. Then, multiple testing procedures such as FDR can be applied across the whole genome to declare significance. In the following case studies, we applied the BH method (aka ‘fdr’) proposed by Benjamini and Hochberg [42] to obtain the adjusted P-values.
When the number of methylation sites in a given gene is small, the above-mentioned steps can be directly implemented. In the event of high dimension that is typically the case in real application, we first implement an SIS procedure following Step 1 in Zhang et al. [17] to reduce the dimension of high-dimensional mediators to
(when the outcome variable is continuous) or
(when the outcome variable is binary). With the reduced dimension, the above-described procedures can be followed to assess the mediation effect. Since the effect of multiple mediators on the response Y is assumed to be linear, up to a transformation with a link function depending on the response distribution, we termed this gene-based high-dimensional linear mediation analysis (gHMA-L).
Gene-based high-dimensional nonlinear mediation analysis
The joint effect of multiple mediators in a gene on disease risk is not necessarily linear. When the relationship between the mediators and outcome is nonlinear, our proposed gHMA-L may not work well because it is carried out under the linearity assumption. KPCA has been shown to be a powerful tool for extracting nonlinear features. Here, we apply the KPCA method to extract kernel principal components (KPCs) to represent the gene-level signal. KPCA is a nonlinear extension of the linear PCA by using positive definite kernels [31, 32, 43]. Generally, linear PCA can be performed on data with linear variation and the obtained principal components are linear combination of original features. Unlike the linear PCA, KPCA is conducted in a nonlinear feature space, which is formed by mapping the original data (or input space) to the feature space through some kernel functions, i.e.
, where M represents the original input data and
is a kernel function that acts as a nonlinear mapping. There are many kernel functions that can be selected in this kernel-based mapping, such as the linear kernel, the Gaussian kernel and the polynomial kernel. It has been reported that the Gaussian kernel and linear kernel were suggested for methylation data in Zhao et al. [33]. The Gaussian kernel and linear kernel can be used to characterize the nonlinear and linear relationship from the input data, respectively. Note that KPCA with liner kernel is equivalent to a standard linear PCA that can only capture the linear relationship. In order to capture the underlying nonlinear structure, we perform KPCA by adopting the Gaussian kernel in this article, which is defined as
with unknown parameter
. In our analysis, we set the unknown parameter
to be the dimension of the input data, which was recommended in Jalali-Heravi and Kyani [44]. Through the kernel mapping, the variations in the feature space (or the mapped data) show linear relationship. Thus, KPCA is considered as a nonlinear PCA method that can effectively capture any nonlinear relationship from the input data.
In KPCA, we solve the following eigenvalue problem:
with
. This eigenvalue problem is equivalent to
for all
. It should be noted that all solutions of eigenvector
lie in the span of
, and there exist coefficients
to make the eigenvector
satisfy the following equation:
. Here, an
kernel matrix K with
is introduced to replace the term of the dot product in the feature space, so that the eigenvalue problem of KPCA can be rewritten as
, where
. Furthermore, to obtain the KPCs of the original data, the sample should be projected to the eigenvectors by
. Finally, the kth principal component can be calculate by
. Such kernel trick shows that, for a properly chosen kernel function, we can obtain the KPCs without knowing the specific mapping function
.
In our analysis, we use KPCs to capture the underlying potential nonlinear structure of the methylation signals in each gene. The number of KPCs, denoted as
, can be determined based on the proportion of variance explained. It should be mentioned that in our simulation study and real data analysis, we set this proportion as 80%. We denote the kth KPC as
,
. Then, the gene-based nonlinear mediation analysis can be carried out by modifying model (2) to
![]() |
![]() |
(4) |
![]() |
The same analytical procedure described for gHMA-L can be carried out by replacing
with the KPC
.
An omnibus test of gene-based high-dimensional mediation analysis
In practice, the underlying relationship between the mediators and the outcome variable is generally unknown in prior. When the underlying relationship is nonlinear, a linear model may suffer from power loss, and vice versa. Here, we propose gHMA-O based on the ACAT proposed by Liu et al. [37]. ACAT is a flexible and computationally efficient method for combining P-values. It functions similarly as the minimum P-value approach but does not require permutation to assess the null distribution of the minimum P-value; hence it is computationally very efficient. Moreover, it works well under different correlation structures. Let
and
denote the gene-level P-values based on gHMA-L and gHMA-NL, respectively. We first transform the two P-values via
to make the transformed P-value follow a standard Cauchy distribution. Then, we construct the test statistic
. P-value of gHMA-O can be approximated by
.
Simulation studies
To evaluate the performance of our proposed gene-based mediation method, we carried out a series of simulation studies to investigate the type I error and empirical power of gHMA-L, gHMA-NL and gHMA-O. To check the type I error control, we considered three scenarios. In Scenario 1, the exposure variable X is assumed to be independent of M, but M is associated with Y, i.e.
; In Scenario 2, X is relate to M, but M is independent of Y, i.e.
; In Scenario 3, X is independent of M, and M is independent of Y, i.e.
To assess the empirical power, we considered Scenario 4 with
.
We simulated gene-level methylation signals using
, where
and
. Here,
represents the sample size and we set
, which is the percentage of drinking in our first real data analysis. In our simulation study, we assumed that a gene has K methylation sites, and methylation signals within a gene are correlated. The methylation signals were simulated from a multivariate normal distribution with mean
and covariance matrix
. The intercept
is generated from
and the covariance matrix
follows an AR(1) structure with
. In the simulations, we set
. The response variable Y was simulated according to the model,
where
,
and
.
indicates the relationship between M and Y, where
, with
and
. The weighting parameters
and
control the relationship between mediators M and outcome Y: when
and
,
is linear; when
and
,
is nonlinear; when
and
,
is a mix of linear and nonlinear functions. In all the scenarios, we set the first 10 elements of
(
,
) to be
and the rest of
to be 0, corresponding to the case with
. We set 
and the rest to be 0, corresponding to the case with
. Different sample sizes (
) and different data dimension (K = 500 and 1000) were assumed. Scenarios 1–3 correspond to the null hypothesis
. In all scenarios, 1000 simulation runs were performed.
Table 1 shows the empirical type I error simulation results. In all the cases, the type I error is reasonably controlled at the 0.05 significance level. In Scenario 1, although the size of gHMA-NL is slightly conservative, gHMA-O well controls the size. Under Scenario 2, the type I error is slightly inflated for gHMA-L, while gHMA-NL has conservative size. However, the omnibus test gHMA-O shows reasonable type I error control. In Scenario 3, gHMA-NL shows slightly conservative size, but again gHMA-O shows robust control of the type I error. Overall, the proposed omnibus test gHMA-O combining gHMA-L and gHMA-NL well controls the type I errors under different settings.
Table 1.
Simulation results for the type I error control under the three scenarios
| Scenario | N | K |
|
gHMA-L | gHMA-NL | gHMA-O |
|---|---|---|---|---|---|---|
Scenario 1
|
300 | 500 | Linear | 0.053 | 0.038 | 0.044 |
| 1000 | Linear | 0.057 | 0.036 | 0.053 | ||
| 600 | 500 | Linear | 0.046 | 0.043 | 0.051 | |
| 1000 | Linear | 0.062 | 0.034 | 0.046 | ||
| 300 | 500 | Nonlinear | 0.055 | 0.041 | 0.054 | |
| 1000 | Nonlinear | 0.048 | 0.025 | 0.049 | ||
| 600 | 500 | Nonlinear | 0.061 | 0.035 | 0.054 | |
| 1000 | Nonlinear | 0.051 | 0.041 | 0.045 | ||
| 300 | 500 | Mixed | 0.059 | 0.040 | 0.050 | |
| 1000 | Mixed | 0.048 | 0.041 | 0.047 | ||
| 600 | 500 | Mixed | 0.056 | 0.038 | 0.054 | |
| 1000 | Mixed | 0.053 | 0.033 | 0.046 | ||
Scenario 2
|
300 | 500 | 0 | 0.064 | 0.047 | 0.056 |
| 1000 | 0 | 0.044 | 0.042 | 0.044 | ||
| 600 | 500 | 0 | 0.064 | 0.042 | 0.053 | |
| 1000 | 0 | 0.051 | 0.042 | 0.053 | ||
Scenario 3
|
300 | 500 | 0 | 0.046 | 0.034 | 0.042 |
| 1000 | 0 | 0.057 | 0.037 | 0.049 | ||
| 600 | 500 | 0 | 0.056 | 0.048 | 0.053 | |
| 1000 | 0 | 0.049 | 0.032 | 0.043 |
Figure 3 shows the empirical power plot under Scenario 4. We can see that the power increases as the sample size increases. It decreases as the data dimension increases since the increased data dimension essentially adds more noise to the data. When the underlying true relationship is linear, gHMA-L performs better than gHMA-NL does. On the other hand, when the underlying true relationship is nonlinear, gHMA-NL outperforms gHMA-L. This shows that optimal power can only be achieved when the true underlying function can be properly captured. It is striking to note that, under different cases, the power of gHMA-O is always close to the optimal one.
Figure 3 .

The power result of gHMA-O, gHMA-NL and gHMA-L under Scenario 4.
In summary, across all simulations, the proposed gHMA-L method works well under linear effects and gHMA-NL works well under nonlinear situation, while the omnibus test gHMA-O always works as well as the optimal one. This indicates the robustness of the proposed omnibus test. Thus, it is generally safe to apply the gHMA-O method in real application, regardless of the underlying true functional mechanism.
Case studies
In this section, we demonstrate the utility of the proposed gene-based mediation analysis methods by applying them to two DNA methylation (DNAm) datasets: one from the Mayo Clinic Epithelial Ovarian Cancer Case-Control Study (EOC) and the other one from the GTP.
Application to the EOC data
The Mayo Clinic Ovarian Cancer Case-Control Study data contain general information (such as age at enrolling and smoking status), genome-wide methylation data and EOC status. Ovarian cancer has become the most common gynecologic cancer and the fifth leading cause of cancer death among American women and results in serious threat to the life and health of women [45–47]. Given that genetic and environmental factors play important roles in the development of EOC, we pay our attention on identifying important epigenetic markers that mediate the causal pathway from alcohol consumption to EOC risk.
The exposure to drinking status was measured by a questionnaire about current drinking alcoholic beverages (Yes or No). The leukocyte-derived DNA methylation level was obtained by using the Illumina Infinium HumanMethylation27 BeadChip DNA methylation array and was calculated as M values (logit-transform of beta values) for statistical analysis, after correcting the effect of batch and blood cell subtype [48]. Details about the data can be found in these articles [22, 47]. Finally, a total of 398 women of European ancestry who participated in this study between year 1999 and year 2007 remained for our following analysis (196 EOC cases and 202 controls), after excluding subjects with missing on drinking status. Our analysis also adjusts for the effects of age, current smoking status, study enrollment year, location of residence (MN versus other), the first principal component representing within-European population substructure and leukocyte cell subtypes proportions. Before performing our proposed approaches, we first mapped DNAm sites into genes and formed CpG sets by grouping CpGs located in the same gene. This ended up with a total of 14 157 CpG gene sets that consist of 25 917 DNAm sites, among which 3414 genes contain only one CpG site each and 10 743 genes contain at least two DNAm sites each.
We applied the gene-level mediation analysis to identify genes that mediate the drinking effect on EOC risk, using our proposed method gHMA-O. We extracted KPCs based on the Gaussian kernel account for at least 80% of the total variation. In the process of KPC extraction, only genes that contain at least two CpGs were used for gHMA-NL analysis. The gHMA-O omnibus test results were shown in Table 2. After false discovery rate (FDR) correction, we did not identify significant genes at the 0.05 FDR level. Here, we only listed the top six genes as suggestive significance. The weak signal might be due to the low number of methylation sites in this data set.
Table 2.
Top six genes identified by gHMA-O that mediate the drinking risk for EOC
| Gene name | No. of CpG sites | Chr | P-value |
|---|---|---|---|
| ZFYVE19 | 2 | 15 | 0.0038 |
| KRAS | 6 | 12 | 0.0085 |
| FAM167B | 1 | 1 | 0.0102 |
| MAN2C1 | 2 | 15 | 0.0141 |
| RASSF4 | 2 | 10 | 0.0148 |
| TFPT | 2 | 19 | 0.0159 |
As shown in Table 2, the omnibus test gHMA-O identified ZFYVE19, KRAS, FAM167B, MAN2C1, RASSF4, TFPT as the top six mediating genes. Genes listed in Table 2 have been reported to be associated with immune function and cancer, of which three genes, KRAS, MAN2C1 and RASSF4, have been implicated in ovarian cancer. Gene KRAS is a well-studied oncogene and encodes a well-known oncoprotein that is a member of the small GTPase superfamily, which is implicated in a variety of cancers. Several mutations in this gene have been identified to be related to ovarian carcinomas [49, 50]. Quitadamo et al. [51] discovered KRAS linked to ovarian cancer via constructing integrated network involving miRNA and gene expression [51]. Gene MAN2C1 (mannosidase alpha class 2C member 1) encodes cytosolic α-mannosidase involved in glycan catabolism and is widely expressed in fat and ovary. MAN2C1 reportedly plays a role in regulating apoptotic signaling and its upregulation can promote tumorigenesis of various cancer types [52]. This gene has been identified as a PTEN-binding protein [53] and attenuates PTEN functions by binding to PTEN, which is a potent tumor suppressor gene. Various studies have shown that downregulated PTEN induced by promoter methylation is associated with ovarian cancer [54]. Gene RASSF4 (Ras association domain family member 4) has been identified as a tumor suppressor gene, which is involved in apoptotic signaling pathway by synergizing with KRAS. Downregulation of RASSF4 in many human tumors has been reported to be correlated with methylation of the promoter region [55]. This gene has been demonstrated to be involved in an ovarian cancer module regulated by miR-185 [56], and its expression was at least 2-fold higher in ovarian tumor than in normal ovary [57].
Application to the GTP data
The second data set was obtained from the GTP, which is a cross-sectional study dedicated to investigate the role of environmental and genetic risk factors in the development of stress-related psychiatric disorder diseases [e.g. post-traumatic stress disorder (PTSD), depressive disorder and anxiety disorder] that confer a heavy burden on public health [58, 59]. It has been reported that childhood trauma exposures are related to PTSD and major depressive disorder in adulthood [59, 60]. They are also associated with changes in gene expression and DNAm via previous studies [58, 61]. The extant studies have demonstrated that changes in blood-derived DNA methylation profiles are associated with PTSD and depression [58, 62, 63], which are often comorbid [64]. However, little is known on which epigenetic markers mediate the causal pathway from childhood maltreatment to comorbid current PTSD and depression symptoms in adulthood.
In this study, childhood maltreatment was measured using the Childhood Trauma Questionnaire [65]. The exposure that we are interested in was childhood sexual/physical abuse, which was dichotomized by the ‘moderate to extreme’ criteria. Current PTSD symptoms and depression symptoms were assessed with the modified PTSD Symptom Scale (PSS) [66] and the Beck Depression Inventory (BDI) [67], respectively. In our analysis, we defined cases with comorbid current PTSD and depression symptoms as having PSS score ≥ 14 and BDI score ≥ 14, and defined controls with no current PTSD and no depressive symptoms as having PSS score ≤ 7 and BDI score ≤ 7, despite being exposed to trauma [68]. The DNA methylation level was obtained by using the Illumina Infinium HumanMethylation450K BeadChip DNA methylation array, and the β-values of methylation level were logit transformed into M-values for our statistical analysis. The methylation data are available at the Gene Expression Omnibus (GEO) website under the accession GSE72680. More details about the GTP data can be found in articles [59, 69]. Finally, 112 case subjects with the symptom of comorbid current PTSD and depression and 63 controls remained for further analysis after excluding non-African Americans to minimize heterogeneity. We further adjusted for the effects of age, sex, body mass index and blood cell type proportions in our analysis.
We first mapped DNAm sites into genes and formed CpG sets by grouping CpGs located within the same gene. This resulted in a total of 19 127 gene-based CpG sets that consisted of 249 527 CpG sites. The same procedure as described in the first case study was applied here. The results were shown in Table 3. Again, we did not identify statistically significant methylation genes at the 0.05 FDR level after the FDR adjustment, thus only listed the top 10 genes as suggestive significance.
Table 3.
List of top 10 genes identified by gHMA-O that mediate the pathway from childhood maltreatment to PTSD and depression
| Gene name | No. of CpG sites | Chromosome | P-value |
|---|---|---|---|
| GPX6 | 24 | 6 | 0.0002 |
| KHDRBS2 | 14 | 6 | 0.0002 |
| WDR1 | 24 | 4 | 0.0004 |
| MLIP | 11 | 6 | 0.0004 |
| RAB32 | 19 | 6 | 0.0005 |
| FAAP20 | 42 | 1 | 0.0007 |
| PRDM16 | 475 | 1 | 0.0008 |
| PPM1L | 30 | 3 | 0.0009 |
| FTLP10 | 7 | 4 | 0.0012 |
| P3H2 | 24 | 3 | 0.0014 |
As shown in Table 3, 10 genes, GPX6, KHDRBS2, WDR1, MLIP, RAB32, FAAP20, PRDM16, PPM1L, FTLP10 and P3H2, were identified as potential epigenetic markers that mediate the effect of childhood sexual/physical abuse on comorbid PTSD and depression in adulthood by our proposed gHMA-O method. Most of the genes listed in Table 3 have been reported to be associated with immune function, inflammation and neoplasms, and four genes GPX6, WDR1, MLIP and RAB32 have been implicated in psychiatric disorders including schizophrenia, autism spectrum disorders, bipolar disorder and major depression. Gene GPX6 (glutathione peroxidase 6) encodes a member of the glutathione peroxidase family, which plays an important role in the glutathione metabolism pathway. This gene has been identified as a significant pleiotropic gene associated with psychiatric disorders [70], and the methylation level of this gene has been revealed to be related to the treatment of PTSD [71]. The mutations in GPX6 also have been reported to be associated with depression by a genome-wide association study in UK Biobank [72]. Gene WDR1 (WD repeat domain 1) encodes a protein containing 9 WD repeats, which is involved in protein–protein interactions. This gene has been reported to be significantly associated with age at onset in Schizophrenia by a genome-wide association analysis in a European–American sample [73]. Gene MLIP (muscular LMNA interacting protein), also known as CIP and C6orf142, encodes cardiac ISL1-interacting protein, which is involved in the negative regulation of cardiac muscle hypertrophy process. This gene has been identified as differentially expressed gene between psychotic and wild-type mice by RNA sequencing g [74]. Gene RAB32, a member of RAS oncogene family, encodes an A-kinase-anchoring protein, which is involved in the endosome to melanosome transport, phagosome maturation and antigen processing and presentation. The RAB32 gene has been shown to be differentially expressed in the superior temporal gyrus in Schizophrenia [75]. Again, these top methylation genes can only be interpreted as suggestive ones. The less significant results could be partly due to small sample size in this study.
Conclusion and discussion
In this article, we first reviewed existing literatures on high-dimensional mediation analysis, and then we proposed a gene-based omnibus high-dimensional mediation analysis method for investigating potential epigenetic markers that mediate the causal effect of exposure on the outcome at the gene level. The method integrates kernel PCA and P-value combination techniques while considering both linear and nonlinear mediation effects. In practice, the underlying relationship between mediators and outcome is generally unknown. The developed omnibus test gHMA-O combines the strength of gHMA-L assuming a linear relationship and gHMA-NL assuming a nonlinear relationship; thus, it is robust against miss-specification of the underlying analytical models. Extensive simulation studies show that the proposed omnibus test is generally safe to apply regardless of the underlying relationship between mediators and the outcome. Different from traditional genomic mediation analysis, the proposed methods consider the methylation signals of CpG sites within a gene as a testing unit. Thus, biologically speaking, the gene-based mediation analysis is more attractive compared to a single marker-based analysis. The same idea can be extended to a pathway-based mediation analysis.
It is broadly recognized that aberrant DNA methylation can lead to abnormal gene expression, which can result in the occurrence and development of many diseases. We applied the proposed methods to investigate potential genes whose methylation levels act as important mediators in the relationship between alcohol consumption and EOC risk using the EOC data and to detect methylation genes that mediate the effect of childhood maltreatment on comorbid PTSD and depression symptoms in adulthood using the GTP data. Although potential methylation genes are shown to play important mediation roles, the true causal relationship between exposure (drinking and childhood maltreatment), DNA methylation and outcome (EOC status and comorbid PTSD and depression symptoms in adulthood) needs to be investigated and validated by further biological experiments.
There are several extensions of our proposed method in future research. For example, in this work, only one exposure is considered. As we all know, DNA methylations can be regulated by many exposures simultaneously in reality. Extension to multiple exposure models can be developed by adopting a two-stage regularization method. In addition, we proposed to perform KPCA at the gene level via the Gaussian kernel function. Note that only a single kernel function is used to capture the complex relationship among the methylation features. To make the method more powerful and flexible, we can adopt the composite kernel idea proposed by Yang et al. [76] through a linear combination of a set of candidate kernels, each with its unique strength in capturing the underlying relationship. This will be investigated in our future work. The developed computational R code can be downloaded at https://github.com/RuilingFang/gHMA.
Key Points
We focused on gene-based high-dimensional mediation analysis by considering methylation signals of CpG sites within a gene as a testing unit. The gene-based mediation analysis is biologically more attractive and meaningful compared to a single marker-based analysis.
We proposed two gene-based methods: one assuming a linear relationship between mediators and an outcome (gHMA-L) and one assuming a nonlinear relationship (gHMA-NL).
An omnibus test of gene-based high-dimensional mediation analysis (gHMA-O) is proposed to integrate gHMA-L and gHMA-NL. gHMA-O is robust against miss-specification of the underlying analytical models.
Application to two data sets identified promising gene-level methylation signals acting as potential mediators between an exposure and a disease outcome.
The proposed method can be extended to other high-dimensional mediation studies in which the interest is focused on evaluating the effect of a gene set or a pathway.
Ruiling Fang is a PhD candidate at Shanxi Medical University. Her research is focused on developing statistical methods for mediation analysis with applications to epigenetic studies.
Haitao Yang is an associate professor in the School of Public Health at Hebei Medical University. His research interest is to develop and apply novel statistical methods to solve genetics and genomics questions related to complex diseases.
Yuzhao Gao is a PhD candidate at Shanxi Medical University. His research is focused on developing statistical methods for mediation analysis with applications to epigenetic studies.
Hongyan Cao is an associate professor at Shanxi Medical University. Her research is focused on developing statistical methods on genetic/genomic data analysis.
Ellen L. Goode is a professor of epidemiology at Mayo Clinic in Rochester, MN. Her research is focused on ovarian cancer genetics.
Yuehua Cui is a professor in the Department of Statistics and Probability at Michigan State University. His research focuses on developing novel statistical and computational methods to disentangle the genetic basis of complex traits by integrating various tools and sources in genetics, (epi)genomics and statistics.
References
- 1. Baron RM, Kenny DA. The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol 1986;51:1173–82. [DOI] [PubMed] [Google Scholar]
- 2. MacKinnon D. Introduction to Statistical Mediation Analysis. New York, Routledge, 2012. [Google Scholar]
- 3. Pearl J. Direct and indirect effects. In: Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc, San Francisco, 2001, 411–20. [Google Scholar]
- 4. Imai K, Keele L, Tingley D. A general approach to causal mediation analysis. Psychol Methods 2010;15:309–34. [DOI] [PubMed] [Google Scholar]
- 5. Imai K, Keele L, Yamamoto T. Identification, inference and sensitivity analysis for causal mediation effects. Stat Sci 2010;25:51–71. [Google Scholar]
- 6. Vanderweele TJ, Vansteelandt S. Odds ratios for mediation analysis for a dichotomous outcome. Am J Epidemiol 2010;172:1339–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Zhang Z, Wang L. Methods for mediation analysis with missing data. Psychometrika 2013;78:154–84. [DOI] [PubMed] [Google Scholar]
- 8. Daniel RM, De Stavola BL, Cousens SN, et al. Causal mediation analysis with multiple mediators. Biometrics 2015;71:1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Boca SM, Sinha R, Cross AJ, et al. Testing multiple biological mediators simultaneously. Bioinformatics 2014;30:214–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Preacher KJ, Hayes AF. Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behav Res Methods 2008;40:879–91. [DOI] [PubMed] [Google Scholar]
- 11. Pearl J. The causal mediation formula--a guide to the assessment of pathways and mechanisms. Prev Sci 2012;13:426–36. [DOI] [PubMed] [Google Scholar]
- 12. Yuan Y, Mackinnon DP. Robust mediation analysis based on median regression. Psychol Methods 2014;19:1–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Wang W, Nelson S, Albert JM. Estimation of causal mediation effects for a dichotomous outcome in multiple-mediator models using the mediation formula. Stat Med 2013;32:4211–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Shrout PE, Bolger N. Mediation in experimental and nonexperimental studies: new procedures and recommendations. Psychol Methods 2002;7:422–45. [PubMed] [Google Scholar]
- 15. Wang J, Spitz MR, Amos CI, et al. Mediating effects of smoking and chronic obstructive pulmonary disease on the relation between the CHRNA5-A3 genetic locus and lung cancer risk. Cancer 2010;116:3458–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Wang J, Spitz MR, Amos CI, et al. Method for evaluating multiple mediators: mediating effects of smoking and COPD on the association between the CHRNA5-A3 variant and lung cancer risk. PLoS One 2012;7:e47705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Zhang H, Zheng Y, Zhang Z, et al. Estimating and testing high-dimensional mediation effects in epigenetic studies. Bioinformatics 2016;32:3150–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Huang YT, Pan WC. Hypothesis test of mediation effect in causal mediation model with high-dimensional continuous mediators. Biometrics 2016;72:402–13. [DOI] [PubMed] [Google Scholar]
- 19. Chen OY, Crainiceanu C, Ogburn EL, et al. High-dimensional multivariate mediation with application to neuroimaging data. Biostatistics 2018;19:121–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Zhao Y, Lindquist MA, Caffo BS. Sparse principal component based high-dimensional mediation analysis. Comput Stat Data Anal 2020;142:106835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Barfield R, Shen J, Just AC, et al. Testing for the indirect effect under the null for genome-wide mediation analyses. Genet Epidemiol 2017;41:824–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Wu D, Yang H, Winham SJ, et al. Mediation analysis of alcohol consumption, DNA methylation, and epithelial ovarian cancer. J Hum Genet 2018;63:339–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Gao Y, Yang H, Fang R, et al. Testing mediation effects in high-dimensional epigenetic studies. Front Genet 2019;10:1195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Bind M-A, Lepeule J, Zanobetti A, et al. Air pollution and gene-specific methylation in the normative aging study. Epigenetics 2014;9:448–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Timms JA, Relton CL, Rankin J, et al. DNA methylation as a potential mediator of environmental risks in the development of childhood acute lymphoblastic leukemia. Epigenomics 2016;8:519–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Barker ED, Walton E, Cecil CA. Annual research review: DNA methylation as a mediator in the association between risk exposure and child and adolescent psychopathology. J Child Psychol Psychiatry 2018;59:303–22. [DOI] [PubMed] [Google Scholar]
- 27. Huang JV, Cardenas A, Colicino E, et al. DNA methylation in blood as a mediator of the association of mid-childhood body mass index with cardio-metabolic risk score in early adolescence. Epigenetics 2018;13:1072–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Bozack AK, Cardenas A, Quamruzzaman Q, et al. DNA methylation in cord blood as mediator of the association between prenatal arsenic exposure and gestational age. Epigenetics 2018;13:923–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Bozack AK, Cardenas A, Geldhof J, et al. Cord blood DNA methylation of DNMT3A mediates the association between in utero arsenic exposure and birth outcomes: results from a prospective birth cohort in Bangladesh. Environ Res 2020;183:109134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Jordahl KM, Phipps AI, Randolph TW, et al. Differential DNA methylation in blood as a mediator of the association between cigarette smoking and bladder cancer risk among postmenopausal women. Epigenetics 2019;14:1065–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Schölkopf B, Smola A, Müller K-R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 1998;10:1299–319. [Google Scholar]
- 32. Smola AJ, Schölkopf B. Learning with Kernels. Department of Computer Science. Berlin: Technical University, 1998. [Google Scholar]
- 33. Zhao N, Zhan X, Huang YT, et al. Kernel machine methods for integrative analysis of genome-wide methylation and genotyping studies. Genet Epidemiol 2017;42:156. [DOI] [PubMed] [Google Scholar]
- 34. Liu Z, Chen D, Bensmail H. Gene expression data classification with kernel principal component analysis. Biomed Res Int 2005;2005:155–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Gao Q, He Y, Yuan Z, et al. Gene-or region-based association study via kernel principal component analysis. BMC Genet 2011;12:75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Mariette J, Villa-Vialaneix N. Unsupervised multiple kernel learning for heterogeneous data integration. Bioinformatics 2017;34:1009–15. [DOI] [PubMed] [Google Scholar]
- 37. Liu Y, Chen S, Li Z, et al. ACAT: a fast and powerful p value combination method for rare-variant analysis in sequencing studies. Am J Hum Genet 2019;104:410–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Huang B, Sivaganesan S, Succop P, et al. Statistical assessment of mediational effects for logistic mediational models. Stat Med 2004;23:2713–28. [DOI] [PubMed] [Google Scholar]
- 39. Schluchter MD. Flexible approaches to computing mediated effects in generalized linear models: generalized estimating equations and bootstrapping. Multivar Behav Res 2008;43:268–88. [DOI] [PubMed] [Google Scholar]
- 40. Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Series B Stat Methodology 2008;70:849–911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Yang JJ, Li J, Williams LK, et al. An efficient genome-wide association test for multivariate phenotypes based on the fisher combination function. BMC Bioinformatics 2016;17:19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B Methodol 1995;57:289–300. [Google Scholar]
- 43. Wu W, Massart D, De Jong S. The kernel PCA algorithms for wide data. Part I: theory and algorithms. Chemometr Intell Lab Syst 1997;36:165–72. [Google Scholar]
- 44. Jalali-Heravi M, Kyani A. Application of genetic algorithm-kernel partial least square as a novel nonlinear feature selection method: activity of carbonic anhydrase II inhibitors. Eur J Med Chem 2007;42:649–59. [DOI] [PubMed] [Google Scholar]
- 45. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2018. CA Cancer J Clin 2018;68:7–30. [DOI] [PubMed] [Google Scholar]
- 46. Cancer Genome Atlas Research Network . Integrated genomic analyses of ovarian carcinoma. Nature 2011;474:609–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Koestler DC, Chalise P, Cicek MS, et al. Integrative genomic analysis identifies epigenetic marks that mediate genetic risk for epithelial ovarian cancer. BMC Med Genomics 2014;7:8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Du P, Zhang X, Huang CC, et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics 2010;11:587. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Jones S, Wang T-L, Shih I-M, et al. Frequent mutations of chromatin remodeling gene ARID1A in ovarian clear cell carcinoma. Science 2010;330:228–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Bast RC, Jr, Hennessy B, Mills GB. The biology of ovarian cancer: new opportunities for translation. Nat Rev Cancer 2009;9:415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Quitadamo A, Tian L, Hall B, et al. An integrated network of microRNA and gene expression in ovarian cancer. BMC Bioinformatics 2015;S5.BioMed Central. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Wang L, Suzuki T. Dual functions for cytosolic α-mannosidase (Man2C1) its down-regulation causes mitochondria-dependent apoptosis independently of its α-mannosidase activity. J Biol Chem 2013;288:11887–11896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Hopkins BD, Hodakoski C, Barrows D, et al. PTEN function: the long and the short of it. Trends Biochem Sci 2014;39:183–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Bermúdez Brito M, Goulielmaki E, Papakonstanti EA. Focus on PTEN regulation. Front Oncol 2015;5:166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. van der Weyden L, Adams DJ. The Ras-association domain family (RASSF) members and their role in human tumourigenesis. Biochim Biophys Acta 2007;1776:58–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Jin D, Lee H. A computational approach to identifying gene-microRNA modules in cancer. PLoS Comput Biol 2015;56:e1004042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Fedorowicz G, Guerrero S, Wu TD, et al. Microarray analysis of RNA extracted from formalin-fixed, paraffin-embedded and matched fresh-frozen ovarian adenocarcinomas. BMC Med Genomics 2009;2:23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Uddin M, Ratanatharathorn A, Armstrong D, et al. Epigenetic meta-analysis across three civilian cohorts identifies NRG1 and HGS as blood-based biomarkers for post-traumatic stress disorder. Epigenomics 2018;10:1585–601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Gillespie CF, Bradley B, Mercer K, et al. Trauma exposure and stress-related disorders in inner city primary care patients. Gen Hosp Psychiatry 2009;31:505–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Lowe SR, Quinn JW, Richards CA, et al. Childhood trauma and neighborhood-level crime interact in predicting adult posttraumatic stress and major depression symptoms. 2016;51:212–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Mehta D, Klengel T, Conneely KN, et al. Childhood maltreatment is associated with distinct genomic and epigenetic profiles in posttraumatic stress disorder. Proc Natl Acad Sci 2013;110:8302–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Uddin M, Aiello AE, Wildman DE, et al. Epigenetic and immune function profiles associated with posttraumatic stress disorder. Proc Natl Acad Sci 2010;107:9470–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Kuan P, Waszczuk M, Kotov R, et al. An epigenome-wide DNA methylation study of PTSD and depression in world trade center responders. Transl Psychiatry 2017;7:e1158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Caramanica K, Brackbill RM, Liao T, et al. Comorbidity of 9/11-related PTSD and depression in the world trade center health registry 10–11 years postdisaster. J Trauma Stress 2014;27:680–8. [DOI] [PubMed] [Google Scholar]
- 65. Bernstein DP, Ahluvalia T, Pogge D, et al. Validity of the childhood trauma questionnaire in an adolescent psychiatric population. J Am Acad Child Adolesc Psychiatry 1997;36:340–8. [DOI] [PubMed] [Google Scholar]
- 66. Foa EB, Riggs DS, Dancu CV, et al. Reliability and validity of a brief instrument for assessing post-traumatic stress disorder. J Trauma Stress 1993;6:459–73. [Google Scholar]
- 67. Beck AT, Ward CH, Mendelson M, et al. An inventory for measuring depression. Arch Gen Psychiatry 1961;4:561–71. [DOI] [PubMed] [Google Scholar]
- 68. Wingo AP, Velasco ER, Florido A, et al. Expression of the PPM1F gene is regulated by stress and associated with anxiety and depression. Biol Psychiatry 2018;83:284–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Binder EB, Bradley RG, Liu W, et al. Association of FKBP5 polymorphisms and childhood abuse with risk of posttraumatic stress disorder symptoms in adults. JAMA 2008;299:1291–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Jia X, Yang Y, Chen Y, et al. Multivariate analysis of genome-wide data to identify potential pleiotropic genes for five major psychiatric disorders using MetaCCA. J Affect Disord 2019;242:234–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Vinkers CH, Geuze E, van Rooij SJH, et al. Successful treatment of post-traumatic stress disorder reverses DNA methylation marks. Mol Psychiatry 2019, 10.1038/s41380-019-0549-3. [DOI] [PubMed] [Google Scholar]
- 72. Howard DM, Adams MJ, Shirali M, et al. Genome-wide association study of depression phenotypes in UK biobank identifies variants in excitatory synaptic pathways. Nat Commun 2018;9:1470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Wang K-S, Liu X, Zhang Q, et al. Genome-wide association analysis of age at onset in schizophrenia in a European-American sample. Am J Med Genet B Neuropsychiatr Genet 2011;156B:671–80. [DOI] [PubMed] [Google Scholar]
- 74. Hegde S, Ji H, Oliver D, et al. PDE11A regulates social behaviors and is a key mechanism by which social experience sculpts the brain. Neuroscience 2016;335:151–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Bowden NA, Scott RJ, Tooney PA. Altered gene expression in the superior temporal gyrus in schizophrenia. BMC Genomics 2008;9:199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Yang H, Li S, Cao H, et al. Predicting disease trait with genomic data: a composite kernel approach. Brief Bioinformatics 2016;18:bbw043. [DOI] [PubMed] [Google Scholar]













