Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Aug 1.
Published in final edited form as: Stat Med. 2016 Feb 16;35(16):2831–2844. doi: 10.1002/sim.6900

Integrative genomic testing of cancer survival using semiparametric linear transformation models

Yen-Tsung Huang *, Tianxi Cai , Eunhee Kim
PMCID: PMC10392002  NIHMSID: NIHMS1911466  PMID: 26887583

Abstract

The wide availability of multi-dimensional genomic data has spurred increasing interests in integrating multi-platform genomic data. Integrative analysis of cancer genome landscape can potentially lead to deeper understanding of the biological process of cancer. We integrate epigenetics (DNA methylation and microRNA expression) and gene expression data in tumor genome to delineate the association between different aspects of the biological processes and brain tumor survival. To model the association, we employ a flexible semi-parametric linear transformation model that incorporates both the main effects of these genomic measures as well as the possible interactions among them. We develop variance component tests to examine different coordinated effects by testing various subsets of model coefficients for the genomic markers. A Monte-Carlo perturbation procedure is constructed to estimate the null distribution of the proposed test statistics. We further propose omnibus testing procedures to synthesize information from fitting various parsimonious sub-models to improve power. Simulation results suggest that our proposed testing procedures maintain proper size under the null and outperform standard score tests. We further illustrate the utility of our procedure in two genomic analyses for survival of glioblastoma multiforme patients.

Keywords: integrative genomics, linear transformation model, survival analysis, variance component test

1. Introduction

With advances in high-throughput biotechnology, genomic studies with a wide range of platforms have been performed to identify disease susceptibility loci or biomarkers for various phenotypic traits. Successful examples include gene expression microarray studies, genomewide association studies (GWAS) and epigenome-wide association studies (EWAS). Despite the success of existing single-platform based studies, significant amount of genomic information is lost if one focuses only on a single platform. A new hypothesis has been advocated that the biological process of complex phenotypic traits such as cancer survival can be better characterized by multiple types of genetic, epigenetic and genomic alterations, and each platform provides a different and complementary view of the phenotype [1, 2].

This paper is motivated by The Cancer Genome Atlas (TCGA), a research project with a rich collection of multiplatform genomic data to map the tumor genomes in many types of cancers. We focus on a genomic study of glioblastoma multiforme (GBM), in which the association between DNA methylation and gene expression profile in the GRB10 gene and the overall survival of GBM patients was reported [3]. It was also established that GRB10 gene is the target of microRNA, miR-633 [4]. Both DNA methylation and microRNA consist of epigenetic regulation of gene expression and have been found to be associated with gene expression [3, 5]. The example suggests that multiple genomic data are interrelated, e.g., DNA methylation-microRNA-gene expression and may jointly affect cancer survival, illustrated as a causal diagram [6] in Figure 1. We are interested in 1) the effect of DNA methylation of GRB10 gene on GBM survival mediated through mRNA expression of the gene (the dashed path in Figure 1), 2) the effect of DNA methylation mediated through microRNA expression (the solid path), and 3) the effect of DNA methylation on cancer survival independent of mRNA or microRNA expressions and perhaps through other biological mechanisms (the dotted path).

Figure 1:

Figure 1:

Causal diagram of a set of DNA methylations (S), microRNA expression (M), gene expression (G) and outcome of interest (Y=H(T)). Three path-specific effects are in different line styles: ΔSY, effect of methylation on outcome independent of microRNA and mRNA gene expression is in dotted line; ΔSGY, effect of methylation mediated through gene expression but not through microRNA is in dashed lines; ΔSMY, effect mediated through microRNA is in solid lines.

Hypothesis testing methods of multiple genetic markers on the survival outcome have been developed [7, 8]. These methods largely focus on a single genomic platform such as genetic markers. Moreover, these methods examine the overall effect and are not able to decompose the overall effect into separate components, as illustrated in Figure 1. With the rich collection of tumor genomic data such as TCGA, there has been a pressing need of analyzing multiplatform genomic data to understand their respective contribution to cancer survival. Statistical methods have been proposed under the mediation framework [9, 10, 11, 12] to integrate multiplatform genomic data where the outcome is dichotomous [13, 14]. It has also been shown that the three pathways illustrated in Figure 1 correspond to different sets of coefficients in regression models, and a hypothesis testing method has been developed to examine their effect on dichotomous outcomes [15]. However, the current integrated methods are not able to analyze the time-to-event data due to the challenge of censoring and require additional development prior to applying to the TCGA data. To bridge those gaps, we develop in this paper a new testing procedure for survival data that integrates multi-platform genomic data.

Cox proportional hazards (PH) model is the most popular model for analyzing survival data [16, 17]. Efficient estimation and testing procedures have been developed under the PH model [18]. However, since the PH assumption may be violated in real applications, alternative survival models such as proportional odds (PO) model [19] can be useful for such applications. Both the PH and PO models are special cases of a broader class of linear transformation models, which relates a nonparametric transformation of the failure time to covariates and a parametric random error in a linear form [18]. Various estimating procedures have been proposed for linear transformation models [20, 21], and Zeng and Lin further proposed a non-parametric maximum likelihood estimator for a more general setting [22]. As most existing work focused primarily on the estimation problem, Tzeng et al. recently proposed an efficient testing procedure to examine effects of multiple genetic markers [8]. Although Tzeng’s method also concerns multivariate testing, their method, however, is not readily applicable to our motivating example. First, it is not clear how to use the existing method to analyze multi-platforms genomic data. It has been shown that the single platform method is subject to power loss as it fails to account for signals from other platforms [13]. Second, by focusing on the overall effect, the current method is not able to examine specific effects illustrated in Figure 1. Third, it is not clear how to balance between robustness against model misspecification and statistical power while incorporating potential interactions among various platforms. To address these limitations, we propose a testing procedure based on estimating equations that extends Tzeng et al.’s work to integrative genomics.

The rest of the paper is organized as follows. In Section 2.2, we introduce a semiparametric linear transformation model for DNA methylation, microRNA and gene expression jointly on failure time, and propose a variance component score testing procedure for an arbitrary set of regression coefficients. We also construct an omnibus test to accommodate different underlying disease models. In Section 3, we provide mechanistic interpretation for various subsets of coefficients in the joint survival model. In Section 4, we conduct numerical studies to examine path-specific effects. In Section 5, we illustrate the utility of our methods with two data applications. We conclude with discussion in Section 6.

2. A multivariate test for the transformation model

2.1. The model

Our overall goal is to understand whether and how a survival time T depends on a p dimensional DNA methylation markers S within a gene, a microRNA expression M, and a gene expression G, after adjusting for a q dimensional vector of covariates X. We assume a fixed number of DNA methylation markers p, but p may not be small relative to the sample size n in a finite sample. Due to censoring, T is only observable up to a bivariate vector (T, δ), where T=min(T,C), δ=I(TC) and C is the censoring time. Suppose data for analysis consists of n independent and identically distributed random vectors {(Ti,δi,ZiT),i=1,,n}, where i indexes subjects and Zi=(Gi,MiGi,GiSiT,MiGiSiT,XiT,SiT,Mi,MiSiT)T.

We model the relationship through a flexible semi-parametric transformation model allowing for interactions among S, M and G:

H(Ti)=γTZi+ϵi,ϵiZi (1)

where γ=(βG,βMG,βSGT,βSMGT,βXT,βST,βM,βSMT)T is the unknown regression parameters representing the effects of the covariates, genomic markers along with their interactions, ϵi has a specified parametric distribution, and H() is an unspecified strictly increasing smooth transformation function. The advantage of our proposed model is that after transformation H(), of survival time T, the survival model is a linear model: the outcome H(T) relates to the predictor in a linear form. Under the model (1), the survival function given Z is

ST(t,Z)P(TtZ)=Sϵ(Λ(t)eγTz),

where Sϵ() is a survival function of ϵ=eϵ and Λ()=eH(). It follows that the cumulative hazard and hazard functions, respectively are Λ(TZ)=𝒢{eγTZΛ(T)} and dΛ(TZ)=𝒢{eγTZΛ(T)}eγTZdΛ(T) where 𝒢()=logSϵ(). We denote dΛ(Ti)=Λi. A noteworthy feature of our proposal is that we start from a very general model that incorporates all possible interactions among genomic markers, but then accommodate other parsimonious models later to improve power of the proposed tests.

2.2. Testing procedure for an arbitrary subset of regression parameters

We develop a variance component score-based testing procedure for an arbitrary set of regression coefficients in model (1). We also provide mechanistic interpretation of various subsets of regression coefficients under the framework of causal mediation modeling in Section 3. For illustration, we focus on the testing of whether gene expression G is associated with survival given other markers. This corresponds to testing the hypothesis

H0:β(βG,βMG,βSGT,βSMGT)T=0, (2)

but note that testing for any arbitrary set of regression coefficients can be developed similarly. Since β corresponds to the effect of V=(G,MG,GST,MGST)T, containing all contributions from G, testing (2) can be used to assess the total effect of G on survival.

2.2.1. Derivation of the test statistic

To test for H0 in (2), we first rewrite the model (1) as

H(Ti)=(X~iTα+ViTβ)+ϵi, (3)

where X~iT=(XiT,SiT,Mi,MiSiT) and αT=(βXT,βST,βM,βSMT). Components of V may be highly correlated with each other due to correlation within S and among G, M and S. The conventional approach such as likelihood ratio test or Wald test may not work well due to the instability in fitting model (1) that has a large number, 4p+3+q, of potentially highly correlated predictors, especially when p is not small. Alternatively, one may employ a standard score test, which only requires fitting the null model. However, the type I error of the standard score test is not protected according to our stimulation studies in Section 4, probably due to the relatively large DF, 2p+2.

To overcome the problem, we propose a score test for β by imposing a working assumption that the parameters {βSGj,j=1,,p} and {βSMGj,j=1,,p} are 2p independent zero-mean random variables with var(βSGj)=τSG and var(βSMGj)=τSMG. The hypothesis test for the null (2) becomes jointly testing for the variance components (τSG and τSMG) [23] and two scalar regression coefficients (βG and βMG):

H0:τSG=τSMG=βG=βMG=0. (4)

By assuming βSGjF(0,τSG) where F is any arbitrary distribution, one can largely reduce the degree of freedom, i.e., H0:βSG1==βSGp=0vs. H0:τSG=0. The score vector for βSG, UβSG, is a p-variate normal asymptotically; and the standard score test based on UβSG is a p-DF test. The score test for τSG based on UτSG=UβSG2, which follows a mixture of chi-square distribution under the null, has an effective DF typically much lower than p. In finite sample, the distribution of UτSG can be better approximated than that of UβSG. One can show that the scores for τSG, τSMG, βG and βMG are:

UτSG=UβSG2=n1iκiGiSi2,UτSMG=UβSMG2=n1iκiMiGiSi2,UβG=n1iκiGi,UβMG=n1iκiMiGi,

where

κi={δi𝒢(eαTX~iΛ(Ti))𝒢(eαTX~iΛ(Ti))𝒢(eαTX~iΛ(Ti))}eαTX~iΛ(Ti)+δi,

UβSG2=j=1pUβSGj2, UβSGj=n1i=1nκiGiSji, UβSMG2=j=1pUβSMGj2 UβSMGj=n1i=1nκiMiGiSji. To combine informations from UτSG, UτSMG, UβG and UβMG, we propose a composite score statistic by taking a weighted sum of UτSG, UτSMG, UβG2 and UβMG2

Q=n(w1UβG2+w2UβMG2+w3UτSG+w4UτSMG)=n12iκiVwi2, (5)

where VwiT=(w1Gi,w2MiGi,w3GiSiT,w4MiGiSiT). Different weighting schemes for {w1,w2,w3,w4} can be implemented to reflect the prior knowledge regarding the relative contributions of various genomic effects. If no such knowledge is available, we propose to weight each term using the inverse of its standard deviation. The asymptotic variances for UτSG, UτSMG, UβG2 and UβMG2 can be estimated from a Monte-Carlo perturbation procedure described in Section 2.2.2. Equal weighting w1=w2=w3=w4 is equivalent to testing H0:τ=0 where τ is a common variance of all elements in βT=(βG,βMG,βSGT,βSMGT), which is still a valid test but may not be powerful in practice since the information from different genomic markers may not be comparable due to different scales.

To calculate Q, one needs to estimate α and Λ() under H0 by fitting the null model:

H(Ti)=X~iTα+ϵi. (6)

Estimating procedures to estimate α and Λ such as Expectation-Maximization (EM) algorithm to obtain the nonparametric maximum likelihood estimate (NPMLE) have been proposed [22]. However, a challenge remains in estimating α as its dimension is large (q+1+2p). We use a ridge regression to stablize the estimation by introducing an L2 penalty on the coefficients corresponding to methylation related components. The penalized log-likelihood under the null model (6) is lp(ψ)=ln(ψ)12λβSTβS12λβSMTβSM where ln(ψ)=i=1nli(ψ), li is the unit log likelihood under the null model (6), λ is a tuning parameter and ψT=(αT,ΛT). The estimation of ψ can be achieved by solving the estimating equation Uψ(ψ)λI2ψ=0 where UψT(ψ)=(UαT,UΛT), Uα and UΛj are provided in Appendix, I2 is (q+1+2p+m)×(q+1+2p+m) block diagonal matrix with the top (q+1+2p)×(q+1+2p) block diagonal matrix being I(q+1+2p)×(q+1+2p) and the bottom m×m block diagonal matrix being 0 with m being the number of events. For selection of the tuning parameter λ, we use generalized cross-validation (GCV) [24, 25] to estimate λ as the minimizer of the GCV function ln(ψ^)n{1n1tr(H)}2, where H=(Uψψ+λI2)1Uψβ. λ is searched within a range of [0, nlog(n)] to ensure λ^=o(n), an assumption that we later use to derive the asymptotic distribution of Q^, the estimate of Q. By plugging in the estimates of α and Λ, one can obtain Q^=Q(ψ^).

2.2.2. Distribution of Q(ψ^)

Denote θT=(βT,ψT) and ψ0, β0(=0) and θ0 to be true parameters under the null (4) for their counterparts ψ, β and θ. Q can be re-expressed as an L2 norm of the score for β:

Q^=n12Uβ(β0,ψ^)2.

Note that the weight w is involved in the test statistic. As expressed in Vwi, the weighting scheme can be conceived as a pre-determined variable standardization before fitting the model. We show in Appendix that

n12Uβ(β0,ψ^)=n12AUζ(θ0)+op(1)J. (7)

By continuous mapping theorem, asymptotic distribution of Q^ is a function of the estimating equation Uζ:

Q^dn12AUζ(θ0)2. (8)

n12AUζ(θ0) can be approximated by a perturbation procedure [26, 27] using the estimating equation n12A^iUζi(ψ^)𝒩i where 𝒩=(𝒩1,,𝒩n)T is a vector of n independent standard normal random variables; A^ is the empirical version of A by plugging in ψ^, the estimate under the null model (6) with L2 penalty; A=[[I2p+2×2p+2,Uβψψ(Uψψψ+λI2)1] with ψ between ψ^ and ψ0; and Uβψ Uψψ, Uζ=iUζi are provided in Appendix.

2.2.3. The omnibus test

While testing procedures derived under the three-way interaction model is robust to model misspecification, power may be compromised when the true underlying model does not involve certain interactions. Hence, it is desirable to develop a test that can accommodate different models to optimize statistical power. We propose an omnibus test that combines multiple p-values from testing under a range of models that incorporate different layers of interactions yet are all correct under the null. Specifically, we compute the minimum of these p-values from multiple models and compare the observed minimum p-value to its null distribution, approximated by a resampling perturbation procedure. The test statistic Q in (5) is derived under the outcome model (1), which assumes all possible two-way and three-way interactions. In this section, we denote the test statistic (5) as Q4. Suppose that the outcome Y does not depend on the three-way interaction (βSMG=0), or it does not depend on the three-way interaction, SNP-by-methylation or the SNP-by-expression interaction βSMG=βSM=βSG=0), or it depends only on the main effect of gene expression (βSMG=βSM=βSG=0 and βMG=0), then it is more powerful to test for H0:βG=βMG=0, βSG=βSMG=0 using the test statistics Q3, Q2, and Q1, respectively, with corresponding VwiT=(w1Gi,w2MiGi,w3GiSiT), (w1Gi,w2MiGi) and (w1Gi). Q1-Q4 all provide valid tests under the null. Under those more parsimonious models, the test statistic Q4 loses power as it tests for unnecessary parameters. However, if the outcome model is truly determined by all two-way and three-way interactions as (1), Q1-Q3 will lose power compared to Q4.

As shown in Section 2.2.2, the null distribution of Q can be estimated based on the empirical distribution of the perturbed statistics n12A^iUζi(ψ^)𝒩i2 conditional on the observed data. By generating independent 𝒩 repeatedly, the perturbed realization of Q can be obtained, denoted by {Q^(b),b=1,,B}, where B is the number of perturbations. The p-value can be approximated as the tail probability by comparing {Q^(b)} with the observed Q^. Hence one can calculate the p-values of the four candidate models by inputting Uθi with VwiT=(w1Gi), (w1Gi,w2MiGi), (w1Gi,w2MiGi,w3GiSiT) and (w1Gi,w2MiGi,w3GiSiT,w4MiGiSiT), respectively for Q1-Q4, generating their perturbed realizations of the null counterpart for the candidate model k as {Q^k(b)}, and comparing them with corresponding observed values Q^k(k=1,,4). Note that for each perturbation b, the random normal perturbation variable 𝒩(b) is the same across the four tests. Let P^k=𝒮k(Q^k) be the p-value for the candidate model k, where 𝒮k(q)=pr{Q^k(b)>g}. The null distribution of the minimum p-value, P^min=minkP^k can be approximated by the empirical distribution of {P^min(b)=mink{𝒮k(Q^k(b))},b=1,,B} given the observed data. The p-value of the omnibus test hence can be calculated by comparing P^min with {P^min(b)}.

3. Implication of testing a subset of coefficients

In this section, we provide mechanistic interpretation of our testing procedure. The effect on Y contributed by G can be examined by testing all the parameters related to G, as null (2). Similarly, those contributed by S and M, respectively, can be evaluated by testing

H0:βS=βSM=βSG=βSMG=0H0:βM=βMG=0,βSM=βSMG=0.

By testing different subsets of regression coefficients, we are able to examine the significance of various genomic effects on the survival outcome. The proposed integrative testing procedure helps identify useful biomarkers across multiple genomic data, which can also be potential therapeutic targets.

Furthermore, we can interpret the results under the framework of causal mediation modeling. In our data example, there are three path-specific effects (Figure 1): 1) the effect of DNA methylations on the outcome mediated through gene expression but not through microRNA, denoted by ΔSGY; 2) the effect of methylations mediated through microRNA and possibly through gene expression, denoted by ΔSMY; and 3) the alternative effect of DNA methylations on the outcome, not through microRNA or mRNA gene expression, denoted by ΔSY. With identifiability assumptions discussed in Supplementary Materials[28], it has been shown that under the structure that M is determined by S, G is determined by M, and G is also determined by S independent of M, ΔSGY corresponds to all regression coefficients for G:βG,βMG,βSG and βSMG; ΔSMY corresponds to all regression coefficients for M and G:βM,βG,βMG,βSM,βSG and βSMG; ΔSY corresponds to all regression coefficients for S:βS,βSM,βSG and βSMG; the overall effect Δoverall corresponds to all regression coefficients: βS,βM,βG,βMG,βSM,βSG and βSMG[15]. With these results, the testing procedures in Section 2.2 can be used to examine path-specific effects and thus have mechanistic implication. For example, the test for H0:βG=βMG=0,βSG=βSMG=0 is equivalent to that for H0:ΔSGY=0; and the test statistic (5) assesses the effect of methylation S on the survival time T mediated through gene expression G. More discussions on path-specific effects under mediation analyses can be found in Supplementary Materials.

4. Simulation

We have conducted extensive simulation studies to evaluate the performance of the proposed methods and compare with the conventional score test. We investigate p = 12 DNA methylation markers of GRB10, microRNA miR-633 and mRNA expression of GRB10 in n = 271 simulated subjects. To mimic the motivating data example of the survival study for glioblastoma multiforme or GBM, we simulate the data focusing on GRB10 gene. We obtain 12 DNA methylation markers at GRB10 from 271 GBM patients of TCGA data and simulate microRNA miR-633 expression, mRNA gene expression of GRB10 and failure time based on the real methylation data. We assume cg25915982 at 50.85 Mb of chromosome 7 to be the causal methylation marker Scausal. MicroRNA miR-633 expression, mRNA expression of GRB10 and survival time are generated using the causal marker, but analyses are based on all 12 methylation markers, assuming we do not know the causal marker. miR-633 expression value M is generated by a model: Mi=5.75+Scausal,i×δS+ϵM,i, where ϵM,i follows normal distribution with mean zero and standard deviation 0.05. mRNA expression of GRB10 G is generated by a model: Gi=10+Scausal,i×αS+Mi×αM+ϵGM,i, where ϵGM,i follow standard normal distribution. Survival time T is generated by a model: logTi=Scausal,iβS+MiβM+GiβG+MiScausal,iβSM+GiScausal,iβSG+MiGiβMG+MiGiScausal,iβSMG+ϵTi where ϵTi follow standard normal. Censored time C is selected to control the censoring proportion at 70%. Observed follow up time T is the minimum of T and C, and survival status is death if TC or censored if T>C. For 𝒢() transformation in analyses, we consider Box-Cox transformation 𝒢(x)=(1+x)ρ1ρ with ρ=1.2. We also conduct simulation studies where data are generated with Box-Cox transformation with ρ=1.2 or 1.0 and analyses is performed with correctly specified model (see Supplementary Materials, Tables S2-S7).

By setting different configurations of δs and αs, we are able to generate data according to different DNA methylation-microRNA-mRNA expression relationships illustrated in Table S1. But here we will focus on the first condition in Table S1: δS=0.04, αS=2.5 and αM=2.0 since the testing procedures under other conditions are the same or just special cases. We study the performance of tests under various configurations of βs. Empirical size and power are estimated as percentage of p-value < 0.05 in 2000 simulations.

4.1. Size and power of ΔSY, ΔSGY and ΔSMY

Empirical size and power of testing H0:ΔSY=0 are presented in Table 1. Empirical sizes are correct under different null models: all βs are zero, all βs are zero except βM(=0.3), all βs except βG(=0.3) are zero. For settings under the alternatives, the test with correct model specification has optimal power, and the omnibus test can almost reach the optimal power across different settings. For example, under the setting with only main effects (βS=0.4, βM=βG=0.3), the proposed test focusing on main effects has the optimal power 86.5%; under the setting with main effects and two-way interactions (βS=0.1, βM=βG=βMG=βSM=βSG=0.3, βSMG=0), the test under the correct model have the optimal power 67.2%; and omnibus tests are very close to the two optimal tests with power 80.4% and 55.1%, respectively (Table 1). Type I error of standard score test with 4p=48 DF is largely inflated probably due to the DF and the high correlation among the markers.

Table 1:

Empirical size and power (%) of testing ΔSY.Q1: model with only main effects; Q2: model with main effects and microRNA-by-expression interaction; Q3: model with main effects and two-way interactions; Q4: model with main effects, two-way and three-way interactions; Omnibus: the omnibus test for Q1-Q4; Score test: the classic score test for βs.

Null Alternative
βS 0 0 0 0.4 0.2 0.4 0.6 0.1 0.1 0 0
βM 0 0.3 0 0 0.3 0.3 0.3 0.3 0.3 0 0
βG 0 0 0.3 0 0.3 0.3 0.3 0.3 0.3 0 0
βMG 0 0 0 0 0 0 0 0.3 0.3 0 0
βSM 0 0 0 0 0 0 0 0.2 0.3 0 0
βSG 0 0 0 0 0 0 0 0.2 0.3 0 0
βSMG 0 0 0 0 0 0 0 0 0 0.5 1.0
Q1 4.10 3.90 3.85 89.3 25.8 85.0 99.9 4.25 3.85 5.90 7.00
Q2 4.10 4.20 3.85 89.3 26.5 84.9 99.9 4.20 4.10 6.25 8.50
Q3 3.65 3.95 4.35 70.5 14.3 59.6 95.8 19.7 55.8 7.10 19.8
Q4 2.95 3.30 3.25 62.1 11.0 49.4 91.9 16.1 46.5 61.9 95.6
Omnibus 3.90 4.10 3.75 83.4 21.3 78.0 99.6 13.2 43.4 45.7 90.0
Score test 48.3 49.9 51.9

Empirical size and power of testing H0:ΔSGY=0 are presented in Table 2. Empirical sizes are correct under different null: all βs are zero, all βs except βS(=0.2) are zero, all βs except βM(=0.2) are zero. Under the alternatives, tests assuming the correct models perform the best and the omnibus test can almost reach the optimal power with limited power loss, similar to the results for ΔSY. For instance, under the setting with βS=0.2, βM=0.2, βG=0.3 and all other βs to be zero, the test for main effects performs optimally with power 86.9%, and the omnibus test has power 80.7%. Type I error of the conventional score test with 2p+1(=25) DF is again largely inflated.

Table 2:

Empirical size and power (%) of testing ΔSGY.Q1: model with only main effects; Q2: model with main effects and microRNA-by-expression interaction; Q3: model with main effects and two-way interactions; Q4: model with main effects, two-way and three-way interactions; Omnibus: the omnibus test for Q1-Q4; Score test: the classic score test for βs. The tuning parameter λ was chosen using GCV.

Null Alternative
βS 0 0.2 0 0 0.2 0.2 0.2 0.1 0.1 0 0
βM 0 0 0.2 0 0.2 0.2 0.2 0.1 0.1 0 0
βG 0 0 0 0.3 0.2 0.3 0.4 0.1 0.1 0 0
βMG 0 0 0 0 0 0 0 0.3 0.5 0 0
βSM 0 0 0 0 0 0 0 0.3 0.5 0 0
βSG 0 0 0 0 0 0 0 0.3 0.5 0 0
βSMG 0 0 0 0 0 0 0 0 0 0.5 0.8
Q1 4.70 4.40 4.65 90.2 56.2 88.5 98.8 6.30 3.35 7.55 11.3
Q2 5.85 4.95 4.90 82.5 44.3 81.1 97.0 9.60 8.45 9.60 13.4
Q3 5.00 5.20 5.00 73.5 36.8 72.8 94.1 60.5 93.5 10.9 20.7
Q4 5.10 4.90 4.60 66.9 31.6 66.7 91.5 53.3 88.7 63.8 88.7
Omnibus 5.35 4.70 4.80 84.1 47.2 82.8 97.6 45.8 85.1 45.2 78.5
Score test 45.3 45.0 48.3

Similarly, type I error of our proposed methods for H0:ΔSMY=0 is protected under the null (Table 3). In contrast, type I error of the conventional score test with 3p+3 DF is inflated. Under the alternatives, tests assuming the correct models perform optimally, and the omnibus test approaches the optimal power across a wide range of settings.

Table 3:

Empirical size and power (%) of testing ΔSMY.Q1: model with only main effects; Q2: model with main effects and microRNA-by-expression interaction; Q3: model with main effects and two-way interactions; Q4: model with main effects, two-way and three-way interactions; Omnibus: the omnibus test for Q1-Q4; Score test: the classic score test for βs. The tuning parameter λ was chosen using GCV.

Null Alternative
βS 0 0.4 0 0 0.3 0.3 0.3 0.1 0.1 0 0
βM 0 0 0.3 0 0.1 0.2 0.3 0 0 0 0
βG 0 0 0 0.3 0.2 0.2 0.2 0 0 0 0
βMG 0 0 0 0 0 0 0 0.3 0.5 0 0
βSM 0 0 0 0 0 0 0 0.3 0.5 0 0
βSG 0 0 0 0 0 0 0 0.3 0.5 0 0
βSMG 0 0 0 0 0 0 0 0 0 0.5 1.0
Q1 4.65 5.00 89.8 83.0 62.9 87.8 98.2 4.90 5.15 47.2 78.3
Q2 5.25 5.35 86.0 75.8 57.0 83.5 96.6 7.45 9.65 43.4 73.6
Q3 5.55 4.80 76.9 65.4 44.9 72.8 92.9 49.1 93.3 36.3 67.3
Q4 4.30 4.60 73.9 59.0 41.7 69.6 91.3 41.0 87.3 80.6 99.0
Omnibus 5.30 4.65 86.4 76.6 55.9 83.2 96.6 33.9 85.5 69.6 97.4
Score test 51.0 49.0

The test size is also protected at type I error rate of 0.005 and 0.0005 (Table S8). Additional simulation studies with multiple causal methylation loci (Tables S9-S14) and different combinations of sample size, the number of methylation markers and censoring proportion (Tables S15-17) are presented and discussed in Supplementary Materials (Section 2).

5. Data Applications

We present two data application examples, both assessing the genomic contribution to overall survival of GBM. GBM is the most common malignant brain tumor that is rapidly fatal with median survival time of 15 months [29]. Due to its poor prognosis and lack of well-established environmental risk factors, it is important to identify genomic markers for outcome prognostication, which also help understand the progression mechanism of this fatal disease. Multiple sets of genomic data as well as survival information have been archived on TCGA. Here we exploit the multi-platform genomic data to investigate the mechanism of epigenetic effect on GBM mortality.

5.1. GRB10 gene and GBM survival

We integrate epigenetic DNA methylation of GRB10, expression of microRNA miR-633 and gene expression of GRB10 to jointly model overall survival of GBM. There are 271 patients with complete level 3 data on methylation, microRNA and gene expression arrays. We combine 12 methylation loci at GRB10 from Illumina 27K array and its expression value on Agilent G4502A expression array as well as the expression of microRNA, miR-633, to perform a gene-based integrated analysis. We have shown that DNA methylation of GRB10 gene is significantly associated with overall survival of GBM, and that GRB10 expression is regulated by its methylation [3], which is also supported by the existing literature [5]. We have found that two methylation sites of GRB10 are associated with the expression of miR-633 with p-value = 0.017 and 0.012, and the expression of miR-633 is also highly associated with expression of GRB10 with p-value = 0.0031 from Wald-type univariate hypothesis tests for least square estimators. Furthermore, literature has shown that GRB10 gene is the target of miR-633 [4] and microRNA expression can be regulated by methylation [30]. Therefore, based on the evidence from literature and statistical analyses, we set up a model as Figure 1, with S, M and G being 12 DNA methylation loci of GRB10, miR-633 and GRB10 expressions, respectively.

The results of the proposed integrated analyses for GRB10 are provided in Table 4. The effects of DNA methylation of GRB10 mediated through GRB10 expression (ΔSGY: omnibus p-value=0.0045) or miR-633 (ΔSMY: omnibus p-value=0.0081) expression are prominent, compared to the effect independent of the two expression values (ΔSY: omnibus p-value=0.14). The overall effect of methylation on survival is also significant (omnibus p-value=0.012). In contrast, likelihood ratio test (LRT) can not be performed due to failure in convergence when fitting model (1), and score test does not protect the type I error, as shown in simulation studies. We conclude that GRB10 methylation has a significant effect on overall survival of GBM, which is mostly mediated by miR-633 expression or GRB10 expression.

Table 4:

p-values for three path-specific effects of GRB10 gene and miR-633 on GBM survival. Q1-Q4 correspond to the models mentioned in Section 2.2.3.

ΔSY ΔSGY ΔSMY Δoverall
Q1 0.10 0.0047 0.0047 0.0142
Q2 0.09 0.0047 0.0052 0.0086
Q3 0.14 0.0035 0.0150 0.0163
Q4 0.21 0.0045 0.0170 0.0159
Omnibus 0.14 0.0045 0.0081 0.0119

5.2. miR-223 and GBM survival

In the second example, we apply our proposed procedures to examine the effect between miR-223 and GBM survival, accounting for expression values of 16 mediation genes. Our previous work suggests that the prognostic effect of miR-223 expression is mediated by expression levels of the 16 genes [31]. We set up a integrated analysis illustrated in Figure 2. It can be viewed as a simplified case of Figure 1, with S being the scalar expression value of miR-223, M=G being the expression values of the 16 mediation genes. It follows that there are only two path-specific effects: ΔSGY, the effect of miR-223 expression on the GBM survival, mediated through expressions of the 16 mediation genes, and ΔSY, the effect of miR-223 expression independent of the 16 mediation genes.

Figure 2:

Figure 2:

Causal diagram of microRNA miR-223 expression (S), 16 gene expression values (G) and outcome of interest (Y=H(T)). Two path-specific effects are in different line styles: ΔSY, effect of miR-223 on outcome independent of 16 mRNA gene expression is in dotted line; ΔSGY, effect of miR-223 expression mediated through mRNA expression of the p(=16) genes is in solid lines.

There are 504 GBM patients with complete level 3 data on microRNA and gene expression arrays. Both path-specific effects of miR-223 are highly significant, as shown in Table 5. The omnibus p-value for the effect of miR-223 mediated through the 16 genes is < 10−6, and the p-value for the effect of miR-223 independent of the 16 genes is 0.0009. The p-value of the overall effect is 0.0008. We conclude that miR-223 may be a promising prognostic marker for GBM patients, and the mechanisms mediated through gene expression or other pathways are both highly significant and deserve further research.

Table 5:

p-values for two path-specific effects of miR-223 (S) and 16 mediation genes (G) on GBM survival. Q1 corresponds to the main-effect model, and Q2 corresponds to the model with both main and interactive effects.

ΔSY ΔSGY Δoverall
Q1 0.0007 < 10−6 0.0045
Q2 0.0052 < 10−6 0.0009
Omnibus 0.0009 < 10−6 0.0008

6. Discussion

In this paper, we propose a testing procedure for path-specific effects of genomic markers on survival outcome through a semiparametric linear transformation modeling framework. We are able to decompose the genomic effect into molecule-specific components using the path-specific effect approach. In addition to shedding light on the mechanism of disease etiology, the path-specific effect may have translational utility. Epigenetic alterations such as microRNA expression and DNA methylation are potentially reversible [32, 33, 34], and microRNA regulation has specificity in target genes. The findings from our path-specific effect analyses provide more specific hypotheses and mechanisms for biologists to validate, compared to conventional epigenome-wide association studies. Furthermore, the path-specific effect can also highlight biomarkers where therapeutic devices may be developed. For example, we observe a significant effect of DNA methylation of GRB10 mediated through miR-633 ΔSMY and its mRNA expression ΔSGY (Table 4); one may thus design a gene-specific intervention on mRNA expression of GRB10 through miR-633 or other small RNA to improve GBM survival even though there is little gene- or loci-specific intervention is available on DNA methylation.

We note that carrying out the NPMLE and the resampling perturbation procedures is computationally intensive but not prohibitive. For the analyses of GBM survival data in Section 5.1 performed on a laptop with Intel i5-3380M 2.90 GHz CPU and 8.00 RAM, the proposed testing procedure with 1000 resampling perturbation takes 3.95 seconds if the tuning parameter λ is pre-specified and 30.30 seconds if λ is selected via GCV. All simulation studies (n=271 and p=12; 1000 resampling perturbation and 2000 replicated) are performed using a computer cluster with 2 - 8core Intel Xeon CPUs running at 2.53 GHz, 24.00 RAM and a Linux environment. The total time for completing each simulation is 2.58 hours with pre-specified λ and 15.50 hours with GCV selected λ. The Matlab codes are available in Supplementary Materials.

The proposed test is a score test for the variance component of the parameters of interest. Instead of fitting a large model as shown in (1), one only needs to fit a model under the null, which makes the method numerically stable. The non-parametric maximum likelihood estimator, proposed by Zeng and Lin [22] for the null model using Newton-Raphson or EM algorithm requires iteration where we use α=0 and Λ being the inverse of the number of events as initial values. In our simulation studies, the convergence rates are extremely high with 99.8% for ΔSGY and 100% for ΔSY and ΔSMY. One alternative would be to obtain initial parameters from a consistent estimator [20] to assure a better convergence and to stablize the estimating procedure. On the other hand, as the proposed method relies on a resampling-based perturbation procedure to approximate the tail probability, it remains difficult to precisely approximate a very small p-value in practice.

Our approach extends the previous work for genetic analyses [7, 8] to facilitate integrated genomic analyses, and the proposed omnibus test synthesizes information from various candidate models to boost statistical power as well as to preserve the robustness to model misspecification. The linear transformation model has also been extended to incorporate dependent failure time, repeated measurement as well as time-varying covariates [22]. Based on our current work, its flexibility may facilitate future directions for big data sciences. For instance, the model (1) can be easily extended to incorporate time-varying genomic markers. As the genomic profile is dynamic during cancer development, ‘time-varying integrative genomics’ may better reveal the biological mechanisms behind this fatal disease.

The estimate of α in (6) is biased using an L2 ridge regression. The bias is a function of the tuning parameter λ. We address this in our theoretical development as well as in numerical studies. It should be noted that here we focus on hypothesis testing rather than estimation, and our testing procedure is developed under the null. To ensure its validity, one has to derive the distribution of test statistic Q(ψ^) that incorporates λ under the null. We show in Appendix 7.2 and Section 2.2.2 that with a bounded tuning parameter λ=o(n), the asymptotic distribution of ψ^ is a function of score Uζ and λ in (8). In real application, one still has to approximate A(ψ) and Uζ(ψ) in (8) by plugging in ψ^=(α^T,Λ^T)T. Therefore, we also evaluate the validity of our testing procedure in simulation studies with empirical estimates under finite sample. As shown in the first three columns of Table 2 (Null), our proposed testing procedures Q1-Q4 and the omnibus test protect Type I Error at 5%.

Supplementary Material

Supplementary Materials

Acknowledgments

The authors are grateful to the editor, the associate editor and two anonymous referees for their insightful comments that improved the presentation of the paper. This study is supported by National Institutes of Health grants CA182937 and AG048825.

7. Appendix

7.1. Estimating equation of model (1)

The log-likelihood can be written as ln=i{δilogdΛ(TiZi)Λ(TiZi)}, where δi=1 if subject i is death and 0 otherwise and Λ(Ti)=jI(TjTi)Λj. It follows that the score for γ and Λj are:

Uγ=i[{δi𝒢(eγTZiΛ(Ti))𝒢(eγTZiΛ(Ti))𝒢(eγTZiΛ(Ti))}eγTZiΛ(Ti)Zi+δiZi]UΛj=1Λj+i[{δi𝒢(eγTZiΛ(Ti))𝒢(eγTZiΛ(Ti))𝒢(eγTZiΛ(Ti))}eγTZiI(TjTi)].

The scores for γ and Λj can be re-expressed as a set of estimating equations:

Uγ=iUγi,UΛj=iUΛji,j=1,,m,

and

Uγi=[UβiUαi]=[{δi𝒢(eγTZiΛ(Ti))𝒢(eγTZiΛ(Ti))𝒢(eγTZiΛ(Ti))}eγTZiΛ(Ti)+δi]ZiUΛji=[{δi𝒢(eγTZiΛ(Ti))𝒢(eγTZiΛ(Ti))𝒢(eγTZiΛ(Ti))}eγTZiΛ(j)(Ti)+δiI(TjTi)].

where Λ(j)(Ti)=Λ(Ti)I(TiTj)+Λ(Tj)I(Ti>Tj). We can denote UζT=(UγT,UΛT)=(UβT,UψT) and UΛT=(UΛ1,,UΛm).

And the derivatives of the estimating equations are:

Uζθ=[UγγUγΛUΛγUΛΛ]=[UββUβψUψβUψψ].

The element of Uζθ can be expressed as follows:

Uγγ=i(d1iΛ2(Ti)+d0iΛ(Ti))ZiZiT,UΛjΛk=i(d1iΛ(j)(Ti)+d0iI(TkTj))I(TkTi)UγΛj=i(d1iΛ(Ti)+d0i)ZiI(TjTi),UΛjγ=i(d1iΛ(Ti)+d0i)Λ(j)(Ti)ZiT,

where d1i=[{𝒢(eγTZiΛ(Ti))𝒢(eγTZiΛ(Ti)(𝒢(eγTZiΛ(Ti)𝒢(eγTZiΛ(Ti))2}δi𝒢(eγTZiΛ(Ti))]e2γTZi and d0i={δi𝒢(eγTZiΛ(Ti))𝒢(eγTZiΛ(Ti))𝒢(eγTZiΛ(Ti))}eγTZi, and the (j, k)-th element of UΛΛ is UΛjΛk.

7.2. Distribution of Q(ψ^)

Denote α0, Λ0, ψ0, β0(=0) and θ0 are the true parameters under the null (4) for their counterparts α, Λ , ψ β and θ. A simple Taylor series expansion shows

n12U^β(β0)=n12Uβ(β0,ψ^)=n12Uβ(β0,ψ0)+n12Uβψψ(ψ^ψ0), (A. 1)

where ψ is between ψ^ and ψ0. Another Taylor expansion can show that

0=n12U^ψλ(β0,ψ^)=n12Uψ(β0,ψ^)n12λI2ψ=(n12Uψ(β0,ψ0)+n12Uψψψ(ψ^ψ0)n12λI2ψ^=n12Uψ(β0,ψ0)+n12(UψψψλI2)(ψ^ψ0)n12λI2ψ0,

where I2 is (q+m)×(q+m) block diagonal matrix with the top q×q block diagonal matrix being Iq×q and the bottom m×m block diagonal matrix being 0. Since λ=o(n), it follows that n(ψ^ψ0)=[[n1(Uψψψ+λI2)]1n12Uψ(β0,ψ0)+op(1)J, where J is a vector of 1’s with length the same as β. By plugging it in (A. 1), one can obtain

n12Uβ(β0,ψ^)=n12Uβ(β0,ψ0)+n12Uβψψ(Uψψψ+λI2)1Uψ(β0,ψ0)+op(1)J=n12[Uβ(β0,ψ0)+Uβψψ(Uψψψ+λI2)1Uψ(β0,ψ0)]+op(1)J,

Thus (A. 1) becomes

n12Uβ(β0,ψ^)=n12AUζ(θ0)+op(1)J. (A. 2)

Recall A=[I2p+2×2p+2,Uβψψ((Uψψψ+λI2)1], and Uβψ, Uψψ, Uζ are provided in the above section

References

  • [1].Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TF, McCarroll SA and Visscher PM. Finding the missing heritability of complex diseases. Nature 2009; 461(7265): 747–753. DOI: 10.1038/nature08494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Wang W, Baladandayuthapani V, Morris JS, Boom BM, Manyam G and Do KA. iBAG: integative Bayesian analysis of high-dimensional multiplatform genomics data. Bioinformatics 2013; 29(2):149–159. DOI: 10.1093/bioinformatics/bts655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Smith AA, Huang YT, Eliot M, Houseman EA, Marsit JK, Wiencke JK and Kelsey KT A novel approach to the discovery of survival biomarkers in glioblastoma using a joint analysis of DNA methylation and gene expression. Epigenetics 2014; 9(6): 873–883. DOI: 10.4161/epi.28571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Jia P, Sun J, Guo AY and Zhao Z. SZGR: a comprehensive schizophrenia gene resource. Molecular Psychiatry 2010; 15(5):453–462. DOI: 10.1038/mp.2009.93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Turan N, Ghalwash MF, Katari S, Coutifaris C, Obradovic Z and Sapienza C. DNA methylation differences at growth related genes correlate with birth weight: a molecular signature linked to developmental origins of adult disease? BMC Medical Genomics 2012; 5:10. DOI: 10.1186/1755-8794-5-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Robins JM Semantics of causal DAG models and the identification of direct and indirect effects. Oxford University Press, New York, 2003. [Google Scholar]
  • [7].Cai T, Tonini G and Lin X. Kernel machine approach to testing the significance of multiple genetic markers for risk prediction. Biometrics 2011; 67(3): 975–986. DOI: 10.1111/j.1541-0420.2010.01544.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Tzeng JY, Lu W and Hsu FC. Gene-level pharmacogenetic analysis on survival outcomes using gene-trait similarity regression. The Annals of Applied Statistics 2014; 8(2): 1232–1255. DOI: 10.1214/14-AOAS735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Robins JM and Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology 1992; 3(2): 143–155. [DOI] [PubMed] [Google Scholar]
  • [10].Pearl J. Direct and indirect effects. In Proceedings of the Seventeenth Conference on Uncertainty and Artificial Intelligence. Morgan Kaufmann, San Francisco, 2001; 411–420. [Google Scholar]
  • [11].VanderWeele TJ and Vansteelandt S. Conceptual issues concerning mediation, intervention and composition. Statistics and its Interface 2009; 2:457–468. DOI: 10.4310/SII.2009.v2.n4.a7. [DOI] [Google Scholar]
  • [12].Imai K, Keele L and Yamamoto T. Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science 2010; 25(1):51–71. DOI: 10.1214/10-STS321. [DOI] [Google Scholar]
  • [13].Huang YT, VanderWeele TJ and Lin X. Joint analysis of SNP and expression data in genetic association studies of complex diseases. Annals of Applied Statistics 2014; 8(1):352–376. DOI: 10.1214/13-AOAS690. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Zhao SD, Cai TT and Li H. More powerful genetic association testing via a new statistical framework for integrative genomics. Biometrics 2014; 70(4): 881–890. DOI: 10.1111/biom.12206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Huang YT. Integrative modeling of multiplatform genomic data under the framework of mediation analysis. Statistics in Medicine 2015; 34(1): 162–178. DOI: 10.1002/sim.6326. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Cox DR. Regression models and life-tables (with discussion). Journal of the Royal Statistical Society, Series B 1972; 34(2): 187–220. [Google Scholar]
  • [17].Anderson PK, and Gill RD. Cox’s regression model for counting process: a large sample study. Annals of Statistics 1982; 10(4): 1100–1120. DOI: 10.1214/aos/1176345976. [DOI] [Google Scholar]
  • [18].Kalbfleisch JD and Prentice RL. The Statistical Analysis of Failure Time Data, 2nd Edition, Hoboken: Wiley, 2002. [Google Scholar]
  • [19].Bennett S. Analysis of survival data by the proportional odds model. Statistics in Medicine 1983; 2(2): 273–277. DOI: 10.1002/sim.4780020223. [DOI] [PubMed] [Google Scholar]
  • [20].Cheng SC, Wei LJ and Ying Z. Analysis of transformation models with censored data. Biometrika 1995; 82(4): 835–845. DOI: 10.1093/biomet/82.4.835. [DOI] [Google Scholar]
  • [21].Cai T, Cheng SC and Wei LJ. Semiparametric mixed-effects models for clustered failure time data. Journal of the American Statistical Association 2002; 97(458): 514–522. DOI: 10.1198/016214502760047041. [DOI] [Google Scholar]
  • [22].Zeng D and Lin DY. Maximum likelihood estimation in semiparametric regression models with censored data. Journal of the Royal Statistical Society, Series B 2007; 69(4): 507–564. DOI: 10.1111/j.1369-7412.2007.00606.x. [DOI] [Google Scholar]
  • [23].Lin X. Variance component test in generalised linear models with random effects. Biometrika 1997; 84(2):309–326. DOI: 10.1093/biomet/84.2.309. [DOI] [Google Scholar]
  • [24].Craven P and Wahba G. Smoothing noisy data with spline functions. Numerische Mathematik 1979; 31(4): 377–403. DOI: 10.1007/BF01404567. [DOI] [Google Scholar]
  • [25].O’Sullivan F, Yandell BS and Raynor WJ Jr. Automatic smoothing of regression functions in generalized linear models. Journal of the American Statistical Association 1986; 81(393):96–103. DOI: 10.1080/01621459.1986.10478243. [DOI] [Google Scholar]
  • [26].Parzen M, Wei LJ, and Ying Z. A resampling method based on pivotal estimating functions. Biometrika 1994; 81(2):341–350. DOI: 10.2307/2336964. [DOI] [Google Scholar]
  • [27].Cai T, Wei LJ, and Wilcox M. Semiparametric regression analysis for clustered failure time data. Biometrika 2000; 87(4): 867–878. DOI: 10.1093/biomet/87.4.867. [DOI] [Google Scholar]
  • [28].Huang YT and Cai T. Mediation analysis for survival data using semiparametric probit models. Biometrics 2016; DOI: 10.1111/biom.12445. [DOI] [PubMed] [Google Scholar]
  • [29].Stupp R, Mason WP, van den Bent MJ, Weller M, Fisher B, Taphoorn MJ, Belanger K, Brandes AA, Marosi C, Bogdahn U, Curschmann J, Janzer RC, Ludwin SK, Gorlia T, Allgeier A, Lacombe D, Cairncross JG, Eisenhauer E, Mirimanoff RO, European Organisation for Treatment of Cancer Brain Tumor and Radiotherapy Groups and National Cancer Institute of Canada Clinical Trials Group. Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma. New England Journal of Medicine 2005; 352(10):987–996. DOI: 10.1056/NEJMoa043330. [DOI] [PubMed] [Google Scholar]
  • [30].Suzuki H, Maruyama R, Yamamoto E and Kai M. DNA methylation and microRNA dysregulation in cancer. Molecular Oncology 2012; 6(6):567–578. DOI: 10.1016/j.molonc.2012.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Huang YT, Hsu T, Kelsey KT and Lin CL. Integrative analysis of micro-RNA, gene expression and survival of glioblastoma multiforme. Genetic Epidemiology 2015; 39(2): 134–143. DOI: 10.1002/gepi.21875. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Issa JP, Gharibyan V, Cortes J, Jelinek J, Morris G, Verstovsek S, Talpaz M, Garcia-Manero G and Kantarjian HM. Phase II study of low-dose decitabine in patients with chronic myelogenous leukemia resistant to imatinib mesylate. Journal of Clinical Oncology 2005; 23(17):3948–3956. DOI: 10.1200/JCO.2005.11.981. [DOI] [PubMed] [Google Scholar]
  • [33].Kaminskas E, Farrell A, Abraham S, Baird A, Hsieh LS, Lee SL, Leighton JK, Patel H, Rahman A, Sridhara R, Wang YC and Pazdur R. Approval summary: azacitidine for treatment of myelodysplastic syndrome subtypes. Clinical Cancer Research 2005; 11(10):3604–3608. DOI: 10.1158/1078-0432.CCR-04-2135. [DOI] [PubMed] [Google Scholar]
  • [34].Garcia-Manero G, Kantarjian HM, Sanchez-Gonzalez B, Yang H, Rosner G, Verstovsek S, Rytting M, Wierda WG, Ravandi F, Koller C, Xiao L, Faderl S, Estrov Z, Cortes J, O’Brien S, Estey E, Bueso-Ramos C, Fiorentino J, Jabbour E and Issa JP. Phase 1/2 study of the combination of 5-aza-2’-deoxycytidine with valproic acid in patients with leukemia. Blood 2006; 108(10):3271–3279. DOI: 10.1182/blood-2006-03-009142. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Materials

RESOURCES