Generated effect modifiers (GEM’s) in randomized clinical trials

Eva Petkova; Thaddeus Tarpey; Zhe Su; R Todd Ogden

doi:10.1093/biostatistics/kxw035

. 2016 Jul 27;18(1):105–118. doi: 10.1093/biostatistics/kxw035

Generated effect modifiers (GEM’s) in randomized clinical trials

Eva Petkova ^1,^*, Thaddeus Tarpey ², Zhe Su ³, R Todd Ogden ⁴

PMCID: PMC5255046 PMID: 27465235

Abstract

In a randomized clinical trial (RCT), it is often of interest not only to estimate the effect of various treatments on the outcome, but also to determine whether any patient characteristic has a different relationship with the outcome, depending on treatment. In regression models for the outcome, if there is a non-zero interaction between treatment and a predictor, that predictor is called an “effect modifier”. Identification of such effect modifiers is crucial as we move towards precision medicine, that is, optimizing individual treatment assignment based on patient measurements assessed when presenting for treatment. In most settings, there will be several baseline predictor variables that could potentially modify the treatment effects. This article proposes optimal methods of constructing a composite variable (defined as a linear combination of pre-treatment patient characteristics) in order to generate an effect modifier in an RCT setting. Several criteria are considered for generating effect modifiers and their performance is studied via simulations. An example from a RCT is provided for illustration.

Keywords: Biosignature, Moderator, Precision medicine, Treatment decision, Value

1. Introduction

Precision medicine focuses on making treatment decisions for an individual patient based on the patient’s measures (e.g., clinical and biological features). The idea underlies a long history of attempts to identify characteristics that exhibit interaction with treatment assignment in a regression model for the outcome of interest. Such baseline characteristics, called “treatment effect modifiers”, indicate that the outcome under one treatment compared to another treatment depends on these characteristics. Measures with such interactions can aid decisions about which treatment to prescribe (Gail and Simon, 1985; Wellek, 1997; Song and Pepe, 2004; Wang and Ware, 2013).

Interest in precision medicine is growing rapidly, both in clinical research and in statistical methodology. An important component of precision medicine is the notion of an “optimal treatment regime”, first formalized by Murphy (2003) and Robins (2004). Given a vector Inline graphic of baseline covariates, a treatment decision can be based a decision function that maps to a treatment indicator, say . Treatment decisions can be compared using the “value” of a decision , denoted . The value of a decision is the expected value of an outcome variable (with respect to the joint distribution of Inline graphic ) when all patients are treated according to a decision function and Qian and Murphy (2011) show that the value can be expressed as

V (d) = E [E (y | x, A = d (x))],

(1.1)

where Inline graphic is the outcome of a patient given treatment with covariates . Here we consider outcome variables that are continuous, where higher values of are preferred, as per convention. Determining optimal individual treatment decisions using data from RCTs is a topic that is the subject of active research (see Robins and others, 2008; Zhao and others, 2012; Zhang and others, 2012b; Kang and others, 2014; Zhao and others, 2015, among others). The “optimal treatment decision” is the one that, when applied to the target population, has the largest value.

It has long been recognized that features that are important for predicting outcome might not be necessarily be useful for making treatment decisions (e.g., Wellek, 1997; Song and Pepe, 2004). Much recent research has focused on identification of individual baseline covariates related to the treatment effect (i.e., variables that exhibit interactions with the treatment indicator in predicting treatment outcome) in contrast to being important in the baseline model. A major challenge in precision medicine is that most baseline measures typically have small moderating effects and individually contribute little to informed treatment decisions. Unconstrained regression models with Inline graphic predictors (plus treatment and predictor- by-treatment interactions) become unwieldy, unstable and difficult to interpret when is moderate to large. Various strategies have been proposed to deal with the problem (see Qian and Murphy, 2011; Gunter and others, 2011; Lu and others, 2011, among others). Extensions of the methodology that allow functional data objects to be incorporated as baseline features have also been developed (e.g., McKeague and Qian, 2014; Ciarleglio and others, 2015).

A parsimonious alternative to these previous methods that has received little attention is to use a simple model with only a single “composite” predictor. Herein, a methodology is developed for combining several baseline predictors into a single treatment effect modifier in the context of the classic linear model, which we call a generated effect modifier (GEM). Given a vector of Inline graphic predictors , we consider linear combinations of the predictors for as potential GEMs. The idea of combining covariates was proposed by Tukey (1991, 1993) for balancing and increasing the precision of the estimates of treatment effect in RCTs. A closely related approach was proposed by Tian and Tibshirani (2011) who developed a method of constructing binary “markers” from continuous variables (via cut-off values) and forming an index to detect treatment–marker interactions. Emura and others (2012) introduced a compound covariate approach for predicting survival time in the case when there are too many covariates, for example, gene expression data. In contrast to this work, we propose to combine covariates with the goal of obtaining a single moderating variable, a GEM, that would aid in deciding which treatment is appropriate for any particular patient. Although the GEM model is more restrictive than an unconstrained model, it provides a parsimonious single index approach for making individualized treatment decisions.

Alternative approaches to optimal treatment decision estimation have been proposed that fall in the realm of machine learning and can often be framed in the context of classification problems (Zhang and others, 2012a). Examples are the outcome weighted learning (OWL) (e.g., Zhao and others, 2015; Song and others, 2015) based on support vector machines, tree-based classification (e.g. Laber and Zhao, 2015), and the Kang and others (2014) method based on adaptive boosting. Although these approaches can be appealing options in many settings, we base our general approach on the linear model as it is most frequently utilized in practice and lends itself very well to interpretability. This paper fulfills the practical need of providing a simple treatment effect modifier methodology in the classic linear model setting for making precision medicine decisions. Also, the GEM approach provides the benefit of a visual presentation that is familiar to clinicians.

In efficacy studies, after the primary analysis of treatment efficacy has been performed, the usual practice is to seek individual effect modifiers (single patient baseline characteristics) with the ultimate goal of informing treatment decisions. When no single variable has a strong modifying effect, the GEM is an appealing and novel approach for secondary exploratory analysis to find a strong treatment effect modifier. The GEM can be particularly useful for analysis of studies designed to discover biosignatures for treatment response.

2. Criteria for choosing a GEM

Here we introduce several optimality criteria for defining a GEM Inline graphic . For notational simplicity, we present the model in terms of the centered (at zero within treatment group) outcomes and predictors matrix . The unrestricted linear model for the treatment groups is

E (y_{k} | X_{k}) = X_{k} β_{k}, with β_{k} = (β_{k 1}, \dots, β_{k p})^{'}, for k = 1, \dots, K,

(2.2)

while the GEM model under consideration can be parameterized as

(\begin{matrix} y_{1} \\ y_{2} \\ ⋮ \\ y_{K} \end{matrix}) = (\begin{matrix} X_{1} & 0 & \dots & 0 \\ 0 & X_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & X_{K} \end{matrix}) γ \otimes α + (\begin{matrix} ϵ_{1} \\ ϵ_{2} \\ ⋮ \\ ϵ_{K} \end{matrix}) ​,

(2.3)

where Inline graphic denotes the Kronecker product. The vector is the vector of the scaling coefficients for the GEM model (2.3). Because the predictors might be measured on different scales, a natural constraint that ensures identifiability is that the GEM has a unit variance constraint

α^{'} Ψ_{x} α = 1,

(2.4)

where Inline graphic denotes the predictor covariance matrix (assumed equal across treatment groups as in a RCT). An unrestricted multiple regression model for treatment groups (e.g. model (2.2)) with predictors and all interactions between treatment indicators and predictors, has regression coefficients (not counting intercepts), whereas the restricted GEM model (2.3) is more parsimonious with only Inline graphic parameters (constraint (2.4) reduces the number of free parameters in by one). Model (2.3) was considered by Follmann and Proschan (1999), but from a different perspective, where the vectors of regression coefficients from (2.2) are all equal under the null hypothesis and (2.3) is the alternative hypothesis model. In addition to being more parsimonious and providing an intuitive interpretation with easy visualization, GEMs can also be used for making straightforward treatment decisions. When Inline graphic , for a new subject with covariates , the estimated treatment decision based on an unrestricted regression model is where is an indicator function and and are the least squares (LS) estimates of the regression coefficients of model (2.2) written for the uncentered outcomes and predictors. Under a GEM model, the treatment decision is Inline graphic where are the LS estimates of the scaling coefficients in model (2.3) for non-centered outcomes and predictors.

Since the GEM is defined as a linear combination of predictors, the GEM model lends itself most naturally to continuous predictors. In the results that follow, there is nothing that precludes the use of discrete predictors; only care must be taken in how discrete predictors are coded and how the corresponding GEM is to be interpreted. It is very common in clinical practice that categorical variables are actually discretized versions of continuous variables. If this is the case, we recommend that the original variable is used in the GEM instead of its discretized version.

There are several principled criteria one can use for choosing Inline graphic for optimizing the GEM. A natural choice obviously, in terms of moderator analysis, is to maximize the magnitude of interaction in the GEM model. Alternatively, can be choosen to provide the best fit to the data using a GEM model which is consistent with the classic goal in linear models of minimizing the error sum of squares. A third approach, also consistent with the linear model framework, is to determine Inline graphic that maximizes the statistical significance of the interaction effects via an -test. Summarizing, we consider the following three criteria, which we refer to as the “numerator” (N), “denominator” (D) and “-ratio” (F) criteria, respectively:

(N)Maximizing the interaction effect: Maximize the variability in the GEM scaling coefficients ’s in (2.3), corresponding to maximizing the Numerator of an -test for significance of interaction effects. When there are treatment groups, this is the same as maximizing the squared difference between the scaling coefficients and in the GEM model.
(D)Fidelity to the data: Minimize the sum of squared residuals in the GEM model (2.3). This corresponds to the Denominator of an -test for significance of interaction effects.
()-ratio: Combine the first two criteria and maximize the ratio of the variability of the GEM scaling coefficients relative to the sum of squared residuals for the GEM model. This criterion corresponds to choosing to maximize the -test statistic when testing significance of interactions in the GEM model.

The method of LS is used to estimate the parameters of models (2.2) and (2.3). The common covariance matrix Inline graphic can be estimated by the pooled estimate of the predictor covariance matrix:

{\hat{Ψ}}_{x} = \sum_{k = 1}^{K} X_{k}^{'} X_{k} / (N - K),

(2.5)

where Inline graphic , where is the sample size in group . The following notation will be used: let denote the vector of covariances between and the and denote the variance of in the th group. Then the usual unconstrained vector of slope coefficients in the th treatment group in terms of population parameters and the weighted average coefficient vector are respectively

β_{k} = Ψ_{x}^{- 1} Ψ_{x y_{k}} and \bar{β} = \sum_{k = 1}^{K} π_{k} β_{k} .

(2.6)

With a randomized experiment, equal weights ( Inline graphic ) are used for and that is the convention used in this article (although more flexible choices for weights are also possible). The GEM scaling coefficients in (2.3) can be expressed equivalently, using (2.4), as

γ_{k} = \frac{cov (X_{k} α, y_{k})}{var (X_{k} α)} = \frac{α^{'} Ψ_{x y_{k}}}{α^{'} Ψ_{x} α} = \frac{α^{'} Ψ_{x} Ψ_{x}^{- 1} Ψ_{x y_{k}}}{α^{'} Ψ_{x} α} = α^{'} Ψ_{x} β_{k} .

2.1. The “numerator” criterion: maximizing the interaction effect

This section derives the expression for Inline graphic in the GEM model that maximizes the variance of a discrete random variable taking values with respective probabilities (i.e., the variance of the GEM slopes) which is given by

\sum_{k = 1}^{K} π_{k} {(\frac{α^{'} Ψ_{x} (β_{k} - \bar{β})}{α^{'} Ψ_{x} α})}^{2} = \frac{α^{'} Ψ_{x} [\sum_{k = 1}^{K} π_{k} (β_{k} - \bar{β}) (β_{k} - \bar{β})^{'}] Ψ_{x} α}{(α^{'} Ψ_{x} α)^{2}} .

(2.7)

Denote the “between” group covariance matrix for the unconstrained slope coefficients as

B = \sum_{k = 1}^{K} π_{k} (β_{k} - \bar{β}) (β_{k} - \bar{β})^{'} .

(2.8)

Using (2.4), we seek Inline graphic that maximizes where is the symmetric square-root of . The solution is where is the eigenvector of that is associated with the largest eigenvalue. To obtain an estimator , we can apply the plug-in principal, use the pooled estimator from (2.5) and the usual unrestricted LS estimators Inline graphic in place of the ’s. The GEM ’s and intercepts can be estimated via LS.

In the case of Inline graphic groups, is a rank one matrix with eigenvector proportional to , in which case

α^{N} = \frac{β_{1} - β_{2}}{\sqrt{(β_{1} - β_{2})^{'} Ψ_{x} (β_{1} - β_{2})}} .

(2.9)

Section 1.1 of the supplementary material shows that for Inline graphic , in terms of population parameters, the treatment decision based on the unrestricted regression is equivalent to the treatment decision based on the numerator GEM model. Minor differences in the empirical decision rules from these two methods are due to differences in the LS estimates using the GEM predictor versus using the original predictors in the unrestricted model.

2.2. The “denominator” criterion: minimizing the residual error

This subsection gives the LS expression for Inline graphic that minimizes the sum of squared residuals in a GEM model, that is, that provides the best fitting GEM model. Under the assumption of normality, the LS estimator coincides with the maximum likelihood estimator in the GEM linear model.

The sum of squared residuals from a standard linear model using LS can be written as Inline graphic where is the hat matrix and is an identity matrix. This sum of squared residuals (when divided by its associated degrees of freedom) is an estimate of the quantity In the GEM model (2.3) with treatment arms, the hat matrix in the th group is Letting Section 1.2 of the supplementary material available at Biostatistics online shows that the Inline graphic minimizing the “denominator” criterion is given by where is the leading eigenvector of . As before, can be estimated by plugging in the LS estimators for in the expression for and using the sample covariance matrix of the pooled predictors (2.5) to estimate .

2.3. The “-criterion”: maximizing the -statistic

This section determines Inline graphic that maximizes the strength of the statistical evidence for the interaction effect in the GEM model (2.3) via an -test. With , we can consider the general linear hypothesis of If and , the null hypothesis above states that the two groups have the same coefficients with respect to the GEM Inline graphic (i.e., no interaction). Thus, the goal is to determine that maximizes the -ratio for testing . From the two previous subsections, the -ratio is proportional to the ratio with (2.7) in the numerator and a denominator corresponding to the residual sum-of-squares. The value of satisfying the “ Inline graphic -ratio” criterion is where is the leading eigenvector of

{[(\sum_{k = 1}^{K} π_{k} σ_{y_{k}}^{2} I_{p}) - Ψ_{x}^{1 / 2} D Ψ_{x}^{1 / 2}]}^{- 1} Ψ_{x}^{1 / 2} B Ψ_{x}^{1 / 2} .

(2.10)

The derivation is in Section 1.3 of the supplementary material. Once again, Inline graphic can be estimated by plugging parameter estimates into (2.10) and extracting the leading eigenvector.

3. Fitting a GEM when the GEM model is misspecified

The GEM model allows us to combine several predictors into a single linear combination that has good treatment effect moderator properties. Generally, we do not expect the GEM model to be the true data generating model and (based on the above expressions), the “true” Inline graphic for the three criteria would differ. Consider two cases with groups and predictors from a Gaussian distribution with means , variances 1 and 2, respectively, and a covariance 0.2:

Case 1: β_{1} = (\begin{array}{r} - 0.4 \\ 2.0 \end{array}), β_{2} = (\begin{array}{r} - 0.6 \\ 2.5 \end{array}); Case 2: β_{1} = (\begin{array}{r} 1.5 \\ 2.5 \end{array}), β_{2} = (\begin{array}{r} - 2.5 \\ 2.5 \end{array}) .

The deviation from a GEM model is measured by the angle Inline graphic between the coefficient vectors : . In Case 1, , and in Case 2, , so Case 1 is very “close” to a GEM model (), while Case 2 is almost as far away from GEM as possible (). The “true” ’s are:

Case 1: \begin{matrix} α^{N} & = & (0.283, - 0.707)^{'} \\ α^{D} & = & (0.160, - 0.714)^{'} \\ α^{F} & = & (0.160, - 0.714)^{'} \end{matrix} Case 2: \begin{matrix} α^{N} & = & (1.000, 0.000)^{'} \\ α^{D} & = & (0.143, - 0.714)^{'} \\ α^{F} & = & (1.000, 0.000)^{'} \end{matrix} .

From (2.10), Inline graphic depends on the error variance; the results above are for a coefficient of determination . As expected, the is closer to when the data is from a GEM model since the GEM regression fits the data well in this case, while when the model is far from a GEM model, is closer to . This observation together with results from simulations suggest the use of the Inline graphic -criterion in practice.

4. Permutation testing for the interaction in a GEM model

The GEM model estimation seeks to determine a linear combination of predictors that maximizes the evidence of an interaction effect using one of the three criteria described above. If there are no interaction effects between predictors and treatment indicators, then the GEM procedure would tend to generate anti-conservative Inline graphic -values. A straightforward remedy to this problem is to fit the GEM model on many data sets with permuted treatment labels. A permutation -value for testing an interaction effect can then be calculated as

Permutation p value = {Proportion of “permuted” p values < original p value} .

Theoretical details for using permutation tests for interaction effects in the presence of possible main effects have been investigated previously in the literature (e.g., Wang and others, 2015, p. 2046).

5. Simulation studies

An appealing feature of the GEM model is its utility for making individual treatment decisions, especially when Inline graphic is large. In this subsection we investigate the value (1.1) of treatment decisions based on the three GEM criteria for both GEM and non-GEM generating models. Data sets were simulated under a variety of parameter settings. We varied the coefficient of determination to be small (0.2), medium (0.5), and large (0.8). Another useful measure in the “effect size” (ES) of a moderator. For a regression model Inline graphic , with and a treatment indicator (), the ES (Kraemer, 2013) of as an effect modifier is the proportion of the outcome variance (after removing the variance due to treatment) that is explained by the different relationships between and in the two treatment groups, that is,

E S = \sqrt{\frac{(γ_{3} / 2)^{2}}{(γ_{2} + γ_{3} / 2)^{2} + (γ_{3} / 2)^{2} + σ^{2}}},

(5.11)

where Inline graphic is the error variance (assuming equal error variances for all values of ). The simulations are similar for the GEM and non-GEM settings, except that the GEM models are characterized with respect to the effect size of (using ES = 0.1 and 0.3), while the non-GEM cases are characterized with respect to the angle between the vectors of regression coefficients as described in Section 3; we use a small ( Inline graphic ) and a large () deviation from GEM.

The sample sizes per treatment group considered are Inline graphic , mimicking typical situations in medical research. For each sample size, the number of predictors used were and (except when , namely and ). The predictors are generated from -variate normal distributions with mean zero and variances equal to 1, and small pairwise correlations (from Inline graphic 0.2 to 0.2) randomly selected, while ensuring a positive definite correlation matrix. For each , . Under GEM, is computed to satisfy the respective and . Under non-GEM, is obtained by adding random noise to the coefficients in and computing the angle between and . More details about the values of Inline graphic are given in Section 3.2 of the supplementary material. For each combination of and the ’s (), a large sample () is generated with known outcome values under both treatments and it is used to evaluate the “true” optimal population average outcome , which is the highest achievable value of any decision.

For each simulation configuration ( Inline graphic , and ES), data sets are generated and estimates of are computed, as well as and coefficients of the unrestricted regression model (2.2). These estimates are used to define treatment decisions as described in Section 2. These decisions are applied to the cases in the large data set to obtain the estimated values Inline graphic of the respective decisions , , and . For the sake of comparison, these values are expressed as a proportion of the “true” optimal average outcome , and also taking into account the the worst average outcome , which is obtained by choosing the worst (lower) outcome for each subject in the large data set. For example, the values of the treatment decision based on the “numerator” GEM approach are reported as Inline graphic .

Figure 1 shows the means and the 95% Monte Carlo (MC) confidence intervals for the value of the decisions in the case of data generation from GEM models. A general observation is that for small ES of the GEM, the estimated decisions produce values that are about 10-20% lower than the “true” optimal value Inline graphic for and still lower for . How much worse the estimated decisions are compared with the “true” optimal average population outcome depends on the sample size and (performance improves with increasing sample size and ). The “denominator” method is superior to the other two approaches, especially for larger Inline graphic ’s and smaller ES’s, which is not be surprising since the denominator criterion is equivalent to the MLE objective when the error is normal and the true model is a GEM, as is the case here.

Inline graphic — GEM data generation model. Mean and 95% Monte Carlo (MC) confidence intervals (based on the MC runs) of the values of the decisions, as a proportion , for (left half of panels) and 200 (right half of panels), and for ES = 0.1 (top half of panels) and ES=0.3 (bottom half of panels). The three panels per (, ES) combination correspond to on the left, in the middle and on the right. The method based on the unrestricted regression and the three GEM approaches are denoted as: (i) unrestricted—red color, most left; (ii) numerator criteria—green, second from left; (iii) denominator criterion—blue, third from left; (iv) criterion—purple, most right. The “Number of observations” on the bottom horizontal axis is the sample size per group.

Figure 2 presents information similar to that on Figure 1, but here the data are generated under a general linear non-GEM model (2.2). It shows that even when the data is not generated from a GEM model, the criteria perform quite well for relatively small number of covariates Inline graphic . For larger , larger sample sizes and larger are needed to achieve good performance. The values of the decisions based on the denominator criterion are meaningfully inferior to the values of the decisions from the other methods as the deviation from GEM becomes large. The denominator’s inferiority becomes more pronounced as Inline graphic , and increase. Regardless of the data generating model, the values produced by the method are either the best or very close to the best values produced by either of the other methods compared here. Additionally, simulations were run using the non-GEM generating model except that a subset of predictors were discretized to be binary (5 out of 10 for Inline graphic and 20 out of 200 when ); the results are very similar to those when all predictors are continuous—details are provided in the supplementary material.

Section 4 of the supplementary material available at Biostatistics online presents results on the performance of the GEM methods in the case when the data generation is not from the linear model (2.2). There we show simulation results based on a doubly-robust estimation procedure using an augmented inverse probability weighted estimator (AIPWE) of the value Inline graphic (Robins and others, 1994; Zhang and others, 2012b). Although the GEM approach based on the AIPWE does marginally worse than the unrestricted approach described in Zhang and others (2012b) using an example with predictors, their approach becomes computationally infeasible for larger values of Inline graphic . In cases with large , the GEM reduces the dimensionality of the predictor space to making the AIPWE approach fast and feasible.

6. Application to data from a RCT

We illustrate the three GEM procedures using data from a RCT for the treatment of depression comparing antidepressants of the class of selective serotonin reuptake inhibitors (SSRI) against placebo. In addition to establishing the overall efficacy of the SSRI, the investigators were interested in finding biosignatures for SSRI treatment response. The investigators defined “biosignature” as a baseline patient characteristic or a combination of such characteristics, that constitutes a moderator of the treatment effect of SSRI vs. placebo.

Data from 76 and 72 subjects randomized to placebo and SSRI, respectively, were available. The outcome was the change from baseline (week 0) to 8 week of treatment on the Hamilton Rating Scale for Depression (HRSD). High values of HRSD indicate higher depression severity and thus positive change (week 0–week 8) indicate reduction of depression. The following baseline clinical measures were proposed as potential moderators: (i) level of anxiety (ii) severity of anger attack; (iii) suicidal risk; (iv) medical comorbidity score; and (v) experience of pleasure score.

Outcome was modeled as a linear function of a baseline measure, treatment indicator (SSRI Inline graphic vs. placebo ) and the interaction between them for each measure individually. None of the interaction terms were statistically significant, see Table 1. A comparison of a full unrestricted model with all five predictors and their interactions with treatment against a reduced model without the interactions, yielded a non-significant Inline graphic -test for the interactions (). Thus, the usual approaches of treating each predictor separately or a full unrestricted model for all predictors fail to find evidence for heterogeneous effect of SSRI and consequently fails to identify patients who stand to benefit from or be harmed by it.

Table 1.

SSRI Clinical biosignature: potential moderators of the efficacy of treatment with SSRI vs. placebo with respect to change in HRSD from baseline to week 8. The 3rd column gives the Inline graphic -values for the interaction predictor-by-treatment term and the 4th column gives the effect size of the predictor as a moderator of treatment effect from a regression model with only that variable as a predictor in addition to treatment. The last two columns give the regression coefficients from models with all five baseline measures as predictors for treatment Inline graphic (placebo) and (SSRI) respectively

	Mean	St. dev.	Interaction	Effect	Reg. coefs
			p value	size
Anxiety	5.36	1.80	0.797	0.020	1.06	1.44
Anger attack	3.05	2.12	0.671	0.034	0.59	0.09
Suicide risk	5.42	2.37	0.155	0.113	1.00	0.38
Medical comorbidity score	2.01	2.78	0.092	0.140	0.11	0.58
Life pleasure score	33.17	5.51	0.065	0.148	0.20	0.04

Open in a new tab

Next, the linear combinations Inline graphic for the 3 GEM criteria were estimated, see Table 2. The numerator and -criteria give similar results, but only the -criterion has a statistically significant permutation value (). Note, that the effect sizes for the GEMs based on the numerator and the -criterion (which are very similar, both Inline graphic ), are double that of any individual predictor. The denominator GEM, on the other hand, does not produce a significant interaction value (and also has a very small estimated ES), which is consistent with the observation that, since the angle between the unrestricted regression coefficient vectors is relatively large ( Inline graphic ), the model deviates quite a bit from a true GEM model.

Table 2.

GEM Model for SSRI clinical biosignature. The estimated GEMs of the SSRI treatment effect on change in HRSD. The bottom rows give the GEM effect sizes (row 6), permutation-adjusted Inline graphic -values (row 7); the estimated value (1.1) of the decision based on GEM criteria along with a 95% cross-validated bootstrap confidence interval (CI) (row 8); the difference in value and 95% cross-validated bootstrap CI for the difference between the decision based on the respective GEM and the decision (i) give everyone placebo (row 9), (ii) give everyone SSRI (row 10), and (iii) give everyone SSRI or placebo at random (row 11).

	Estimated

Anxiety	0.12	0.55	0.12
Anger attack	0.15	0.15	0.15
Suicide risk	0.42	0.14	0.42
Medical comorbidity score	0.21	0.10	0.21
Life pleasure score	0.07	-0.04	0.07
Effect size	0.27	0.01	0.27
Permutation -value	0.061	0.895	0.048
Value of GEM	8.03	7.60	8.03
(95% CI)	(6.28, 9.78)	(5.62, 9.43)	(6.21, 9.68)
Value of GEM Value of placebo	2.02	1.57	2.00
(95% CI)	(1.97, 2.06)	(1.52, 1.62)	(1.96, 2.05)
Value of GEM Value of SSRI	0.52	0.07	0.50
(95% CI)	(0.48, 0.55)	(0.04, 0.10)	(0.46, 0.54)
Value of GEM Value of random	1.29	0.84	1.27
(95% CI)	(1.25, 1.32)	(0.80, 0.87)	(1.24, 1.31)

Open in a new tab

For the sake of comparison, estimates of the value for the three GEM criteria were obtained using an Inverse Probability Weighted Estimator (IPWE) Inline graphic where if the treatment assignment and treatment decision coincide for subject with covariates . Here, is the probability of treatment assignment, which will be a constant for a RCT and is 0.5 in this example. Row 8 of Table 2 gives a 95% cross-validation bootstrap confidence interval (using 1000 bootstrap samples) for the value of each GEM criterion. The CIs were computed using a 10-fold cross-validation on each bootstrap sample, where treatment decisions were estimated by applying the respective GEM approach to 9 of 10 non-overlapping subsamples of equal size, and then applied to the remaining 10th subsample to obtain an estimate of the value of the treatment decision and finally averaging those estimates across the 10 folds of the cross-validation. As Table 2 shows, the Inline graphic and numerator approaches produce very similar bootstrap confidence intervals for the value of the decision, while the denominator criterion results in a lower decision value that has a wider 95% CI. The last three rows of Table 2 show the differences between the values of the treatment decisions derived from each the three GEM approaches and the value of three commonly used comparison decisions (i) give everyone placebo; (ii) give everyone SSRI; and (iii) give placebo and SSRI at random estimated by the same cross-validation approach based on 1000 bootstrap samples.

The results from the GEM approaches are visually presented in Figure 3. The GEM analysis using the Inline graphic -ratio criterion (similar to the numerator criterion) results in the conclusion that 30.4% of the target population (to the left of the vertical lines at GEM ) does not benefit from SSRI treatment. The decision based on the GEM could be not to prescribe SSRI to those subjects with ; alternatively, one might choose to give SSRI only to patients with a Inline graphic scores in the range where the 95% CIs for placebo and SSRI GEM regressions do not overlap, that, GEM. These results are consistent with the fact that many antidepressant trials fail to show efficacy, or show only small benefits, for example, about 25–30% difference in response rates of the antidepressants vs. placebo (60–65% vs. 30–35% respectively).

7. Discussion

This article has shown how to combine several baseline characteristics into a single generated effect moderator in the context of the classic linear model. Closed-form expressions have been derived for these GEMs that do not require complex iterative computations. The GEM offers a straightforward approach to determine beneficial treatments for patients. From this perspective, GEMs can be viewed as indices for treatment decisions. Of the three criteria, we generally recommend the Inline graphic -criterion, because it simultaneously maximizes the interaction effect (the numerator) and also minimizes the prediction error (denominator) in the class of GEM models. Additionally, from our results, the -criterion’s performance is either optimal or very close to optimal with respect to making rules for treatment decisions with highest values.

In practice, after conducting the main hypotheses testing in efficacy studies, investigators attempt to discover baseline patient features that moderate the effect of treatment. Given that (if present) variables with large moderating effects of treatments for most illnesses have already been discovered, it is not surprising that researchers regularly fail to discover other moderators in studies where the primary goal is to establish efficacy. The proposed methods show that combining patient characteristics with little to no moderating effects of a treatment can result in a strong treatment effect modifier that can help with making treatment decisions. Of course, any treatment decision has to be validated in properly designed studies; for example, a 3-arm RCT where the experimental treatment, the control treatment and treatment according to the investigated treatment decision are compared. The proposed methodology is expected to be of particular utility in studies specifically designed to discover biosignatures for response to treatment, as discussed in the Introduction.

Several generalizations of the GEM procedure are currently under development, such as extending the GEM to generalized linear models and longitudinal outcomes. Work is also underway to allow the outcome to depend on nonparametric functions of GEMs, similar to generalized additive models. It will be useful to compare the linear GEM model developed here and a more flexible nonparametric GEM model to other methods for precision medicine for providing guidance in making treatment decisions.

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org

Acknowledgements

The authors are thankful to the editors and three reviewers whose feedback has greatly improved this article. Conflict of interest: None declared.

Funding

National Institutes of Health grant R01 MH099003.

References

Ciarleglio A., Petkova E., Tarpey T. and Ogden R. T. (2015). Treatment decisions based on scalar and functional baseline covariates. Biometrics 71, 884–894. [DOI] [PMC free article] [PubMed] [Google Scholar]
Emura T., Chen, Y.-H. and Chen, H.-Y. (2012). Survival prediction based on compound covariate under cox proportional hazards models. PLoS One 7, 247627. [DOI] [PMC free article] [PubMed] [Google Scholar]
Follmann D. A. and Proschan M. A. (1999). A multivariate test of interaction for use in clinical trials. Biometrics 55, 1151–1155. [DOI] [PubMed] [Google Scholar]
Gail M. and Simon R. (1985). Testing for qualitative interactions between treatment effects and patient subsets. Biometrics 41, 361–372. [PubMed] [Google Scholar]
Gunter L., Zhu J. and Murphy S. A. (2011). Variable selection for qualitative interactions in presonalized medicine while controlling the family-wise error rate. Journal of Biopharmaceutical Statistics 21, 1063–1078. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kang C., Janes H. and Huang Y. (2014). Combining biomarkers to optimize patient treatment recommendations. Biometrics 70, 695–707. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kraemer H. C. (2013). Discovering, comparing, and combining moderators of treatment on outcome after randomized clinical trials: a parametric approach. Statistics in Medicine 32, 1964–1973. [DOI] [PubMed] [Google Scholar]
Laber E. B. and Zhao Y.-Q. (2015). Tree-based methods for individualized treatment regimes. Biometrika 102, 501–514. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lu W., Zhang H. H. and Zeng D. (2011). Variable selection for optimal treatment decision. Statistical Methods in Medical Research 22, 493–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
McKeague I. W. and Qian M. (2014). Estimation of treatment policies based on functional predictors. Statistica Sinica 24, 1461–1485. [DOI] [PMC free article] [PubMed] [Google Scholar]
Murphy S. A. (2003). Optimal dynamic treatment regimes (with discussion). Journal of the Royal Statistical Society, Series B 58, 331–366. [Google Scholar]
Qian M. and Murphy S. (2011). Performance guarantees for individualized treatment rules. Annals of Statistics 39, 1180–1210. [DOI] [PMC free article] [PubMed] [Google Scholar]
Robins J. M. (2004). Optimal structured nested models for optimal sequential decisions. In: Heagerty P. J. and Lina D. Y.. (editors) Proceedings of the Second Seattle Symposium on Biostatistics. New York: Springer, pp. 189–326. [Google Scholar]
Robins J., Orellana L. and Rotnizky A. (2008). Estimation and extrapolation of optimal treatment and testing strategies. Statistics in Medicine 27, 4678–4721. [DOI] [PubMed] [Google Scholar]
Robins J. M., Rotnitzky A. and Zhao L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89, 846–866. [Google Scholar]
Song R., Kosorok M., Zeng D., Zhao Y., Laber E. B. and Yuan M. (2015). On sparse representation for optimal individualized treatment selection with penalized outcome weighted learning. Stat 4, 59–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
Song X. and Pepe M. S. (2004). Evaluating markers for selecting a patient’s treatment. Biometrics 60, 874–883. [DOI] [PubMed] [Google Scholar]
Tian L. and Tibshirani R. J. (2011). Adaptive index models for market-based risk stratification. Biostatistics 12, 68–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tukey J. W. (1991). Use of many covariates in clinical trials. International Statistical Review 59, 123–137. [Google Scholar]
Tukey J. W. (1993). Tightening the clinical trial. Controlled Clinical Trials 14, 266–285. [DOI] [PubMed] [Google Scholar]
Wang R., Schoenfeld D. A., Hoeppner B. and Evins A. E. (2015). Detecting treatment-covariate interactions using permutation methods. Statistics in Medicine 34, 2035–2047. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang R. and Ware J. H. (2013). Detecting moderator effects using subgroup analyses. Prevention Science 14, 111–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wellek S. (1997). Testing for absence of qualitative interactions between risk factors and treatment effect. Biometrical Journal 39, 809–821. [Google Scholar]
Zhang B., Tsiatis A. A., Davidian M., Zhang M. and Laber E. (2012a). Estimating optimal treatment regimes from classification perspective. Stat 1, 103–114. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang B., Tsiatis A. A., Laber E. B. and Davidian M. (2012b). A robust method for estimating optimal treatment regimes. Biometrics 68, 1010–1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao Y., Zeng D., Rush A. J. and Kosorok M. P. (2012). Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association 107, 1106–1118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao Y., Zheng D., Laber E. B. and Kosorrok M. R. (2015). New statistical learning methods for estimating optimal dynamic treatment regimes. Journal of the American Statistical Association 110, 583–598. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B1] Ciarleglio A., Petkova E., Tarpey T. and Ogden R. T. (2015). Treatment decisions based on scalar and functional baseline covariates. Biometrics 71, 884–894. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] Emura T., Chen, Y.-H. and Chen, H.-Y. (2012). Survival prediction based on compound covariate under cox proportional hazards models. PLoS One 7, 247627. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] Follmann D. A. and Proschan M. A. (1999). A multivariate test of interaction for use in clinical trials. Biometrics 55, 1151–1155. [DOI] [PubMed] [Google Scholar]

[B4] Gail M. and Simon R. (1985). Testing for qualitative interactions between treatment effects and patient subsets. Biometrics 41, 361–372. [PubMed] [Google Scholar]

[B5] Gunter L., Zhu J. and Murphy S. A. (2011). Variable selection for qualitative interactions in presonalized medicine while controlling the family-wise error rate. Journal of Biopharmaceutical Statistics 21, 1063–1078. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] Kang C., Janes H. and Huang Y. (2014). Combining biomarkers to optimize patient treatment recommendations. Biometrics 70, 695–707. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] Kraemer H. C. (2013). Discovering, comparing, and combining moderators of treatment on outcome after randomized clinical trials: a parametric approach. Statistics in Medicine 32, 1964–1973. [DOI] [PubMed] [Google Scholar]

[B8] Laber E. B. and Zhao Y.-Q. (2015). Tree-based methods for individualized treatment regimes. Biometrika 102, 501–514. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] Lu W., Zhang H. H. and Zeng D. (2011). Variable selection for optimal treatment decision. Statistical Methods in Medical Research 22, 493–504. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] McKeague I. W. and Qian M. (2014). Estimation of treatment policies based on functional predictors. Statistica Sinica 24, 1461–1485. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] Murphy S. A. (2003). Optimal dynamic treatment regimes (with discussion). Journal of the Royal Statistical Society, Series B 58, 331–366. [Google Scholar]

[B12] Qian M. and Murphy S. (2011). Performance guarantees for individualized treatment rules. Annals of Statistics 39, 1180–1210. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] Robins J. M. (2004). Optimal structured nested models for optimal sequential decisions. In: Heagerty P. J. and Lina D. Y.. (editors) Proceedings of the Second Seattle Symposium on Biostatistics. New York: Springer, pp. 189–326. [Google Scholar]

[B14] Robins J., Orellana L. and Rotnizky A. (2008). Estimation and extrapolation of optimal treatment and testing strategies. Statistics in Medicine 27, 4678–4721. [DOI] [PubMed] [Google Scholar]

[B15] Robins J. M., Rotnitzky A. and Zhao L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89, 846–866. [Google Scholar]

[B16] Song R., Kosorok M., Zeng D., Zhao Y., Laber E. B. and Yuan M. (2015). On sparse representation for optimal individualized treatment selection with penalized outcome weighted learning. Stat 4, 59–68. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] Song X. and Pepe M. S. (2004). Evaluating markers for selecting a patient’s treatment. Biometrics 60, 874–883. [DOI] [PubMed] [Google Scholar]

[B18] Tian L. and Tibshirani R. J. (2011). Adaptive index models for market-based risk stratification. Biostatistics 12, 68–86. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] Tukey J. W. (1991). Use of many covariates in clinical trials. International Statistical Review 59, 123–137. [Google Scholar]

[B20] Tukey J. W. (1993). Tightening the clinical trial. Controlled Clinical Trials 14, 266–285. [DOI] [PubMed] [Google Scholar]

[B21] Wang R., Schoenfeld D. A., Hoeppner B. and Evins A. E. (2015). Detecting treatment-covariate interactions using permutation methods. Statistics in Medicine 34, 2035–2047. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] Wang R. and Ware J. H. (2013). Detecting moderator effects using subgroup analyses. Prevention Science 14, 111–120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] Wellek S. (1997). Testing for absence of qualitative interactions between risk factors and treatment effect. Biometrical Journal 39, 809–821. [Google Scholar]

[B24] Zhang B., Tsiatis A. A., Davidian M., Zhang M. and Laber E. (2012a). Estimating optimal treatment regimes from classification perspective. Stat 1, 103–114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] Zhang B., Tsiatis A. A., Laber E. B. and Davidian M. (2012b). A robust method for estimating optimal treatment regimes. Biometrics 68, 1010–1018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] Zhao Y., Zeng D., Rush A. J. and Kosorok M. P. (2012). Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association 107, 1106–1118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] Zhao Y., Zheng D., Laber E. B. and Kosorrok M. R. (2015). New statistical learning methods for estimating optimal dynamic treatment regimes. Journal of the American Statistical Association 110, 583–598. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Generated effect modifiers (GEM’s) in randomized clinical trials

Eva Petkova

Thaddeus Tarpey

Zhe Su

R Todd Ogden

Abstract

1. Introduction

2. Criteria for choosing a GEM

2.1. The “numerator” criterion: maximizing the interaction effect

2.2. The “denominator” criterion: minimizing the residual error

2.3. The “-criterion”: maximizing the -statistic

3. Fitting a GEM when the GEM model is misspecified

4. Permutation testing for the interaction in a GEM model

5. Simulation studies

Figure 1.

Figure 2.

6. Application to data from a RCT

Table 1.

Table 2.

Figure 3.

7. Discussion

Supplementary material

Acknowledgements

Funding

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Generated effect modifiers (GEM’s) in randomized clinical trials

Eva Petkova

Thaddeus Tarpey

Zhe Su

R Todd Ogden

Abstract

1. Introduction

2. Criteria for choosing a GEM

2.1. The “numerator” criterion: maximizing the interaction effect

2.2. The “denominator” criterion: minimizing the residual error

2.3. The “-criterion”: maximizing the -statistic

3. Fitting a GEM when the GEM model is misspecified

4. Permutation testing for the interaction in a GEM model

5. Simulation studies

Figure 1.

Figure 2.

6. Application to data from a RCT

Table 1.

Table 2.

Figure 3.

7. Discussion

Supplementary material

Acknowledgements

Funding

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases