Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2016 Jul 27;18(1):105–118. doi: 10.1093/biostatistics/kxw035

Generated effect modifiers (GEM’s) in randomized clinical trials

Eva Petkova 1,*, Thaddeus Tarpey 2, Zhe Su 3, R Todd Ogden 4
PMCID: PMC5255046  PMID: 27465235

Abstract

In a randomized clinical trial (RCT), it is often of interest not only to estimate the effect of various treatments on the outcome, but also to determine whether any patient characteristic has a different relationship with the outcome, depending on treatment. In regression models for the outcome, if there is a non-zero interaction between treatment and a predictor, that predictor is called an “effect modifier”. Identification of such effect modifiers is crucial as we move towards precision medicine, that is, optimizing individual treatment assignment based on patient measurements assessed when presenting for treatment. In most settings, there will be several baseline predictor variables that could potentially modify the treatment effects. This article proposes optimal methods of constructing a composite variable (defined as a linear combination of pre-treatment patient characteristics) in order to generate an effect modifier in an RCT setting. Several criteria are considered for generating effect modifiers and their performance is studied via simulations. An example from a RCT is provided for illustration.

Keywords: Biosignature, Moderator, Precision medicine, Treatment decision, Value

1. Introduction

Precision medicine focuses on making treatment decisions for an individual patient based on the patient’s measures (e.g., clinical and biological features). The idea underlies a long history of attempts to identify characteristics that exhibit interaction with treatment assignment in a regression model for the outcome of interest. Such baseline characteristics, called “treatment effect modifiers”, indicate that the outcome under one treatment compared to another treatment depends on these characteristics. Measures with such interactions can aid decisions about which treatment to prescribe (Gail and Simon, 1985; Wellek, 1997; Song and Pepe, 2004; Wang and Ware, 2013).

Interest in precision medicine is growing rapidly, both in clinical research and in statistical methodology. An important component of precision medicine is the notion of an “optimal treatment regime”, first formalized by Murphy (2003) and Robins (2004). Given a vector Inline graphic of baseline covariates, a treatment decision can be based a decision function Inline graphic that maps Inline graphic to a treatment indicator, say Inline graphic. Treatment decisions can be compared using the “value” of a decision Inline graphic, denoted Inline graphic. The value of a decision is the expected value of an outcome variable Inline graphic (with respect to the joint distribution of Inline graphic) when all patients are treated according to a decision function Inline graphic and Qian and Murphy (2011) show that the value can be expressed as

V(d)=E[E(y|x,A=d(x))], (1.1)

where Inline graphic is the outcome of a patient given treatment Inline graphic with covariates Inline graphic. Here we consider outcome variables Inline graphic that are continuous, where higher values of Inline graphic are preferred, as per convention. Determining optimal individual treatment decisions using data from RCTs is a topic that is the subject of active research (see Robins and others, 2008; Zhao and others, 2012; Zhang and others, 2012b; Kang and others, 2014; Zhao and others, 2015, among others). The “optimal treatment decision” is the one that, when applied to the target population, has the largest value.

It has long been recognized that features that are important for predicting outcome might not be necessarily be useful for making treatment decisions (e.g., Wellek, 1997; Song and Pepe, 2004). Much recent research has focused on identification of individual baseline covariates related to the treatment effect (i.e., variables that exhibit interactions with the treatment indicator in predicting treatment outcome) in contrast to being important in the baseline model. A major challenge in precision medicine is that most baseline measures typically have small moderating effects and individually contribute little to informed treatment decisions. Unconstrained regression models with Inline graphic predictors (plus treatment and predictor- by-treatment interactions) become unwieldy, unstable and difficult to interpret when Inline graphic is moderate to large. Various strategies have been proposed to deal with the problem (see Qian and Murphy, 2011; Gunter and others, 2011; Lu and others, 2011, among others). Extensions of the methodology that allow functional data objects to be incorporated as baseline features have also been developed (e.g., McKeague and Qian, 2014; Ciarleglio and others, 2015).

A parsimonious alternative to these previous methods that has received little attention is to use a simple model with only a single “composite” predictor. Herein, a methodology is developed for combining several baseline predictors into a single treatment effect modifier in the context of the classic linear model, which we call a generated effect modifier (GEM). Given a vector of Inline graphic predictors Inline graphic, we consider linear combinations of the predictors Inline graphic for Inline graphic as potential GEMs. The idea of combining covariates was proposed by Tukey (1991, 1993) for balancing and increasing the precision of the estimates of treatment effect in RCTs. A closely related approach was proposed by Tian and Tibshirani (2011) who developed a method of constructing binary “markers” from continuous variables (via cut-off values) and forming an index to detect treatment–marker interactions. Emura and others (2012) introduced a compound covariate approach for predicting survival time in the case when there are too many covariates, for example, gene expression data. In contrast to this work, we propose to combine covariates with the goal of obtaining a single moderating variable, a GEM, that would aid in deciding which treatment is appropriate for any particular patient. Although the GEM model is more restrictive than an unconstrained model, it provides a parsimonious single index approach for making individualized treatment decisions.

Alternative approaches to optimal treatment decision estimation have been proposed that fall in the realm of machine learning and can often be framed in the context of classification problems (Zhang and others, 2012a). Examples are the outcome weighted learning (OWL) (e.g., Zhao and others, 2015; Song and others, 2015) based on support vector machines, tree-based classification (e.g. Laber and Zhao, 2015), and the Kang and others (2014) method based on adaptive boosting. Although these approaches can be appealing options in many settings, we base our general approach on the linear model as it is most frequently utilized in practice and lends itself very well to interpretability. This paper fulfills the practical need of providing a simple treatment effect modifier methodology in the classic linear model setting for making precision medicine decisions. Also, the GEM approach provides the benefit of a visual presentation that is familiar to clinicians.

In efficacy studies, after the primary analysis of treatment efficacy has been performed, the usual practice is to seek individual effect modifiers (single patient baseline characteristics) with the ultimate goal of informing treatment decisions. When no single variable has a strong modifying effect, the GEM is an appealing and novel approach for secondary exploratory analysis to find a strong treatment effect modifier. The GEM can be particularly useful for analysis of studies designed to discover biosignatures for treatment response.

2. Criteria for choosing a GEM

Here we introduce several optimality criteria for defining a GEM Inline graphic. For notational simplicity, we present the model in terms of the centered (at zero within treatment group) outcomes Inline graphic and predictors matrix Inline graphic. The unrestricted linear model for the Inline graphic treatment groups is

E(yk|Xk)=Xkβk, with βk=(βk1,,βkp), for k=1,,K, (2.2)

while the GEM model under consideration can be parameterized as

(y1y2yK)=(X1000X2000XK)γα+(ϵ1ϵ2ϵK), (2.3)

where Inline graphic denotes the Kronecker product. The vector Inline graphic is the vector of the scaling coefficients for the GEM model (2.3). Because the predictors might be measured on different scales, a natural constraint that ensures identifiability is that the GEM Inline graphic has a unit variance constraint

αΨxα=1, (2.4)

where Inline graphic denotes the predictor covariance matrix (assumed equal across treatment groups as in a RCT). An unrestricted multiple regression model for Inline graphic treatment groups (e.g. model (2.2)) with Inline graphic predictors and all interactions between treatment indicators and predictors, has Inline graphic regression coefficients (not counting intercepts), whereas the restricted GEM model (2.3) is more parsimonious with only Inline graphic parameters (constraint (2.4) reduces the number of free parameters in Inline graphic by one). Model (2.3) was considered by Follmann and Proschan (1999), but from a different perspective, where the vectors of regression coefficients Inline graphic from (2.2) are all equal under the null hypothesis and (2.3) is the alternative hypothesis model. In addition to being more parsimonious and providing an intuitive interpretation with easy visualization, GEMs can also be used for making straightforward treatment decisions. When Inline graphic, for a new subject with covariates Inline graphic, the estimated treatment decision based on an unrestricted regression model is Inline graphic where Inline graphic is an indicator function and Inline graphic and Inline graphic are the least squares (LS) estimates of the regression coefficients of model (2.2) written for the uncentered outcomes and predictors. Under a GEM model, the treatment decision is Inline graphic where Inline graphic are the LS estimates of the scaling coefficients in model (2.3) for non-centered outcomes and predictors.

Since the GEM is defined as a linear combination of predictors, the GEM model lends itself most naturally to continuous predictors. In the results that follow, there is nothing that precludes the use of discrete predictors; only care must be taken in how discrete predictors are coded and how the corresponding GEM is to be interpreted. It is very common in clinical practice that categorical variables are actually discretized versions of continuous variables. If this is the case, we recommend that the original variable is used in the GEM instead of its discretized version.

There are several principled criteria one can use for choosing Inline graphic for optimizing the GEM. A natural choice obviously, in terms of moderator analysis, is to maximize the magnitude of interaction in the GEM model. Alternatively, Inline graphic can be choosen to provide the best fit to the data using a GEM model which is consistent with the classic goal in linear models of minimizing the error sum of squares. A third approach, also consistent with the linear model framework, is to determine Inline graphic that maximizes the statistical significance of the interaction effects via an Inline graphic-test. Summarizing, we consider the following three criteria, which we refer to as the “numerator” (N), “denominator” (D) and “Inline graphic-ratio” (F) criteria, respectively:

  • (N)Maximizing the interaction effect: Maximize the variability in the GEM scaling coefficients Inline graphic’s in (2.3), corresponding to maximizing the Numerator of an Inline graphic-test for significance of interaction effects. When there are Inline graphic treatment groups, this is the same as maximizing the squared difference between the scaling coefficients Inline graphic and Inline graphic in the GEM model.

  • (D)Fidelity to the data: Minimize the sum of squared residuals in the GEM model (2.3). This corresponds to the Denominator of an Inline graphic-test for significance of interaction effects.

  • (Inline graphic)Inline graphic-ratio: Combine the first two criteria and maximize the ratio of the variability of the GEM scaling coefficients relative to the sum of squared residuals for the GEM model. This criterion corresponds to choosing Inline graphic to maximize the Inline graphic-test statistic when testing significance of interactions in the GEM model.

The method of LS is used to estimate the parameters of models (2.2) and (2.3). The common covariance matrix Inline graphic can be estimated by the pooled estimate of the predictor covariance matrix:

Ψ^x=k=1KXkXk/(NK), (2.5)

where Inline graphic, where Inline graphic is the sample size in group Inline graphic. The following notation will be used: let Inline graphic denote the vector of covariances between Inline graphic and the Inline graphic and Inline graphic denote the variance of Inline graphic in the Inline graphicth group. Then the usual unconstrained vector of slope coefficients in the Inline graphicth treatment group in terms of population parameters and the weighted average coefficient vector are respectively

βk=Ψx1Ψxykandβ¯=k=1Kπkβk. (2.6)

With a randomized experiment, equal weights (Inline graphic) are used for Inline graphic and that is the convention used in this article (although more flexible choices for weights are also possible). The GEM scaling coefficients Inline graphic in (2.3) can be expressed equivalently, using (2.4), as

γk=cov(Xkα,yk)var(Xkα)=αΨxykαΨxα=αΨxΨx1ΨxykαΨxα=αΨxβk.

2.1. The “numerator” criterion: maximizing the interaction effect

This section derives the expression for Inline graphic in the GEM model that maximizes the variance of a discrete random variable taking values Inline graphic with respective probabilities Inline graphic (i.e., the variance of the GEM slopes) which is given by

k=1Kπk(αΨx(βkβ¯)αΨxα)2=αΨx[k=1Kπk(βkβ¯)(βkβ¯)]Ψxα(αΨxα)2. (2.7)

Denote the “between” group covariance matrix for the unconstrained slope coefficients as

B=k=1Kπk(βkβ¯)(βkβ¯). (2.8)

Using (2.4), we seek Inline graphic that maximizes Inline graphic where Inline graphic is the symmetric square-root of Inline graphic. The solution is Inline graphic where Inline graphic is the eigenvector of Inline graphic that is associated with the largest eigenvalue. To obtain an estimator Inline graphic, we can apply the plug-in principal, use the pooled estimator Inline graphic from (2.5) and the usual unrestricted LS estimators Inline graphic in place of the Inline graphic’s. The GEM Inline graphic’s and intercepts can be estimated via LS.

In the case of Inline graphic groups, Inline graphic is a rank one matrix with eigenvector proportional to Inline graphic, in which case

αN=β1β2(β1β2)Ψx(β1β2). (2.9)

Section 1.1 of the supplementary material shows that for Inline graphic, in terms of population parameters, the treatment decision based on the unrestricted regression is equivalent to the treatment decision based on the numerator GEM model. Minor differences in the empirical decision rules from these two methods are due to differences in the LS estimates using the GEM predictor versus using the original predictors in the unrestricted model.

2.2. The “denominator” criterion: minimizing the residual error

This subsection gives the LS expression for Inline graphic that minimizes the sum of squared residuals in a GEM model, that is, that provides the best fitting GEM model. Under the assumption of normality, the LS estimator coincides with the maximum likelihood estimator in the GEM linear model.

The sum of squared residuals from a standard linear model using LS can be written as Inline graphic where Inline graphic is the hat matrix and Inline graphic is an identity matrix. This sum of squared residuals (when divided by its associated degrees of freedom) is an estimate of the quantity Inline graphic In the GEM model (2.3) with Inline graphic treatment arms, the hat matrix in the Inline graphicth group is Inline graphic Letting Inline graphic Section 1.2 of the supplementary material available at Biostatistics online shows that the Inline graphic minimizing the “denominator” criterion is given by Inline graphic where Inline graphic is the leading eigenvector of Inline graphic. As before, Inline graphic can be estimated by plugging in the LS estimators for Inline graphic in the expression for Inline graphic and using the sample covariance matrix of the pooled predictors (2.5) to estimate Inline graphic.

2.3. The “Inline graphic-criterion”: maximizing the Inline graphic-statistic

This section determines Inline graphic that maximizes the strength of the statistical evidence for the interaction effect in the GEM model (2.3) via an Inline graphic-test. With Inline graphic, we can consider the general linear hypothesis of Inline graphic If Inline graphic and Inline graphic, the null hypothesis above states that the two groups have the same coefficients with respect to the GEM Inline graphic (i.e., no interaction). Thus, the goal is to determine Inline graphic that maximizes the Inline graphic-ratio for testing Inline graphic. From the two previous subsections, the Inline graphic-ratio is proportional to the ratio with (2.7) in the numerator and a denominator corresponding to the residual sum-of-squares. The value of Inline graphic satisfying the “Inline graphic-ratio” criterion is Inline graphic where Inline graphic is the leading eigenvector of

[(k=1Kπkσyk2Ip)Ψx1/2DΨx1/2]1Ψx1/2BΨx1/2. (2.10)

The derivation is in Section 1.3 of the supplementary material. Once again, Inline graphic can be estimated by plugging parameter estimates into (2.10) and extracting the leading eigenvector.

3. Fitting a GEM when the GEM model is misspecified

The GEM model allows us to combine several predictors into a single linear combination that has good treatment effect moderator properties. Generally, we do not expect the GEM model to be the true data generating model and (based on the above expressions), the “true” Inline graphic for the three criteria would differ. Consider two cases with Inline graphic groups and Inline graphic predictors Inline graphic from a Gaussian distribution with means Inline graphic, variances 1 and 2, respectively, and a covariance 0.2:

Case 1: β1=(0.42.0),  β2=(0.62.5);Case 2: β1=(1.52.5),  β2=(2.52.5).

The deviation from a GEM model is measured by the angle Inline graphic between the coefficient vectors Inline graphic: Inline graphic. In Case 1, Inline graphic, and in Case 2, Inline graphic, so Case 1 is very “close” to a GEM model (Inline graphic), while Case 2 is almost as far away from GEM as possible (Inline graphic). The “true” Inline graphic’s are:

Case 1: αN=(0.283,0.707)αD=(0.160,0.714)αF=(0.160,0.714)Case 2: αN=(1.000,0.000)αD=(0.143,0.714)αF=(1.000,0.000).

From (2.10), Inline graphic depends on the error variance; the results above are for a coefficient of determination Inline graphic. As expected, the Inline graphic is closer to Inline graphic when the data is from a GEM model since the GEM regression fits the data well in this case, while when the model is far from a GEM model, Inline graphic is closer to Inline graphic. This observation together with results from simulations suggest the use of the Inline graphic-criterion in practice.

4. Permutation testing for the interaction in a GEM model

The GEM model estimation seeks to determine a linear combination of predictors that maximizes the evidence of an interaction effect using one of the three criteria described above. If there are no interaction effects between predictors and treatment indicators, then the GEM procedure would tend to generate anti-conservative Inline graphic-values. A straightforward remedy to this problem is to fit the GEM model on many data sets with permuted treatment labels. A permutation Inline graphic-value for testing an interaction effect can then be calculated as

Permutation p value ={Proportion of “permuted” p values < original p value}.

Theoretical details for using permutation tests for interaction effects in the presence of possible main effects have been investigated previously in the literature (e.g., Wang and others, 2015, p. 2046).

5. Simulation studies

An appealing feature of the GEM model is its utility for making individual treatment decisions, especially when Inline graphic is large. In this subsection we investigate the value (1.1) of treatment decisions based on the three GEM criteria for both GEM and non-GEM generating models. Data sets were simulated under a variety of parameter settings. We varied the coefficient of determination Inline graphic to be small (0.2), medium (0.5), and large (0.8). Another useful measure in the “effect size” (ES) of a moderator. For a regression model Inline graphic, with Inline graphic and a treatment indicator Inline graphic (Inline graphic), the ES (Kraemer, 2013) of Inline graphic as an effect modifier is the proportion of the outcome variance (after removing the variance due to treatment) that is explained by the different relationships between Inline graphic and Inline graphic in the two treatment groups, that is,

ES=(γ3/2)2(γ2+γ3/2)2+(γ3/2)2+σ2, (5.11)

where Inline graphic is the error variance (assuming equal error variances for all values of Inline graphic). The simulations are similar for the GEM and non-GEM settings, except that the GEM models are characterized with respect to the effect size of Inline graphic (using ES = 0.1 and 0.3), while the non-GEM cases are characterized with respect to the angle between the vectors of regression coefficients as described in Section 3; we use a small (Inline graphic) and a large (Inline graphic) deviation from GEM.

The sample sizes per treatment group considered are Inline graphic, mimicking typical situations in medical research. For each sample size, the number of predictors used were Inline graphic and Inline graphic (except when Inline graphic, namely Inline graphic and Inline graphic). The predictors are generated from Inline graphic-variate normal distributions with mean zero and variances equal to 1, and small pairwise correlations (from Inline graphic0.2 to 0.2) randomly selected, while ensuring a positive definite correlation matrix. For each Inline graphic, Inline graphic. Under GEM, Inline graphic is computed to satisfy the respective Inline graphic and Inline graphic. Under non-GEM, Inline graphic is obtained by adding random noise to the Inline graphic coefficients in Inline graphic and computing the angle Inline graphic between Inline graphic and Inline graphic. More details about the values of Inline graphic are given in Section 3.2 of the supplementary material. For each combination of Inline graphic and the Inline graphic’s (Inline graphic), a large sample (Inline graphic) is generated with known outcome values under both treatments and it is used to evaluate the “true” optimal population average outcome Inline graphic, which is the highest achievable value of any decision.

For each simulation configuration (Inline graphic, Inline graphic and ES), Inline graphic data sets are generated and estimates of Inline graphic are computed, as well as Inline graphic and Inline graphic coefficients of the unrestricted regression model (2.2). These estimates are used to define treatment decisions as described in Section 2. These decisions are applied to the Inline graphic cases in the large data set to obtain the estimated values Inline graphic of the respective decisions Inline graphic, Inline graphic, Inline graphic and Inline graphic. For the sake of comparison, these values are expressed as a proportion of the “true” optimal average outcome Inline graphic, and also taking into account the the worst average outcome Inline graphic, which is obtained by choosing the worst (lower) outcome for each subject in the large data set. For example, the values of the treatment decision based on the “numerator” GEM approach are reported as Inline graphic.

Figure 1 shows the means and the 95% Monte Carlo (MC) confidence intervals for the value of the decisions in the case of data generation from GEM models. A general observation is that for small ES of the GEM, the estimated decisions produce values that are about 10-20% lower than the “true” optimal value Inline graphic for Inline graphic and still lower for Inline graphic. How much worse the estimated decisions are compared with the “true” optimal average population outcome depends on the sample size and Inline graphic (performance improves with increasing sample size and Inline graphic). The “denominator” method is superior to the other two approaches, especially for larger Inline graphic’s and smaller ES’s, which is not be surprising since the denominator criterion is equivalent to the MLE objective when the error is normal and the true model is a GEM, as is the case here.

Figure 1.

Figure 1.

GEM data generation model. Mean and 95% Monte Carlo (MC) confidence intervals (based on the Inline graphic MC runs) of the values Inline graphic of the decisions, as a proportion Inline graphic, for Inline graphic (left half of panels) and 200 (right half of panels), and for ES = 0.1 (top half of panels) and ES=0.3 (bottom half of panels). The three panels per (Inline graphic, ES) combination correspond to Inline graphic on the left, Inline graphic in the middle and Inline graphic on the right. The method based on the unrestricted regression and the three GEM approaches are denoted as: (i) unrestricted—red color, most left; (ii) numerator criteria—green, second from left; (iii) denominator criterion—blue, third from left; (iv) Inline graphic criterion—purple, most right. The “Number of observations” on the bottom horizontal axis is the sample size per group.

Figure 2 presents information similar to that on Figure 1, but here the data are generated under a general linear non-GEM model (2.2). It shows that even when the data is not generated from a GEM model, the criteria perform quite well for relatively small number of covariates Inline graphic. For larger Inline graphic, larger sample sizes and larger Inline graphic are needed to achieve good performance. The values of the decisions based on the denominator criterion are meaningfully inferior to the values of the decisions from the other methods as the deviation from GEM becomes large. The denominator’s inferiority becomes more pronounced as Inline graphic, and Inline graphic increase. Regardless of the data generating model, the values produced by the Inline graphic method are either the best or very close to the best values produced by either of the other methods compared here. Additionally, simulations were run using the non-GEM generating model except that a subset of predictors were discretized to be binary (5 out of 10 for Inline graphic and 20 out of 200 when Inline graphic); the results are very similar to those when all predictors are continuous—details are provided in the supplementary material.

Figure 2.

Figure 2.

Non-GEM data generation model. Mean and 95% Monte Carlo (MC) confidence intervals (based on the Inline graphic MC runs) of the values Inline graphic of the decisions, as a proportion Inline graphic, for Inline graphic (left half of panels) and 200 (right half of panels), and for small deviation from GEM (top half of panels) and large deviation from GEM (bottom half of panels). The three panels per (Inline graphic, deviation from GEM) combination correspond to Inline graphic on the left, Inline graphic in the middle and Inline graphic on the right. The method based on the unrestricted regression and the three GEM approaches are denoted as: (i) unrestricted—red color, most left; (ii) numerator criteria—green, second from left; (iii) denominator criterion—blue, third from left; (iv) Inline graphic criterion—purple, most right. The “Number of observations” on the bottom horizontal axis is the sample size per group.

Section 4 of the supplementary material available at Biostatistics online presents results on the performance of the GEM methods in the case when the data generation is not from the linear model (2.2). There we show simulation results based on a doubly-robust estimation procedure using an augmented inverse probability weighted estimator (AIPWE) of the value Inline graphic (Robins and others, 1994; Zhang and others, 2012b). Although the GEM approach based on the AIPWE does marginally worse than the unrestricted approach described in Zhang and others (2012b) using an example with Inline graphic predictors, their approach becomes computationally infeasible for larger values of Inline graphic. In cases with large Inline graphic, the GEM reduces the dimensionality of the predictor space to Inline graphic making the AIPWE approach fast and feasible.

6. Application to data from a RCT

We illustrate the three GEM procedures using data from a RCT for the treatment of depression comparing antidepressants of the class of selective serotonin reuptake inhibitors (SSRI) against placebo. In addition to establishing the overall efficacy of the SSRI, the investigators were interested in finding biosignatures for SSRI treatment response. The investigators defined “biosignature” as a baseline patient characteristic or a combination of such characteristics, that constitutes a moderator of the treatment effect of SSRI vs. placebo.

Data from 76 and 72 subjects randomized to placebo and SSRI, respectively, were available. The outcome was the change from baseline (week 0) to 8 week of treatment on the Hamilton Rating Scale for Depression (HRSD). High values of HRSD indicate higher depression severity and thus positive change (week 0–week 8) indicate reduction of depression. The following baseline clinical measures were proposed as potential moderators: (i) level of anxiety (ii) severity of anger attack; (iii) suicidal risk; (iv) medical comorbidity score; and (v) experience of pleasure score.

Outcome was modeled as a linear function of a baseline measure, treatment indicator (SSRI Inline graphic vs. placebo Inline graphic) and the interaction between them for each measure individually. None of the interaction terms were statistically significant, see Table 1. A comparison of a full unrestricted model with all five predictors and their interactions with treatment against a reduced model without the interactions, yielded a non-significant Inline graphic-test for the interactions (Inline graphic). Thus, the usual approaches of treating each predictor separately or a full unrestricted model for all predictors fail to find evidence for heterogeneous effect of SSRI and consequently fails to identify patients who stand to benefit from or be harmed by it.

Table 1.

SSRI Clinical biosignature: potential moderators of the efficacy of treatment with SSRI vs. placebo with respect to change in HRSD from baseline to week 8. The 3rd column gives the Inline graphic-values for the interaction predictor-by-treatment term and the 4th column gives the effect size of the predictor as a moderator of treatment effect from a regression model with only that variable as a predictor in addition to treatment. The last two columns give the regression coefficients from models with all five baseline measures as predictors for treatment Inline graphic (placebo) and Inline graphic (SSRI) respectively

  Mean St. dev. Interaction Effect Reg. coefs
      p value size Inline graphic Inline graphic
Anxiety 5.36 1.80 0.797 0.020 1.06 1.44
Anger attack 3.05 2.12 0.671 0.034 Inline graphic0.59 Inline graphic0.09
Suicide risk 5.42 2.37 0.155 0.113 1.00 Inline graphic0.38
Medical comorbidity score 2.01 2.78 0.092 0.140 0.11 Inline graphic0.58
Life pleasure score 33.17 5.51 0.065 0.148 Inline graphic0.20 0.04

Next, the linear combinations Inline graphic for the 3 GEM criteria were estimated, see Table 2. The numerator and Inline graphic-criteria give similar results, but only the Inline graphic-criterion has a statistically significant permutation Inline graphic value (Inline graphic). Note, that the effect sizes for the GEMs based on the numerator and the Inline graphic-criterion (which are very similar, both Inline graphic), are double that of any individual predictor. The denominator GEM, on the other hand, does not produce a significant interaction Inline graphic value (and also has a very small estimated ES), which is consistent with the observation that, since the angle between the unrestricted regression coefficient vectors is relatively large (Inline graphic), the model deviates quite a bit from a true GEM model.

Table 2.

GEM Model for SSRI clinical biosignature. The estimated GEMs of the SSRI treatment effect on change in HRSD. The bottom rows give the GEM effect sizes (row 6), permutation-adjusted Inline graphic-values (row 7); the estimated value (1.1) of the decision based on GEM criteria along with a 95% cross-validated bootstrap confidence interval (CI) (row 8); the difference in value and 95% cross-validated bootstrap CI for the difference between the decision based on the respective GEM and the decision (i) give everyone placebo (row 9), (ii) give everyone SSRI (row 10), and (iii) give everyone SSRI or placebo at random (row 11).

  Estimated Inline graphic
  Inline graphic Inline graphic Inline graphic
Anxiety 0.12 0.55 0.12
Anger attack 0.15 Inline graphic0.15 0.15
Suicide risk Inline graphic0.42 0.14 Inline graphic0.42
Medical comorbidity score Inline graphic0.21 Inline graphic0.10 Inline graphic0.21
Life pleasure score 0.07 -0.04 0.07
Effect size 0.27 0.01 0.27
Permutation Inline graphic-value 0.061 0.895 0.048
Value of GEM 8.03 7.60 8.03
(95% CI) (6.28, 9.78) (5.62, 9.43) (6.21, 9.68)
Value of GEM Inline graphic Value of placebo 2.02 1.57 2.00
(95% CI) (1.97, 2.06) (1.52, 1.62) (1.96, 2.05)
Value of GEM Inline graphic Value of SSRI 0.52 0.07 0.50
(95% CI) (0.48, 0.55) (0.04, 0.10) (0.46, 0.54)
Value of GEM Inline graphic Value of random 1.29 0.84 1.27
(95% CI) (1.25, 1.32) (0.80, 0.87) (1.24, 1.31)

For the sake of comparison, estimates of the value for the three GEM criteria were obtained using an Inverse Probability Weighted Estimator (IPWE) Inline graphic where Inline graphic if the treatment assignment Inline graphic and treatment decision Inline graphic coincide for subject Inline graphic with covariates Inline graphic. Here, Inline graphic is the probability of treatment assignment, which will be a constant for a RCT and is 0.5 in this example. Row 8 of Table 2 gives a 95% cross-validation bootstrap confidence interval (using 1000 bootstrap samples) for the value of each GEM criterion. The CIs were computed using a 10-fold cross-validation on each bootstrap sample, where treatment decisions were estimated by applying the respective GEM approach to 9 of 10 non-overlapping subsamples of equal size, and then applied to the remaining 10th subsample to obtain an estimate of the value of the treatment decision and finally averaging those estimates across the 10 folds of the cross-validation. As Table 2 shows, the Inline graphic and numerator approaches produce very similar bootstrap confidence intervals for the value of the decision, while the denominator criterion results in a lower decision value that has a wider 95% CI. The last three rows of Table 2 show the differences between the values of the treatment decisions derived from each the three GEM approaches and the value of three commonly used comparison decisions (i) give everyone placebo; (ii) give everyone SSRI; and (iii) give placebo and SSRI at random estimated by the same cross-validation approach based on 1000 bootstrap samples.

The results from the GEM approaches are visually presented in Figure 3. The GEM analysis using the Inline graphic-ratio criterion (similar to the numerator criterion) results in the conclusion that 30.4% of the target population (to the left of the vertical lines at GEM Inline graphic) does not benefit from SSRI treatment. The decision based on the Inline graphic GEM could be not to prescribe SSRI to those subjects with Inline graphic; alternatively, one might choose to give SSRI only to patients with a Inline graphic scores in the range where the 95% CIs for placebo and SSRI GEM regressions do not overlap, that, GEMInline graphic. These results are consistent with the fact that many antidepressant trials fail to show efficacy, or show only small benefits, for example, about 25–30% difference in response rates of the antidepressants vs. placebo (60–65% vs. 30–35% respectively).

Figure 3.

Figure 3.

The relationship between the GEMs obtained from the three criteria and the change in depression (HRSD) from baseline to week 8 for the SSRI (blue) and placebo (red) interventions. The GEMs corresponding to each of the criteria are plotted on the horizontal axis. The lines are the LS lines and the shaded areas indicate the 95% pointwise CIs. The densities of the respective GEMs for the two treatment groups are indicted at the lower part of each panel. The vertical lines indicate the cut-off point on the linear combinations of predictors above which a depressed patient would benefit from treatment with SSRI.

7. Discussion

This article has shown how to combine several baseline characteristics into a single generated effect moderator in the context of the classic linear model. Closed-form expressions have been derived for these GEMs that do not require complex iterative computations. The GEM offers a straightforward approach to determine beneficial treatments for patients. From this perspective, GEMs can be viewed as indices for treatment decisions. Of the three criteria, we generally recommend the Inline graphic-criterion, because it simultaneously maximizes the interaction effect (the numerator) and also minimizes the prediction error (denominator) in the class of GEM models. Additionally, from our results, the Inline graphic-criterion’s performance is either optimal or very close to optimal with respect to making rules for treatment decisions with highest values.

In practice, after conducting the main hypotheses testing in efficacy studies, investigators attempt to discover baseline patient features that moderate the effect of treatment. Given that (if present) variables with large moderating effects of treatments for most illnesses have already been discovered, it is not surprising that researchers regularly fail to discover other moderators in studies where the primary goal is to establish efficacy. The proposed methods show that combining patient characteristics with little to no moderating effects of a treatment can result in a strong treatment effect modifier that can help with making treatment decisions. Of course, any treatment decision has to be validated in properly designed studies; for example, a 3-arm RCT where the experimental treatment, the control treatment and treatment according to the investigated treatment decision are compared. The proposed methodology is expected to be of particular utility in studies specifically designed to discover biosignatures for response to treatment, as discussed in the Introduction.

Several generalizations of the GEM procedure are currently under development, such as extending the GEM to generalized linear models and longitudinal outcomes. Work is also underway to allow the outcome to depend on nonparametric functions of GEMs, similar to generalized additive models. It will be useful to compare the linear GEM model developed here and a more flexible nonparametric GEM model to other methods for precision medicine for providing guidance in making treatment decisions.

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org

Acknowledgements

The authors are thankful to the editors and three reviewers whose feedback has greatly improved this article. Conflict of interest: None declared.

Funding

National Institutes of Health grant R01 MH099003.

References

  1. Ciarleglio A., Petkova E., Tarpey T. and Ogden R. T. (2015). Treatment decisions based on scalar and functional baseline covariates. Biometrics 71, 884–894. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Emura T., Chen, Y.-H. and Chen, H.-Y. (2012). Survival prediction based on compound covariate under cox proportional hazards models. PLoS One 7, 247627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Follmann D. A. and Proschan M. A. (1999). A multivariate test of interaction for use in clinical trials. Biometrics 55, 1151–1155. [DOI] [PubMed] [Google Scholar]
  4. Gail M. and Simon R. (1985). Testing for qualitative interactions between treatment effects and patient subsets. Biometrics 41, 361–372. [PubMed] [Google Scholar]
  5. Gunter L., Zhu J. and Murphy S. A. (2011). Variable selection for qualitative interactions in presonalized medicine while controlling the family-wise error rate. Journal of Biopharmaceutical Statistics 21, 1063–1078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Kang C., Janes H. and Huang Y. (2014). Combining biomarkers to optimize patient treatment recommendations. Biometrics 70, 695–707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Kraemer H. C. (2013). Discovering, comparing, and combining moderators of treatment on outcome after randomized clinical trials: a parametric approach. Statistics in Medicine 32, 1964–1973. [DOI] [PubMed] [Google Scholar]
  8. Laber E. B. and Zhao Y.-Q. (2015). Tree-based methods for individualized treatment regimes. Biometrika 102, 501–514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Lu W., Zhang H. H. and Zeng D. (2011). Variable selection for optimal treatment decision. Statistical Methods in Medical Research 22, 493–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. McKeague I. W. and Qian M. (2014). Estimation of treatment policies based on functional predictors. Statistica Sinica 24, 1461–1485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Murphy S. A. (2003). Optimal dynamic treatment regimes (with discussion). Journal of the Royal Statistical Society, Series B 58, 331–366. [Google Scholar]
  12. Qian M. and Murphy S. (2011). Performance guarantees for individualized treatment rules. Annals of Statistics 39, 1180–1210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Robins J. M. (2004). Optimal structured nested models for optimal sequential decisions. In: Heagerty P. J. and Lina D. Y.. (editors) Proceedings of the Second Seattle Symposium on Biostatistics. New York: Springer, pp. 189–326. [Google Scholar]
  14. Robins J., Orellana L. and Rotnizky A. (2008). Estimation and extrapolation of optimal treatment and testing strategies. Statistics in Medicine 27, 4678–4721. [DOI] [PubMed] [Google Scholar]
  15. Robins J. M., Rotnitzky A. and Zhao L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89, 846–866. [Google Scholar]
  16. Song R., Kosorok M., Zeng D., Zhao Y., Laber E. B. and Yuan M. (2015). On sparse representation for optimal individualized treatment selection with penalized outcome weighted learning. Stat 4, 59–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Song X. and Pepe M. S. (2004). Evaluating markers for selecting a patient’s treatment. Biometrics 60, 874–883. [DOI] [PubMed] [Google Scholar]
  18. Tian L. and Tibshirani R. J. (2011). Adaptive index models for market-based risk stratification. Biostatistics 12, 68–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Tukey J. W. (1991). Use of many covariates in clinical trials. International Statistical Review 59, 123–137. [Google Scholar]
  20. Tukey J. W. (1993). Tightening the clinical trial. Controlled Clinical Trials 14, 266–285. [DOI] [PubMed] [Google Scholar]
  21. Wang R., Schoenfeld D. A., Hoeppner B. and Evins A. E. (2015). Detecting treatment-covariate interactions using permutation methods. Statistics in Medicine 34, 2035–2047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Wang R. and Ware J. H. (2013). Detecting moderator effects using subgroup analyses. Prevention Science 14, 111–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Wellek S. (1997). Testing for absence of qualitative interactions between risk factors and treatment effect. Biometrical Journal 39, 809–821. [Google Scholar]
  24. Zhang B., Tsiatis A. A., Davidian M., Zhang M. and Laber E. (2012a). Estimating optimal treatment regimes from classification perspective. Stat 1, 103–114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Zhang B., Tsiatis A. A., Laber E. B. and Davidian M. (2012b). A robust method for estimating optimal treatment regimes. Biometrics 68, 1010–1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Zhao Y., Zeng D., Rush A. J. and Kosorok M. P. (2012). Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association 107, 1106–1118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Zhao Y., Zheng D., Laber E. B. and Kosorrok M. R. (2015). New statistical learning methods for estimating optimal dynamic treatment regimes. Journal of the American Statistical Association 110, 583–598. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES