Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2018 Dec 26;21(3):545–560. doi: 10.1093/biostatistics/kxy074

Inference on treatment effect modification by biomarker response in a three-phase sampling design

Michal Juraska 1,, Ying Huang 2, Peter B Gilbert 2
PMCID: PMC7308066  PMID: 30590450

Summary

An objective in randomized clinical trials is the evaluation of “principal surrogates,” which consists of analyzing how the treatment effect on a clinical endpoint varies over principal strata subgroups defined by an intermediate response outcome under both or one of the treatment assignments. The latter effect modification estimand has been termed the marginal causal effect predictiveness (mCEP) curve. This objective was addressed in two randomized placebo-controlled Phase 3 dengue vaccine trials for an antibody response biomarker whose sampling design rendered previously developed inferential methods highly inefficient due to a three-phase sampling design. In this design, the biomarker was measured in a case-cohort sample and a key baseline auxiliary strongly associated with the biomarker (the “baseline surrogate measure”) was only measured in a further sub-sample. We propose a novel approach to estimation of the mCEP curve in such three-phase sampling designs that avoids the restrictive “placebo structural risk” modeling assumption common to past methods and that further improves robustness by the use of non-parametric kernel smoothing for biomarker density estimation. Additionally, we develop bootstrap-based procedures for pointwise and simultaneous confidence intervals and testing of four relevant hypotheses about the mCEP curve. We investigate the finite-sample properties of the proposed methods and compare them to those of an alternative method making the placebo structural risk assumption. Finally, we apply the novel and alternative procedures to the two dengue vaccine trial data sets.

Keywords: Biomarker, Dengue, Principal stratification, Principal surrogate endpoint, Three-phase sampling design, Treatment effect modification, Vaccine

1. Introduction

Over the past 20 years, a sizable body of work has accumulated on estimation and inference on principal stratification estimands with application to “principal surrogate” evaluation. The goal of principal surrogate evaluation is to study how a clinical treatment effect in a randomized trial varies over principal strata subgroups defined by the potential outcome biomarker response under both treatment assignments (Frangakis and Rubin, 2002) or by the potential outcome biomarker response under one of the treatment assignments (Gilbert and Hudgens, 2008). These effect modification estimands have been called the causal effect predictiveness surface and the marginal causal effect predictiveness (mCEP) curve, respectively. Both estimands are useful, and, in this work, we focus on the mCEP curve estimand because of its greater facilitation for decision-making and its easier identifiability. Gilbert and Hudgens (2008) studied fully parametric and fully non-parametric estimated maximum likelihood estimation (EML) methods for this estimand, Huang and Gilbert (2011) relaxed the fully parametric EML methods to semi-parametric EML methods, and Huang and others (2013) and Huang (2018) replaced EML with pseudo-score (PS) estimation, demonstrating improved efficiency compared to EML and providing analytic variance estimation that was not possible via EML. These methods considered a binary clinical endpoint, and Gabriel and Gilbert (2014) and Gabriel and others (2015) extended the work to accommodate a time-to-event clinical endpoint subject to right-censoring. Moreover, Li and others (2010) and other papers from the same group studied full likelihood Bayesian methods, as did Zigler and Belin (2011).

The present work is motivated by two randomized placebo-controlled Phase 3 trials of a dengue vaccine, where the primary clinical endpoint of interest was symptomatic virologically confirmed dengue (VCD) occurrence between the month 13 visit and the month 25 visit. Overall vaccine efficacy [one minus relative risk (vaccine/placebo) of VCD times 100%] to prevent VCD over this follow-up period was estimated at 56.5% (95% CI 43.8–66.4) in the trial in Asia (Capeding and others, 2014) and at 60.8% (95% CI 52.0–68.0) in the trial in Latin America (Villar and others, 2015). In these trials with harmonized study designs and protocols, the biomarker Inline graphic of interest to study as a modifier of vaccine efficacy (in participants free of VCD through month 13) was a participant’s average Inline graphic neutralizing antibody titer against the four dengue virus strains represented in the vaccine construct (one of each serotype), measured from a serum sample taken at the month 13 study visit (Moodie and others, 2018). A particular three-phase case-cohort sampling design was used for measuring Inline graphic from month 13 samples that rendered the previously developed EML and PS methods highly inefficient for solving the problem, and also opened an opportunity for a new approach studied here that better takes advantage of the data structure. (This work does not consider a Bayesian approach, which could also be fruitful for this setting.) Specifically, baseline serum samples from a random sample of approximately 10% (20%) of all participants in the trial in Asia (Latin America) were collected, whereas month 13 serum samples were collected from all participants in each trial. With covariate Inline graphic defined the same as Inline graphic except measured at baseline, both Inline graphic and Inline graphic were measured in all participants randomly sampled at entry into the subcohort, and Inline graphic was measured in all participants experiencing the VCD primary endpoint. Because Inline graphic and Inline graphic are the same variable measured at different times, they are highly correlated, making Inline graphic an ideal “baseline immunogenicity predictor” (Follmann, 2006; Gilbert and Hudgens, 2008) [referred to as a “baseline surrogate measure (BSM)” in Gabriel and Gilbert (2014)]. Such a predictor is a key ingredient of all of the EML and PS methods to yield reasonably precise estimation of the mCEP curve. However, all of the EML and PS methods require that Inline graphic be measured from all vaccine recipients with Inline graphic measured, which means that the methods applied to the data would discard data from approximately 90% or 80% of VCD endpoint cases in the vaccine group for the two trials, respectively.

However, with Inline graphic the potential outcome of Inline graphic for the vaccine (Inline graphic) and placebo (Inline graphic) group, the new opportunity generated by the dengue Phase 3 trials is that Inline graphic, baseline covariates, and Inline graphic may plausibly contain all information for VCD risk in the placebo group without needing to also condition on Inline graphic, because neutralizing antibody titers are the key known predictor of VCD in both vaccinated and unvaccinated individuals [e.g., analysis and references in Katzelnick and others (2017)]. With this novel assumption ((A4) below), this method departs from previous methods. While this assumption can be challenged, it beneficially removes the need to make the “placebo structural risk” modeling assumption common to all past EML and PS methods that links the conditional risk of VCD if assigned placebo to the candidate surrogate if assigned vaccine, a quantity not identifiable from the observed data unless close-out placebo vaccination is used (Follmann, 2006), which was not done in the dengue vaccine trials. The other novel assumption underlying this new approach takes advantage of the fact that the baseline immunogenicity predictor is a BSM, which allows making a time-constancy assumption about the distribution of Inline graphic conditional on baseline covariates and Inline graphic or Inline graphic ((A5) below). By estimating the density of Inline graphic conditional on baseline covariates and Inline graphic with non-parametric kernel smoothing and not making the placebo structural risk modeling assumption, this new method gives more flexible estimation of the mCEP curve than the previous methods, with advantages in bias, efficiency, and confidence interval (CI) coverage as illustrated in Sections 4 and 5.

The remainder of this article is organized as follows. In Sections 1.11.4, we introduce notation, define the estimand of interest, state identifiability assumptions, establish identifiability based on these assumptions, and discuss plausibility of the assumptions and potential for their violations. We describe the estimation method, under a modeling assumption, including the construction of simultaneous CIs, in Section 2 and characterize procedures for testing of hypotheses of interest in Section 3. The design and findings from the simulation experiment are presented in Section 4. We apply the proposed estimation and inference procedures to data from the two dengue vaccine trials in Section 5.

1.1. Notation

We consider a randomized placebo-controlled trial with treatment assignment Inline graphic (Inline graphic, treatment; Inline graphic, placebo), and a discrete or continuous univariate biomarker Inline graphic measured at fixed time Inline graphic after randomization. In many vaccine trials, placebo recipients have variable levels of Inline graphic reflecting pre-existing immunity arising from past exposure (e.g., natural infection and/or prior vaccination) to the disease-causing pathogen (e.g., dengue virus, Plasmodium falciparum, or influenza virus). It is of interest to evaluate Inline graphic as a modifier of the treatment effect on a binary clinical endpoint Inline graphic (Inline graphic, disease; Inline graphic no disease) measured after Inline graphic. To this end, Inline graphic needs to be measured prior to Inline graphic; therefore, we restrict the analysis to trial participants who are observed to be endpoint-free at Inline graphic and denote this status as Inline graphic. If Inline graphic, Inline graphic is undefined, and we set Inline graphic.

We consider a three-phase outcome-dependent case-cohort sampling design as follows: Inline graphic, Inline graphic, Inline graphic, and a baseline covariate vector Inline graphic are measured in all randomized participants (phase 1). Next, at baseline, Bernoulli sampling of all randomized participants is used to determine a subcohort Inline graphic, and the biomarker Inline graphic is measured at time Inline graphic in the subset of this subcohort with Inline graphic as well as in all or almost all cases (those with Inline graphic) with Inline graphic, regardless of their membership in Inline graphic (phase 2) [this is a classic case-cohort sampling design (Prentice, 1986)]. Finally, the biomarker at baseline Inline graphic is measured only in the subcohort Inline graphic with Inline graphic (phase 3); Inline graphic is the BSM in the Gabriel and Gilbert (2014) design. Let Inline graphic and Inline graphic indicate, respectively, that Inline graphic and Inline graphic are measured in phases 2 and 3. The sampling design is graphically illustrated in Supplementary Figure 1 available at Biostatistics online and may be expressed as Inline graphic sampled from the same Bernoulli distribution for all participants and then, conditional on Inline graphic, Inline graphic sampled according to the sampling probability

graphic file with name M66.gif

for a fixed constant Inline graphic.

Our causal estimand of interest is the mCEP curve that has been previously studied in several papers. To define the mCEP curve, let Inline graphic, Inline graphic, Inline graphic, and Inline graphic be the potential outcomes of Inline graphic, Inline graphic, Inline graphic, and Inline graphic under treatment assignment Inline graphic. If Inline graphic, Inline graphic and Inline graphic is undefined, therefore we set Inline graphic.

1.2. Estimand of interest

Let Inline graphic measure the overall causal effect of treatment on the clinical endpoint Inline graphic, where Inline graphic is a known contrast function such that Inline graphic if and only if Inline graphic. Denote Inline graphic for Inline graphic. We define the causal estimand of interest as

graphic file with name M88.gif (3.1)

this estimand is termed the marginal CEP curve in the literature. If Inline graphic is continuous (assumed henceforth), then (1.1) abuses notation for expositional simplicity, with the formal definition provided in the supplementary material available at Biostatistics online. The contrast Inline graphic gives Inline graphic an interpretation as the percent reduction in clinical endpoint risk among treatment recipients with biomarker level Inline graphic compared to if they had been assigned placebo, whereas the additive-difference contrast Inline graphic gives an attributable risk interpretation for treatment recipients with biomarker level Inline graphic.

1.3. Identifiability assumptions

We suppose that Inline graphic, Inline graphic, are independent and identically distributed and assume no drop-out for simplicity. We consider commonly made identifiability assumptions (Gilbert and Hudgens, 2008; Huang and Gilbert, 2011; Huang and others, 2013; Huang, 2018):

  • (A1) Stable Unit Treatment Value Assumption (SUTVA): Inline graphic of the Inline graphic-th subject is independent of Inline graphic, Inline graphic.

  • (A2) Ignorable treatment assignment: Inline graphic is conditionally independent of Inline graphic given Inline graphic.

  • (A3) Equal early clinical risk: Inline graphic.

(A1) implies “consistency,” i.e., Inline graphic. SUTVA may be violated in vaccine trials due to herd immunity and other factors, but may be approximately valid if trial participants do not interact with each other and the study sites are in large geographically dispersed regions. (A2) holds by randomization that may be stratified by Inline graphic. (A3) is more credible when Inline graphic is near baseline relative to the length of follow-up and it takes time for the effect of the intervention to occur. For simplicity of exposition, henceforth all conditional and unconditional probabilities of Inline graphic and densities of Inline graphic implicitly condition on Inline graphic.

We additionally consider the following identifiability assumptions:

  • (A4) Inline graphic, i.e., the risk of Inline graphic is conditionally independent of Inline graphic given Inline graphic.

  • (A5) Time constancy: Inline graphic for all Inline graphic, where Inline graphic and Inline graphic are conditional density functions of Inline graphic.

In a standard trial design for biomarker effect modification evaluation, Inline graphic is unobserved in all placebo recipients, and thus the validity of (A4) cannot be tested in general. (A4) may be violated if the occurrence of the event Inline graphic is correlated with Inline graphic within subgroups defined by Inline graphic, stemming from unmeasured factors, and we discuss its plausibility in the analysis of the dengue vaccine trials in Section 5. (A5) may be plausible as both Inline graphic and Inline graphic measure pre-existing natural and/or vaccine-induced immunity, only Inline graphic is measured Inline graphic time units later than Inline graphic. The estimation section includes a technique that relaxes this assumption.

1.4. Establishing identifiability

Each of the two conditional risks Inline graphic and Inline graphic in Inline graphic, defined in (1.1), is identified by the observed data and assumptions (A1)–(A5), as sketched here. Bayes’ theorem and (A4) yield

graphic file with name M132.gif (1.2)

where Inline graphic, Inline graphic is a conditional joint cumulative distribution function of Inline graphic given Inline graphic, Inline graphic a conditional density of Inline graphic given Inline graphic, Inline graphic a conditional density of Inline graphic given Inline graphic, Inline graphic a marginal density/probability function of Inline graphic, and Inline graphic a marginal density of Inline graphic. The decomposition in (1.2) is advantageous because it enables us to identify (and estimate) Inline graphic by separately identifying each component in (1.2).

First, Inline graphic is identified from placebo recipients with Inline graphic and the sampling design, and Inline graphic, which involves phase 1 covariates only, is identified from all randomized participants. Second, for identifying Inline graphic, note that, under the sampling design,

graphic file with name M152.gif

where the first equality holds because Inline graphic implies Inline graphic, the second equality holds because participants with Inline graphic are a random sample of all randomized participants, and the third equality holds by (A5). Because Inline graphic is identified from the observed data, the needed term Inline graphic is identified. Lastly, Inline graphic in (1.2) is identified from placebo recipients with Inline graphic and the sampling design. Similar to this last term, the conditional risk Inline graphic in (1.1) is identified from treatment recipients with Inline graphic and the sampling design; this conditional risk is straightforward to identify because both Inline graphic and Inline graphic are observable from the same treatment recipient.

2. Estimation method

2.1. Modeling assumption

We develop an estimation and inference method under the following modeling assumption:

  • (A6) The risk of Inline graphic conditional on Inline graphic and Inline graphic follows a generalized linear model (GLM) for Inline graphic.

(A6) for Inline graphic has been made in previous papers and constitutes a standard regression problem without identifiability problems. (A6) for Inline graphic is novel, replacing the untestable “placebo structural risk” assumption from previous papers that Inline graphic conditional on Inline graphic and Inline graphic follows a GLM. The GLMs specified in (A6) are estimated in participants with Inline graphic using methods for case-cohort designs.

2.2. Estimation of the causal estimand

We propose a plug-in estimator for Inline graphic by separately estimating Inline graphic and Inline graphic. We estimate Inline graphic by fitting the GLM specified in (A6) accounting for the case-cohort sampling of Inline graphic. To estimate Inline graphic, leveraging the identifiability results, we estimate Inline graphic by an estimate of Inline graphic, which we obtain via non-parametric kernel smoothing. Because participants with Inline graphic constitute a random sample from all randomized participants, Inline graphic is estimated by an estimate of Inline graphic generated using data from treatment recipients with Inline graphic.

The above approach to estimation of Inline graphic assumes (A5), whose veracity may be supported by regression modeling that indicates an association of Inline graphic and Inline graphic close to identity among placebo recipients with measured Inline graphic. Otherwise, (A5) could be violated, and an estimated regression model, e.g., Inline graphic, could be employed for calibration to estimate Inline graphic.

We estimate Inline graphic by estimating Inline graphic via non-parametric kernel smoothing after random proportional deletion of a subset of cases among participants with measured Inline graphic to attain the same case:control ratio as in the target population that is represented by randomized participants observed to be endpoint-free at Inline graphic (i.e., Inline graphic). Another, more powerful, approach would use inverse probability weighting; however, we conjecture that minimal efficiency gain would be achieved in a rare endpoint setting common in vaccine trials. The multivariate density/probability function Inline graphic is estimated using phase 1 covariate data from all randomized participants. In Section 4, we also consider an alternative parametric estimator for density functions in (1.2) based on maximum likelihood estimation in the Gaussian family of distributions.

Pointwise and simultaneous Wald-type CIs for Inline graphic are obtained by assessing Inline graphic in Inline graphic bootstrap samples, with cases and controls sampled separately yielding a fixed number of cases and controls in each bootstrap sample. We construct the CIs in two steps. First, CIs for the Inline graphic-transformed estimand (defined below) are constructed, and then the inverse transformation is applied to the confidence bounds.

2.3. Simultaneous confidence interval for the mCEP curve

Let Inline graphic be a “symmetrizing” transformation of the mCEP curve estimand that helps make Wald-type CIs more accurate in finite samples; e.g., if Inline graphic, we consider Inline graphic, or, if Inline graphic, we may consider the identity transformation or Inline graphic. For an arbitrary subset Inline graphic of the support of Inline graphic, denote

graphic file with name M209.gif

where Inline graphic and Inline graphic denotes a standard error. For a fixed Inline graphic, we define Inline graphic as the solution to the equation Inline graphic.

Further, let Inline graphic be the estimate of Inline graphic based on the Inline graphic-th bootstrap sample, and Inline graphic be the sample standard deviation of the bootstrap estimates Inline graphic, Inline graphic. Let Inline graphic. Because the distributions of Inline graphic and Inline graphic are asymptotically equivalent, we estimate Inline graphic by Inline graphic defined as the empirical quantile in the bootstrap sample Inline graphic, Inline graphic, at the probability level Inline graphic. Finally, the simultaneous Wald-type bootstrap Inline graphic CI for the Inline graphic curve is obtained by the Inline graphic transformation of the bounds

graphic file with name M232.gif

It is of note that the CI width depends on the size of Inline graphic, which may be chosen based on statistical and biological considerations.

3. Hypothesis testing

It is of interest to evaluate, separately, the following four null hypotheses, each against a general alternative hypothesis:

  • Inline graphic,

  • Inline graphic and a known constant Inline graphic,

  • Inline graphic, where Inline graphic and Inline graphic are each associated with either a different biomarker (measured in the same units) or a different endpoint or both, and

  • Inline graphic, where Inline graphic is a baseline dichotomous phase 1 covariate of interest included in Inline graphic.

A test of Inline graphic assesses Inline graphic as a modifier of the clinical treatment effect. It is commonly of interest to test Inline graphic with Inline graphic representing the absence of a treatment effect, to assess if there exists a subgroup of treatment recipients defined by biomarker levels in Inline graphic with some treatment effect. A test of Inline graphic allows comparisons of two biomarkers and/or two endpoints, while a test of Inline graphic allows baseline subgroup or between-trial comparisons.

We follow Roy and Bose (1953) to construct tests of hypotheses Inline graphic, Inline graphic. Let Inline graphic, where Inline graphic. For testing Inline graphic and Inline graphic at significance level Inline graphic, we use as the regions of rejection

graphic file with name M257.gif

and

graphic file with name M258.gif

respectively, where Inline graphic is an estimator for CE, and Inline graphic and Inline graphic are empirical quantiles in bootstrap samples Inline graphic and Inline graphic, Inline graphic, respectively, at the probability level Inline graphic. We obtain the two-sided p-values as the empirical probabilities that Inline graphic and Inline graphic.

For testing Inline graphic, let Inline graphic be a contrast pertaining to the transformation Inline graphic. Let Inline graphic denote the sample standard deviation of the bootstrap estimates Inline graphic, Inline graphic, and let Inline graphic. We define Inline graphic as the empirical quantile in the bootstrap sample Inline graphic, Inline graphic, at the probability level Inline graphic. Subsequently, we test Inline graphic at significance level Inline graphic by using as the region of rejection

graphic file with name M281.gif

and we obtain the two-sided p-value as the empirical probability that Inline graphic. For testing Inline graphic, we proceed as for the test of Inline graphic except, due to independence of the baseline subgroups defined by Inline graphic, we obtain Inline graphic as Inline graphic.

4. Simulation study

Consider the treatment efficacy estimand Inline graphic defined by the contrast function Inline graphic. The simulation study aims to evaluate and compare finite-sample performance of the proposed estimator for Inline graphic with an alternative pseudo-score estimator (PSN) of Huang (2018) and examine size/power of the proposed tests of Inline graphic, Inline graphic, and Inline graphic. Note that the test of Inline graphic is identical to that of Inline graphic except that it additionally accounts for correlation of the contrasted estimators for Inline graphic. Two approaches to probability density estimation in (1.2)—non-parametric kernel smoothing and Gaussian maximum likelihood estimation—are considered throughout the simulation, resulting in two variants of the proposed estimator for Inline graphic (denoted NP-TE and MLE-TE, respectively). More specifically, the NP-TE estimator employs the generalized product kernel density estimation method of Hall and others (2004) with optimal bandwidths selected by likelihood cross-validation.

The simulation design mimics characteristics of the dengue vaccine trials introduced in Section 1. We generate Inline graphic as follows. First, we generate i.i.d. vectors from Inline graphic, where Inline graphic, Inline graphic with Inline graphic for all Inline graphic, and the correlation matrix Inline graphic is chosen to emulate relationships between biomarker measurements at baseline and month 13 observed in the dengue trials:

graphic file with name M305.gif

Then each component of the generated triplets is left-censored at the value of 1.5, which represents, e.g., the biomarker assay’s lower limit of quantitation.

Furthermore, using conditional independence in (A4), we posit a probit model for the association of the biomarkers Inline graphic with the endpoint indicator in each treatment group:

graphic file with name M307.gif (4.1)

where Inline graphic is the cumulative distribution function of Inline graphic. Model (4.1) yields

graphic file with name M310.gif (4.2)

where Inline graphic is the conditional probability density of Inline graphic given Inline graphic. For Inline graphic, denote the marginal mean risk Inline graphic and the risk “gradient” Inline graphic, where Inline graphic is the quantile of the marginal distribution of Inline graphic at probability Inline graphic. The values of the probit model coefficients Inline graphic are chosen such that Inline graphic, Inline graphic, Inline graphic, and Inline graphic. We consider Inline graphic to reflect the assumed positive correlation between Inline graphic and Inline graphic. Using (4.2), the resultant Inline graphic for Inline graphic, representing the truth, is shown in Figure 1 (solid curve).

Fig. 1.

Fig. 1.

The true Inline graphic curves in the simulation design, each satisfying Inline graphic and Inline graphic. In addition, the solid curve reflects Inline graphic and Inline graphic. The dot-dashed curve, used only to assess power of the test of Inline graphic, reflects the same Inline graphic but Inline graphic. The dashed line, used only to assess size of the tests of Inline graphic and Inline graphic, reflects Inline graphic.

To estimate Inline graphic, the proposed NP-TE and MLE-TE estimators assume the logit link function in model (4.1). The PSN estimator of Huang (2018) for Inline graphic models the endpoint risk as a function of Inline graphic and Inline graphic and utilizes a discrete baseline covariate for predicting a missing Inline graphic value. To implement the PSN estimator, we discretize Inline graphic by quartiles to arrive at Inline graphic, which is used as the auxiliary variable for predicting Inline graphic. We model Inline graphic using the PSN approach and construct the corresponding pointwise Wald-type CI for Inline graphic using the analytical variance estimator developed by Huang (2018). We examine and compare finite-sample relative bias and mean squared error (MSE) of the proposed and the PSN estimators for Inline graphic and coverage probabilities of pointwise and simultaneous 95% Wald-type bootstrap CIs for Inline graphic.

Modifications of the described mechanism are needed for generating data under Inline graphic, Inline graphic, Inline graphic, and under respective alternative hypotheses. To evaluate size of the tests of Inline graphic and Inline graphic, we set Inline graphic in (4.1) and recompute Inline graphic and Inline graphic in order to maintain the marginal probabilities Inline graphic and Inline graphic. The respective constant Inline graphic curve is shown in Figure 1 (dashed line). We evaluate power of the tests of Inline graphic and Inline graphic for the setting described above. This setting also serves to evaluate size of the test of Inline graphic by drawing two independent samples, each used for estimating Inline graphic separately, representing distinct populations. To evaluate power of the test of Inline graphic, the first sample considers the setting specified above, while the second sample considers the same setting except Inline graphic. The resultant Inline graphic curve for this modified setting is also shown in Figure 1 (dot-dashed curve).

We consider a three-phase case-cohort sampling design. In phase 1, 5000 subjects are randomized at a 1:1 ratio to receive treatment (Inline graphic) or placebo (Inline graphic) and followed for the binary outcome Inline graphic. Although not required, for simplicity, we assume that all endpoints occur after time Inline graphic at which Inline graphic is measured (i.e., Inline graphic for all subjects in the absence of drop-out). In phase 2, we measure Inline graphic in a Bernoulli sample Inline graphic, drawn at baseline with sampling probability Inline graphic, and in all cases (Inline graphic) whether or not they were in this subcohort. In phase 3, we measure Inline graphic in subcohort Inline graphic only, i.e., Inline graphic is missing in cases not included in Inline graphic and the proportion of such cases varies with Inline graphic. We evaluate the performance of the estimation and inferential procedures for Inline graphic as a function of Inline graphic, setting Inline graphic, 0.25, and 0.5 (note that Inline graphic was set to 0.1 and 0.2 in the two dengue trials). The results are based on Inline graphic replicated data sets with 500 bootstrap samples drawn in each data set.

For all values of Inline graphic, the NP-TE and MLE-TE estimators exhibit minimal bias for Inline graphic, with an increase in bias in the left-censored tail, whereas the PSN estimator is heavily biased in both tails for Inline graphic, and its bias becomes comparable to that of the NP-TE and MLE-TE estimators as Inline graphic increases to 0.5 (Figure 2, top row). Additionally, for Inline graphic, the NP-TE and MLE-TE estimators substantially reduce the MSE in both tails compared with the PSN estimator, with comparable MSE across all three estimators achieved for Inline graphic (Figure 2, bottom row). Coverage probabilities of pointwise 95% Wald-type bootstrap CIs induced by the NP-TE and MLE-TE estimators are within the Monte-Carlo (MC) error band except with slight undercoverage in the central region by the MLE-TE estimator using Inline graphic (see Section 3 in the supplementary material available at Biostatistics online). In contrast, pointwise 95% Wald-type CIs induced by the PSN estimator uniformly overcover for Inline graphic, with coverage probabilities within the MC error band attained for Inline graphic and 0.5. The simultaneous 95% Wald-type bootstrap CIs for Inline graphic induced by both the NP-TE and MLE-TE estimators exhibit adequate coverage for all values of Inline graphic (Table 1). Overall, the NP-TE and MLE-TE estimators perform comparably, with considerable precision gain and improved CI coverage over the PSN estimator for small values of Inline graphic.

Fig. 2.

Fig. 2.

Top row: Estimated relative bias of the proposed NP-TE and MLE-TE estimators for Inline graphic, compared with that of the PSN estimator of Huang (2018), as a function of the probability Inline graphic of sampling into the phase 2/3 subcohort Inline graphic. Bottom row: Estimated MSE of the proposed NP-TE and MLE-TE estimators for Inline graphic, compared with that of the PSN estimator of Huang (2018), as a function of probability Inline graphic of sampling into the phase 2/3 subcohort Inline graphic.

Table 1.

Estimated coverage probabilities of the simultaneous 95% Wald-type bootstrap CI for Inline graphic based on the NP-TE and MLE-TE estimators, where Inline graphic spans the observed values of Inline graphic, as a function of the probability Inline graphic of sampling into the phase 2/3 subcohort Inline graphic

Inline graphic NP-TE MLE-TE
0.1 0.959 0.943
0.25 0.956 0.944
0.5 0.959 0.954

Table 2 summarizes observed sizes and powers of the tests of Inline graphic, Inline graphic, and Inline graphic described in Section 3, based on both the NP-TE and MLE-TE estimators. For all values of Inline graphic, sizes of the tests of Inline graphic and Inline graphic are in good agreement with the nominal significance level, whereas the test of Inline graphic is markedly conservative. Nevertheless, for each Inline graphic, the power of the test of Inline graphic is only slightly smaller than that of the test of Inline graphic suggesting that the former test is also useful in practice. Powers of the test of Inline graphic are small for the given comparison (see Figure 1) indicating that larger contrasts are needed to be detected with sufficient power. Finally, the tests based on the MLE-TE estimator are slightly more powerful than those based on the NP-TE estimator.

Table 2.

Size and power of the two-sided tests, based on the NP-TE and MLE-TE estimators, of Inline graphic, Inline graphic with Inline graphic, and Inline graphic against general alternative hypotheses, as a function of the probability Inline graphic of sampling into the phase 2/3 subcohort Inline graphic. The nominal significance level is taken to be 0.05

  Test of Inline graphic   Test of Inline graphic   Test of Inline graphic
Inline graphic Size Power   Size Power   Size Power
  NP-TE
0.1 0.01 0.73   0.04 0.83   0.04 0.12
0.25 0.01 0.84   0.05 0.89   0.04 0.15
0.5 0.01 0.89   0.05 0.93   0.04 0.18
  MLE-TE
0.1 0.01 0.87   0.06 0.92   0.05 0.17
0.25 0.01 0.91   0.05 0.95   0.05 0.20
0.5 0.01 0.92   0.06 0.96   0.05 0.20

5. Application

The two harmonized Phase 3 dengue trials, introduced in Section 1, randomized 31 144 healthy children (aged 2–14 and 9–16 years in Asia and Latin America, respectively) at a 2:1 ratio to receive either three doses of Sanofi Pasteur’s recombinant, live, attenuated, tetravalent dengue vaccine (Dengvaxia/CYD-TDV), or placebo at months 0, 6, and 12 after randomization (Capeding and others, 2014; Villar and others, 2015). Participants were followed with active surveillance for 25 months for the primary clinical endpoint of VCD. The current indication of CYD-TDV requires the minimum age of 9 years, and therefore we choose to present herein a trial-pooled analysis in the 25 824 children aged Inline graphic years.

We aim to study modification of the effect of CYD-TDV vs. placebo on VCD risk by a biomarker Inline graphic under assignment to CYD-TDV that is measured by the PRNTInline graphic assay from a serum sample collected at the month 13 visit; Inline graphic is the average of logInline graphic neutralizing antibody titers to the four parental dengue strains of the vaccine constructs, one for each dengue serotype (Moodie and others, 2018). The following three-phase case-cohort sampling design was employed: the phase 1 covariate vector Inline graphic—age category (Inline graphic11 vs. Inline graphic11 years) and country—was observed in all participants. Baseline serum samples for measuring Inline graphic were collected from a Bernoulli sample Inline graphic of all participants, whereas month 13 serum samples were collected from all participants. Subsequently, Inline graphic was measured at month 13 in subcohort Inline graphic and in all VCD cases (phase 2), whereas the biomarker’s baseline value, Inline graphic, was only measured in subcohort Inline graphic (phase 3). Because Inline graphic is measured at month 13, we restrict the analysis to the 24 768 children aged Inline graphic9 years who were at risk and free of VCD at month 13; this cohort includes 503 VCD cases. Of the 2766 controls with Inline graphic measured, all but 7 also had Inline graphic measured. In contrast, of the 502 cases with Inline graphic measured, only 55 (11.0%) also had Inline graphic measured.

We consider two Inline graphic estimands of interest defined by contrasts Inline graphic and Inline graphic in (1.1), with identity and Inline graphic as their respective Inline graphic transformations (for the latter estimand, one could alternatively also use the identity transformation; simulations showed similar coverage of CIs under the two transformations). We employ the proposed estimator for Inline graphic, with probability densities estimated by the generalized kernel method of Hall and others (2004). We model the conditional VCD risk in (A6) by specifying inverse probability-weighted logistic regression models. In addition, a hinge model (Fong and others, 2017) is used for modeling the effect of Inline graphic because a likelihood ratio test supports a better fit (Inline graphic) and the number of VCD cases is sufficient to support this more flexible model. Moreover, a hinge model is used because it specifies that variability in Inline graphic near the lower limit of quantitation (LLOQ) does not associate with VCD risk, which is a desirable (biologically credible) model feature given that such values largely reflect PRNTInline graphic technical measurement error.

For the PSN estimator of Huang (2018), we model Inline graphic using a probit model adjusted for the main effects of Inline graphic, and the main effects and interaction of Inline graphic and Inline graphic with a hinge. The marginal risk conditional on Inline graphic is then estimated by integrating Inline graphic over the distribution of Inline graphic, which is estimated through a multinomial model of Inline graphic on natural cubic spline basis of Inline graphic with nodes at its quantiles at probabilities 0.25, 0.5, and 0.75.

In the cohort of participants at risk at month 13, logistic regression adjusted for treatment assignment as the sole covariate yields an overall estimated VCD risk of 0.013 and 0.035 in vaccine and placebo recipients, respectively, resulting in an overall log relative risk estimate of Inline graphic and an overall additive risk difference estimate of 0.022. We report and compare inference about the Inline graphic estimands defined by contrasts Inline graphic and Inline graphic using the proposed estimator (employing non-parametric kernel smoothing) and, where it is applicable, the PSN estimator. Point estimates and pointwise and simultaneous 95% Wald-type bootstrap CIs for both Inline graphic estimands are shown in Figure 3. The proposed estimator yields substantially narrower CIs than the PSN estimator in this setting. The test of Inline graphic for all Inline graphic proposed in Section 3 yields p-values Inline graphic and 0.16 for Inline graphic and Inline graphic, respectively, indicating that Inline graphic is a significant modifier of relative VCD risk but not of VCD risk difference. The test of Inline graphic for all Inline graphic, where 57 is the estimated hinge-point titer, yields p-values Inline graphic0.001 for both Inline graphic and Inline graphic, respectively, indicating a significant vaccine effect on both the relative risk and the risk difference in a subgroup of vaccine recipients with Inline graphic near the assay’s LLOQ. This supports that the biomarker Inline graphic does not satisfy the average causal necessity property (Gilbert and Hudgens, 2008) which if true implies that this biomarker does not fully mediate the CYD-TDV vaccine effect on VCD risk (VanderWeele, 2008).

Fig. 3.

Fig. 3.

Point and 95% Wald-type bootstrap CI estimates of the log relative risk (vaccine/placebo) of VCD and the additive VCD risk difference (placebo–vaccine) in the trial-pooled analysis of 9–16 year olds using the proposed estimator with non-parametric kernel smoothing vs. the PSN estimator of Huang (2018).

Because (A3) is evidently violated in the dengue trials, we developed a sensitivity analysis method that relaxes (A3) to the weaker monotonicity assumption of “no early harm by treatment” (A3’), i.e., Inline graphic, which is plausible for the dengue trials. The supplementary material available at Biostatistics online describes the method and applied it to the data, which indicates that violations of (A3) leads to underestimation of the Inline graphic curve for both the log relative risk and risk difference contrast. (A4) states that the neutralizing antibody response to vaccination does not affect placebo-group VCD risk after accounting for the existing natural neutralizing antibody response (due to prior dengue infections) and for measured baseline covariates. It could be violated if the ability to mount a strong neutralizing antibody response to vaccination is correlated with an intrinsic VCD susceptibility factor that is not fully captured by the other variables. The fact that the natural response Inline graphic and the vaccine response Inline graphic measure a similar quantity using the identical laboratory assay may limit the degree of violation of (A4). Furthermore, in placebo recipients with measured Inline graphic, robust linear regression (Yohai, 1987) and locally weighted polynomial regression indicate an association of Inline graphic and Inline graphic close to identity, supporting validity of (A5) (see the supplementary material available at Biostatistics online for details).

6. Discussion

Driven by data constraints of a BSM three-phase sampling design, this article develops semi-parametric inferential procedures for a principal stratification estimand, the mCEP curve, designed to assess modification of a clinical treatment effect by the potential outcome marker response under assignment to treatment. We demonstrate that current alternative approaches may be biased and highly inefficient in this setting. We show that, under a different set of assumptions, we can remove the untestable placebo structural risk modeling assumption common to all past EML and PS methods and employ non-parametric kernel smoothing to estimate the placebo group’s clinical endpoint risk conditional on the potential outcome Inline graphic under treatment assignment. We use a bootstrap procedure to construct pointwise and simultaneous CIs and tests of relevant hypotheses about Inline graphic estimands.

The proposed estimation method particularly offers an alternative to the PS estimation method of Huang and others (2013) and Huang (2018), which estimate the same Inline graphic estimand. The following deliberations shall provide guidance for method selection. First, the proposed method makes three assumptions not made by the PS method: (i) Inline graphic is conditionally independent of Inline graphic given Inline graphic and Inline graphic ((A4) above); (ii) the distribution of Inline graphic conditional on Inline graphic and Inline graphic or Inline graphic remains identical ((A5) above); and (iii) under treatment assignment Inline graphic, the clinical endpoint risk of Inline graphic conditional on the biomarker response Inline graphic and baseline covariates Inline graphic follows a GLM, which replaces the placebo structural risk assumption of the PS method that Inline graphic conditional on Inline graphic and Inline graphic follows a GLM (a part of (A6) above). Since (A4) is not testable without close-out placebo vaccination, if data support validity of (A5) and (A6) in a standard trial with a BSM three-phase sampling design, the present method may be preferable for its potentially substantial improvement in bias, efficiency, and CI coverage. Because (A4) is untestable in a standard trial and could easily be violated, future research is warranted for sensitivity and for relaxing the assumption. Second, the present method enables use of non-parametric kernel smoothing, which more flexibly estimates the Inline graphic curve, albeit involves the issue of bandwidth selection. For this reason, maximum likelihood estimation, as an alternative to kernel smoothing, may be employed with the present method as demonstrated in Section 4. Third, to the best of our knowledge, the present method is the first method providing formal tests of the null hypotheses that the Inline graphic curve equals to a constant and that the Inline graphic curve is identical in two baseline covariate subgroups.

Recognizing the importance of assumption (A3) and a possibly frequent occurrence of its violation, we developed a method of sensitivity analysis that relaxes (A3) to the weaker monotonicity assumption of “no early harm by treatment” (A3’), i.e., Inline graphic, which, in practice, is far more easily justified than (A3) (the supplementary material available at Biostatistics online describes the method).

In using non-parametric smoothing, the newly proposed method is not well suited to studying multivariate response biomarkers as effect modifiers. However, for settings where many or even high-dimensional response biomarkers are measured, the approach may still be useful by first defining a univariate score biomarker as the fitted values of a model selected by statistical/machine learning to provide the best prediction of the clinical outcome (Price and others, 2018) and then to study the score biomarker as an effect modifier.

Supplementary Material

kxy074_Supplementary_Materials

Acknowledgments

We thank the participants and investigators of the CYD14 and CYD15 dengue vaccine efficacy trials, our Sanofi Pasteur colleagues for collaboration and sharing of the data, Ted Holzman (Fred Hutchinson Cancer Research Center) for assistance with computational analysis, and Lindsay N. Carpp (Fred Hutchinson Cancer Research Center) for technical editing. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Conflict of Interest: None declared.

7. Software

All proposed methods are implemented in the R package Inline graphic (Juraska, 2018) available on the Comprehensive R Archive Network. The R code used for computing and plotting results reported in Sections 4 and 5 is available at https://github.com/mjuraska/mCEPcurve-three-phase.

Funding

Sanofi Pasteur and National Institute of Allergy and Infectious Diseases of the National Institutes of Health (Award Number R37AI054165).

References

  1. Capeding M. R., Tran N. H., Hadinegoro S. R. S., Ismail H. I. H. M., Chotpitayasunondh T., Chua M. N., Luong C. Q., Rusmil K., Wirawan D. N., Nallusamy R.. and others (2014). Clinical efficacy and safety of a novel tetravalent dengue vaccine in healthy children in Asia: a phase 3, randomised, observer-masked, placebo-controlled trial. The Lancet 384, 1358–1365. [DOI] [PubMed] [Google Scholar]
  2. Follmann D. (2006). Augmented designs to assess immune response in vaccine trials. Biometrics 62, 1161–1169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Fong Y., Huang Y., Gilbert P. B. and Permar S. R. (2017). chngpt: threshold regression model estimation and inference. BMC Bioinformatics 18(1): 454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Frangakis C. and Rubin D. (2002). Principal stratification in causal inference. Biometrics 58, 21–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Gabriel E. and Gilbert P. (2014). Evaluating principal surrogate endpoints with time-to-event data accounting for time-varying treatment efficacy. Biostatistics 15, 251–265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Gabriel E. E., Sachs M. C. and Gilbert P. B. (2015). Comparing and combining biomarkers as principal surrogates for time-to-event clinical endpoints. Statistics in Medicine 34, 381–395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Gilbert P. B. and Hudgens M. G. (2008). Evaluating candidate principal surrogate endpoints. Biometrics 64, 1146–1154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Hall P., Racine J. and Li Q. (2004). Cross-validation and the estimation of conditional probability densities. Journal of the American Statistical Association 99, 1015–1026. [Google Scholar]
  9. Huang Y. (2018). Evaluating principal surrogate markers in vaccine trials in the presence of multiphase sampling. Biometrics 74, 27–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Huang Y. and Gilbert P. B. (2011). Comparing biomarkers as principal surrogate endpoints. Biometrics 67, 1442–1451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Huang Y., Gilbert P. B. and Wolfson J. (2013). Design and estimation for evaluating principal surrogate markers in vaccine trials. Biometrics 69, 301–309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Juraska M. (2018). pssmooth: Flexible and Efficient Evaluation of Principal Surrogates/Treatment Effect Modifiers. R package version 1.0.1, Comprehensive R Archive Network. [Google Scholar]
  13. Katzelnick L. C., Gresh L., Halloran M. E., Mercado J. C., Kuan G., Gordon A., Balmaseda A. and Harris E. (2017). Antibody-dependent enhancement of severe dengue disease in humans. Science 358, 929–932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Li Y., Taylor J. and Elliott M. (2010). A Bayesian approach to surrogacy assessment using principal stratification in clinical trials. Biometrics 66, 523–531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Moodie Z., Juraska M., Huang Y., Zhuang Y., Fong Y., Carpp L., Self S., Chambonneau L., Small R., Jackson N., Noriega F.. and others (2018). Neutralizing antibody correlates analysis of tetravalent dengue vaccine efficacy trials in Asia and Latin America. Journal of Infectious Diseases 217, 742–753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Prentice R. (1986). A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73, 1–11. [Google Scholar]
  17. Price B., Gilbert P. and van der Laan M. (2018). Estimation of the optimal surrogate based on a randomized trial. Biometrics. doi: 10.1111/biom.12879 [Epub ahead of print]. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Roy S. N. and Bose R. C. (1953). Simultaneous confidence interval estimation. The Annals of Mathematical Statistics 24, 513–536. [Google Scholar]
  19. VanderWeele T. (2008). Simple relations between principal stratification and direct and indirect effects. Statistics and Probability Letters 78, 2957–2962. [Google Scholar]
  20. Villar L., Dayan G. H., Arredondo-García J. L., Rivera D. M., Cunha R., Deseda C., Reynales H., Costa M. S., Morales-Ramírez J. O., Carrasquilla G.. and others (2015). Efficacy of a tetravalent dengue vaccine in children in Latin America. New England Journal of Medicine 372, 113–123. [DOI] [PubMed] [Google Scholar]
  21. Yohai V. (1987). High breakdown-point and high-efficiency robust estimates for regression. Annals of Statistics 15, 642–656. [Google Scholar]
  22. Zigler C. M. and Belin T. R. (2011). The potential for bias in principal causal effect estimation when treatment received depends on a key covariate. Annals of Applied Statistics 5, 1876–1892. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

kxy074_Supplementary_Materials

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES