Summary
An objective in randomized clinical trials is the evaluation of “principal surrogates,” which consists of analyzing how the treatment effect on a clinical endpoint varies over principal strata subgroups defined by an intermediate response outcome under both or one of the treatment assignments. The latter effect modification estimand has been termed the marginal causal effect predictiveness (mCEP) curve. This objective was addressed in two randomized placebo-controlled Phase 3 dengue vaccine trials for an antibody response biomarker whose sampling design rendered previously developed inferential methods highly inefficient due to a three-phase sampling design. In this design, the biomarker was measured in a case-cohort sample and a key baseline auxiliary strongly associated with the biomarker (the “baseline surrogate measure”) was only measured in a further sub-sample. We propose a novel approach to estimation of the mCEP curve in such three-phase sampling designs that avoids the restrictive “placebo structural risk” modeling assumption common to past methods and that further improves robustness by the use of non-parametric kernel smoothing for biomarker density estimation. Additionally, we develop bootstrap-based procedures for pointwise and simultaneous confidence intervals and testing of four relevant hypotheses about the mCEP curve. We investigate the finite-sample properties of the proposed methods and compare them to those of an alternative method making the placebo structural risk assumption. Finally, we apply the novel and alternative procedures to the two dengue vaccine trial data sets.
Keywords: Biomarker, Dengue, Principal stratification, Principal surrogate endpoint, Three-phase sampling design, Treatment effect modification, Vaccine
1. Introduction
Over the past 20 years, a sizable body of work has accumulated on estimation and inference on principal stratification estimands with application to “principal surrogate” evaluation. The goal of principal surrogate evaluation is to study how a clinical treatment effect in a randomized trial varies over principal strata subgroups defined by the potential outcome biomarker response under both treatment assignments (Frangakis and Rubin, 2002) or by the potential outcome biomarker response under one of the treatment assignments (Gilbert and Hudgens, 2008). These effect modification estimands have been called the causal effect predictiveness surface and the marginal causal effect predictiveness (mCEP) curve, respectively. Both estimands are useful, and, in this work, we focus on the mCEP curve estimand because of its greater facilitation for decision-making and its easier identifiability. Gilbert and Hudgens (2008) studied fully parametric and fully non-parametric estimated maximum likelihood estimation (EML) methods for this estimand, Huang and Gilbert (2011) relaxed the fully parametric EML methods to semi-parametric EML methods, and Huang and others (2013) and Huang (2018) replaced EML with pseudo-score (PS) estimation, demonstrating improved efficiency compared to EML and providing analytic variance estimation that was not possible via EML. These methods considered a binary clinical endpoint, and Gabriel and Gilbert (2014) and Gabriel and others (2015) extended the work to accommodate a time-to-event clinical endpoint subject to right-censoring. Moreover, Li and others (2010) and other papers from the same group studied full likelihood Bayesian methods, as did Zigler and Belin (2011).
The present work is motivated by two randomized placebo-controlled Phase 3 trials of a dengue vaccine, where the primary clinical endpoint of interest was symptomatic virologically confirmed dengue (VCD) occurrence between the month 13 visit and the month 25 visit. Overall vaccine efficacy [one minus relative risk (vaccine/placebo) of VCD times 100%] to prevent VCD over this follow-up period was estimated at 56.5% (95% CI 43.8–66.4) in the trial in Asia (Capeding and others, 2014) and at 60.8% (95% CI 52.0–68.0) in the trial in Latin America (Villar and others, 2015). In these trials with harmonized study designs and protocols, the biomarker
of interest to study as a modifier of vaccine efficacy (in participants free of VCD through month 13) was a participant’s average
neutralizing antibody titer against the four dengue virus strains represented in the vaccine construct (one of each serotype), measured from a serum sample taken at the month 13 study visit (Moodie and others, 2018). A particular three-phase case-cohort sampling design was used for measuring
from month 13 samples that rendered the previously developed EML and PS methods highly inefficient for solving the problem, and also opened an opportunity for a new approach studied here that better takes advantage of the data structure. (This work does not consider a Bayesian approach, which could also be fruitful for this setting.) Specifically, baseline serum samples from a random sample of approximately 10% (20%) of all participants in the trial in Asia (Latin America) were collected, whereas month 13 serum samples were collected from all participants in each trial. With covariate
defined the same as
except measured at baseline, both
and
were measured in all participants randomly sampled at entry into the subcohort, and
was measured in all participants experiencing the VCD primary endpoint. Because
and
are the same variable measured at different times, they are highly correlated, making
an ideal “baseline immunogenicity predictor” (Follmann, 2006; Gilbert and Hudgens, 2008) [referred to as a “baseline surrogate measure (BSM)” in Gabriel and Gilbert (2014)]. Such a predictor is a key ingredient of all of the EML and PS methods to yield reasonably precise estimation of the mCEP curve. However, all of the EML and PS methods require that
be measured from all vaccine recipients with
measured, which means that the methods applied to the data would discard data from approximately 90% or 80% of VCD endpoint cases in the vaccine group for the two trials, respectively.
However, with
the potential outcome of
for the vaccine (
) and placebo (
) group, the new opportunity generated by the dengue Phase 3 trials is that
, baseline covariates, and
may plausibly contain all information for VCD risk in the placebo group without needing to also condition on
, because neutralizing antibody titers are the key known predictor of VCD in both vaccinated and unvaccinated individuals [e.g., analysis and references in Katzelnick and others (2017)]. With this novel assumption ((A4) below), this method departs from previous methods. While this assumption can be challenged, it beneficially removes the need to make the “placebo structural risk” modeling assumption common to all past EML and PS methods that links the conditional risk of VCD if assigned placebo to the candidate surrogate if assigned vaccine, a quantity not identifiable from the observed data unless close-out placebo vaccination is used (Follmann, 2006), which was not done in the dengue vaccine trials. The other novel assumption underlying this new approach takes advantage of the fact that the baseline immunogenicity predictor is a BSM, which allows making a time-constancy assumption about the distribution of
conditional on baseline covariates and
or
((A5) below). By estimating the density of
conditional on baseline covariates and
with non-parametric kernel smoothing and not making the placebo structural risk modeling assumption, this new method gives more flexible estimation of the mCEP curve than the previous methods, with advantages in bias, efficiency, and confidence interval (CI) coverage as illustrated in Sections 4 and 5.
The remainder of this article is organized as follows. In Sections 1.1–1.4, we introduce notation, define the estimand of interest, state identifiability assumptions, establish identifiability based on these assumptions, and discuss plausibility of the assumptions and potential for their violations. We describe the estimation method, under a modeling assumption, including the construction of simultaneous CIs, in Section 2 and characterize procedures for testing of hypotheses of interest in Section 3. The design and findings from the simulation experiment are presented in Section 4. We apply the proposed estimation and inference procedures to data from the two dengue vaccine trials in Section 5.
1.1. Notation
We consider a randomized placebo-controlled trial with treatment assignment
(
, treatment;
, placebo), and a discrete or continuous univariate biomarker
measured at fixed time
after randomization. In many vaccine trials, placebo recipients have variable levels of
reflecting pre-existing immunity arising from past exposure (e.g., natural infection and/or prior vaccination) to the disease-causing pathogen (e.g., dengue virus, Plasmodium falciparum, or influenza virus). It is of interest to evaluate
as a modifier of the treatment effect on a binary clinical endpoint
(
, disease;
no disease) measured after
. To this end,
needs to be measured prior to
; therefore, we restrict the analysis to trial participants who are observed to be endpoint-free at
and denote this status as
. If
,
is undefined, and we set
.
We consider a three-phase outcome-dependent case-cohort sampling design as follows:
,
,
, and a baseline covariate vector
are measured in all randomized participants (phase 1). Next, at baseline, Bernoulli sampling of all randomized participants is used to determine a subcohort
, and the biomarker
is measured at time
in the subset of this subcohort with
as well as in all or almost all cases (those with
) with
, regardless of their membership in
(phase 2) [this is a classic case-cohort sampling design (Prentice, 1986)]. Finally, the biomarker at baseline
is measured only in the subcohort
with
(phase 3);
is the BSM in the Gabriel and Gilbert (2014) design. Let
and
indicate, respectively, that
and
are measured in phases 2 and 3. The sampling design is graphically illustrated in Supplementary Figure 1 available at Biostatistics online and may be expressed as
sampled from the same Bernoulli distribution for all participants and then, conditional on
,
sampled according to the sampling probability
![]() |
for a fixed constant
.
Our causal estimand of interest is the mCEP curve that has been previously studied in several papers. To define the mCEP curve, let
,
,
, and
be the potential outcomes of
,
,
, and
under treatment assignment
. If
,
and
is undefined, therefore we set
.
1.2. Estimand of interest
Let
measure the overall causal effect of treatment on the clinical endpoint
, where
is a known contrast function such that
if and only if
. Denote
for
. We define the causal estimand of interest as
![]() |
(3.1) |
this estimand is termed the marginal CEP curve in the literature. If
is continuous (assumed henceforth), then (1.1) abuses notation for expositional simplicity, with the formal definition provided in the supplementary material available at Biostatistics online. The contrast
gives
an interpretation as the percent reduction in clinical endpoint risk among treatment recipients with biomarker level
compared to if they had been assigned placebo, whereas the additive-difference contrast
gives an attributable risk interpretation for treatment recipients with biomarker level
.
1.3. Identifiability assumptions
We suppose that
,
, are independent and identically distributed and assume no drop-out for simplicity. We consider commonly made identifiability assumptions (Gilbert and Hudgens, 2008; Huang and Gilbert, 2011; Huang and others, 2013; Huang, 2018):
(A1) Stable Unit Treatment Value Assumption (SUTVA):
of the
-th subject is independent of
,
.(A2) Ignorable treatment assignment:
is conditionally independent of
given
.(A3) Equal early clinical risk:
.
(A1) implies “consistency,” i.e.,
. SUTVA may be violated in vaccine trials due to herd immunity and other factors, but may be approximately valid if trial participants do not interact with each other and the study sites are in large geographically dispersed regions. (A2) holds by randomization that may be stratified by
. (A3) is more credible when
is near baseline relative to the length of follow-up and it takes time for the effect of the intervention to occur. For simplicity of exposition, henceforth all conditional and unconditional probabilities of
and densities of
implicitly condition on
.
We additionally consider the following identifiability assumptions:
(A4)
, i.e., the risk of
is conditionally independent of
given
.(A5) Time constancy:
for all
, where
and
are conditional density functions of
.
In a standard trial design for biomarker effect modification evaluation,
is unobserved in all placebo recipients, and thus the validity of (A4) cannot be tested in general. (A4) may be violated if the occurrence of the event
is correlated with
within subgroups defined by
, stemming from unmeasured factors, and we discuss its plausibility in the analysis of the dengue vaccine trials in Section 5. (A5) may be plausible as both
and
measure pre-existing natural and/or vaccine-induced immunity, only
is measured
time units later than
. The estimation section includes a technique that relaxes this assumption.
1.4. Establishing identifiability
Each of the two conditional risks
and
in
, defined in (1.1), is identified by the observed data and assumptions (A1)–(A5), as sketched here. Bayes’ theorem and (A4) yield
![]() |
(1.2) |
where
,
is a conditional joint cumulative distribution function of
given
,
a conditional density of
given
,
a conditional density of
given
,
a marginal density/probability function of
, and
a marginal density of
. The decomposition in (1.2) is advantageous because it enables us to identify (and estimate)
by separately identifying each component in (1.2).
First,
is identified from placebo recipients with
and the sampling design, and
, which involves phase 1 covariates only, is identified from all randomized participants. Second, for identifying
, note that, under the sampling design,
![]() |
where the first equality holds because
implies
, the second equality holds because participants with
are a random sample of all randomized participants, and the third equality holds by (A5). Because
is identified from the observed data, the needed term
is identified. Lastly,
in (1.2) is identified from placebo recipients with
and the sampling design. Similar to this last term, the conditional risk
in (1.1) is identified from treatment recipients with
and the sampling design; this conditional risk is straightforward to identify because both
and
are observable from the same treatment recipient.
2. Estimation method
2.1. Modeling assumption
We develop an estimation and inference method under the following modeling assumption:
(A6) The risk of
conditional on
and
follows a generalized linear model (GLM) for
.
(A6) for
has been made in previous papers and constitutes a standard regression problem without identifiability problems. (A6) for
is novel, replacing the untestable “placebo structural risk” assumption from previous papers that
conditional on
and
follows a GLM. The GLMs specified in (A6) are estimated in participants with
using methods for case-cohort designs.
2.2. Estimation of the causal estimand
We propose a plug-in estimator for
by separately estimating
and
. We estimate
by fitting the GLM specified in (A6) accounting for the case-cohort sampling of
. To estimate
, leveraging the identifiability results, we estimate
by an estimate of
, which we obtain via non-parametric kernel smoothing. Because participants with
constitute a random sample from all randomized participants,
is estimated by an estimate of
generated using data from treatment recipients with
.
The above approach to estimation of
assumes (A5), whose veracity may be supported by regression modeling that indicates an association of
and
close to identity among placebo recipients with measured
. Otherwise, (A5) could be violated, and an estimated regression model, e.g.,
, could be employed for calibration to estimate
.
We estimate
by estimating
via non-parametric kernel smoothing after random proportional deletion of a subset of cases among participants with measured
to attain the same case:control ratio as in the target population that is represented by randomized participants observed to be endpoint-free at
(i.e.,
). Another, more powerful, approach would use inverse probability weighting; however, we conjecture that minimal efficiency gain would be achieved in a rare endpoint setting common in vaccine trials. The multivariate density/probability function
is estimated using phase 1 covariate data from all randomized participants. In Section 4, we also consider an alternative parametric estimator for density functions in (1.2) based on maximum likelihood estimation in the Gaussian family of distributions.
Pointwise and simultaneous Wald-type CIs for
are obtained by assessing
in
bootstrap samples, with cases and controls sampled separately yielding a fixed number of cases and controls in each bootstrap sample. We construct the CIs in two steps. First, CIs for the
-transformed estimand (defined below) are constructed, and then the inverse transformation is applied to the confidence bounds.
2.3. Simultaneous confidence interval for the mCEP curve
Let
be a “symmetrizing” transformation of the mCEP curve estimand that helps make Wald-type CIs more accurate in finite samples; e.g., if
, we consider
, or, if
, we may consider the identity transformation or
. For an arbitrary subset
of the support of
, denote
![]() |
where
and
denotes a standard error. For a fixed
, we define
as the solution to the equation
.
Further, let
be the estimate of
based on the
-th bootstrap sample, and
be the sample standard deviation of the bootstrap estimates
,
. Let
. Because the distributions of
and
are asymptotically equivalent, we estimate
by
defined as the empirical quantile in the bootstrap sample
,
, at the probability level
. Finally, the simultaneous Wald-type bootstrap
CI for the
curve is obtained by the
transformation of the bounds
![]() |
It is of note that the CI width depends on the size of
, which may be chosen based on statistical and biological considerations.
3. Hypothesis testing
It is of interest to evaluate, separately, the following four null hypotheses, each against a general alternative hypothesis:
,
and a known constant
,
, where
and
are each associated with either a different biomarker (measured in the same units) or a different endpoint or both, and
, where
is a baseline dichotomous phase 1 covariate of interest included in
.
A test of
assesses
as a modifier of the clinical treatment effect. It is commonly of interest to test
with
representing the absence of a treatment effect, to assess if there exists a subgroup of treatment recipients defined by biomarker levels in
with some treatment effect. A test of
allows comparisons of two biomarkers and/or two endpoints, while a test of
allows baseline subgroup or between-trial comparisons.
We follow Roy and Bose (1953) to construct tests of hypotheses
,
. Let
, where
. For testing
and
at significance level
, we use as the regions of rejection
![]() |
and
![]() |
respectively, where
is an estimator for CE, and
and
are empirical quantiles in bootstrap samples
and
,
, respectively, at the probability level
. We obtain the two-sided p-values as the empirical probabilities that
and
.
For testing
, let
be a contrast pertaining to the transformation
. Let
denote the sample standard deviation of the bootstrap estimates
,
, and let
. We define
as the empirical quantile in the bootstrap sample
,
, at the probability level
. Subsequently, we test
at significance level
by using as the region of rejection
![]() |
and we obtain the two-sided p-value as the empirical probability that
. For testing
, we proceed as for the test of
except, due to independence of the baseline subgroups defined by
, we obtain
as
.
4. Simulation study
Consider the treatment efficacy estimand
defined by the contrast function
. The simulation study aims to evaluate and compare finite-sample performance of the proposed estimator for
with an alternative pseudo-score estimator (PSN) of Huang (2018) and examine size/power of the proposed tests of
,
, and
. Note that the test of
is identical to that of
except that it additionally accounts for correlation of the contrasted estimators for
. Two approaches to probability density estimation in (1.2)—non-parametric kernel smoothing and Gaussian maximum likelihood estimation—are considered throughout the simulation, resulting in two variants of the proposed estimator for
(denoted NP-TE and MLE-TE, respectively). More specifically, the NP-TE estimator employs the generalized product kernel density estimation method of Hall and others (2004) with optimal bandwidths selected by likelihood cross-validation.
The simulation design mimics characteristics of the dengue vaccine trials introduced in Section 1. We generate
as follows. First, we generate i.i.d. vectors from
, where
,
with
for all
, and the correlation matrix
is chosen to emulate relationships between biomarker measurements at baseline and month 13 observed in the dengue trials:
![]() |
Then each component of the generated triplets is left-censored at the value of 1.5, which represents, e.g., the biomarker assay’s lower limit of quantitation.
Furthermore, using conditional independence in (A4), we posit a probit model for the association of the biomarkers
with the endpoint indicator in each treatment group:
![]() |
(4.1) |
where
is the cumulative distribution function of
. Model (4.1) yields
![]() |
(4.2) |
where
is the conditional probability density of
given
. For
, denote the marginal mean risk
and the risk “gradient”
, where
is the quantile of the marginal distribution of
at probability
. The values of the probit model coefficients
are chosen such that
,
,
, and
. We consider
to reflect the assumed positive correlation between
and
. Using (4.2), the resultant
for
, representing the truth, is shown in Figure 1 (solid curve).
Fig. 1.
The true
curves in the simulation design, each satisfying
and
. In addition, the solid curve reflects
and
. The dot-dashed curve, used only to assess power of the test of
, reflects the same
but
. The dashed line, used only to assess size of the tests of
and
, reflects
.
To estimate
, the proposed NP-TE and MLE-TE estimators assume the logit link function in model (4.1). The PSN estimator of Huang (2018) for
models the endpoint risk as a function of
and
and utilizes a discrete baseline covariate for predicting a missing
value. To implement the PSN estimator, we discretize
by quartiles to arrive at
, which is used as the auxiliary variable for predicting
. We model
using the PSN approach and construct the corresponding pointwise Wald-type CI for
using the analytical variance estimator developed by Huang (2018). We examine and compare finite-sample relative bias and mean squared error (MSE) of the proposed and the PSN estimators for
and coverage probabilities of pointwise and simultaneous 95% Wald-type bootstrap CIs for
.
Modifications of the described mechanism are needed for generating data under
,
,
, and under respective alternative hypotheses. To evaluate size of the tests of
and
, we set
in (4.1) and recompute
and
in order to maintain the marginal probabilities
and
. The respective constant
curve is shown in Figure 1 (dashed line). We evaluate power of the tests of
and
for the setting described above. This setting also serves to evaluate size of the test of
by drawing two independent samples, each used for estimating
separately, representing distinct populations. To evaluate power of the test of
, the first sample considers the setting specified above, while the second sample considers the same setting except
. The resultant
curve for this modified setting is also shown in Figure 1 (dot-dashed curve).
We consider a three-phase case-cohort sampling design. In phase 1, 5000 subjects are randomized at a 1:1 ratio to receive treatment (
) or placebo (
) and followed for the binary outcome
. Although not required, for simplicity, we assume that all endpoints occur after time
at which
is measured (i.e.,
for all subjects in the absence of drop-out). In phase 2, we measure
in a Bernoulli sample
, drawn at baseline with sampling probability
, and in all cases (
) whether or not they were in this subcohort. In phase 3, we measure
in subcohort
only, i.e.,
is missing in cases not included in
and the proportion of such cases varies with
. We evaluate the performance of the estimation and inferential procedures for
as a function of
, setting
, 0.25, and 0.5 (note that
was set to 0.1 and 0.2 in the two dengue trials). The results are based on
replicated data sets with 500 bootstrap samples drawn in each data set.
For all values of
, the NP-TE and MLE-TE estimators exhibit minimal bias for
, with an increase in bias in the left-censored tail, whereas the PSN estimator is heavily biased in both tails for
, and its bias becomes comparable to that of the NP-TE and MLE-TE estimators as
increases to 0.5 (Figure 2, top row). Additionally, for
, the NP-TE and MLE-TE estimators substantially reduce the MSE in both tails compared with the PSN estimator, with comparable MSE across all three estimators achieved for
(Figure 2, bottom row). Coverage probabilities of pointwise 95% Wald-type bootstrap CIs induced by the NP-TE and MLE-TE estimators are within the Monte-Carlo (MC) error band except with slight undercoverage in the central region by the MLE-TE estimator using
(see Section 3 in the supplementary material available at Biostatistics online). In contrast, pointwise 95% Wald-type CIs induced by the PSN estimator uniformly overcover for
, with coverage probabilities within the MC error band attained for
and 0.5. The simultaneous 95% Wald-type bootstrap CIs for
induced by both the NP-TE and MLE-TE estimators exhibit adequate coverage for all values of
(Table 1). Overall, the NP-TE and MLE-TE estimators perform comparably, with considerable precision gain and improved CI coverage over the PSN estimator for small values of
.
Fig. 2.
Top row: Estimated relative bias of the proposed NP-TE and MLE-TE estimators for
, compared with that of the PSN estimator of Huang (2018), as a function of the probability
of sampling into the phase 2/3 subcohort
. Bottom row: Estimated MSE of the proposed NP-TE and MLE-TE estimators for
, compared with that of the PSN estimator of Huang (2018), as a function of probability
of sampling into the phase 2/3 subcohort
.
Table 1.
Estimated coverage probabilities of the simultaneous 95% Wald-type bootstrap CI for
based on the NP-TE and MLE-TE estimators, where
spans the observed values of
, as a function of the probability
of sampling into the phase 2/3 subcohort 
|
NP-TE | MLE-TE |
|---|---|---|
| 0.1 | 0.959 | 0.943 |
| 0.25 | 0.956 | 0.944 |
| 0.5 | 0.959 | 0.954 |
Table 2 summarizes observed sizes and powers of the tests of
,
, and
described in Section 3, based on both the NP-TE and MLE-TE estimators. For all values of
, sizes of the tests of
and
are in good agreement with the nominal significance level, whereas the test of
is markedly conservative. Nevertheless, for each
, the power of the test of
is only slightly smaller than that of the test of
suggesting that the former test is also useful in practice. Powers of the test of
are small for the given comparison (see Figure 1) indicating that larger contrasts are needed to be detected with sufficient power. Finally, the tests based on the MLE-TE estimator are slightly more powerful than those based on the NP-TE estimator.
Table 2.
Size and power of the two-sided tests, based on the NP-TE and MLE-TE estimators, of
,
with
, and
against general alternative hypotheses, as a function of the probability
of sampling into the phase 2/3 subcohort
. The nominal significance level is taken to be 0.05
Test of
|
Test of
|
Test of
|
||||||
|---|---|---|---|---|---|---|---|---|
|
Size | Power | Size | Power | Size | Power | ||
| NP-TE | ||||||||
| 0.1 | 0.01 | 0.73 | 0.04 | 0.83 | 0.04 | 0.12 | ||
| 0.25 | 0.01 | 0.84 | 0.05 | 0.89 | 0.04 | 0.15 | ||
| 0.5 | 0.01 | 0.89 | 0.05 | 0.93 | 0.04 | 0.18 | ||
| MLE-TE | ||||||||
| 0.1 | 0.01 | 0.87 | 0.06 | 0.92 | 0.05 | 0.17 | ||
| 0.25 | 0.01 | 0.91 | 0.05 | 0.95 | 0.05 | 0.20 | ||
| 0.5 | 0.01 | 0.92 | 0.06 | 0.96 | 0.05 | 0.20 | ||
5. Application
The two harmonized Phase 3 dengue trials, introduced in Section 1, randomized 31 144 healthy children (aged 2–14 and 9–16 years in Asia and Latin America, respectively) at a 2:1 ratio to receive either three doses of Sanofi Pasteur’s recombinant, live, attenuated, tetravalent dengue vaccine (Dengvaxia/CYD-TDV), or placebo at months 0, 6, and 12 after randomization (Capeding and others, 2014; Villar and others, 2015). Participants were followed with active surveillance for 25 months for the primary clinical endpoint of VCD. The current indication of CYD-TDV requires the minimum age of 9 years, and therefore we choose to present herein a trial-pooled analysis in the 25 824 children aged
years.
We aim to study modification of the effect of CYD-TDV vs. placebo on VCD risk by a biomarker
under assignment to CYD-TDV that is measured by the PRNT
assay from a serum sample collected at the month 13 visit;
is the average of log
neutralizing antibody titers to the four parental dengue strains of the vaccine constructs, one for each dengue serotype (Moodie and others, 2018). The following three-phase case-cohort sampling design was employed: the phase 1 covariate vector
—age category (
11 vs.
11 years) and country—was observed in all participants. Baseline serum samples for measuring
were collected from a Bernoulli sample
of all participants, whereas month 13 serum samples were collected from all participants. Subsequently,
was measured at month 13 in subcohort
and in all VCD cases (phase 2), whereas the biomarker’s baseline value,
, was only measured in subcohort
(phase 3). Because
is measured at month 13, we restrict the analysis to the 24 768 children aged
9 years who were at risk and free of VCD at month 13; this cohort includes 503 VCD cases. Of the 2766 controls with
measured, all but 7 also had
measured. In contrast, of the 502 cases with
measured, only 55 (11.0%) also had
measured.
We consider two
estimands of interest defined by contrasts
and
in (1.1), with identity and
as their respective
transformations (for the latter estimand, one could alternatively also use the identity transformation; simulations showed similar coverage of CIs under the two transformations). We employ the proposed estimator for
, with probability densities estimated by the generalized kernel method of Hall and others (2004). We model the conditional VCD risk in (A6) by specifying inverse probability-weighted logistic regression models. In addition, a hinge model (Fong and others, 2017) is used for modeling the effect of
because a likelihood ratio test supports a better fit (
) and the number of VCD cases is sufficient to support this more flexible model. Moreover, a hinge model is used because it specifies that variability in
near the lower limit of quantitation (LLOQ) does not associate with VCD risk, which is a desirable (biologically credible) model feature given that such values largely reflect PRNT
technical measurement error.
For the PSN estimator of Huang (2018), we model
using a probit model adjusted for the main effects of
, and the main effects and interaction of
and
with a hinge. The marginal risk conditional on
is then estimated by integrating
over the distribution of
, which is estimated through a multinomial model of
on natural cubic spline basis of
with nodes at its quantiles at probabilities 0.25, 0.5, and 0.75.
In the cohort of participants at risk at month 13, logistic regression adjusted for treatment assignment as the sole covariate yields an overall estimated VCD risk of 0.013 and 0.035 in vaccine and placebo recipients, respectively, resulting in an overall log relative risk estimate of
and an overall additive risk difference estimate of 0.022. We report and compare inference about the
estimands defined by contrasts
and
using the proposed estimator (employing non-parametric kernel smoothing) and, where it is applicable, the PSN estimator. Point estimates and pointwise and simultaneous 95% Wald-type bootstrap CIs for both
estimands are shown in Figure 3. The proposed estimator yields substantially narrower CIs than the PSN estimator in this setting. The test of
for all
proposed in Section 3 yields p-values
and 0.16 for
and
, respectively, indicating that
is a significant modifier of relative VCD risk but not of VCD risk difference. The test of
for all
, where 57 is the estimated hinge-point titer, yields p-values
0.001 for both
and
, respectively, indicating a significant vaccine effect on both the relative risk and the risk difference in a subgroup of vaccine recipients with
near the assay’s LLOQ. This supports that the biomarker
does not satisfy the average causal necessity property (Gilbert and Hudgens, 2008) which if true implies that this biomarker does not fully mediate the CYD-TDV vaccine effect on VCD risk (VanderWeele, 2008).
Fig. 3.
Point and 95% Wald-type bootstrap CI estimates of the log relative risk (vaccine/placebo) of VCD and the additive VCD risk difference (placebo–vaccine) in the trial-pooled analysis of 9–16 year olds using the proposed estimator with non-parametric kernel smoothing vs. the PSN estimator of Huang (2018).
Because (A3) is evidently violated in the dengue trials, we developed a sensitivity analysis method that relaxes (A3) to the weaker monotonicity assumption of “no early harm by treatment” (A3’), i.e.,
, which is plausible for the dengue trials. The supplementary material available at Biostatistics online describes the method and applied it to the data, which indicates that violations of (A3) leads to underestimation of the
curve for both the log relative risk and risk difference contrast. (A4) states that the neutralizing antibody response to vaccination does not affect placebo-group VCD risk after accounting for the existing natural neutralizing antibody response (due to prior dengue infections) and for measured baseline covariates. It could be violated if the ability to mount a strong neutralizing antibody response to vaccination is correlated with an intrinsic VCD susceptibility factor that is not fully captured by the other variables. The fact that the natural response
and the vaccine response
measure a similar quantity using the identical laboratory assay may limit the degree of violation of (A4). Furthermore, in placebo recipients with measured
, robust linear regression (Yohai, 1987) and locally weighted polynomial regression indicate an association of
and
close to identity, supporting validity of (A5) (see the supplementary material available at Biostatistics online for details).
6. Discussion
Driven by data constraints of a BSM three-phase sampling design, this article develops semi-parametric inferential procedures for a principal stratification estimand, the mCEP curve, designed to assess modification of a clinical treatment effect by the potential outcome marker response under assignment to treatment. We demonstrate that current alternative approaches may be biased and highly inefficient in this setting. We show that, under a different set of assumptions, we can remove the untestable placebo structural risk modeling assumption common to all past EML and PS methods and employ non-parametric kernel smoothing to estimate the placebo group’s clinical endpoint risk conditional on the potential outcome
under treatment assignment. We use a bootstrap procedure to construct pointwise and simultaneous CIs and tests of relevant hypotheses about
estimands.
The proposed estimation method particularly offers an alternative to the PS estimation method of Huang and others (2013) and Huang (2018), which estimate the same
estimand. The following deliberations shall provide guidance for method selection. First, the proposed method makes three assumptions not made by the PS method: (i)
is conditionally independent of
given
and
((A4) above); (ii) the distribution of
conditional on
and
or
remains identical ((A5) above); and (iii) under treatment assignment
, the clinical endpoint risk of
conditional on the biomarker response
and baseline covariates
follows a GLM, which replaces the placebo structural risk assumption of the PS method that
conditional on
and
follows a GLM (a part of (A6) above). Since (A4) is not testable without close-out placebo vaccination, if data support validity of (A5) and (A6) in a standard trial with a BSM three-phase sampling design, the present method may be preferable for its potentially substantial improvement in bias, efficiency, and CI coverage. Because (A4) is untestable in a standard trial and could easily be violated, future research is warranted for sensitivity and for relaxing the assumption. Second, the present method enables use of non-parametric kernel smoothing, which more flexibly estimates the
curve, albeit involves the issue of bandwidth selection. For this reason, maximum likelihood estimation, as an alternative to kernel smoothing, may be employed with the present method as demonstrated in Section 4. Third, to the best of our knowledge, the present method is the first method providing formal tests of the null hypotheses that the
curve equals to a constant and that the
curve is identical in two baseline covariate subgroups.
Recognizing the importance of assumption (A3) and a possibly frequent occurrence of its violation, we developed a method of sensitivity analysis that relaxes (A3) to the weaker monotonicity assumption of “no early harm by treatment” (A3’), i.e.,
, which, in practice, is far more easily justified than (A3) (the supplementary material available at Biostatistics online describes the method).
In using non-parametric smoothing, the newly proposed method is not well suited to studying multivariate response biomarkers as effect modifiers. However, for settings where many or even high-dimensional response biomarkers are measured, the approach may still be useful by first defining a univariate score biomarker as the fitted values of a model selected by statistical/machine learning to provide the best prediction of the clinical outcome (Price and others, 2018) and then to study the score biomarker as an effect modifier.
Supplementary Material
Acknowledgments
We thank the participants and investigators of the CYD14 and CYD15 dengue vaccine efficacy trials, our Sanofi Pasteur colleagues for collaboration and sharing of the data, Ted Holzman (Fred Hutchinson Cancer Research Center) for assistance with computational analysis, and Lindsay N. Carpp (Fred Hutchinson Cancer Research Center) for technical editing. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Conflict of Interest: None declared.
7. Software
All proposed methods are implemented in the R package
(Juraska, 2018) available on the Comprehensive R Archive Network. The R code used for computing and plotting results reported in Sections 4 and 5 is available at https://github.com/mjuraska/mCEPcurve-three-phase.
Funding
Sanofi Pasteur and National Institute of Allergy and Infectious Diseases of the National Institutes of Health (Award Number R37AI054165).
References
- Capeding M. R., Tran N. H., Hadinegoro S. R. S., Ismail H. I. H. M., Chotpitayasunondh T., Chua M. N., Luong C. Q., Rusmil K., Wirawan D. N., Nallusamy R.. and others (2014). Clinical efficacy and safety of a novel tetravalent dengue vaccine in healthy children in Asia: a phase 3, randomised, observer-masked, placebo-controlled trial. The Lancet 384, 1358–1365. [DOI] [PubMed] [Google Scholar]
- Follmann D. (2006). Augmented designs to assess immune response in vaccine trials. Biometrics 62, 1161–1169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fong Y., Huang Y., Gilbert P. B. and Permar S. R. (2017). chngpt: threshold regression model estimation and inference. BMC Bioinformatics 18(1): 454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frangakis C. and Rubin D. (2002). Principal stratification in causal inference. Biometrics 58, 21–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gabriel E. and Gilbert P. (2014). Evaluating principal surrogate endpoints with time-to-event data accounting for time-varying treatment efficacy. Biostatistics 15, 251–265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gabriel E. E., Sachs M. C. and Gilbert P. B. (2015). Comparing and combining biomarkers as principal surrogates for time-to-event clinical endpoints. Statistics in Medicine 34, 381–395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gilbert P. B. and Hudgens M. G. (2008). Evaluating candidate principal surrogate endpoints. Biometrics 64, 1146–1154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hall P., Racine J. and Li Q. (2004). Cross-validation and the estimation of conditional probability densities. Journal of the American Statistical Association 99, 1015–1026. [Google Scholar]
- Huang Y. (2018). Evaluating principal surrogate markers in vaccine trials in the presence of multiphase sampling. Biometrics 74, 27–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang Y. and Gilbert P. B. (2011). Comparing biomarkers as principal surrogate endpoints. Biometrics 67, 1442–1451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang Y., Gilbert P. B. and Wolfson J. (2013). Design and estimation for evaluating principal surrogate markers in vaccine trials. Biometrics 69, 301–309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Juraska M. (2018). pssmooth: Flexible and Efficient Evaluation of Principal Surrogates/Treatment Effect Modifiers. R package version 1.0.1, Comprehensive R Archive Network. [Google Scholar]
- Katzelnick L. C., Gresh L., Halloran M. E., Mercado J. C., Kuan G., Gordon A., Balmaseda A. and Harris E. (2017). Antibody-dependent enhancement of severe dengue disease in humans. Science 358, 929–932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Y., Taylor J. and Elliott M. (2010). A Bayesian approach to surrogacy assessment using principal stratification in clinical trials. Biometrics 66, 523–531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moodie Z., Juraska M., Huang Y., Zhuang Y., Fong Y., Carpp L., Self S., Chambonneau L., Small R., Jackson N., Noriega F.. and others (2018). Neutralizing antibody correlates analysis of tetravalent dengue vaccine efficacy trials in Asia and Latin America. Journal of Infectious Diseases 217, 742–753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prentice R. (1986). A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73, 1–11. [Google Scholar]
- Price B., Gilbert P. and van der Laan M. (2018). Estimation of the optimal surrogate based on a randomized trial. Biometrics. doi: 10.1111/biom.12879 [Epub ahead of print]. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roy S. N. and Bose R. C. (1953). Simultaneous confidence interval estimation. The Annals of Mathematical Statistics 24, 513–536. [Google Scholar]
- VanderWeele T. (2008). Simple relations between principal stratification and direct and indirect effects. Statistics and Probability Letters 78, 2957–2962. [Google Scholar]
- Villar L., Dayan G. H., Arredondo-García J. L., Rivera D. M., Cunha R., Deseda C., Reynales H., Costa M. S., Morales-Ramírez J. O., Carrasquilla G.. and others (2015). Efficacy of a tetravalent dengue vaccine in children in Latin America. New England Journal of Medicine 372, 113–123. [DOI] [PubMed] [Google Scholar]
- Yohai V. (1987). High breakdown-point and high-efficiency robust estimates for regression. Annals of Statistics 15, 642–656. [Google Scholar]
- Zigler C. M. and Belin T. R. (2011). The potential for bias in principal causal effect estimation when treatment received depends on a key covariate. Annals of Applied Statistics 5, 1876–1892. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




















