Adjusting for Misclassification in a Stratified Biomarker Clinical Trial

Chunling Liu; Aiyi Liu; Jiang Hu; Vivian Yuan; Susan Halabi

doi:10.1002/sim.6164

. Author manuscript; available in PMC: 2014 Aug 15.

Published in final edited form as: Stat Med. 2014 Apr 14;33(18):3100–3113. doi: 10.1002/sim.6164

Adjusting for Misclassification in a Stratified Biomarker Clinical Trial ^†

Chunling Liu ¹, Aiyi Liu ^2,^*,^†, Jiang Hu ³, Vivian Yuan ⁴, Susan Halabi ⁵

PMCID: PMC4107031 NIHMSID: NIHMS578974 PMID: 24733510

Abstract

Clinical trials utilizing predictive biomarkers have become a research focus in personalized medicine. We investigate the effects of biomarker misclassification on the design and analysis of stratified biomarker clinical trials. For a variety of inference problems including marker-treatment interaction in particular, we show that marker misclassification may have profound adverse effects on the coverage of confidence intervals, power of the tests, and required sample sizes. For each inferential problem we propose methods to adjust for the classification errors.

Keywords: Biomarkers, classification error, correction for error, personalized medicine, power and sample size, prevalence, randomized controlled clinical trials, sensitivity and specificity

1. Introduction

Advances in understanding the genetics and biology of certain cancers have led to the successful development of novel therapies that target specific pathways. A convincing example is given in [1] that reported a statistically significant overall hazard ratio estimate from a randomized clinical trial in which women with ovarian cancer were treated with either pegylated liposomal doxorubicin or topotecan. The authors further reported that among patients with platinum-sensitive disease, a more significant hazard ratio was found. However, among patients with platinum-refractory disease, the hazard ratio was not significant. The study results showed an evident interaction between treatment (pegylated liposomal doxorubicin or topotecan) and a biomarker (platinum).

When biomarker-treatment interaction is the primary research interest in a clinical trial, the stratified biomarker design is commonly used due to its fully taking the advantage of randomization and its ability to address various questions of interest; see, among others, [2–7]. In renal cell carcinoma, novel therapies that target the vascular endothelial growth factor (VEGF) and mammalian target of rapamycin (mTOR) pathways have been identified and are being used as treatment option for patients [8–9].

As another example, data from several centers have shown that retinoblastoma function may help differentiate if the androgen signaling pathway is viable. The loss of retinoblastoma status plays critical role in cell regulation and it suppresses androgen receptor expression and activity. It is estimated that 30% – 40% of prostate cancers will be androgen positive [10–12]. Investigators are interested in whether patients with advanced prostate cancer respond to treatment differently according to their retinoblastoma status.

Predictive markers for response have been shown to be important in patients with advanced renal cancer carcinoma. Furthermore, it has been reported that inhibition of the VEGF pathway prolong clinical outcomes, such as objective response, progression-free survival and overall survival. A statistically significant interleukin 6 (IL-6) by treatment interaction in predicting progression-free survival (PFS) was observed in patients with metastatic renal cell carcinoma (p-value=0.009) [13]. In patients with high IL-6, the median PFS was 33 weeks and 10 weeks in patients treated with pazopanib and placebo, respectively. On the other hand, the median PFS was 42 weeks and 24 weeks in low IL-6 patients treated with pazopanib and placebo, respectively [13].

We consider a two-arm trial (treatment versus standard) with T being the treatment indicator, where T = 1 if treatment and T = 0 if standard. We confine attention to a dichotomous predictive biomarker whose status is denoted by G (=1 if positive and =0 if negative). The prevalence of the biomarker is denoted by ξ_G = Pr (G = 1). Then in a stratified biomarker design, patients with the same biomarker status are randomized into treatment arm or standard arm, as shown in the following figure:

True marker status \Rightarrow {\begin{cases} positive (G = 1) \Rightarrow & {\begin{cases} treatment (T = 1), \\ standard (T = 0); \end{cases} \\ negative (G = 0) \Rightarrow & {\begin{cases} treatment (T = 1), \\ standard (T = 0) . \end{cases} \end{cases}

The primary interest of a stratified biomarker design is to investigate the marker-treatment interaction on a clinical endpoint, denoted by Y. Other questions that can be answered from the trial employing such a design include whether the treatments are different within the same marker status, or whether the clinical outcomes within the same treatment are different between marker status. These questions all involve inference on some function of the marker-by-treatment means of the clinical outcomes:

E (Y ∣ G = g, T = t) = μ_{g t}, VAR (Y ∣ G = g, T = t) = σ_{g t}^{2} .

Define δ_g = μ_g₁ − μ_g₀ to be the mean outcome difference between treatments in the population with marker status G = g, and Δ_t = μ₁_t − μ₀_t to be the mean outcome difference between positive and negative marker status in the same treatment, as a measure of the marker effects in treatment arm T = t. We are interested in testing separately or simultaneously the null hypothesis H₀ : δ_g = 0, (g = 0, 1), or $H_{0}^{'} : Δ_{t} = 0$ , (t = 0, 1). The null hypothesis of no marker by treatment interaction is then $H_{0}^{″} : γ = 0$ , where

γ = δ_{1} - δ_{0} = Δ_{1} - Δ_{0} .

Because the independence between test statistics, the simultaneous null hypotheses can be tested by separately testing each individual null hypothesis with adequate allocation of the overall type I error rate, as demonstrated below for testing H₀.

Let W_g be a standardized test statistic for testing H₀_g : δ_g = 0. The null hypothesis H₀ = H₀₀∩H₀₁ is rejected if |W_g| > c_g, for g = 0 or 1, where c_g are properly chosen critical values.

Assuming that W₀ and W₁ are independent, which is the case in most trial settings, the power of the test is given by

ω (δ_{0}, δ_{1}) = Pr (∣ W_{0} ∣ > c_{0} or ∣ W_{1} ∣ > c_{1}) = ω_{0} (δ_{0}) + ω_{1} (δ_{1}) - ω_{0} (δ_{0}) ω_{1} (δ_{1})

where ω_g(δ_g) = Pr (|W_g| > c_g) is the power of the test that rejects H₀_g if |W_g| > c_g. The type I error rate is thus given by ω(0, 0) = ω₀(0) + ω₁(0) − ω₀(0)ω₁(0) where ω_g(0) is the type I error rate for testing H₀_g.

With significance level α to test the null hypothesis H₀ and power 1 − β to detect marker-specific treatment differences δ₀ and δ₁, one can allocate adequately the type I error rate and the power to test separately the two null hypotheses, H₀₀ and H₀₁. Suppose the allocation for H₀_g is α_g for type I error and 1 − β_g for power at δ_g, then these allocated errors must satisfy α = α₀ + α₁ − α₀α₁, and β = β₀β₁. In practice one can assign smaller error rates to the more important hypotheses, e.g. H₀₁ that concerns the treatment difference in the marker-positive group. With equal allocation of type I error rates and power, we have α₀ = α₁ = 1 − (1 − α)^1/2 and β₀ = β₁ = β^1/2. The null hypotheses $H_{0}^{'} : Δ_{t} = 0$ : Δ_t = 0, (t = 0, 1) can be dealt with similarly.

In the present article, we investigate, both analytically and numerically, the adverse effects of biomarker classification errors on the design of a stratified biomarker clinical trial. For a variety of inference problems including marker-treatment interaction, we show that marker misclassification may have profound adverse effects on the coverage of confidence intervals, power of the tests, and required sample sizes. For each inference problem we propose methods to adjust for the classification errors. Sample size calculations adjusting for misclassification are presented in particular for testing marker-treatment interactions.

The paper is organized as follows. In Section 2, we present notations and preliminary results concerning the design of a stratified biomarker trial in the presence of marker misclassification. We then discuss the effects of misclassification on estimating treatment means in each marker stratum, and present a method to correct for misclassification in Section 3. We investigate the effects of misclassification on estimating treatment differences in each marker stratum in Section 4, followed by a method to correct for misclassification. We evaluate the effects of misclassification on marker differences in each treatment arm in Section 5, with a method to correct for marker misclassification. In Section 6, we address the marker-treatment interaction, starting with the investigation of the effects on power and sample size of misclassification, followed by a method to correct for misclassification and an approach to compute sample sizes to warrant adequate power to detect potential interaction. We then present an example and then discuss the findings in Section 7.

2. The Design in Presence of Misclassification

We assume that a gold standard exists to determine the true status G of the biomarker, with G = 1 being positive and 0 if otherwise. Due to reasons such as cost, ethics or administration, an imperfect assay is used, resulting in classification errors in determining the biomarker status. This is common in assaying a diagnostic biomarker; see, among others, [14–16]. Wang et al. [16] demonstrated that misclassification can inflate type I error rates in a noninferiority trial with binary outcomes.

Let M be the observed status of G, with sensitivity π₁ = Pr (M = 1 | G = 1) and specificity π₀ = Pr (M = 0 | G = 0). For the biomarker to be practically useful, we assume that 1/2 < π₀, π₁ ≤ 1. It thus follows that the probability that the observed status of the marker is positive for a patient is

ξ_{M} = π_{1} ξ_{G} + (1 - π_{0}) (1 - ξ_{G}) .

(1)

We refer to ξ_M as the observed prevalence which is bounded by 1 − π₀ and π₁ because 0 ≤ ξ_G ≤ 1, and π₀ + π₁ > 0.

The actual stratified design is carried out according to the figure with the observed marker status M replacing the true status G.

Suppose that a total of N patients are enrolled into the trial. Let Y_i be the observed clinical outcome of the ith (i = 1, …, N) patient with observed marker status M_i(= 0, 1), in treatment arm T_i(= 0, 1).

Let N₁ be the number of patients with observed marker status being positive. Note that N₁ is a random variable following a binomial distribution with size N and success probability ξ_M; thus E(N₁) = Nξ_M. Write N₀ = N − N₁, the number of patients with observed marker status being negative. Let N_mt = λ_mtN_m the number of patients in the subgroup with M = m and T = t, where the allocation proportions λ_mt ∈ [0, 1] are usually pre-specified, and λ_m₁ + λ_m₀ = 1. The allocation ratio of treatment to standard in the M = m group is then λ_m₁/λ_m₀. Equal allocation between treatments in the M = m group corresponds to λ_mt = 1/2. The targeted biomarker-strategy designs correspond to an extreme allocation with λ₀₁ = 0; see, e.g., [4] and [16].

To simplify the notations, we assume that all the tests have significance level α and the confidence intervals have confidence level 1 − α. We will refer as “naive” procedures to those with no adjustment for classification errors, and as “error-adjusted” procedures to those that adjust for misclassification errors. Wherever there is no ambiguity, we will omit these distinctions.

The naive estimators of μ_gt and $σ_{g t}^{2}$ are given by

{\hat{μ}}_{g t} = \frac{1}{N_{g t}} \sum_{{i : M_{i} = g, T_{i} = t}} Y_{i}, {\hat{σ}}_{g t}^{2} = \frac{1}{N_{g t} - 1} \sum_{{i : M_{i} = g, T_{i} = t}} {(Y_{i} - {\hat{μ}}_{g t})}^{2} .

The naive confidence limits of μ_gt are calculated as

{\hat{μ}}_{g t} \pm {\hat{σ}}_{g t} Z_{α / 2} / \sqrt{N_{g t}},

(2)

where throughout Z_r is the rth upper quantile of the standard normal distribution, that is, Φ(Z_r) = 1 − r, where Φ denotes the standard normal distribution function.

The naive testing procedure rejects the null hypothesis H₀_g if

∣ {\hat{δ}}_{g} / s_{g} ∣ > Z_{α / 2},

(3)

where

{\hat{δ}}_{g} = {\hat{μ}}_{g 1} - {\hat{μ}}_{g 0}, s_{g}^{2} = \frac{1}{N_{g 1}} {\hat{σ}}_{g 1}^{2} + \frac{1}{N_{g 0}} {\hat{σ}}_{g 0}^{2} .

(4)

Similarly the null hypothesis H̃₀_t : Δ_t = 0 is rejected if

∣ {\hat{Δ}}_{t} / {\tilde{s}}_{t} ∣ > Z_{α / 2},

where

{\hat{Δ}}_{t} = {\hat{μ}}_{1 t} - {\hat{μ}}_{0 t}, {\tilde{s}}_{t}^{2} = \frac{1}{N_{1 t}} {\hat{σ}}_{1 t}^{2} + \frac{1}{N_{0 t}} {\hat{σ}}_{0 t}^{2} .

If there are no classification errors, then the aforementioned estimates are unbiased, and, if N is large enough, the tests have significance level α and the confidence intervals have coverage probability 1 − α. In the presence of misclassification, however, these claims need to be carefully examined and corrections need to be made to account for classification error whenever necessary.

Throughout, unless stated otherwise, distributions and their characteristics of estimators are unconditional, taking the randomness of the observed sample sizes N_m (m = 0, 1) into account. Such an unconditional approach will allow us to investigate the effects of the marker’s prevalence ξ_G as well. Conditional inference given N_m can be obtained in the derivation by replacing N with N₁/ξ_M, where ξ_M is given in (18). To adjust for classification errors, we assume that the marker’s prevalence ξ_G, sensitivity π₁, and specificity π₀ are known; this implies that the marker’s positive and negative predictive values are also known, because of the well-know relationships:

τ_{1} = π_{1} ξ_{G} / ξ_{M}, τ_{0} = π_{0} (1 - ξ_{G}) / (1 - ξ_{M}) .

(5)

3. Estimating Stratum-Specific Treatment Means μ_gt

3.1. Effects of Misclassification

If the true marker status of the ith patient is G_i, then by the conditional expectations arguments we have

ζ_{m t} = E (Y_{i} ∣ M_{i} = m, T_{i} = t) = \sum_{g = 0}^{1} μ_{g t} Pr (G_{i} = g ∣ M_{i} = m),

noting that the treatments play no role in determining the marker’s status.

This leads to

ζ_{1 t} = τ_{1} μ_{1 t} + (1 - τ_{1}) μ_{0 t}, ζ_{0 t} = τ_{0} μ_{0 t} + (1 - τ_{0}) μ_{1 t},

where τ₁ = Pr (G_i = 1 | M_i = 1) and τ₀ = Pr (G_i = 0 | M_i = 0) are the marker’s positive predictive value and negative predictive value, respectively. Similarly we have

ν_{m t}^{2} = VAR (Y_{i} ∣ M_{i} = m, T_{i} = t) = \sum_{g = 0}^{1} (μ_{g t}^{2} + σ_{g t}^{2}) Pr (G_{i} = g ∣ M_{i} = m) - ζ_{m t}^{2},

and thus

ν_{1 t}^{2} = τ_{1} (μ_{1 t}^{2} + σ_{1 t}^{2}) + (1 - τ_{1}) (μ_{0 t}^{2} + σ_{0 t}^{2}) - ζ_{1 t}^{2},

(6)

ν_{0 t}^{2} = τ_{0} (μ_{0 t}^{2} + σ_{0 t}^{2}) + (1 - τ_{0}) (μ_{1 t}^{2} + σ_{1 t}^{2}) - ζ_{0 t}^{2} .

(7)

Taking the marker classification errors into account, we have E(μ̂_gt) = ζ_gt and $E ({\hat{σ}}_{g t}^{2}) = ν_{g t}^{2}$ . The (unconditional) variances of the mean estimates are given by

VAR ({\hat{μ}}_{1 t}) = E {VAR ({\hat{μ}}_{1 t} ∣ N_{1 t})} + VAR {E ({\hat{μ}}_{1 t} ∣ N_{1 t})} = E (\frac{ν_{1 t}^{2}}{N_{1 t}}) \approx \frac{ν_{1 t}^{2}}{λ_{1 t} ξ_{M} N}

(8)

and

VAR ({\hat{μ}}_{0 t}) = E (\frac{ν_{0 t}^{2}}{N_{0 t}}) \approx \frac{ν_{0 t}^{2}}{λ_{0 t} (1 - ξ_{M}) N},

(9)

noting that N₁/N is a consistent estimator of ξ_M.

Therefore, in the presence of misclassification, the naive estimators, μ̂_gt and ${\hat{σ}}_{g t}^{2}$ , are no longer unbiased for the corresponding parameters (i.e., μ_gt and σ_gt) they estimate. The bias of the mean estimates is given by, respectively

E ({\hat{μ}}_{1 t}) - μ_{1 t} = - (1 - τ_{1}) Δ_{t}, E ({\hat{μ}}_{0 t}) - μ_{0 t} = (1 - τ_{0}) Δ_{t} .

(10)

If we assume that, in the same treatment group, larger clinical outcomes are more likely to occur in patients with positive marker status, then the treatment mean will be underestimated for marker positive patients, but overestimated for the marker negative patients.

For large sample, μ̂₁_t and μ̂₀_t are asymptotically normally distributed with ${N_{g t}}^{1 / 2} ({\hat{μ}}_{g t} - ζ_{g t}) ~ N (0, ν_{g t}^{2})$ , where, throughout, “~” reads as “is distributed as”. Then the coverage probability of the naive confidence interval of μ₁_t in (2) is approximately

Pr ({\hat{μ}}_{1 t} - {\hat{σ}}_{1 t} Z_{α / 2} / N_{1 t}^{1 / 2} \leq μ_{1 t} \leq {\hat{μ}}_{1 t} + {\hat{σ}}_{1 t} Z_{α / 2} / N_{1 t}^{1 / 2}) \approx Φ (c_{1 t} + Z_{α / 2}) - Φ (c_{1 t} - Z_{α / 2})

(11)

where

c_{1 t} = \frac{(1 - τ_{1}) Δ_{t} {(λ_{1 t} N ξ_{M})}^{1 / 2}}{ν_{1 t}} .

The power, as a function of c₁_t, strictly increases in (−∞, 0] and decreases in [0, ∞). Therefore, when the true marker status can be correctly classified, (18) gives the coverage probability approximately 100(1 − α)%. Otherwise, the asymptotic coverage probability of the naive confidence interval in (2) is always smaller than the nominal level of 1 − α. Indeed, the power can be substantially reduced; a particularly interesting observation is that the coverage probability approaches to zero when the sample size N gets larger.

3.2. Correction for Classification Error

From (10), unbiased estimators $μ_{g t}^{*}$ of μ_gt can be derived by solving the equations:

{\begin{cases} {\hat{μ}}_{1 t} = τ_{1} μ_{1 t}^{*} + (1 - τ_{1}) μ_{0 t}^{*}, \\ {\hat{μ}}_{0 t} = (1 - τ_{0}) μ_{1 t}^{*} + τ_{0} μ_{0 t}^{*} . \end{cases}

We have

μ_{1 t}^{*} = \frac{τ_{0} {\hat{μ}}_{1 t} - (1 - τ_{1}) {\hat{μ}}_{0 t}}{τ_{1} τ_{0} - (1 - τ_{1}) (1 - τ_{0})}, μ_{0 t}^{*} = \frac{τ_{1} {\hat{μ}}_{0 t} - (1 - τ_{0}) {\hat{μ}}_{1 t}}{τ_{1} τ_{0} - (1 - τ_{1}) (1 - τ_{0})} .

It follows from (8) and (9) that the variances of the unbiased estimators are

VAR (μ_{1 t}^{*}) \approx \frac{τ_{0}^{2} ν_{1 t}^{2} / (λ_{1 t} ξ_{M} N) + {(1 - τ_{1})}^{2} ν_{0 t}^{2} / {λ_{0 t} (1 - ξ_{M}) N}}{{τ_{1} τ_{0} - (1 - τ_{1}) (1 - τ_{0})}^{2}}

and

VAR (μ_{0 t}^{*}) \approx \frac{τ_{1}^{2} ν_{0 t}^{2} / {λ_{0 t} (1 - ξ_{M}) N} + {(1 - τ_{0})}^{2} ν_{1 t}^{2} / (λ_{1 t} ξ_{M} N)}{{τ_{1} τ_{0} - (1 - τ_{1}) (1 - τ_{0})}^{2}} .

Recall that $E ({\hat{σ}}_{g t}^{2}) = ν_{g t}^{2}$ where ${\hat{σ}}_{g t}^{2}$ are given in (4). Consistent estimate $\hat{VAR} (μ_{1 t}^{*})$ of $VAR (μ_{1 t}^{*})$ and $\hat{VAR} (μ_{0 t}^{*})$ of $VAR (μ_{0 t}^{*})$ are given by

\frac{τ_{0}^{2} {\hat{σ}}_{1 t}^{2} / N_{1 t} + {(1 - τ_{1})}^{2} {\hat{σ}}_{0 t}^{2} / N_{0 t}}{{τ_{1} τ_{0} - (1 - τ_{1}) (1 - τ_{0})}^{2}}, \frac{τ_{1}^{2} {\hat{σ}}_{0 t}^{2} / N_{0 t} + {(1 - τ_{0})}^{2} {\hat{σ}}_{1 t}^{2} / N_{1 t}}{{τ_{1} τ_{0} - (1 - τ_{1}) (1 - τ_{0})}^{2}},

respectively.

Note that in large sample $(μ_{g t}^{*} - μ_{g t}) / {\hat{VAR} (μ_{g t}^{*})}^{1 / 2} ~ N (0, 1)$ . Therefore, if λ_gt → constant when N → ∞, then the error-adjusted confidence interval of μ_gt with limits $μ_{g t}^{*} \pm Z_{α / 2} {\hat{VAR} (μ_{g t}^{*})}^{1 / 2}$ has asymptotic coverage probability of 1 − α.

4. Inference on Marker-Specific Treatment Differences

4.1. Effects of Misclassification

We confine our attention to the marker positive group G = 1. The marker negative group can be dealt with similarly. Consider testing the null hypothesis H₀₁ based on the statistics in (3). Taking misclassification into consideration, we have

E ({\hat{δ}}_{1}) = ζ_{11} - ζ_{10} = τ_{1} δ_{1} + (1 - τ_{1}) δ_{0}, VAR ({\hat{δ}}_{1}) \approx \frac{ν_{11}^{2}}{λ_{11} ξ_{M} N} + \frac{ν_{10}^{2}}{λ_{10} ξ_{M} N} .

(12)

In large sample, δ̂₁ asymptotically follows a normal distribution. Note that, under the simultaneous null hypothesis H₀ : δ₁ = δ₀ = 0, E(δ̂₁) = 0. The actual type I error rate is then given by

\begin{array}{l} Pr (∣ {\hat{δ}}_{1} ∣ > s_{1} Z_{α / 2}) \approx Pr {\frac{∣ {\hat{δ}}_{1} ∣}{{(\frac{ν_{11}^{2}}{λ_{11} ξ_{M} N} + \frac{ν_{10}^{2}}{λ_{10} ξ_{M} N})}^{1 / 2}} > \frac{s_{1} Z_{α / 2}}{{(\frac{ν_{11}^{2}}{λ_{11} ξ_{M} N} + \frac{ν_{10}^{2}}{λ_{10} ξ_{M} N})}^{1 / 2}}} \\ \approx 2 {1 - Φ (Z_{α / 2})} = α, \end{array}

(13)

utilizing the fact that $s_{1}^{2}$ defined in (4) is a consistent estimate of $ν_{11}^{2} / (λ_{11} ξ_{M} N) + ν_{10}^{2} / (λ_{10} ξ_{M} N)$ .

Therefore, under simultaneous null hypothesis H₀, the naive tests maintain the type I error at the nominal level, regardless of the marker misclassification. However, unlike the cases when there is no classification error, the type I error rate of the test for the individual hypothesis H₀₁ : δ₁ = 0 depends on δ₀, and thus is no longer controlled at the nominal level. Indeed, the power of the test at δ₁ > 0 is given by

Φ {\frac{τ_{1} δ_{1} + (1 - τ_{1}) δ_{0}}{{(\frac{ν_{11}^{2}}{λ_{11} ξ_{M} N} + \frac{ν_{10}^{2}}{λ_{10} ξ_{M} N})}^{1 / 2}} - Z_{α / 2}} + Φ {- \frac{τ_{1} δ_{1} + (1 - τ_{1}) δ_{0}}{{(\frac{ν_{11}^{2}}{λ_{11} ξ_{M} N} + \frac{ν_{10}^{2}}{λ_{10} ξ_{M} N})}^{1 / 2}} - Z_{α / 2}}

(14)

as compared to

Φ {\frac{δ_{1}}{{(\frac{σ_{11}^{2}}{λ_{11} ξ_{G} N} + \frac{σ_{10}^{2}}{λ_{10} ξ_{G} N})}^{1 / 2}} - Z_{α / 2}},

when there is no classification error.

The type I error rate follows by setting δ₁ = 0 and is given by

Φ {\frac{(1 - τ_{1}) δ_{0}}{{(\frac{ν_{11}^{2}}{λ_{11} ξ_{M} N} + \frac{ν_{10}^{2}}{λ_{10} ξ_{M} N})}^{1 / 2}} - Z_{α / 2}},

which can be substantially inflated, and indeed approaches to 1 when N → ∞ and δ₀ > 0.

Reduction in power due to misclassification may also be sizable. The loss of power attributes to the following observations. First, if we assume that marker-positive patients benefit more from the treatment than marker-negative patients, that is, δ₁ > δ₀, then δ₁ > τ₁δ₁ − (1 − τ₁)δ₀. Secondly, assuming that σ₁_t = σ₀_t = σ_t, that is, the variations of the outcomes in the same treatment arm are not affected by the marker status. Then from (6) and (7) we have

ν_{1 t}^{2} = σ_{t}^{2} + τ_{1} (1 - τ_{1}) Δ_{t}^{2} > σ_{t}^{2}, ν_{0 t}^{2} = σ_{t}^{2} + τ_{0} (1 - τ_{0}) Δ_{t}^{2} > σ_{t}^{2}

if Δ_t ≠ 0.3) It is possible that τ₁δ₁ + (1 − τ₁)δ₀ ≈ 0, which may occur when only patients with marker positive status are benefited from the treatment, that is δ₁ > 0 > δ₀.

The classification error can also substantially affects the coverage probability of the naive confidence interval δ̂₁ ± s₁Z_α_/2. Similar to the derivation of (14), we obtain

Pr ({\hat{δ}}_{1} - s_{1} Z_{α / 2} \leq δ_{1} \leq {\hat{δ}}_{1} + s_{1} Z_{α / 2}) = Φ (ϒ_{1} + Z_{α / 2}) - Φ (ϒ_{1} - Z_{α / 2}),

where

ϒ_{1} = \frac{(1 - τ_{1}) γ}{{(\frac{ν_{11}^{2}}{λ_{11} ξ_{M} N} + \frac{ν_{10}^{2}}{λ_{10} ξ_{M} N})}^{1 / 2}} .

Again, in the presence of classification error, the coverage probability is always smaller, and often substantially so, than the nominal level of 1 − α; it approaches to zero if N → ∞.

4.2. Correction for Classification Error

Similar to (12), we can show that

E ({\hat{δ}}_{0}) - (1 - τ_{0}) δ_{1} + τ_{0} δ_{0}, VAR ({\hat{δ}}_{0}) \approx \frac{ν_{01}^{2}}{λ_{01} (1 - ξ_{M}) N} + \frac{ν_{00}^{2}}{λ_{00} (1 - ξ_{M}) N} .

Therefore, unbiased estimates $δ_{g}^{*}$ of δ_g can be obtained by solving the following equations:

{\begin{cases} {\hat{δ}}_{1} = τ_{1} δ_{1}^{*} + (1 - τ_{1}) δ_{0}^{*}, \\ {\hat{δ}}_{0} = (1 - τ_{0}) δ_{1}^{*} + τ_{0} δ_{0}^{*} . \end{cases}

It follows that

δ_{1}^{*} = \frac{τ_{0} {\hat{δ}}_{1} - (1 - τ_{1}) {\hat{δ}}_{0}}{τ_{1} τ_{0} - (1 - τ_{1}) (1 - τ_{0})}, δ_{0}^{*} = \frac{τ_{1} {\hat{δ}}_{0} - (1 - τ_{0}) {\hat{δ}}_{1}}{τ_{1} τ_{0} - (1 - τ_{1}) (1 - τ_{0})} .

The variance $VAR (δ_{1}^{*})$ of the unbiased estimator δ₁^* is approximately

\frac{τ_{0}^{2} {ν_{11}^{2} / (λ_{11} ξ_{M} N) + ν_{10}^{2} / (λ_{10} ξ_{M} N)} + {(1 - τ_{1})}^{2} [ν_{01}^{2} / {λ_{01} (1 - ξ_{M}) N} + ν_{00}^{2} / {λ_{00} (1 - ξ_{M}) N}]}{{τ_{1} τ_{0} - (1 - τ_{1}) (1 - τ_{0})}^{2}},

which can be estimated consistently by

\hat{VAR} (δ_{1}^{*}) = \frac{τ_{0}^{2} ({\hat{σ}}_{11}^{2} / N_{11} + {\hat{σ}}_{10}^{2} / N_{10}) + {(1 - τ_{1})}^{2} ({\hat{σ}}_{01}^{2} / N_{01} + {\hat{σ}}_{00}^{2} / N_{00})}{{τ_{1} τ_{0} - (1 - τ_{1}) (1 - τ_{0})}^{2}} .

Note that in large sample $(δ_{1}^{*} - δ_{1}) / {\hat{VAR} (δ_{1}^{*})}^{1 / 2} ~ N (0, 1)$ . Therefore, the error-adjusted confidence interval of δ₁ with limits $δ_{1}^{*} \pm Z_{α / 2} {\hat{VAR} (δ_{1}^{*})}^{1 / 2}$ has asymptotic coverage probability of 1 − α. Furthermore, the error-adjusted test that rejects H₀₁ : δ₁ = 0 if $∣ δ_{1}^{*} ∣ / {\hat{VAR} (δ_{1}^{*})}^{1 / 2} > Z_{α / 2}$ has type I error approximately α, regardless of the value of δ₀. The power is given by $Φ [δ_{1} / {VAR (δ_{1}^{*})}^{1 / 2} - Z_{α / 2}]$ .

5. Inference on Treatment-Specific Marker Effects

5.1. Effects of Misclassification

Consider the naive test procedure given in Section 2. Taking the classification errors into account we have

E ({\hat{Δ}}_{t}) = (τ_{0} + τ_{1} - 1) Δ_{t}, VAR ({\hat{Δ}}_{t}) \approx \frac{ν_{1 t}^{2}}{λ_{1 t} ξ_{M} N} + \frac{ν_{0 t}^{2}}{λ_{0 t} (1 - ξ_{M}) N} .

(15)

Therefore Δ̂_t is no longer unbiased for Δ_t. Indeed, it always underestimates Δ_t if Δ_t > 0 and overestimates Δ_t if Δ_t < 0. In large sample, Δ̂_t asymptotically follows a normal distribution. Similar to the derivations of (13) and (14) we conclude that the naive test asymptotically maintains the type I error at the nominal level, and the power of the test at some Δ_t > 0 is given by

Pr ({\hat{Δ}}_{t} > {\tilde{s}}_{t} Z_{α / 2}) = Φ [\frac{(τ_{0} + τ_{1} - 1) Δ_{t}}{{\frac{ν_{1 t}^{2}}{λ_{1 t} ξ_{M} N} + \frac{ν_{0 t}^{2}}{λ_{0 t} (1 - ξ_{M}) N}}^{1 / 2}} - Z_{α / 2}]

which can be substantially smaller than

Φ [\frac{Δ_{t}}{{\frac{σ_{1 t}^{2}}{λ_{11} ξ_{G} N} + \frac{σ_{0 t}^{2}}{λ_{0 t} (1 - ξ_{G}) N}}^{1 / 2}} - Z_{α / 2}],

the power when there is no classification error.

Furthermore, the coverage probability of the naive confidence interval Δ̂_t ± s̃_tZ_α_/2 of Δ_t is given by

\begin{array}{l} Pr ({\hat{Δ}}_{t} - {\tilde{s}}_{t} Z_{α / 2} \leq Δ_{t} \leq {\hat{Δ}}_{t} + {\tilde{s}}_{t} Z_{α / 2}) = Φ ({\tilde{ϒ}}_{t} + Z_{α / 2}) - Φ ({\tilde{ϒ}}_{t} - Z_{α / 2}) \\ \leq 1 - α, (and \to 0 if N \to \infty), \end{array}

where

{\tilde{ϒ}}_{t} = \frac{(2 - τ_{0} - τ_{1}) Δ_{t}}{{\frac{ν_{1 t}^{2}}{λ_{1 t} ξ_{M} N} + \frac{ν_{0 t}^{2}}{λ_{0 t} (1 - ξ_{M}) N}}^{1 / 2}} .

5.2. Correction for Classification Error

Correction for misclassification follows from the fact that

Δ_{t}^{*} = \frac{{\hat{Δ}}_{t}}{τ_{0} + τ_{1} - 1}

is an unbiased estimator of Δ_t. The variance and its consistent estimator are given respectively by

VAR (Δ_{t}^{*}) \approx \frac{ν_{1 t}^{2} / (λ_{1 t} ξ_{M} N) + ν_{0 t}^{2} / {λ_{0 t} (1 - ξ_{M}) N}}{{(τ_{0} + τ_{1} - 1)}^{2}}, \hat{VAR} (Δ_{t}^{*}) = \frac{{\hat{σ}}_{1 t}^{2} / N_{1 t} + {\hat{σ}}_{0 t}^{2} / N_{0 t}}{{(τ_{0} + τ_{1} - 1)}^{2}} .

In large sample $(Δ_{t}^{*} - Δ_{t}) / {\hat{VAR} (Δ_{t}^{*})}^{1 / 2} ~ N (0, 1)$ . Assume that λ_gt → constant when N → ∞. Then, the error-adjusted confidence interval of Δ_t with limits $Δ_{t}^{*} \pm Z_{α / 2} {\hat{VAR} (Δ_{t}^{*})}^{1 / 2}$ has asymptotic coverage probability of 1 − α. The error-adjusted test that rejects H̃₀_t : Δ_t = 0 if $∣ Δ_{t}^{*} ∣ / {\hat{VAR} (Δ_{t}^{*})}^{1 / 2} > Z_{α / 2}$ is equivalent to the naive test.

6. Inference on Marker-Treatment Interaction

6.1. Effects of Misclassification

Recall that the marker-treatment interaction effect is measured by γ = Δ₁ − Δ₀. It follows from (15) that the naive estimate of the interaction γ̂ = Δ̂₁ − Δ̂₀ has mean and variance, given respectively by E(γ̂) = (τ₀ + τ₁ − 1)γ and $VAR (\hat{γ}) \approx θ_{1}^{2} / N$ where

θ_{1}^{2} = \frac{ν_{10}^{2}}{λ_{10} ξ_{M}} + \frac{ν_{00}^{2}}{λ_{00} (1 - ξ_{M})} + \frac{ν_{11}^{2}}{λ_{11} ξ_{M}} + \frac{ν_{01}^{2}}{λ_{01} (1 - ξ_{M})} .

Therefore the naive estimator of the marker-treatment interaction is biased and under-(over-)estimates the interaction if γ > (<)0. The naive test for interaction rejects the null hypothesis $H_{0}^{″} : γ = 0$ if

∣ \hat{γ} ∣ / {({\tilde{s}}_{0}^{2} + {\tilde{s}}_{1}^{2})}^{1 / 2} > Z_{α / 2} .

The power of the test at some γ > 0 is given by

Pr (\hat{γ} > Z_{α / 2} {({\tilde{s}}_{0}^{2} + {\tilde{s}}_{1}^{2})}^{1 / 2}) = Φ {\frac{(τ_{0} + τ_{1} - 1) γ N^{1 / 2}}{θ_{1}} - Z_{α / 2}} .

(16)

It follows from (16) that the naive test maintains the type I error rate at the nominal level of α, regardless of the classification errors. However, the power of the test can be substantially adversely affected as compared to the power of the test with no misclassification, that is, Φ(γN^1/2/θ₀ − Z_α_/2), where θ₀ is such that

θ_{0}^{2} = \frac{σ_{10}^{2}}{λ_{10} ξ_{G}} + \frac{σ_{00}^{2}}{λ_{00} (1 - ξ_{G})} + \frac{σ_{11}^{2}}{λ_{11} ξ_{G}} + \frac{σ_{01}^{2}}{λ_{01} (1 - ξ_{G})} .

The coverage probability of the naive confidence interval $\hat{γ} \pm z_{α / 2} {({\tilde{s}}_{0}^{2} + {\tilde{s}}_{1}^{2})}^{1 / 2}$ of γ is given by

Pr {\hat{γ} - z_{α / 2} {({\tilde{s}}_{0}^{2} + {\tilde{s}}_{1}^{2})}^{1 / 2} \leq γ \leq \hat{γ} + z_{α / 2} {({\tilde{s}}_{0}^{2} + {\tilde{s}}_{1}^{2})}^{1 / 2}} = Φ {\frac{(2 - τ_{0} - τ_{1}) γ N^{1 / 2}}{θ_{1}} + Z_{α} / 2} - Φ {\frac{(2 - τ_{0} - τ_{1}) γ N^{1 / 2}}{θ_{1}} - Z_{α} / 2},

which can be substantially lower (approaching 0 if N → ∞) than the nominal level of 1 − α.

For α = 0.05, σ_gt = 1, λ_mt = 1, γ = 0.936, and selected values of π₀, π₁, ξ_G, and N, Table 1 presents coverage probability of the naive confidence interval and the power of the naive test. In all cases the actual coverage probability is smaller that the nominal level of 0.95, many are of more than 25% reduction. The actual power is also substantially lower than that with no classification errors, some with more than 50% reduction in power. The coverage probability and the power increase as the classification accuracy improves. An increased sample size yields increased power but decreased coverage probability. For example, with 90% sensitivity and specificity respectively, and 40% marker prevalence, the naive coverage probability is 0.90 and the power is 0.71 if the sample size is N = 200. These two measures change to 0.84 and 0.95 respectively when the sample size doubles.

Table 1.

Coverage probability of the naive confidence interval and power of the naive test for marker-treatment interaction:

(N = 200, ξ_G = 0.4)^†
	π₁=0·80	0·85	0·90	0·95
π₀= 0·80	0·74/0·46	0·78/0·52	0·82/0·58	0·85/0·64
0·85	0·80/0·53	0·83/0·59	0·86/0·65	0·89/0·70
0·90	0·85/0·61	0·88/0·66	0·90/0·71	0·91/0·76
0·95	0·90/0·69	0·91/0·73	0·93/0·78	0·94/0·82
(N = 200, ξ_G = 0.6)^†
	π₁=0 ·80	0·85	0·90	0·95
π₀= 0·80	0·74/0·46	0·80/0·53	0·85/0·61	0·90/0·69
0·85	0·78/0·52	0·83/0·59	0·88/0·66	0·91/0·73
0·90	0·82/0·58	0·86/0·65	0·90/0·71	0·93/0·78
0·95	0·85/0·64	0·89/0·70	0·91/0·76	0·94/0·82
(N = 400, ξ_G = 0.4)^††
	π₁=0·80	0·85	0·90	0·95
π₀= 0·80	0·54/0·75	0·61/0·81	0·68/0·86	0·75/0·91
0·85	0·65/0·82	0·71/0·87	0·76/0·91	0·82/0·94
0·90	0·75/0·88	0·80/0·92	0·84/0·95	0·88/0·97
0·95	0·85/0·93	0·88/0·96	0·90/0·97	0·92/0·98
(N = 400, ξ_G = 0.6)^††
	π₁=0·85	0·90	0·95	0·99
π₀= 0·80	0·54/0·75	0·65/0·82	0·75/0·88	0·85/0·93
0·85	0·61/0·81	0·71/0·87	0·80/0·92	0·88/0·96
0·90	0·68/0·86	0·76/0·91	0·84/0·95	0·90/0·97
0·95	0·75/0·91	0·82/0·94	0·88/0·97	0·92/0·98

Open in a new tab

^†

power=0.90 if no misclassification;

^††

power=0.99 if no misclassification.

6.2. Correction for Classification Error

An unbiased estimator of the interaction effect γ can be given by

γ^{*} = \frac{\hat{γ}}{τ_{0} + τ_{1} - 1} .

The variance and its consistent estimator are given respectively by

VAR (γ^{*}) \approx \frac{θ_{1}^{2}}{N {(τ_{0} + τ_{1} - 1)}^{2}}, \hat{VAR} (γ^{*}) = \frac{{\hat{θ}}_{1}^{2}}{N {(τ_{0} + τ_{1} - 1)}^{2}}

where

{\hat{θ}}_{1}^{2} = \frac{{\hat{σ}}_{10}^{2}}{λ_{10} ξ_{M}} + \frac{{\hat{σ}}_{00}^{2}}{λ_{00} (1 - ξ_{M})} + \frac{{\hat{σ}}_{11}^{2}}{λ_{11} ξ_{M}} + \frac{{\hat{σ}}_{01}^{2}}{λ_{01} (1 - ξ_{M})} .

In large sample $(γ^{*} - γ) / {\hat{VAR} (γ^{*})}^{1 / 2} ~ N (0, 1)$ . Hence, the error-adjusted confidence interval of γ with limits $γ^{*} \pm Z_{α / 2} {\hat{VAR} (γ^{*})}^{1 / 2}$ has asymptotic coverage probability of 1 − α. The error-adjusted test that rejects $H_{0}^{″} : γ = 0$ if $∣ γ^{*} ∣ / {\hat{VAR} (γ^{*})}^{1 / 2} > Z_{α / 2}$ is equivalent to the naive test.

6.3. Sample Size Adjustment

For the stratified biomarker design, the sample size N needs to be sufficiently large to ensure adequate power of 1 − β to detect a meaningful marker-treatment interaction γ. From (16) the sample size is given by

N = \frac{{(Z_{α / 2} + Z_{β})}^{2} θ_{1}^{2}}{{(τ_{0} + τ_{1} - 1)}^{2} γ^{2}} .

(17)

On the other hand, in the absence of misclassification the required sample size is

N^{'} = \frac{{(Z_{α / 2} + Z_{β})}^{2} θ_{0}^{2}}{γ^{2}} .

It follows from (18) and (5) that

1 \geq τ_{0} + τ_{1} - 1 = \frac{(1 - ξ_{G}) (2 π_{0} - 1)}{ξ_{M}} \geq \frac{(1 - ξ_{G}) (2 π_{0} - 1)}{π_{1}} > 0.

Furthermore, as pointed out in Section 4.1, the variance ν is usually larger than its counterpart σ. Therefore, a much larger sample size may be required to achieve the desirable power when classification errors exist.

Under the same specifications of parameters’ values (except for N) used for Table 1, Table 2 presents the actual sample size needed and its ratio to the sample size when there is no classification error. It shows that the sample size can be more than twice that required when there is no misclassification of the marker status.

Table 2.

Required sample size and its ratio to the sample size (= 200) when there is no misclassification

ξ_G = 0.4
	π₁=0·80	0·85	0·90	0·95
π₀= 0·80	612/3·06	522/2·61	449/2·24	388/1·94
0·85	508/2·54	442/2·21	386/1·93	339/1·69
0·90	423/2·11	373/1·87	331/1·65	295/1·47
0·95	350/1·75	314/1·57	283/1·41	255/1·27
ξ_G = 0.6
	π₁=0·80	0·85	0·90	0·95
π₀= 0·80	612/3·06	508/2·54	423/2·11	350/1·75
0·85	522/2·61	442/2·21	373/1·87	314/1·87
0·90	449/2·24	386/1·93	331/1·65	283/1·41
0·95	388/1·94	339/1·69	295/1·47	255/1·27

Open in a new tab

6.4. Example

We sought to design a phase III trial where patients with metastatic renal cell carcinoma will be randomized to sunitinib (standard of care) or sunitinib plus an experimental drug stratified by the IL-6 status. The primary endpoint is progression-free survival (PFS) rate at 6 months. IL-6 is a continuous variable with high IL-6 status defined as a value greater than or equal to 13 pg/mL; this cut-point value is based on the observed median as was reported in one study [13]. Based on observed data, the PFS rate at 6-months in low and high IL-6 patients treated with sunitinib is 48% and 18%, respectively. The hypothesized effect in low and high IL-6 patients treated with the experimental drug is 66% and 59%, respectively. The assay has 95% sensitivity and 90% specificity. Assuming equal allocation and 40% prevalence of high IL-6. Assuming further that a power of 0.85 is desirable to detect a marker-treatment interaction effect of γ = (0.59 − 0.18) − (0.66 − 0.48) = 0.23 in PFS rates. Using equation (17), the required sample size is about 1,020, or 255 patients are needed in each stratum of IL-6 by treatment. If on the other hand, the prevalence of high IL-6 status is 30%, then the required sample size is much larger, about 1,244, or 311 patients in each stratum. In contrast, the sample sizes are about 177 and 202 respectively per stratum when there is no classification errors for the two scenarios. Note that similar to the comparison of two independent proportions, in the calculation the stratum-specific variances $σ_{g t}^{2}$ are set to be μ̄(1 − μ̄) where μ̄ is the average of stratum-specific rates, that is,

\bar{μ} = (μ_{11} + μ_{01} + μ_{10} + μ_{00}) / 4 (= (0.59 + 0.66 + 0.18 + 0.48) / 4 = 0.4775),

yielding σ_gt = 0.25.

7. Discussion

In the present paper we demonstrated both analytically and numerically that the misclassified biomarker status can have profound negative impact on various inference problems in a stratified biomarker trial. The methods developed are based on asymptotic theory and are suitable for most biomarker stratified trials that usually require relatively large sample sizes; however, caution needs to be taken for small-size trials.

It is worth noting that, as a result of the randomization, the naive test for marker-treatment interaction maintains the required type I error rates, but suffers considerably from loss of power due to misclassification, which in turn, results in larger sample sizes required for the trial.

Our investigation assumes that the marker’s prevalence ξ_G, sensitivity π₁, and specificity π₀ are all known. When the N patients are a representative sample of the targeted population, ${\hat{ξ}}_{M} = \sum_{i = 1}^{N} M_{i} / N$ is an unbiased estimate of ξ_M. Then from (18), it follows that

{\hat{ξ}}_{G} = \frac{{\hat{ξ}}_{M} - (1 - π_{0})}{π_{0} + π_{1} - 1}

is an unbiased estimate of the marker’s prevalence ξ_G. If sensitivity π₁, and specificity π₀ are unknown, then a preliminary study can be conducted to estimate π₀ and π₁.

The technical developments employed in the present paper can be readily extended to other biomarker-driven designs, for example, the biomarker enrichment strategy design in which only marker positive patients are randomized to receive treatments. However, as shown in the developments, data from all marker by treatment strata are needed to adjust for classification errors. For a review of useful biomarker based clinical designs, see, e.g. [3, 6, 18,19]. Although the choice of these various designs depends on the trial aims, the impact of biomarker misclassification can be substantial in each design, and needs further evaluations. For example, some designs involve testing multiple hypotheses concerning the various aspects of the marker-treatment effects. It is then important to investigate how the classification errors adversely affect the allocation of type I error rates and the power of the study. Such investigation is also warranted for adaptive and Bayesian biomarker designs.

Throughout, testing marker-treatment effects is formulated based on stratum-specific means, e.g. means of normal distributions or proportions of a dichotomous endpoint. The methods developed in the present paper could be generalized, with some tedious algebraic manipulations, to ordinal/categorical and longitudinal/repeated endpoints with stratum means as the primary interest. We are currently working on extending the method for time-to-event endpoints and longitudinally measured endpoints with hazards ratio and rates of change as the primary comparison, respectively. As can be expected, these types of endpoint require different and more complicated technical handling of the assumptions.

Increased advances in understanding the roles of molecular and genetic pathways in carcinogenesis are leading to the development of novel therapies that target the disease pathways. As a result of these advances, the landscape for performing clinical trials with biomarkers in cancer is evolving and becoming complex. Despite the large sample size required for the stratified biomarker design, we believe that this approach is realistic and worth it as it accounts for misclassification errors.

Acknowledgments

Research of A. Liu was supported by the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development of the National Institutes of Health. Research of S. Halabi was supported by grants R01-CA155296 and U01-CA157703.

Appendix: Some Technical Details

Proof of Eq. (1)

\begin{array}{l} ξ_{M} = Pr (M = 1) = Pr (M = 1, G = 1) + Pr (M = 1, G = 0) \\ = Pr (M = 1 ∣ G = 1) Pr (G = 1) + Pr (M = 1 ∣ G = 0) Pr (G = 0) \\ = π_{1} ξ_{G} + (1 - π_{0}) (1 - ξ_{G}) . \end{array}

Proof of Eq. (11)

\begin{array}{l} Pr ({\hat{μ}}_{1 t} - {\hat{σ}}_{1 t} Z_{α / 2} / N_{1 t}^{1 / 2} \leq μ_{1 t} \leq {\hat{μ}}_{1 t} + {\hat{σ}}_{1 t} Z_{α / 2} / N_{1 t}^{1 / 2}) \approx Pr ({\hat{μ}}_{1 t} - ν_{1 t} Z_{α / 2} / N_{1 t}^{1 / 2} \leq μ_{1 t}) - Pr ({\hat{μ}}_{1 t} + ν_{1 t} Z_{α / 2} / N_{1 t}^{1 / 2} \leq μ_{1 t}) \\ = E {Φ (\frac{N_{1 t}^{1 / 2} (1 - τ_{1}) Δ_{t}}{ν_{1 t}} + Z_{α / 2})} - E {Φ (\frac{N_{1 t}^{1 / 2} (1 - τ_{1}) Δ_{t}}{ν_{1 t}} - Z_{α / 2})} \\ \approx Φ (c_{1 t} + Z_{α / 2}) - Φ (c_{1 t} - Z_{α / 2}) \end{array}

Note that in the third expression the expectation is taken with respect to the random number N₁_t.

References

1.Gordon AN, Tonda M, Sun S, Rackoff W Doxil study 30–49 investigators. Long-term survival advantage for women treated with pegylated liposomal doxorubicin compared with topotecan in a phase 3 randomized study of recurrent and refractory epithelial ovarian cancer. Gynecologic Oncology. 2004;95:1–8. doi: 10.1016/j.ygyno.2004.07.011. [DOI] [PubMed] [Google Scholar]
2.Simon R, Maitournam A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research. 2004;10:6759–6763. doi: 10.1158/1078-0432.CCR-04-0496. [DOI] [PubMed] [Google Scholar]
3.Mandrekar SJ, Sargent DJ. Clinical Trial Designs for Predictive Biomarker Validation: Theoretical Considerations and Practical Challenges. Journal of Clinical Oncology. 2009;27:4027–4034. doi: 10.1200/JCO.2009.22.3701. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Freidlin B, McShane LM, Korn EL. Randomized clinical trials with biomarkers: Design issues. Journal National Cancer Institute. 2010;102:152–160. doi: 10.1093/jnci/djp477. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Joo J, Geller NL, French B, Kimmel SE, Rosenberg Y, Ellenberg JH. Prospective alpha allocation in the clarification of optimal anticoagulation through genetics (COAG) trial. Clinical Trials. 2010;7:597–604. doi: 10.1177/1740774510381285. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Simon R. Clinical trials for predictive medicine: new challenges and paradigms. Clinical Trials. 2010;7:516–524. doi: 10.1177/1740774510366454. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Lai TL, Lavori PW. Innovative clinical trial designs toward a 21st-century health care system. Statistics in Bioscience. 2011;3:145–168. doi: 10.1007/s12561-011-9042-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Motzer RJ, Hutson TE, Tomczak P, Michaelson MD, Bukowski RM, Rixe O, Oudard S, Negrier S, Szczylik C, Kim ST, Chen I, Bycott PW, Baum CM, Figlin RA. Sunitinib versus interferon alfa in metastatic renal-cell carcinoma. New England Journal of Medicine. 2007;356:115–124. doi: 10.1056/NEJMoa065044. [DOI] [PubMed] [Google Scholar]
9.Rini BI, Halabi S, Rosenberg JE, Stadler WM, Vaena DA, Ou SS, Archer L, Atkins JN, Picus J, Czaykowski P, Dutcher J, Small EJ. Bevacizumab plus interferon-alpha versus interferon-alpha monotherapy in patients with metastatic renal cell carcinoma: Results of CALGB 90206. Journal of Clinical Oncology. 2008;26:5422–5428. doi: 10.1200/JCO.2008.16.9847. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Bosco EE, Wang Y, Xu H, Zilfou JT, Knudsen KE, Aronow BJ, Lowe SW, Knudsen ES. The retinoblastoma tumor suppressor modifies the therapeutic response of breast cancer. Journal of Clinical Investigation. 2007;117:218–228. doi: 10.1172/JCI28803. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Sharma A, Comstock CE, Knudsen ES, Cao KH, Hess-Wilson JK, Morey LM, Barrera J, Knudsen KE. Retinoblastoma tumor suppressor status is a critical determinant of therapeutic response in prostate cancer cells. Cancer Research. 2007;67:6192–6203. doi: 10.1158/0008-5472.CAN-06-4424. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Sharma A, Yeow WS, Ertel A, Coleman I, Clegg N, Thangavel C, Morrissey C, Zhang X, Comstock CE, Witkiewicz AK, Gomella L, Knudsen ES, Nelson PS, Knudsen KE. The retinoblastoma tumor suppressor controls androgen signaling and human prostate cancer progression. Journal of Clinical Investigation. 2010;120:4478–4492. doi: 10.1172/JCI44239. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Tran HT, Liu Y, Zurita AJ, Lin Y, Baker-Neblett KL, Martin AM, Figlin RA, Hutson TE, Sternberg CN, Amado RG, Pandite LN, Heymach JV. Prognostic or predictive plasma cytokines and angiogenic factors for patients treated with pazopanib for metastatic renal-cell cancer: a retrospective analysis of phase 2 and phase 3 trials. Lancet Oncology. 2012;13:827–837. doi: 10.1016/S1470-2045(12)70241-3. [DOI] [PubMed] [Google Scholar]
14.Abecasis GR, Cherny SS, Cardon LR. The impact of genotyping error on family-based analysis of quantitative traits. European Journal of Human Genetics. 2001;9:130–134. doi: 10.1038/sj.ejhg.5200594. [DOI] [PubMed] [Google Scholar]
15.Hao K, Li C, Rosenow C, Wong WH. Estimation of genotype error rate using samples with pedigree informationan application on the GeneChip Mapping 10K array. Genomics. 1992;84:623–630. doi: 10.1016/j.ygeno.2004.05.003. [DOI] [PubMed] [Google Scholar]
16.Wang SJ, Hung HMJ, O’Neill RT. Genomic classifier for patient enrichment: Misclassification and type I error issues in pharmacogenomics noninferiority trial. Statistics in Biopharmaceutical Research. 2011;3:310–319. [Google Scholar]
17.Maitournam A, Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine. 2005;24:329–339. doi: 10.1002/sim.1975. [DOI] [PubMed] [Google Scholar]
18.Freidlin B, Korn EL. Biomarker-adaptive clinical trial designs. Pharmacogenomics. 2010;11:1679–1682. doi: 10.2217/pgs.10.153. [DOI] [PubMed] [Google Scholar]
19.Gosho M, Nagashima K, Sato Y. Study Designs and Statistical Analyses for Biomarker Research. Sensors. 2012;12:8966–8986. doi: 10.3390/s120708966. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Gordon AN, Tonda M, Sun S, Rackoff W Doxil study 30–49 investigators. Long-term survival advantage for women treated with pegylated liposomal doxorubicin compared with topotecan in a phase 3 randomized study of recurrent and refractory epithelial ovarian cancer. Gynecologic Oncology. 2004;95:1–8. doi: 10.1016/j.ygyno.2004.07.011. [DOI] [PubMed] [Google Scholar]

[R2] 2.Simon R, Maitournam A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research. 2004;10:6759–6763. doi: 10.1158/1078-0432.CCR-04-0496. [DOI] [PubMed] [Google Scholar]

[R3] 3.Mandrekar SJ, Sargent DJ. Clinical Trial Designs for Predictive Biomarker Validation: Theoretical Considerations and Practical Challenges. Journal of Clinical Oncology. 2009;27:4027–4034. doi: 10.1200/JCO.2009.22.3701. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Freidlin B, McShane LM, Korn EL. Randomized clinical trials with biomarkers: Design issues. Journal National Cancer Institute. 2010;102:152–160. doi: 10.1093/jnci/djp477. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Joo J, Geller NL, French B, Kimmel SE, Rosenberg Y, Ellenberg JH. Prospective alpha allocation in the clarification of optimal anticoagulation through genetics (COAG) trial. Clinical Trials. 2010;7:597–604. doi: 10.1177/1740774510381285. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Simon R. Clinical trials for predictive medicine: new challenges and paradigms. Clinical Trials. 2010;7:516–524. doi: 10.1177/1740774510366454. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Lai TL, Lavori PW. Innovative clinical trial designs toward a 21st-century health care system. Statistics in Bioscience. 2011;3:145–168. doi: 10.1007/s12561-011-9042-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Motzer RJ, Hutson TE, Tomczak P, Michaelson MD, Bukowski RM, Rixe O, Oudard S, Negrier S, Szczylik C, Kim ST, Chen I, Bycott PW, Baum CM, Figlin RA. Sunitinib versus interferon alfa in metastatic renal-cell carcinoma. New England Journal of Medicine. 2007;356:115–124. doi: 10.1056/NEJMoa065044. [DOI] [PubMed] [Google Scholar]

[R9] 9.Rini BI, Halabi S, Rosenberg JE, Stadler WM, Vaena DA, Ou SS, Archer L, Atkins JN, Picus J, Czaykowski P, Dutcher J, Small EJ. Bevacizumab plus interferon-alpha versus interferon-alpha monotherapy in patients with metastatic renal cell carcinoma: Results of CALGB 90206. Journal of Clinical Oncology. 2008;26:5422–5428. doi: 10.1200/JCO.2008.16.9847. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Bosco EE, Wang Y, Xu H, Zilfou JT, Knudsen KE, Aronow BJ, Lowe SW, Knudsen ES. The retinoblastoma tumor suppressor modifies the therapeutic response of breast cancer. Journal of Clinical Investigation. 2007;117:218–228. doi: 10.1172/JCI28803. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Sharma A, Comstock CE, Knudsen ES, Cao KH, Hess-Wilson JK, Morey LM, Barrera J, Knudsen KE. Retinoblastoma tumor suppressor status is a critical determinant of therapeutic response in prostate cancer cells. Cancer Research. 2007;67:6192–6203. doi: 10.1158/0008-5472.CAN-06-4424. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Sharma A, Yeow WS, Ertel A, Coleman I, Clegg N, Thangavel C, Morrissey C, Zhang X, Comstock CE, Witkiewicz AK, Gomella L, Knudsen ES, Nelson PS, Knudsen KE. The retinoblastoma tumor suppressor controls androgen signaling and human prostate cancer progression. Journal of Clinical Investigation. 2010;120:4478–4492. doi: 10.1172/JCI44239. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Tran HT, Liu Y, Zurita AJ, Lin Y, Baker-Neblett KL, Martin AM, Figlin RA, Hutson TE, Sternberg CN, Amado RG, Pandite LN, Heymach JV. Prognostic or predictive plasma cytokines and angiogenic factors for patients treated with pazopanib for metastatic renal-cell cancer: a retrospective analysis of phase 2 and phase 3 trials. Lancet Oncology. 2012;13:827–837. doi: 10.1016/S1470-2045(12)70241-3. [DOI] [PubMed] [Google Scholar]

[R14] 14.Abecasis GR, Cherny SS, Cardon LR. The impact of genotyping error on family-based analysis of quantitative traits. European Journal of Human Genetics. 2001;9:130–134. doi: 10.1038/sj.ejhg.5200594. [DOI] [PubMed] [Google Scholar]

[R15] 15.Hao K, Li C, Rosenow C, Wong WH. Estimation of genotype error rate using samples with pedigree informationan application on the GeneChip Mapping 10K array. Genomics. 1992;84:623–630. doi: 10.1016/j.ygeno.2004.05.003. [DOI] [PubMed] [Google Scholar]

[R16] 16.Wang SJ, Hung HMJ, O’Neill RT. Genomic classifier for patient enrichment: Misclassification and type I error issues in pharmacogenomics noninferiority trial. Statistics in Biopharmaceutical Research. 2011;3:310–319. [Google Scholar]

[R17] 17.Maitournam A, Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine. 2005;24:329–339. doi: 10.1002/sim.1975. [DOI] [PubMed] [Google Scholar]

[R18] 18.Freidlin B, Korn EL. Biomarker-adaptive clinical trial designs. Pharmacogenomics. 2010;11:1679–1682. doi: 10.2217/pgs.10.153. [DOI] [PubMed] [Google Scholar]

[R19] 19.Gosho M, Nagashima K, Sato Y. Study Designs and Statistical Analyses for Biomarker Research. Sensors. 2012;12:8966–8986. doi: 10.3390/s120708966. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Adjusting for Misclassification in a Stratified Biomarker Clinical Trial ^†

Chunling Liu

Aiyi Liu

Jiang Hu

Vivian Yuan

Susan Halabi

Abstract

1. Introduction

2. The Design in Presence of Misclassification

3. Estimating Stratum-Specific Treatment Means μ_gt

3.1. Effects of Misclassification

3.2. Correction for Classification Error

4. Inference on Marker-Specific Treatment Differences

4.1. Effects of Misclassification

4.2. Correction for Classification Error

5. Inference on Treatment-Specific Marker Effects

5.1. Effects of Misclassification

5.2. Correction for Classification Error

6. Inference on Marker-Treatment Interaction

6.1. Effects of Misclassification

Table 1.

6.2. Correction for Classification Error

6.3. Sample Size Adjustment

Table 2.

6.4. Example

7. Discussion

Acknowledgments

Appendix: Some Technical Details

Proof of Eq. (1)

Proof of Eq. (11)

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Adjusting for Misclassification in a Stratified Biomarker Clinical Trial †

Chunling Liu

Aiyi Liu

Jiang Hu

Vivian Yuan

Susan Halabi

Abstract

1. Introduction

2. The Design in Presence of Misclassification

3. Estimating Stratum-Specific Treatment Means μgt

3.1. Effects of Misclassification

3.2. Correction for Classification Error

4. Inference on Marker-Specific Treatment Differences

4.1. Effects of Misclassification

4.2. Correction for Classification Error

5. Inference on Treatment-Specific Marker Effects

5.1. Effects of Misclassification

5.2. Correction for Classification Error

6. Inference on Marker-Treatment Interaction

6.1. Effects of Misclassification

Table 1.

6.2. Correction for Classification Error

6.3. Sample Size Adjustment

Table 2.

6.4. Example

7. Discussion

Acknowledgments

Appendix: Some Technical Details

Proof of Eq. (1)

Proof of Eq. (11)

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Adjusting for Misclassification in a Stratified Biomarker Clinical Trial ^†

3. Estimating Stratum-Specific Treatment Means μ_gt