Some Distributions and Their Implications for an Internal Pilot Study With a Univariate Linear Model

Christopher S Coffey; Keith E Muller

doi:10.1080/03610920008832631

. Author manuscript; available in PMC: 2013 Dec 2.

Published in final edited form as: Commun Stat Theory Methods. 2007 Jun 27;29(12):10.1080/03610920008832631. doi: 10.1080/03610920008832631

Some Distributions and Their Implications for an Internal Pilot Study With a Univariate Linear Model

Christopher S Coffey ¹, Keith E Muller ²

PMCID: PMC3845535 NIHMSID: NIHMS446102 PMID: 24307749

Abstract

In planning a study, the choice of sample size may depend on a variance value based on speculation or obtained from an earlier study. Scientists may wish to use an internal pilot design to protect themselves against an incorrect choice of variance. Such a design involves collecting a portion of the originally planned sample and using it to produce a new variance estimate. This leads to a new power analysis and increasing or decreasing sample size. For any general linear univariate model, with fixed predictors and Gaussian errors, we prove that the uncorrected fixed sample F-statistic is the likelihood ratio test statistic. However, the statistic does not follow an F distribution. Ignoring the discrepancy may inflate test size. We derive and evaluate properties of the components of the likelihood ratio test statistic in order to characterize and quantify the bias. Most notably, the fixed sample size variance estimate becomes biased downward. The bias may inflate test size for any hypothesis test, even if the parameter being tested was not involved in the sample size re-estimation. Furthermore, using fixed sample size methods may create biased confidence intervals for secondary parameters and the variance estimate.

Key Words and Phrases: Interim power analysis, sample size re-estimation

1. Introduction

1.1 Motivation and Literature Review

In designing a study, researchers want to collect a sample large enough to detect a specified effect for a given test size (α_t) and target power (P_t). Scientists often rely on an educated guess or variance estimate of uncertain validity to conduct a power analysis and choose a sample size. Wittes and Brittain (1990) introduced the concept of an internal pilot study for the two sample t-test, in which some fraction of the planned observations are used to re-estimate error variance but not the effect of interest. Using the new variance estimate in a fixed sample power calculation then modifies the final sample size. Wittes and Brittain suggested ignoring the randomness of the final sample size for testing.

Coffey and Muller (1999) extended the idea to any General Linear Univariate Model (GLUM) with fixed predictors and Gaussian errors. They derived an exact algorithm for computing test size and power of the primary hypothesis. They also illustrated the strong dependence of test size inflation on interactions among a number of study features.

Many important questions remain unanswered. Carefully evaluating the analytic properties of the approach will greatly help in determining the impact of using an internal pilot design. In particular, 1) detailed knowledge of analytic properties of the random variables in the test statistic would allow characterizing the inflation. 2) Additional results are needed for the general GLUM setting to allow testing secondary hypotheses other than the one upon which sample size re-estimation was based. 3) The ability to provide a defensible confidence interval for the variance observed in a study would aid researchers planning similar studies in the future.

1.2 Notation

Indicate the cumulative distribution function (CDF) of a random variable, U, with parameters α₁ to α_k, as F_U(u; α₁…α_k), with pth quantile $F_{U}^{- 1} (p; α_{1} \dots α_{k})$ , and density f_U(u; α₁…α_k). Let χ²(ν, ω) indicate a noncentral χ², χ²(ν) a central χ², and F(ν₁, ν₂, ω) a noncentral F variable (Johnson, Kotz, and Balakrishnan, 1995, Chapters 29 and 30). Also, let $χ_{T}^{2} (ν; l, u)$ indicate a doubly truncated central χ² with lower truncation point l and upper truncation point u (Coffey and Muller, 2000).

We consider the same model as in Coffey and Muller (1999), which includes the two sample t-test as a special case. For a specified design, write a GLUM with fixed predictors and Gaussian errors as

\begin{matrix} y_{+} = X_{+} β + e_{+} \\ [\begin{matrix} y_{1} \\ n_{1} \times 1 \\ y_{2} \\ N_{2} \times 1 \end{matrix}] = [\begin{matrix} X_{1} \\ n_{1} \times q \\ X_{2} \\ N_{2} \times q \end{matrix}] β + [\begin{matrix} e_{1} \\ n_{1} \times 1 \\ e_{2} \\ N_{2} \times 1 \end{matrix}], \end{matrix}

(1)

with partitioning corresponding to the internal pilot and second samples. Table 1 contains four categories of notation: 1) design parameters, which are properties required for any sample size calculation, 2) sample size allocation rules, which determine the size of the internal pilot sample and limit the final sample size, 3) unknown fixed parameters, and 4) random variables to be observed. Let Es(X_j) represent the matrix created by deleting any duplicate rows from X_j (Helms, 1988). We require that Es(X₁) = Es(X₂), with the possible exception that a “block” effect may be added, which indicates whether the observation was collected in the internal pilot or second sample. We also impose the restriction that all possible observed samples differ by a multiple of a fixed number of observations, m. It follows that n₁, N₂, and N₊ will be multiples of m. For example, consider increasing sample size by always taking two control subjects for each experimental one (m = 3).

Table 1. Internal Pilot Study Notation.

Symbol

Definition

Design Parameters

α_t

Target test size

P_t

Target Power

θ_*

“Scientifically Important” value of θ

σ_{0}^{2}

Variance value used for planning

n₀

Pre-planned sample size based on α_t, P_t, θ_*,

σ_{0}^{2}

Sample Size Allocation

Proportion of n₀ used in internal pilot

n₁

Internal pilot sample size; size of first sample, πn₀

ν₁

Internal pilot error degrees of freedom, n₁ − rank(X₁)

n_+,min

Minimum size of final sample

n_+,max

Maximum size of final sample

Fixed, Unknown Parameters

σ²

True error variance

Ratio of true variance to variance used for planning, σ²/

σ_{0}^{2}

True value of secondary parameter, Cβ, a × 1 vector

Random Variables

{\hat{σ}}_{1}^{2}

Internal pilot variance estimate,

y_{1}^{'} [I - X_{1} {(X_{1}^{'} X_{1})}^{-} X_{1}^{'}] y_{1} / ν_{1}

N₂

Size of second sample, with particular value n₂

N₊

Final sample size, n₁ + N₂, with particular value n₊

ν₊

Final sample error df, N₊ − rank(X₊)

β̂(n₊)

{(X_{+}^{'} X_{+})}^{-} X_{+}^{'} y_{+}

θ̂(n₊)

Final estimate of secondary parameter, Cβ̂(n₊)

SSH(n₊)

Final hypothesis SS,

\hat{θ} (n_{+})' {[C {(X_{+}^{'} X_{+})}^{-} C']}^{- 1} \hat{θ} (n_{+})

σ̂²(n₊)

Final variance estimate,

y_{+}^{'} [I - X_{+} {(X_{+}^{'} X_{+})}^{-} X_{+}^{'}] y_{+} / ν_{+}

F(n₊)

Test statistic, [SSH(n₊)/a]/σ̂²(n₊)

Open in a new tab

In testing H₀: θ = θ₀, with θ = Cβ, we assume C to be an a × q matrix with a = rank(C). Without loss of generality assume θ₀ = 0 (Coffey and Muller, 1999). The unadjusted testing method computes F(n₊), the fixed sample size F statistic and rejects H₀ if $F (n_{+}) > f_{F} = F_{F}^{- 1} (1 - α_{t}, a, ν_{+})$ .

1.3 Known Results

Wittes and Brittain (1990) considered using an internal pilot design with no adjustment to testing. A fixed sample power calculation determines the random N₊ as a function of ${\hat{σ}}_{1}^{2}$ . They used simulations to evaluate test size, power, and expected sample size for a t-test involving roughly 100 total observations.

Wittes, Schabenberger, Zucker, Brittain, and Proschan (1999) derived exact test size in this setting. They also showed that σ̂²(N₊) is biased downward, but did not provide an expression for the bias.

Coffey and Muller (1999) provided a number of exact results for the more general GLUM setting. For a specified value of n₊, define ω_t(n₊) to be the solution to P_t = 1 − F_F[f_F; a, ν₊, ω_t(n₊)]. Hence

σ^{2} (n_{+}) = \frac{θ_{*}^{'} {[C {(X_{+}^{'} X_{+})}^{-} C']}^{- 1} θ_{*}}{ω_{t} (n_{+})}

(2)

equals the largest value of ${\hat{σ}}_{1}^{2}$ which leads to a final sample size of n₊ or smaller. Also define

\begin{array}{l} q (n_{+}, γ) = \frac{ν_{1} σ^{2} (n_{+})}{σ^{2}} \\ = \frac{ν_{1} σ^{2} (n_{+})}{γ σ_{0}^{2}}, \end{array}

(3)

which equals the largest value of SSE₁/σ² leading to a sample size of n₊ or smaller. Hence the probability of a particular random final sample size is

\begin{array}{l} Pr {N_{+} = n_{+}} = Pr {σ^{2} (n_{+} - m) < {\hat{σ}}_{1}^{2} < σ^{2} (n_{+})} \\ = Pr {q (n_{+} - m, γ) < {SSE}_{1} / σ^{2} < q (n_{+}, γ)} \\ = F_{χ^{2}} [q (n_{+}, γ); ν_{1}] - F_{χ^{2}} [q (n_{+} - m, γ); ν_{1}] . \end{array}

(4)

Note that $F_{χ^{2}} [q (n_{+, min} - m, γ); ν_{1}] = 0$ and $F_{χ^{2}} [q (n_{+, max}, γ); ν_{1}] = 1$ . In theory, n_{+, max} may be infinite. However, budgetary and time constraints often restrict n_{+, max} to some small multiple of n₀. Coffey and Muller (1999) used a double conditioning argument to describe an algorithm for computing the power of the unadjusted test, for any θ:

P (γ, θ) = 1 - \sum_{n_{+} = n_{+, min}}^{n_{+, max}} \int_{q (n_{+} - m, γ)}^{q (n_{+}, γ)} Pr {(\frac{ν_{+}}{f_{F} a}) χ^{2} [a, ω (n_{+}, γ, θ)] - χ^{2} (n_{2}) \leq t} f_{χ^{2}} (t; ν_{1}) dt,

(5)

with

ω (n_{+}, γ, θ) = \frac{θ' {[C {(X_{+}^{'} X_{+})}^{-} C^{'}]}^{- 1} θ}{σ^{2}} .

(6)

In practice, the results of Coffey and Muller (1999) and the new results in this paper do not require determining N₊ with a fixed sample calculation. The rule for choosing sample size need only determine {σ²(n₊)} in a way that maps regions of ${\hat{σ}}_{1}^{2}$ into values of N₊. Changing the rule merely changes the corresponding truncation regions.

2. New Analytic Results

2.1 Error Bound for Power and Test Size Algorithm

Even without a finite upper limit on N₊ (allowing n_+,max = ∞), practical computations require truncating the distribution of N₊ at a value beyond which the probability of a sample size more extreme is negligible. Indicate the error due to this truncation by P_E(γ, θ). Let N_L be the lower truncation point, i. e., the largest value of N₊ such that $F_{χ^{2}} [q (n_{+}, γ); ν_{1}] < ∊$ . Let N_U be the upper truncation point, i. e., the smallest value of N₊ such that $1 - F_{χ^{2}} [q (n_{+}, γ); ν_{1}] < ∊$ . Truncating at N_L and N_U leads to an error of

\begin{array}{l} P_{E} (γ, θ) = \sum_{n_{+} = n_{+, min}}^{N_{L}} \int_{q (n_{+} - m, γ)}^{q (n_{+}, γ)} F_{Q} (t; n_{+}, γ, θ) f_{χ^{2}} (t; ν_{1}) dt + \sum_{n_{+} = N_{U}}^{n_{+, max}} \int_{q (n_{+} - m, γ)}^{q (n_{+}, γ)} F_{Q} (t; n_{+}, γ, θ) f_{χ^{2}} (t; ν_{1}) dt, \end{array}

(7)

with

F_{Q} (t; n_{+}, γ, θ) = Pr {(\frac{ν_{+}}{f_{F} a}) χ^{2} [a, ω (n_{+}, γ, θ)] - χ^{2} (n_{2}) \leq t} .

(8)

Replacing F_Q(·) with 1 gives an upper bound on the error in the last equation:

P_{E} (γ, θ) \leq \sum_{n_{+} = n_{+, min}}^{N_{L}} \int_{q (n_{+} - m, γ)}^{q (n_{+}, γ)} f_{χ^{2}} (t; ν_{1}) dt + \sum_{n_{+} = N_{U}}^{n_{+, max}} \int_{q (n_{+} - m, γ)}^{q (n_{+}, γ)} f_{χ^{2}} (t; ν_{1}) dt \leq F_{χ^{2}} (N_{L}; ν_{1}) + [1 - F_{χ^{2}} (N_{U}; ν_{1})] \leq 2 ∊ .

(9)

If only one end requires truncation the bound reduces to ∊.

2.2 The Likelihood and Related Properties

Using an internal pilot study causes N₊ to be random. In a Bayesian framework, the stopping rule does not affect inference and the likelihood principle is not affected by this randomness (Jennison and Turnbull, 2000, p.338). Thus it seems intuitive that the maximum likelihood estimates and likelihood ratio test statistic for internal pilot and fixed sample designs coincide. Nevertheless, our interest in a wide range of scenarios compelled us to provide a formal proof.

Use conditioning arguments to write the likelihood for the GLUM as

ℒ (β, σ^{2}; y_{1}, y_{2}, N_{+}) = ℒ (β, σ^{2}; y_{1}) \cdot ℒ (β, σ^{2}; y_{2} | N_{+}, y_{1}) \cdot ℒ (β, σ^{2}; N_{+} | y_{1})

(10)

The marginal likelihood of the first sample, ℒ(β, σ²; y₁) equals that of a random sample of n₁ observations from a Gaussian population. The likelihood of the second sample conditional upon N₊, ℒ (β, σ²; y₂|N₊, y₁), equals that of a random sample of n₂ observations from a Gaussian population. Hence the total likelihood differs from that for a fixed sample with n₊ = n₁ + n₂ observations only through ℒ(β, σ²; N₊ | y₁). Observe that N₊ is discrete and write

\begin{array}{l} ℒ (β, σ^{2}; N_{+} | y_{1}) = Pr {N_{+} = n_{+} | y_{1}} \\ = {\begin{cases} 1, if q (n_{+} - m, γ) < {SSE}_{1} / σ^{2} \leq q (n_{+}, γ) \\ 0, otherwise \end{cases} \\ = I {q (n_{+} - m; γ) < {SSE}_{1} / σ^{2} \leq q (n_{+}; γ)}, \end{array}

(11)

in which I(·) represents an indicator function with a value of 1 if the expression is true. In turn, write the joint likelihood under an internal pilot design as

\begin{array}{l} ℒ (β, σ^{2}; y_{1}, y_{2}, N_{+}) = I {q (n_{+} - m, γ) < {SSE}_{1} / σ^{2} \leq q (n_{+}, γ)} \times {(2 π σ^{2})}^{- n_{+} / 2} exp [- (y_{+} - X_{+} β)' (y_{+} - X_{+} β) / 2 σ^{2}] . \end{array}

(12)

Since the value of the indicator function does not depend on any unknown parameter, we may ignore it in estimation. Therefore the sufficient statistics and maximum likelihood estimates for σ², β, and θ, coincide with those from a fixed sample size analysis. Let σ̃²(n₊) represent the maximum likelihood estimate of σ². In a fixed sample design we often prefer the unbiased estimate,

{\hat{σ}}^{2} (n_{+}) = (\frac{n_{+}}{ν_{+}}) {\tilde{σ}}^{2} (n_{+}) .

(13)

However, both σ̂²(n₊) and σ̂²(N₊) have bias, as detailed in §2.5.

Coincidence of the maximum likelihood estimates implies the coincidence of the likelihood test statistics. However, F(n₊) will not follow an F distribution under an internal pilot design. In order to characterize the distribution of F(n₊), we examine each component separately in the sections which follow.

2.3 The Distribution of SSH(n₊)

Conditional on N₊ = n₊, the numerator of the likelihood ratio test statistic has the same distribution as for a fixed sample design. To see this, observe that β̂(n₊) equals a linear combination of β̂₁ and β̂₂(n₊), the independent estimates from the internal pilot and additional samples. The randomness of N₊ depends only on ${\hat{σ}}_{1}^{2}$ which is independent of β̂₁, β̂₂(n₊), and, in turn, β̂(n₊). Hence the conditional distributions of β̂(n₊), θ̂(n₊), and SSH(n₊) coincide with the corresponding distributions for a fixed sample size design with n₊ observations. Incidentally, if Es(X₊) has full rank then β̂(n₊) will be unbiased.

Unconditionally, the numerator of the likelihood ratio test statistic for an internal pilot design has the same distribution as for a fixed sample design under the null, but not under the alternative. To see this, observe that if θ ≠ 0 then the CDF of SSH(N₊)/σ² equals a weighted sum of the conditional CDF's, with the weights corresponding to the probability of observing a specific n₊:

Pr {SSH (N_{+}) / σ^{2} < s} = \sum_{n_{+} = n_{+, min}}^{n_{+, max}} F_{χ^{2}} [s; a, ω (n_{+}, γ, θ)] Pr {N_{+} = n_{+}} .

(14)

Under the null, ω(n₊, γ, 0) = 0, the conditional distribution does not depend on n₊, and SSH(N₊)/σ² ∼ χ²(a). Hence any effect on test size due to using an internal pilot does not depend on the numerator of the test statistic.

2.4 The Distribution of SSE(n₊)

With a fixed n₊, SSE(n₊)/σ² ∼ χ²(ν₊). However, with an internal pilot, the dependence of n₊ on ${\hat{σ}}_{1}^{2}$ complicates the distribution. Following Coffey and Muller (1999), write SSE(n₊) as the sum of two independent quadratic forms:

SSE (n_{+}) = y_{+}^{'} A_{e} y_{+} = y_{+}^{'} (A_{e} - A_{1}) y_{+} + y_{+}^{'} A_{1} y_{+} = {SSE}_{*} (n_{+}) + {SSE}_{1} (n_{+}),

(15)

with SSE₁(n₊) the error sum of squares from the internal pilot sample. Coffey and Muller (1999) proved that SSE_*(n₊)/σ² ∼ χ²(n₂). However, conditional upon observing a specific n₊, SSE₁ is restricted to the possible range of values which would have led to that final sample size. Therefore

{SSE}_{1} (n_{+}) / σ^{2} \sim χ_{T}^{2} [ν_{1}, q (n_{+} - m; γ), q (n_{+}; γ)] .

(16)

The characteristic function of SSE(n₊)/σ² has a simple form. Define

D (t) = F_{χ^{2}} [q (n_{+}; γ) (1 - 2 t); ν_{1}] - F_{χ^{2}} [q (n_{+} - m; γ) (1 - 2 t); ν_{1}],

(17)

and assume t ∈ [0, 1/2). A result in Coffey and Muller (2000) allows writing the characteristic function of SSE₁(n₊)/σ² as

ϕ_{{SSE}_{1} (n_{+}) / σ^{2}} (t) = [\frac{D (i t)}{D (0)}] ϕ_{χ^{2}} (t; ν_{1}) .

(18)

In turn, the independence of SSE₁(n₊) and SSE_*(n₊) implies

\begin{matrix} ϕ_{SSE (n_{+}) / σ^{2}} (t) = {[\frac{D (i t)}{D (0)}] ϕ_{χ^{2}} (t; ν_{1})} ϕ_{χ^{2}} (t; n_{2}) \\ = [\frac{D (i t)}{D (0)}] ϕ_{χ^{2}} (t; ν_{+}) . \end{matrix}

(19)

Thus the characteristic function of SSE(n₊)/σ² under an internal pilot design equals the product of the characteristic function under a fixed design and a factor which accounts for truncation. Ignoring the randomness of N₊ and approximating the distribution with the fixed sample result ignores the factor.

The CDF of SSE(n₊)/σ² may be computed by inverting the characteristic function. Alternatively, a method similar to the one used by Coffey and Muller (1999) for the CDF of the test statistic may be used. Condition on SSE₁(n₊)/σ² and integrate numerically over its range of values:

F_{SSE (n_{+}) / σ^{2}} (s; γ) = \int_{q (n_{+} - m, γ)}^{q (n_{+}, γ)} Pr {{SSE}_{*} (n_{+}) / σ^{2} + t \leq s} \frac{f_{χ^{2}} (t; ν_{1})}{Pr {N_{+} = n_{+}}} dt = \frac{1}{Pr {N_{+} = n_{+}}} \int_{q (n_{+} - m, γ)}^{q (n_{+}, γ)} F_{χ^{2}} (s - t; n_{2}) f_{χ^{2}} (t; ν_{1}) dt .

(20)

Compute the unconditional CDF via the law of total probability:

F_{SSE (N_{+}) / σ^{2}} (s; γ) = \sum_{n_{+} = n_{+, min}}^{n_{+, max}} Pr {N_{+} = n_{+}} F_{SSE (n_{+}) / σ^{2}} (s; γ) = \sum_{n_{+} = n_{+, min}}^{n_{+, max}} \int_{q (n_{+} - m, γ)}^{q (n_{+}, γ)} F_{χ^{2}} (s - t; n_{2}) f_{χ^{2}} (t; ν_{1}) dt .

(21)

Truncating the sum will lead to the same size error as in §2.1 for computing the CDF of the test statistic (error < 2∊, with ∊ chosen as the truncation value).

2.5 The Bias of σ̂²(n₊)

Wittes, et al. (1999) showed that ε[σ̂²(N₊)] ≤ σ² (for the t-test), but did not provide an expression for the bias. The importance of the bias arises from the fact that test size inflation varies directly with it. Using the standard result for the moment generating function of a linear transformation of a random variable,

M_{{\hat{σ}}^{2} (n_{+}) / σ^{2}} (t) = [\frac{D (t / ν_{+})}{D (0)}] M_{χ^{2} (ν_{+})} (t / ν_{+}) .

(22)

Taking the first derivative and setting t = 0 gives the conditional bias:

ɛ [\frac{{\hat{σ}}^{2} (n_{+})}{σ^{2}}] = 1 - \frac{2 {q (n_{+}, γ) f_{χ^{2}} [q (n_{+}, γ); ν_{1}] - q (n_{+} - m, γ) f_{χ^{2}} [q (n_{+} - m, γ); ν_{1}]}}{ν_{+} Pr {N_{+} = n_{+}}} .

(23)

Applying Lemma 1 from Coffey and Muller (2000),

ɛ [\frac{{\hat{σ}}^{2} (n_{+})}{σ^{2}}] = 1 - \frac{2 ν_{1} {f_{χ^{2}} [q (n_{+}, γ); ν_{1} + 2] - f_{χ^{2}} [q (n_{+} - m, γ); ν_{1} + 2]}}{ν_{+} Pr {N_{+} = n_{+}}} .

(24)

Recall that χ²(ν₁ + 2) has a single mode at ν₁ (for ν₁ ≥ 2, as always true here). Hence σ̂²(n₊) is (conditionally) biased in a direction depending on whether ${\hat{σ}}_{1}^{2}$ is greater than or less than σ². The law of total probability allows obtaining the unconditional expectation:

ɛ [\frac{{\hat{σ}}^{2} (N_{+})}{σ^{2}}] = \sum_{n_{+} = n_{+, min}}^{n_{+, max}} ɛ [{\hat{σ}}^{2} (n_{+}) / σ^{2}] Pr {N_{+} = n_{+}} = 1 - 2 ν_{1} \sum_{n_{+} = n_{+, min}}^{n_{+, max}} \frac{f_{χ^{2}} [q (n_{+}, γ); ν_{1} + 2] - f_{χ^{2}} [q (n_{+} - m, γ); ν_{1} + 2]}{ν_{+}} .

(25)

Factoring like terms and finding common denominators leads to

ɛ [\frac{{\hat{σ}}^{2} (N_{+})}{σ^{2}}] = 1 - \sum_{n_{+} = n_{+, min}}^{n_{+, max}} (\frac{ν_{1}}{ν_{+}}) (\frac{2 m}{ν_{+} + m}) f_{χ^{2}} [q (n_{+}, γ); ν_{1} + 2] .

(26)

Hence ε[σ̂²(N₊)] ≤ σ². Furthermore, from (26) it is clear that large values of either n₁ or n₊ insure negligible bias. However, the dependence of N₊ on many of the parameters in Table 1 complicate any discussion of large sample properties. In general, any combination of parameters which leads to large values of n₁ or N₊ will reduce bias in σ̂²(N₊). Finally, observing that f_χ₂(t; ν₁ + 2) has a single mode at t = ν₁, allows bounding the unconditional bias:

0 \leq 1 - ɛ [\frac{{\hat{σ}}^{2} (N_{+})}{σ^{2}}] \leq f_{χ^{2}} (ν_{1}; ν_{1} + 2) \sum_{n_{+} = n_{+, min}}^{n_{+, max}} (\frac{ν_{1}}{ν_{+}}) (\frac{2 m}{ν_{+} + m}) .

(27)

The origin of the bias may be characterized further. Obviously $ɛ ({\hat{σ}}_{1}^{2}) = σ^{2}$ . Define ${\hat{σ}}_{*}^{2} (n_{+}) = {SSE}_{*} (n_{+}) / n_{2}$ and note that

ɛ [{\hat{σ}}_{*}^{2} (N_{+})] = \sum_{n_{+}} Pr {N_{+} = n_{+}} ɛ [{SSE}_{*} (n_{+}) / n_{2}] = σ^{2} .

(28)

Therefore σ̂²(N₊) equals a linear function of two unbiased estimators:

{\hat{σ}}^{2} (N_{+}) = (\frac{ν_{1}}{ν_{+}}) {\hat{σ}}_{1}^{2} + (\frac{n_{2}}{ν_{+}}) {\hat{σ}}_{*}^{2} (N_{+}) .

(29)

This form has three important implications. First, the randomness of the weights creates bias. Second, as n₂/n₁ increases, the second term in (29) dominates. Third, although neither of the two unbiased estimates uses all of the data, any combination of the estimates using fixed, positive weights that sum to one would create an unbiased estimate of σ² that uses all of the data.

2.6 Characteristic Function of a 1-1 Function of the Test Statistic

Let F_F(n+) (f; γ;, θ) represent the conditional CDF of the internal pilot test statistic computed at an arbitrary point f. For example, letting f = f_F allows computing the conditional power under an internal pilot design using the unadjusted approach for testing. The results in §2.3 and §2.4 imply that F(n₊) differs from F(f; a, ν₊) only through the denominator. With c(n₊, f) = ν₊/fa, express F_F(n+) (f; γ;, θ) as

\begin{matrix} F_{F (n_{+})} (f; γ; θ) = Pr {\frac{SSH (n_{+}) / a}{SSE (n_{+}) / ν_{+}} \leq f} \\ = Pr {c (n_{+}, f) SSH (n_{+}) / σ^{2} - {SSE}_{*} (n_{+}) / σ^{2} - {SSE}_{1} (n_{+}) / σ^{2} \leq 0} \\ = Pr {c (n_{+}, f) χ^{2} [a, ω (n_{+}, γ, θ)] - χ^{2} (n_{2}) - χ_{T}^{2} [ν_{1}, q (n_{+} - m; γ), q (n_{+}; γ)] \leq 0} \\ = Pr {S (n_{+}, f) \leq 0} = F_{S (n_{+}, f)} (0; f, γ, θ) . \end{matrix}

(30)

Hence we may compute the CDF of F(n₊) at the point f via the CDF of S(n₊, f) evaluated at zero. Davies' (1980) algorithm inverts the characteristic function to compute the CDF of a weighted sum of χ²'s. Since S(n₊, f) contains a doubly-truncated χ², Coffey and Muller (1999) computed the CDF by first conditioning on the doubly-truncated random variable, then applying Davies' algorithm to compute the CDF of the remaining sum of two χ²'s. The approach leads to a double numerical integral. Alternatively, we could directly invert the characteristic function of S(n₊, f), which requires only one integration. Using standard results about characteristic functions gives:

ϕ_{S (n_{+}, f)} (t) = ϕ_{c (n_{+}, f) SSH (n_{+}) / σ^{2}} (t) ϕ_{- {SSE}_{*} (n_{+}) / σ^{2}} (t) ϕ_{- {SSE}_{1} (n_{+}) / σ^{2}} (t) = \frac{D (- i t)}{D (0)} \times {{(1 + 2 i t)}^{- ν_{+} / 2} {[1 - 2 i c (n_{+}, f) t]}^{- a / 2} exp [\frac{i c (n_{+}, f) t ω (n_{+}, γ, θ)}{1 - 2 i c (n_{+}, f) t}]} .

(31)

The term in braces equals the characteristic function for the weighted sum which would arise in the fixed sample case, with n₊ observations. In turn, the factor D(−it)/D(0) accounts for the truncation of ${\hat{σ}}_{1}^{2}$ .

3. Numerical Examples

We illustrate the relationship between the bias in estimating the variance and inflation of test size with two examples. Although our results apply far more generally, for simplicity of exposition the examples correspond to the two sample setting. In each case, we assume α_t = 0.05, P_t = 0.90, and no finite bound on the size of the final sample (n_+,max = ∞). Example A, described by Wittes and Brittain (1990), centers on detecting a mean difference of θ_* = 1 with $σ_{0}^{2} = 2$ , n₀ = 86, n₁ = 44, and n_+,min = n₀ = 86. Example B, described by Coffey and Muller (1999), centers on detecting a mean difference of θ_* = 1.6 with $σ_{0}^{2} = 1$ , n₀ = 20, n₁ = 10 and n_+,min = n₁ = 10. This allows the sample size to be reduced if the variance was originally overstated (γ < 1).

For γ ∈ {0.5, 0.75, 1, 1.5, 2}, Table 2 displays ε [σ̂² (N₊)/σ²] and the true test size for the examples. Bias was computed in SAS IML® with equation 24, while test size was calculated using the algorithm in Coffey and Muller (1999). Note how the amount of test size inflation closely tracks the amount of bias in σ̂² (N₊)/σ². Wittes and Brittain (1990) showed that, in Example A, an internal pilot can provide much better power than a fixed design while providing a negligible increase in test size. Table 2 illustrates that we have little bias in estimating σ̂² (N₊)/σ² as well. However, Example B can lead to test size as large as 0.065 and ε [σ̂² (N₊)/σ²] as small as 0.89. At least for the highly constrained design in Example A, with moderate to large sample sizes, an internal pilot design causes little worry about test size inflation or bias in estimating σ². However, the test size and bias are of much greater concern in small samples.

Table 2. Bias in Estimating γ and the Relationship with Test Size Inflation.

	Example A		Example B

Gamma	ε[σ̂²(N₊)/σ²]	α	ε[σ̂²(N₊)/σ²]	α
0.5	1.000	0.050	0.909	0.055
0.75	0.998	0.050	0.891	0.062
1.0	0.990	0.051	0.896	0.065
1.5	0.985	0.052	0.916	0.065
2	0.988	0.052	0.931	0.062

Open in a new tab

Coffey and Muller (1999) showed that the degree of test size inflation was highly dependent upon the choice of design parameters and sample size allocation rules. The examples illustrate the correlation in the bias of σ̂²(N₊)/σ² with test size inflation. This implies that there are combinations of design parameters and sample size allocation rules which cause nonignorable bias in σ̂²(N₊)/σ². Hence the possibility of bias should at least be examined before using the variance estimate from an internal pilot study to make inference or plan a future study.

4. Implications of the New Results

4.1 Testing

Any inflation of test size arises solely from a change in the distribution of the variance estimate, rather than a change in the distribution of the parameter estimate itself. In the GLUM setting, a test of any secondary parameter involves the variance estimate. For example, consider testing for a “block” effect in order to insure that there were no differences between the internal pilot and second samples with regards to the outcome. The biased estimate of σ² may inflate test size. Hence researchers must be wary of inflation even for hypothesis tests about secondary parameters that were not involved in the sample size re-estimation.

4.2 Confidence Regions for θ

The same complication that biases test size also biases confidence interval coverage. Inverting a test statistic and naively computing a 100(1 − p)% confidence region for θ with standard fixed sample size linear models theory gives coverage less than or equal to the desired level. As with test size, the true coverage depends upon the unknown parameter γ. Furthermore such bias occurs with any secondary parameter.

4.3 Confidence Intervals for σ²

An appropriate confidence interval for the variance observed in a particular study can be invaluable in planning future studies. Furthermore, in the fixed sample size case, computing confidence intervals for σ² allows computing confidence intervals for power and noncentrality (Taylor and Muller, 1996; Muller and Pasour, 1997). However, since SSE(n₊)/σ² does not follow a χ² distribution under an internal pilot design, forming a confidence interval for σ² using fixed sample size methods may not provide the desired coverage. As with confidence regions for θ, the true coverage depends on γ. However, confidence intervals for σ² may have more or less coverage than desired.

5. Conclusions

The following conclusions apply to any univariate linear model with fixed predictors and Gaussian errors.

Coffey and Muller's (1999) algorithm for power involves the distribution of N₊. With n_+,max = ∞, the calculations require truncation of the distribution. The truncation region can be defined to insure a specified upper bound on error.
The likelihood ratio test statistic under an internal pilot design coincides with the statistic for a fixed sample design. However, the statistic does not follow an F-distribution because the variance estimate is not a scaled χ².
The distributions of θ̂(n₊) and SSH(n₊)/σ² coincide with those from a fixed sample analysis with n₊ observations.
SSE(n₊)/σ² equals the sum of a χ²(n₂) and a doubly-truncated χ²(ν₁), in contrast to the fixed sample result of χ²(n₂ + ν₁). This leads to ε [σ̂²(N₊)] ≤ σ². Bias in test size and coverage of confidence intervals varies directly with the bias of the final variance estimate.
Random predictors (Sampson, 1974), greatly complicate the problem. Our results apply only conditional upon the observed value of (random) X₊, at the conclusion of the study. Even with a fixed sample size, only limited results are available for power calculations with random predictors. The introduction of an internal pilot complicates the problem because X₂ is random at the re-estimation stage. Clearly, further research is needed for such studies.
Fast, accurate approximations for computing power and test size would ease the burden of planning a study with an internal pilot design.
Methods for controlling test size merit future research.

Acknowledgments

Coffey's work was supported in part by NIEHS grant 5-T32-ES07018. A portion of the work was submitted by Coffey in partial fulfillment of the requirements for the Ph. D. in Biostatistics. Muller's work was supported in part by NCI program project grant P01 CA47 982-04. The authors wish to thank two anonymous reviewers and an associate editor for a number of helpful suggestions.

Contributor Information

Christopher S. Coffey, A-1124 Medical Center North, Dept. of Preventive Medicine, Vanderbilt Univ. School of Med., Nashville, Tennessee 37232-2637

Keith E. Muller, 3105C McGavran-Greenberg, Dept. of Biostatistics, CB#7400, University of North Carolina, Chapel Hill, North Carolina 27599

Bibliography

Coffey CS, Muller KE. Exact Test Size and Power of a Gaussian Error Linear Model for an Internal Pilot Study. Statistics in Medicine. 1999;18:1199–1214. doi: 10.1002/(sici)1097-0258(19990530)18:10<1199::aid-sim124>3.0.co;2-0. [DOI] [PubMed] [Google Scholar]
Coffey CS, Muller KE. Properties of Doubly-Truncated Gamma Variables. Communications in Statistics - Theory and Methods. 2000;294 doi: 10.1080/03610920008832519. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
Davies RB. The Distribution of a Linear Combination of χ2 Random Variables. Applied Statistics. 1980;29:323–333. [Google Scholar]
Helms RW. Comparisons of Parameter and Hypothesis Definitions in a General Linear Model. Communications in Statistics - Theory and Methods. 1988;17:2725–2753. [Google Scholar]
Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinical Trials. Boca Raton: Chapman & Hall/CRC; 2000. [Google Scholar]
Johnson NL, Kotz S, Balakrishnan N. Continuous Univariate Distributions-2. New York: Wiley; 1995. [Google Scholar]
Muller KE, Pasour VB. Bias in Linear Model Power and Sample Size Due to Estimating Variance. Communications in Statistics - Theory and Methods. 1997;26:839–851. doi: 10.1080/03610929708831953. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sampson AR. A Tale of Two Regressions. Journal of the American Statistical Association. 1974;69:682–689. [Google Scholar]
Taylor DJ, Muller KE. Bias in Linear Model Power and Sample Size Calculations Due to Estimating Noncentrality. Communications in Statistics - Theory and Methods. 1996;25:1595–1610. doi: 10.1080/03610929608831787. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wittes J, Brittain E. The Role of Internal Pilot Studies in Increasing the Efficiency of Clinical Trials. Statistics in Medicine. 1990;9:65–72. doi: 10.1002/sim.4780090113. [DOI] [PubMed] [Google Scholar]
Wittes J, Schabenberger O, Zucker D, Brittain E, Proschan M. Internal Pilot Studies I: Type I error rate of the naive t-test. Statistics in Medicine. 1999;18:3481–3491. doi: 10.1002/(sici)1097-0258(19991230)18:24<3481::aid-sim301>3.0.co;2-c. [DOI] [PubMed] [Google Scholar]

[R1] Coffey CS, Muller KE. Exact Test Size and Power of a Gaussian Error Linear Model for an Internal Pilot Study. Statistics in Medicine. 1999;18:1199–1214. doi: 10.1002/(sici)1097-0258(19990530)18:10<1199::aid-sim124>3.0.co;2-0. [DOI] [PubMed] [Google Scholar]

[R2] Coffey CS, Muller KE. Properties of Doubly-Truncated Gamma Variables. Communications in Statistics - Theory and Methods. 2000;294 doi: 10.1080/03610920008832519. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Davies RB. The Distribution of a Linear Combination of χ2 Random Variables. Applied Statistics. 1980;29:323–333. [Google Scholar]

[R4] Helms RW. Comparisons of Parameter and Hypothesis Definitions in a General Linear Model. Communications in Statistics - Theory and Methods. 1988;17:2725–2753. [Google Scholar]

[R5] Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinical Trials. Boca Raton: Chapman & Hall/CRC; 2000. [Google Scholar]

[R6] Johnson NL, Kotz S, Balakrishnan N. Continuous Univariate Distributions-2. New York: Wiley; 1995. [Google Scholar]

[R7] Muller KE, Pasour VB. Bias in Linear Model Power and Sample Size Due to Estimating Variance. Communications in Statistics - Theory and Methods. 1997;26:839–851. doi: 10.1080/03610929708831953. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Sampson AR. A Tale of Two Regressions. Journal of the American Statistical Association. 1974;69:682–689. [Google Scholar]

[R9] Taylor DJ, Muller KE. Bias in Linear Model Power and Sample Size Calculations Due to Estimating Noncentrality. Communications in Statistics - Theory and Methods. 1996;25:1595–1610. doi: 10.1080/03610929608831787. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Wittes J, Brittain E. The Role of Internal Pilot Studies in Increasing the Efficiency of Clinical Trials. Statistics in Medicine. 1990;9:65–72. doi: 10.1002/sim.4780090113. [DOI] [PubMed] [Google Scholar]

[R11] Wittes J, Schabenberger O, Zucker D, Brittain E, Proschan M. Internal Pilot Studies I: Type I error rate of the naive t-test. Statistics in Medicine. 1999;18:3481–3491. doi: 10.1002/(sici)1097-0258(19991230)18:24<3481::aid-sim301>3.0.co;2-c. [DOI] [PubMed] [Google Scholar]

PERMALINK

Some Distributions and Their Implications for an Internal Pilot Study With a Univariate Linear Model

Christopher S Coffey

Keith E Muller

Abstract

1. Introduction

1.1 Motivation and Literature Review

1.2 Notation

Table 1. Internal Pilot Study Notation.

1.3 Known Results

2. New Analytic Results

2.1 Error Bound for Power and Test Size Algorithm

2.2 The Likelihood and Related Properties

2.3 The Distribution of SSH(n₊)

2.4 The Distribution of SSE(n₊)

2.5 The Bias of σ̂²(n₊)

2.6 Characteristic Function of a 1-1 Function of the Test Statistic

3. Numerical Examples

Table 2. Bias in Estimating γ and the Relationship with Test Size Inflation.

4. Implications of the New Results

4.1 Testing

4.2 Confidence Regions for θ

4.3 Confidence Intervals for σ²

5. Conclusions

Acknowledgments

Contributor Information

Bibliography

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Some Distributions and Their Implications for an Internal Pilot Study With a Univariate Linear Model

Christopher S Coffey

Keith E Muller

Abstract

1. Introduction

1.1 Motivation and Literature Review

1.2 Notation

Table 1. Internal Pilot Study Notation.

1.3 Known Results

2. New Analytic Results

2.1 Error Bound for Power and Test Size Algorithm

2.2 The Likelihood and Related Properties

2.3 The Distribution of SSH(n+)

2.4 The Distribution of SSE(n+)

2.5 The Bias of σ̂2(n+)

2.6 Characteristic Function of a 1-1 Function of the Test Statistic

3. Numerical Examples

Table 2. Bias in Estimating γ and the Relationship with Test Size Inflation.

4. Implications of the New Results

4.1 Testing

4.2 Confidence Regions for θ

4.3 Confidence Intervals for σ2

5. Conclusions

Acknowledgments

Contributor Information

Bibliography

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

2.3 The Distribution of SSH(n₊)

2.4 The Distribution of SSE(n₊)

2.5 The Bias of σ̂²(n₊)

4.3 Confidence Intervals for σ²