Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Dec 2.
Published in final edited form as: Commun Stat Theory Methods. 2007 Jun 27;29(12):10.1080/03610920008832631. doi: 10.1080/03610920008832631

Some Distributions and Their Implications for an Internal Pilot Study With a Univariate Linear Model

Christopher S Coffey 1, Keith E Muller 2
PMCID: PMC3845535  NIHMSID: NIHMS446102  PMID: 24307749

Abstract

In planning a study, the choice of sample size may depend on a variance value based on speculation or obtained from an earlier study. Scientists may wish to use an internal pilot design to protect themselves against an incorrect choice of variance. Such a design involves collecting a portion of the originally planned sample and using it to produce a new variance estimate. This leads to a new power analysis and increasing or decreasing sample size. For any general linear univariate model, with fixed predictors and Gaussian errors, we prove that the uncorrected fixed sample F-statistic is the likelihood ratio test statistic. However, the statistic does not follow an F distribution. Ignoring the discrepancy may inflate test size. We derive and evaluate properties of the components of the likelihood ratio test statistic in order to characterize and quantify the bias. Most notably, the fixed sample size variance estimate becomes biased downward. The bias may inflate test size for any hypothesis test, even if the parameter being tested was not involved in the sample size re-estimation. Furthermore, using fixed sample size methods may create biased confidence intervals for secondary parameters and the variance estimate.

Key Words and Phrases: Interim power analysis, sample size re-estimation

1. Introduction

1.1 Motivation and Literature Review

In designing a study, researchers want to collect a sample large enough to detect a specified effect for a given test size (αt) and target power (Pt). Scientists often rely on an educated guess or variance estimate of uncertain validity to conduct a power analysis and choose a sample size. Wittes and Brittain (1990) introduced the concept of an internal pilot study for the two sample t-test, in which some fraction of the planned observations are used to re-estimate error variance but not the effect of interest. Using the new variance estimate in a fixed sample power calculation then modifies the final sample size. Wittes and Brittain suggested ignoring the randomness of the final sample size for testing.

Coffey and Muller (1999) extended the idea to any General Linear Univariate Model (GLUM) with fixed predictors and Gaussian errors. They derived an exact algorithm for computing test size and power of the primary hypothesis. They also illustrated the strong dependence of test size inflation on interactions among a number of study features.

Many important questions remain unanswered. Carefully evaluating the analytic properties of the approach will greatly help in determining the impact of using an internal pilot design. In particular, 1) detailed knowledge of analytic properties of the random variables in the test statistic would allow characterizing the inflation. 2) Additional results are needed for the general GLUM setting to allow testing secondary hypotheses other than the one upon which sample size re-estimation was based. 3) The ability to provide a defensible confidence interval for the variance observed in a study would aid researchers planning similar studies in the future.

1.2 Notation

Indicate the cumulative distribution function (CDF) of a random variable, U, with parameters α1 to αk, as FU(u; α1αk), with pth quantile FU1(p;α1αk), and density fU(u; α1αk). Let χ2(ν, ω) indicate a noncentral χ2, χ2(ν) a central χ2, and F(ν1, ν2, ω) a noncentral F variable (Johnson, Kotz, and Balakrishnan, 1995, Chapters 29 and 30). Also, let χT2(ν;l,u) indicate a doubly truncated central χ2 with lower truncation point l and upper truncation point u (Coffey and Muller, 2000).

We consider the same model as in Coffey and Muller (1999), which includes the two sample t-test as a special case. For a specified design, write a GLUM with fixed predictors and Gaussian errors as

y+=X+β+e+[y1n1×1y2N2×1]=[X1n1×qX2N2×q]β+[e1n1×1e2N2×1], (1)

with partitioning corresponding to the internal pilot and second samples. Table 1 contains four categories of notation: 1) design parameters, which are properties required for any sample size calculation, 2) sample size allocation rules, which determine the size of the internal pilot sample and limit the final sample size, 3) unknown fixed parameters, and 4) random variables to be observed. Let Es(Xj) represent the matrix created by deleting any duplicate rows from Xj (Helms, 1988). We require that Es(X1) = Es(X2), with the possible exception that a “block” effect may be added, which indicates whether the observation was collected in the internal pilot or second sample. We also impose the restriction that all possible observed samples differ by a multiple of a fixed number of observations, m. It follows that n1, N2, and N+ will be multiples of m. For example, consider increasing sample size by always taking two control subjects for each experimental one (m = 3).

Table 1. Internal Pilot Study Notation.

Symbol Definition
Design Parameters
αt Target test size
Pt Target Power
θ* “Scientifically Important” value of θ
σ02
Variance value used for planning
n0 Pre-planned sample size based on αt, Pt, θ*, σ02
Sample Size Allocation
π Proportion of n0 used in internal pilot
n1 Internal pilot sample size; size of first sample, πn0
ν1 Internal pilot error degrees of freedom, n1 − rank(X1)
n+,min Minimum size of final sample
n+,max Maximum size of final sample
Fixed, Unknown Parameters
σ2 True error variance
γ Ratio of true variance to variance used for planning, σ2/ σ02
θ True value of secondary parameter, , a × 1 vector
Random Variables
σ^12
Internal pilot variance estimate, y1[IX1(X1X1)X1]y1/ν1
N2 Size of second sample, with particular value n2
N+ Final sample size, n1 + N2, with particular value n+
ν+ Final sample error df, N+ − rank(X+)
β̂(n+)
(X+X+)X+y+
θ̂(n+) Final estimate of secondary parameter, Cβ̂(n+)
SSH(n+) Final hypothesis SS, θ^(n+)[C(X+X+)C]1θ^(n+)
σ̂2(n+) Final variance estimate, y+[IX+(X+X+)X+]y+/ν+
F(n+) Test statistic, [SSH(n+)/a]/σ̂2(n+)

In testing H0: θ = θ0, with θ = , we assume C to be an a × q matrix with a = rank(C). Without loss of generality assume θ0 = 0 (Coffey and Muller, 1999). The unadjusted testing method computes F(n+), the fixed sample size F statistic and rejects H0 if F(n+)>fF=FF1(1αt,a,ν+).

1.3 Known Results

Wittes and Brittain (1990) considered using an internal pilot design with no adjustment to testing. A fixed sample power calculation determines the random N+ as a function of σ^12. They used simulations to evaluate test size, power, and expected sample size for a t-test involving roughly 100 total observations.

Wittes, Schabenberger, Zucker, Brittain, and Proschan (1999) derived exact test size in this setting. They also showed that σ̂2(N+) is biased downward, but did not provide an expression for the bias.

Coffey and Muller (1999) provided a number of exact results for the more general GLUM setting. For a specified value of n+, define ωt(n+) to be the solution to Pt = 1 − FF[fF; a, ν+, ωt(n+)]. Hence

σ2(n+)=θ[C(X+X+)C]1θωt(n+) (2)

equals the largest value of σ^12 which leads to a final sample size of n+ or smaller. Also define

q(n+,γ)=ν1σ2(n+)σ2=ν1σ2(n+)γσ02, (3)

which equals the largest value of SSE1/σ2 leading to a sample size of n+ or smaller. Hence the probability of a particular random final sample size is

Pr{N+=n+}=Pr{σ2(n+m)<σ^12<σ2(n+)}=Pr{q(n+m,γ)<SSE1/σ2<q(n+,γ)}=Fχ2[q(n+,γ);ν1]Fχ2[q(n+m,γ);ν1]. (4)

Note that Fχ2[q(n+,minm,γ);ν1]=0 and Fχ2[q(n+,max,γ);ν1]=1. In theory, n+, max may be infinite. However, budgetary and time constraints often restrict n+, max to some small multiple of n0. Coffey and Muller (1999) used a double conditioning argument to describe an algorithm for computing the power of the unadjusted test, for any θ:

P(γ,θ)=1n+=n+,minn+,maxq(n+m,γ)q(n+,γ)Pr{(ν+fFa)χ2[a,ω(n+,γ,θ)]χ2(n2)t}fχ2(t;ν1)dt, (5)

with

ω(n+,γ,θ)=θ[C(X+X+)C]1θσ2. (6)

In practice, the results of Coffey and Muller (1999) and the new results in this paper do not require determining N+ with a fixed sample calculation. The rule for choosing sample size need only determine {σ2(n+)} in a way that maps regions of σ^12 into values of N+. Changing the rule merely changes the corresponding truncation regions.

2. New Analytic Results

2.1 Error Bound for Power and Test Size Algorithm

Even without a finite upper limit on N+ (allowing n+,max = ∞), practical computations require truncating the distribution of N+ at a value beyond which the probability of a sample size more extreme is negligible. Indicate the error due to this truncation by PE(γ, θ). Let NL be the lower truncation point, i. e., the largest value of N+ such that Fχ2[q(n+,γ);ν1]<. Let NU be the upper truncation point, i. e., the smallest value of N+ such that 1Fχ2[q(n+,γ);ν1]<. Truncating at NL and NU leads to an error of

PE(γ,θ)=n+=n+,minNLq(n+m,γ)q(n+,γ)FQ(t;n+,γ,θ)fχ2(t;ν1)dt+n+=NUn+,maxq(n+m,γ)q(n+,γ)FQ(t;n+,γ,θ)fχ2(t;ν1)dt, (7)

with

FQ(t;n+,γ,θ)=Pr{(ν+fFa)χ2[a,ω(n+,γ,θ)]χ2(n2)t}. (8)

Replacing FQ(·) with 1 gives an upper bound on the error in the last equation:

PE(γ,θ)n+=n+,minNLq(n+m,γ)q(n+,γ)fχ2(t;ν1)dt+n+=NUn+,maxq(n+m,γ)q(n+,γ)fχ2(t;ν1)dtFχ2(NL;ν1)+[1Fχ2(NU;ν1)]2. (9)

If only one end requires truncation the bound reduces to ∊.

2.2 The Likelihood and Related Properties

Using an internal pilot study causes N+ to be random. In a Bayesian framework, the stopping rule does not affect inference and the likelihood principle is not affected by this randomness (Jennison and Turnbull, 2000, p.338). Thus it seems intuitive that the maximum likelihood estimates and likelihood ratio test statistic for internal pilot and fixed sample designs coincide. Nevertheless, our interest in a wide range of scenarios compelled us to provide a formal proof.

Use conditioning arguments to write the likelihood for the GLUM as

(β,σ2;y1,y2,N+)=(β,σ2;y1)(β,σ2;y2|N+,y1)(β,σ2;N+|y1) (10)

The marginal likelihood of the first sample, ℒ(β, σ2; y1) equals that of a random sample of n1 observations from a Gaussian population. The likelihood of the second sample conditional upon N+, ℒ (β, σ2; y2|N+, y1), equals that of a random sample of n2 observations from a Gaussian population. Hence the total likelihood differs from that for a fixed sample with n+ = n1 + n2 observations only through ℒ(β, σ2; N+ | y1). Observe that N+ is discrete and write

(β,σ2;N+|y1)=Pr{N+=n+|y1}={1,ifq(n+m,γ)<SSE1/σ2q(n+,γ)0,otherwise=I{q(n+m;γ)<SSE1/σ2q(n+;γ)}, (11)

in which I(·) represents an indicator function with a value of 1 if the expression is true. In turn, write the joint likelihood under an internal pilot design as

(β,σ2;y1,y2,N+)=I{q(n+m,γ)<SSE1/σ2q(n+,γ)}×(2πσ2)n+/2exp[(y+X+β)(y+X+β)/2σ2]. (12)

Since the value of the indicator function does not depend on any unknown parameter, we may ignore it in estimation. Therefore the sufficient statistics and maximum likelihood estimates for σ2, β, and θ, coincide with those from a fixed sample size analysis. Let σ̃2(n+) represent the maximum likelihood estimate of σ2. In a fixed sample design we often prefer the unbiased estimate,

σ^2(n+)=(n+ν+)σ2(n+). (13)

However, both σ̂2(n+) and σ̂2(N+) have bias, as detailed in §2.5.

Coincidence of the maximum likelihood estimates implies the coincidence of the likelihood test statistics. However, F(n+) will not follow an F distribution under an internal pilot design. In order to characterize the distribution of F(n+), we examine each component separately in the sections which follow.

2.3 The Distribution of SSH(n+)

Conditional on N+ = n+, the numerator of the likelihood ratio test statistic has the same distribution as for a fixed sample design. To see this, observe that β̂(n+) equals a linear combination of β̂1 and β̂2(n+), the independent estimates from the internal pilot and additional samples. The randomness of N+ depends only on σ^12 which is independent of β̂1, β̂2(n+), and, in turn, β̂(n+). Hence the conditional distributions of β̂(n+), θ̂(n+), and SSH(n+) coincide with the corresponding distributions for a fixed sample size design with n+ observations. Incidentally, if Es(X+) has full rank then β̂(n+) will be unbiased.

Unconditionally, the numerator of the likelihood ratio test statistic for an internal pilot design has the same distribution as for a fixed sample design under the null, but not under the alternative. To see this, observe that if θ0 then the CDF of SSH(N+)/σ2 equals a weighted sum of the conditional CDF's, with the weights corresponding to the probability of observing a specific n+:

Pr{SSH(N+)/σ2<s}=n+=n+,minn+,maxFχ2[s;a,ω(n+,γ,θ)]Pr{N+=n+}. (14)

Under the null, ω(n+, γ, 0) = 0, the conditional distribution does not depend on n+, and SSH(N+)/σ2χ2(a). Hence any effect on test size due to using an internal pilot does not depend on the numerator of the test statistic.

2.4 The Distribution of SSE(n+)

With a fixed n+, SSE(n+)/σ2χ2(ν+). However, with an internal pilot, the dependence of n+ on σ^12 complicates the distribution. Following Coffey and Muller (1999), write SSE(n+) as the sum of two independent quadratic forms:

SSE(n+)=y+Aey+=y+(AeA1)y++y+A1y+=SSE(n+)+SSE1(n+), (15)

with SSE1(n+) the error sum of squares from the internal pilot sample. Coffey and Muller (1999) proved that SSE*(n+)/σ2χ2(n2). However, conditional upon observing a specific n+, SSE1 is restricted to the possible range of values which would have led to that final sample size. Therefore

SSE1(n+)/σ2χT2[ν1,q(n+m;γ),q(n+;γ)]. (16)

The characteristic function of SSE(n+)/σ2 has a simple form. Define

D(t)=Fχ2[q(n+;γ)(12t);ν1]Fχ2[q(n+m;γ)(12t);ν1], (17)

and assume t ∈ [0, 1/2). A result in Coffey and Muller (2000) allows writing the characteristic function of SSE1(n+)/σ2 as

ϕSSE1(n+)/σ2(t)=[D(it)D(0)]ϕχ2(t;ν1). (18)

In turn, the independence of SSE1(n+) and SSE*(n+) implies

ϕSSE(n+)/σ2(t)={[D(it)D(0)]ϕχ2(t;ν1)}ϕχ2(t;n2)=[D(it)D(0)]ϕχ2(t;ν+). (19)

Thus the characteristic function of SSE(n+)/σ2 under an internal pilot design equals the product of the characteristic function under a fixed design and a factor which accounts for truncation. Ignoring the randomness of N+ and approximating the distribution with the fixed sample result ignores the factor.

The CDF of SSE(n+)/σ2 may be computed by inverting the characteristic function. Alternatively, a method similar to the one used by Coffey and Muller (1999) for the CDF of the test statistic may be used. Condition on SSE1(n+)/σ2 and integrate numerically over its range of values:

FSSE(n+)/σ2(s;γ)=q(n+m,γ)q(n+,γ)Pr{SSE(n+)/σ2+ts}fχ2(t;ν1)Pr{N+=n+}dt=1Pr{N+=n+}q(n+m,γ)q(n+,γ)Fχ2(st;n2)fχ2(t;ν1)dt. (20)

Compute the unconditional CDF via the law of total probability:

FSSE(N+)/σ2(s;γ)=n+=n+,minn+,maxPr{N+=n+}FSSE(n+)/σ2(s;γ)=n+=n+,minn+,maxq(n+m,γ)q(n+,γ)Fχ2(st;n2)fχ2(t;ν1)dt. (21)

Truncating the sum will lead to the same size error as in §2.1 for computing the CDF of the test statistic (error < 2∊, with ∊ chosen as the truncation value).

2.5 The Bias of σ̂2(n+)

Wittes, et al. (1999) showed that ε[σ̂2(N+)] ≤ σ2 (for the t-test), but did not provide an expression for the bias. The importance of the bias arises from the fact that test size inflation varies directly with it. Using the standard result for the moment generating function of a linear transformation of a random variable,

Mσ^2(n+)/σ2(t)=[D(t/ν+)D(0)]Mχ2(ν+)(t/ν+). (22)

Taking the first derivative and setting t = 0 gives the conditional bias:

ɛ[σ^2(n+)σ2]=12{q(n+,γ)fχ2[q(n+,γ);ν1]q(n+m,γ)fχ2[q(n+m,γ);ν1]}ν+Pr{N+=n+}. (23)

Applying Lemma 1 from Coffey and Muller (2000),

ɛ[σ^2(n+)σ2]=12ν1{fχ2[q(n+,γ);ν1+2]fχ2[q(n+m,γ);ν1+2]}ν+Pr{N+=n+}. (24)

Recall that χ2(ν1 + 2) has a single mode at ν1 (for ν1 ≥ 2, as always true here). Hence σ̂2(n+) is (conditionally) biased in a direction depending on whether σ^12 is greater than or less than σ2. The law of total probability allows obtaining the unconditional expectation:

ɛ[σ^2(N+)σ2]=n+=n+,minn+,maxɛ[σ^2(n+)/σ2]Pr{N+=n+}=12ν1n+=n+,minn+,maxfχ2[q(n+,γ);ν1+2]fχ2[q(n+m,γ);ν1+2]ν+. (25)

Factoring like terms and finding common denominators leads to

ɛ[σ^2(N+)σ2]=1n+=n+,minn+,max(ν1ν+)(2mν++m)fχ2[q(n+,γ);ν1+2]. (26)

Hence ε[σ̂2(N+)] ≤ σ2. Furthermore, from (26) it is clear that large values of either n1 or n+ insure negligible bias. However, the dependence of N+ on many of the parameters in Table 1 complicate any discussion of large sample properties. In general, any combination of parameters which leads to large values of n1 or N+ will reduce bias in σ̂2(N+). Finally, observing that fχ2(t; ν1 + 2) has a single mode at t = ν1, allows bounding the unconditional bias:

01ɛ[σ^2(N+)σ2]fχ2(ν1;ν1+2)n+=n+,minn+,max(ν1ν+)(2mν++m). (27)

The origin of the bias may be characterized further. Obviously ɛ(σ^12)=σ2. Define σ^2(n+)=SSE(n+)/n2 and note that

ɛ[σ^2(N+)]=n+Pr{N+=n+}ɛ[SSE(n+)/n2]=σ2. (28)

Therefore σ̂2(N+) equals a linear function of two unbiased estimators:

σ^2(N+)=(ν1ν+)σ^12+(n2ν+)σ^2(N+). (29)

This form has three important implications. First, the randomness of the weights creates bias. Second, as n2/n1 increases, the second term in (29) dominates. Third, although neither of the two unbiased estimates uses all of the data, any combination of the estimates using fixed, positive weights that sum to one would create an unbiased estimate of σ2 that uses all of the data.

2.6 Characteristic Function of a 1-1 Function of the Test Statistic

Let FF(n+) (f; γ;, θ) represent the conditional CDF of the internal pilot test statistic computed at an arbitrary point f. For example, letting f = fF allows computing the conditional power under an internal pilot design using the unadjusted approach for testing. The results in §2.3 and §2.4 imply that F(n+) differs from F(f; a, ν+) only through the denominator. With c(n+, f) = ν+/fa, express FF(n+) (f; γ;, θ) as

FF(n+)(f;γ;θ)=Pr{SSH(n+)/aSSE(n+)/ν+f}=Pr{c(n+,f)SSH(n+)/σ2SSE(n+)/σ2SSE1(n+)/σ20}=Pr{c(n+,f)χ2[a,ω(n+,γ,θ)]χ2(n2)χT2[ν1,q(n+m;γ),q(n+;γ)]0}=Pr{S(n+,f)0}=FS(n+,f)(0;f,γ,θ). (30)

Hence we may compute the CDF of F(n+) at the point f via the CDF of S(n+, f) evaluated at zero. Davies' (1980) algorithm inverts the characteristic function to compute the CDF of a weighted sum of χ2's. Since S(n+, f) contains a doubly-truncated χ2, Coffey and Muller (1999) computed the CDF by first conditioning on the doubly-truncated random variable, then applying Davies' algorithm to compute the CDF of the remaining sum of two χ2's. The approach leads to a double numerical integral. Alternatively, we could directly invert the characteristic function of S(n+, f), which requires only one integration. Using standard results about characteristic functions gives:

ϕS(n+,f)(t)=ϕc(n+,f)SSH(n+)/σ2(t)ϕSSE(n+)/σ2(t)ϕSSE1(n+)/σ2(t)=D(it)D(0)×{(1+2it)ν+/2[12ic(n+,f)t]a/2exp[ic(n+,f)tω(n+,γ,θ)12ic(n+,f)t]}. (31)

The term in braces equals the characteristic function for the weighted sum which would arise in the fixed sample case, with n+ observations. In turn, the factor D(−it)/D(0) accounts for the truncation of σ^12.

3. Numerical Examples

We illustrate the relationship between the bias in estimating the variance and inflation of test size with two examples. Although our results apply far more generally, for simplicity of exposition the examples correspond to the two sample setting. In each case, we assume αt = 0.05, Pt = 0.90, and no finite bound on the size of the final sample (n+,max = ∞). Example A, described by Wittes and Brittain (1990), centers on detecting a mean difference of θ* = 1 with σ02=2, n0 = 86, n1 = 44, and n+,min = n0 = 86. Example B, described by Coffey and Muller (1999), centers on detecting a mean difference of θ* = 1.6 with σ02=1, n0 = 20, n1 = 10 and n+,min = n1 = 10. This allows the sample size to be reduced if the variance was originally overstated (γ < 1).

For γ ∈ {0.5, 0.75, 1, 1.5, 2}, Table 2 displays ε [σ̂2 (N+)/σ2] and the true test size for the examples. Bias was computed in SAS IML® with equation 24, while test size was calculated using the algorithm in Coffey and Muller (1999). Note how the amount of test size inflation closely tracks the amount of bias in σ̂2 (N+)/σ2. Wittes and Brittain (1990) showed that, in Example A, an internal pilot can provide much better power than a fixed design while providing a negligible increase in test size. Table 2 illustrates that we have little bias in estimating σ̂2 (N+)/σ2 as well. However, Example B can lead to test size as large as 0.065 and ε [σ̂2 (N+)/σ2] as small as 0.89. At least for the highly constrained design in Example A, with moderate to large sample sizes, an internal pilot design causes little worry about test size inflation or bias in estimating σ2. However, the test size and bias are of much greater concern in small samples.

Table 2. Bias in Estimating γ and the Relationship with Test Size Inflation.

Example A Example B


Gamma ε[σ̂2(N+)/σ2] α ε[σ̂2(N+)/σ2] α
0.5 1.000 0.050 0.909 0.055
0.75 0.998 0.050 0.891 0.062
1.0 0.990 0.051 0.896 0.065
1.5 0.985 0.052 0.916 0.065
2 0.988 0.052 0.931 0.062

Coffey and Muller (1999) showed that the degree of test size inflation was highly dependent upon the choice of design parameters and sample size allocation rules. The examples illustrate the correlation in the bias of σ̂2(N+)/σ2 with test size inflation. This implies that there are combinations of design parameters and sample size allocation rules which cause nonignorable bias in σ̂2(N+)/σ2. Hence the possibility of bias should at least be examined before using the variance estimate from an internal pilot study to make inference or plan a future study.

4. Implications of the New Results

4.1 Testing

Any inflation of test size arises solely from a change in the distribution of the variance estimate, rather than a change in the distribution of the parameter estimate itself. In the GLUM setting, a test of any secondary parameter involves the variance estimate. For example, consider testing for a “block” effect in order to insure that there were no differences between the internal pilot and second samples with regards to the outcome. The biased estimate of σ2 may inflate test size. Hence researchers must be wary of inflation even for hypothesis tests about secondary parameters that were not involved in the sample size re-estimation.

4.2 Confidence Regions for θ

The same complication that biases test size also biases confidence interval coverage. Inverting a test statistic and naively computing a 100(1 − p)% confidence region for θ with standard fixed sample size linear models theory gives coverage less than or equal to the desired level. As with test size, the true coverage depends upon the unknown parameter γ. Furthermore such bias occurs with any secondary parameter.

4.3 Confidence Intervals for σ2

An appropriate confidence interval for the variance observed in a particular study can be invaluable in planning future studies. Furthermore, in the fixed sample size case, computing confidence intervals for σ2 allows computing confidence intervals for power and noncentrality (Taylor and Muller, 1996; Muller and Pasour, 1997). However, since SSE(n+)/σ2 does not follow a χ2 distribution under an internal pilot design, forming a confidence interval for σ2 using fixed sample size methods may not provide the desired coverage. As with confidence regions for θ, the true coverage depends on γ. However, confidence intervals for σ2 may have more or less coverage than desired.

5. Conclusions

The following conclusions apply to any univariate linear model with fixed predictors and Gaussian errors.

  1. Coffey and Muller's (1999) algorithm for power involves the distribution of N+. With n+,max = ∞, the calculations require truncation of the distribution. The truncation region can be defined to insure a specified upper bound on error.

  2. The likelihood ratio test statistic under an internal pilot design coincides with the statistic for a fixed sample design. However, the statistic does not follow an F-distribution because the variance estimate is not a scaled χ2.

  3. The distributions of θ̂(n+) and SSH(n+)/σ2 coincide with those from a fixed sample analysis with n+ observations.

  4. SSE(n+)/σ2 equals the sum of a χ2(n2) and a doubly-truncated χ2(ν1), in contrast to the fixed sample result of χ2(n2 + ν1). This leads to ε [σ̂2(N+)] ≤ σ2. Bias in test size and coverage of confidence intervals varies directly with the bias of the final variance estimate.

  5. Random predictors (Sampson, 1974), greatly complicate the problem. Our results apply only conditional upon the observed value of (random) X+, at the conclusion of the study. Even with a fixed sample size, only limited results are available for power calculations with random predictors. The introduction of an internal pilot complicates the problem because X2 is random at the re-estimation stage. Clearly, further research is needed for such studies.

  6. Fast, accurate approximations for computing power and test size would ease the burden of planning a study with an internal pilot design.

  7. Methods for controlling test size merit future research.

Acknowledgments

Coffey's work was supported in part by NIEHS grant 5-T32-ES07018. A portion of the work was submitted by Coffey in partial fulfillment of the requirements for the Ph. D. in Biostatistics. Muller's work was supported in part by NCI program project grant P01 CA47 982-04. The authors wish to thank two anonymous reviewers and an associate editor for a number of helpful suggestions.

Contributor Information

Christopher S. Coffey, A-1124 Medical Center North, Dept. of Preventive Medicine, Vanderbilt Univ. School of Med., Nashville, Tennessee 37232-2637

Keith E. Muller, 3105C McGavran-Greenberg, Dept. of Biostatistics, CB#7400, University of North Carolina, Chapel Hill, North Carolina 27599

Bibliography

  1. Coffey CS, Muller KE. Exact Test Size and Power of a Gaussian Error Linear Model for an Internal Pilot Study. Statistics in Medicine. 1999;18:1199–1214. doi: 10.1002/(sici)1097-0258(19990530)18:10<1199::aid-sim124>3.0.co;2-0. [DOI] [PubMed] [Google Scholar]
  2. Coffey CS, Muller KE. Properties of Doubly-Truncated Gamma Variables. Communications in Statistics - Theory and Methods. 2000;294 doi: 10.1080/03610920008832519. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Davies RB. The Distribution of a Linear Combination of χ2 Random Variables. Applied Statistics. 1980;29:323–333. [Google Scholar]
  4. Helms RW. Comparisons of Parameter and Hypothesis Definitions in a General Linear Model. Communications in Statistics - Theory and Methods. 1988;17:2725–2753. [Google Scholar]
  5. Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinical Trials. Boca Raton: Chapman & Hall/CRC; 2000. [Google Scholar]
  6. Johnson NL, Kotz S, Balakrishnan N. Continuous Univariate Distributions-2. New York: Wiley; 1995. [Google Scholar]
  7. Muller KE, Pasour VB. Bias in Linear Model Power and Sample Size Due to Estimating Variance. Communications in Statistics - Theory and Methods. 1997;26:839–851. doi: 10.1080/03610929708831953. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Sampson AR. A Tale of Two Regressions. Journal of the American Statistical Association. 1974;69:682–689. [Google Scholar]
  9. Taylor DJ, Muller KE. Bias in Linear Model Power and Sample Size Calculations Due to Estimating Noncentrality. Communications in Statistics - Theory and Methods. 1996;25:1595–1610. doi: 10.1080/03610929608831787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Wittes J, Brittain E. The Role of Internal Pilot Studies in Increasing the Efficiency of Clinical Trials. Statistics in Medicine. 1990;9:65–72. doi: 10.1002/sim.4780090113. [DOI] [PubMed] [Google Scholar]
  11. Wittes J, Schabenberger O, Zucker D, Brittain E, Proschan M. Internal Pilot Studies I: Type I error rate of the naive t-test. Statistics in Medicine. 1999;18:3481–3491. doi: 10.1002/(sici)1097-0258(19991230)18:24<3481::aid-sim301>3.0.co;2-c. [DOI] [PubMed] [Google Scholar]

RESOURCES