Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jan 22.
Published in final edited form as: Int J Stat Med Res. 2012;1(1):45–50. doi: 10.6000/1929-6029.2012.01.01.03

Power Calculations for Two-Wave, Change from Baseline to Follow-up Study Designs

M Colin Ard 1, Steven D Edland 1,2, Alzheimer’s Disease Neuroimaging Initiative
PMCID: PMC7822571  NIHMSID: NIHMS1587367  PMID: 33488890

Abstract

Change in a quantitative trait is commonly employed as an endpoint in two-wave longitudinal studies. For example, early phase clinical trials often use two-wave designs with biomarker endpoints to confirm that a treatment affects the putative target treatment pathway before proceeding to larger scale clinical efficacy trials. Power calculations for such designs are straightforward if pilot data from longitudinal investigations of similar duration to the proposed study are available. Often longitudinal pilot data of similar duration are not available, and simplifying assumptions are used to calculate sample size from cross-sectional data, one standard approach being to use a formula based on variance estimated from cross sectional data and correlation estimates abstracted from the literature or inferred from experience with similar endpoints. An implicit assumption of this standard approach is that the variance of the quantitative trait is the same at baseline and follow-up. In practice, this assumption rarely holds, and sample size estimates by this standard formula can be dramatically anti-conservative. Even when longitudinal pilot data for estimating parameters required in sample size calculations are available, sample size calculations will be biased if the interval from baseline to follow-up is not of similar duration to that proposed for the study being designed. In this paper we characterize the magnitude of bias in sample size estimates when formula assumptions do not hold and derive alternative conservative formulas for sample size required to achieve nominal power.

Keywords: Sample Size, Phase 2, Phase II, Clinical Trial, Rate of Change, Compound Symmetry

1. Introduction

Quantitative traits are commonly employed as endpoints in phase II clinical trials. In Alzheimer’s disease, recent efforts have focused on changes in biomarkers as a means of studying disease progression and drug efficacy, and as an aid to diagnosis and early decision-making in the evaluation of clinical interventions for chronic, debilitating and degenerative disease(14). The ongoing process of identifying and validating biomarkers for use as surrogate endpoints, and more generally the quality and informativeness of research outcomes derived from the analysis of such endpoints, require that these trials be adequately powered.

Power and sample size calculations for two-wave (baseline and one follow-up) designs using quantitative endpoints are straightforward if pilot data from longitudinal investigations of similar duration to the proposed study are available. In this case the variance of change scores, σ2d(t), can be estimated from pilot data, and supplied to the usual two-sample normal approximation formula for calculating sample size requirements for a t-test of no treatment effect against a non-directional alternative hypothesis:

nt=2σd(t)2[Φ1(α/2)+Φ1(β)]2Δ(t)2 (1)

where nt is the number of subjects required per study arm for a trial of duration t, σ2d(t) is the within-group variance of the change in measurements from baseline to follow-up for a study of duration time t, Φ−1(p) is the quantile function of the standard normal distribution, α and β are the Type I and Type II error rates, and Δ(t) is the magnitude of the between-group difference in mean change from baseline to follow-up time t that the investigator wishes to be able to detect with probability 1 - β.

Longitudinal pilot data are required to estimate the variance term, σ2d(t), in equation (1). Lacking longitudinal pilot data or published estimates, it may be possible to abstract related parameters from the literature for estimating σ2d(t). For example, values of variance at baseline and at follow-up, and the correlation of baseline to follow-up, may be available in published reports. Calling the variance at baseline σ2(0), the variance at follow-up σ2(t), and the correlation between baseline and follow-up ρ(0t), σ2d(t) is then calculated as:

σd(t)2=σ(0)2+σ(t)22ρ(0t)σ(0)σ(t) (2)

and the value obtained from equation (2) can then be used in place of σ2d(t) in equation (1).

If σ2(t) = σ2(0), that is, if there is compound symmetry of the covariance matrix of the baseline and follow-up (time t) measurements, then equation (2) reduces to:

σd(t)2=2(1ρ(0t))σ(0)2 (3)

As was the case with equation (2), the result of equation (3) can be substituted directly into equation (1), but this is valid only when σ2(t) = σ2(0).

Equation (3) is presented, for example, in Meinert(5) (equation 9.14), Lachin(6), Overall and Doyle(7), and Overall & Starbuck(8). In all cases the formula is presented with no discussion of what the assumption that σ2(t) = σ2(0) implies about the data generating process, or of the consequences of making this assumption in error. This is potentially problematic, because applying equation (3) when compound symmetry does not hold can lead to dramatic underestimation of the required sample size. For chronic progressive disease in particular, persons tend to progress at different rates, so that trajectories of progression tend to fan apart and the variance of measurements of disease severity increase as the cohort is followed over time(9). If the interval from baseline to follow-up is sufficient, there may be a substantial increase in variance from baseline to follow-up and a substantial underestimation of required sample size by equation (3).

A similar danger arises when the researcher has direct access to longitudinal pilot data or is able to abstract relevant parameters from a published longitudinal study, but the duration t of the planned trial exceeds the duration, s say, of the available pilot data. Here, again, if longitudinal trajectories fan apart, then σ2d(s) < σ2d(t) whenever s < t, and sample size estimates obtained by naïve use of σ2d(s) in place of σ2d(t) will be anti-conservative. In the absence of a model describing the functional dependence of the per-occasion variances and between-occasion correlations on the measurement time, no sample size formula is available in this scenario.

In this paper we illustrate analytically and by way of example the potential magnitude of underestimation in sample size calculations by naively applying equation (3) when the assumption that σ2(t) = σ2(0) does not hold (Section 2). We also demonstrate that even when longitudinal pilot data are available for estimating parameters used in sample size calculations, these estimates and resulting sample size calculations can be biased if the interval from baseline to follow-up is not of similar duration to that proposed for the trial (Section 3). Section 3 will also derive conservative upper bounds on required sample size useful for sensitivity analyses when adequate pilot data are not available for planning a future trial.

2. Bias in Sample Size Calculations when the σ2(t) = σ2(0) Assumption Does Not Hold

Letting nt represent the true required sample size per arm for a study of length t and denoting the estimate obtained using equation (3) as nt(3), for a given effect size and Type I and Type II error probabilities, the extent to which nt(3) underestimates nt, expressed as a percentage of the true required sample size is given by:

%Underestimation=100%×ntnt(3)nt=100%×(σ(t)σ(0))[σ(t)(2ρ(0t)1)σ(0)]σ(0)2+σ(t)22ρ(0t)σ(0)σ(t) (4)

Note that 2ρ(0t) – 1 ≤ 1, so that the numerator terms in equation (4) are always greater than zero, and hence nt(3) underestimates the true required sample size, whenever σ2(t) > σ2(0).

Example

To illustrate, we present examples of sample size calculations for a hypothetical phase II clinical trial in Alzheimer’s disease. Longitudinal data excerpted from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort study are used to inform power calculations. ADNI is a joint government, private pharmaceutical, and non-profit initiative to determine promising markers of early AD progression and aid researchers and clinicians in developing effective therapies. The ADNI cohort is a longitudinal observational study that is by design representative of subjects eligible for recruitment to Alzheimer’s disease treatment trials(10). Baseline and 12-month follow-up data were accessed from this publically available dataset(10) on September 29, 2009. Estimates of σ2(0), σ2(t), and ρ(0t) for lateral inferior ventricular volume and for the Alzheimer’s Disease Assessment Scale cognitive subscale (ADAS-cog(11)), two potential endpoints for a phase II trial of an Alzheimer’s disease treatment, are summarized in Table 1. Subjects with an Alzheimer diagnosis at baseline and data at both baseline and 12-month time points were included in the calculations. The variance is larger at follow-up than at baseline for both outcomes, as expected for longitudinal measurements of a chronic progressive disease. The parameter estimates in Table 1 were used to calculate sample size to obtain 80% power to detect Δ(t) equal to a 25% reduction of ventricular enlargement relative to that observed in the ADNI cohort placebo condition pilot data, and likewise to calculate sample size to obtain 80% power to detect a 25% slowing of rate of progression on the ADAS-cog relative to placebo, using both the naïve but biased estimate of σ2d(t) (equation (3)) and the appropriate estimate of σ2d(t) (equation (2)). Sample size estimates resulting from naively using equation (3) to calculate σ2d(t) are approximately 50% smaller than the sample sizes required to obtain nominal power as calculated using equation (2) (Table 1). For example, naïve use of equation (3) to estimate σ2d(t) in equation (1) would lead us to conclude that a sample size of 96 per arm would be sufficient to ensure power = 0.8 to detect a 25% reduction in the rate of ventricular volume increase, whereas the actual required sample size, derived using equation (2), would be 195 subjects per arm. For the ADAS-cog data, these values are 356 and 719, respectively. These sample calculations are for illustrative purposes only. Power calculations for an Alzheimer treatment trial involve additional considerations, including but not limited to calculating bootstrap confidence intervals about sample size estimates acknowledging the uncertainty in population parameters supplied to the power formula (12, 13).

Table 1.

Parameter estimates and sample size requirements for ADAS-cog and ventricular volume endpoints, as calculated from ADNI pilot data.

Endpoint
Statistic ADAS-cog Inf. Lat. Ventricles
Pilot Sample Size 163 149
Baseline Mean 18.3 4.2
12-Month Change 4.2 0.6
σ2(0) 38.6 4.3
σ2(12) 92.6 6.0
ρ(0,12) 0.68 0.98
n12 719 195
n12(3) 356 96
%Underestimation 50.5% 51.1%

Notes: Inf. Lat. Ventricles is the sum of the volumes of the left and right inferior lateral ventricles in cm3; σ2(0) = Baseline variance; σ2(12) = Month 12 variance; ρ(0,12) = Baseline-to-Month 12 correlation; n12 = required sample size per arm to detect a 25% reduction in 12-Month Change with power = 0.8 and two-tailed significance level = 0.05; n12(3) = estimated sample size per arm to detect a 25% reduction in 12-Month Change with power = 0.8 and two-tailed significance level = 0.05 as calculated by equation (3); %Underestimation = underestimation of the required sample size per arm by use of equation (3), expressed as a percent.

3. Potential Bias when Pilot Data are Inconsistent with Planned Trial Design

If researchers have access to pilot data or summary statistics from a study of comparable duration to the planned trial, a valid estimate of the required sample size can be determined using equation (2) to calculate σ2d(t). Often, however, pilot data are from a study of duration s, with s < t. This mismatch in trial duration is significant because typically σ2(s) < σ2(t) when s is shorter than t, and therefore sample size calculations tend to be anti-conservative. Letting nt[s] denote the estimated required sample size per arm calculated from equation (1) with σ2d(s) used in place of σ2d(t), the magnitude of underestimation expressed as a percentage of actual required sample size nt is given by:

%Underestimation=100%×ntnt[s]nt=100%×σ(t)2σ(s)22σ(0)(ρ(0t)σ(t)ρ(0s)σ(s))σ(0)2+σ(t)22ρ(0t)σ(0)σ(t) (5)

While the specification of conditions that are both necessary and sufficient for equation (5) to take values greater than zero is somewhat more involved than was the case with equation (4), it can be shown, for example, that nt[s] will be anticonservative for data generated by the familiar mixed effects model with linear trajectories fanning apart over time and i.i.d. residual error, or whenever both: 1) σ2(0) < σ2(s) < σ2(t) and 2) 0 ≤ ρ(0t)ρ(0s) (Appendix).

Lacking pilot data consistent with the planned trial, there are several alternative approaches. If longitudinal data with more than two waves of observation are available, then power calculations can be performed under the assumption of a linear mixed effects model with correlated random intercepts and slopes(12), a flexible yet parsimonious analytic framework for longitudinal data that is well-suited to situations where variances are expected to increase (and correlations are expected to decrease), as the study duration is extended. However, if only two-wave pilot data are available and the duration of pilot data observation is different (typically shorter than), the duration proposed for the planned trial, the variance components from the correlated random intercepts and slopes variant of the linear mixed effects model are not identifiable, and calculation of the required sample size per arm is not possible. Nonetheless, the linear mixed effects model can still be relied upon to motivate solutions for the required sample size per arm that are guaranteed to be conservative under certain conditions. We offer here two power formulas that provide conservative overestimates of required sample size when the duration of pilot data is less than the duration of the planned trial.

Conservative formula 1.

Under the linear mixed effects model, the variance of change from baseline to follow-up σ2d(t) for a study of length t can be expressed as a function of the variance of change from baseline to follow-up σ2d(s) for a study of length s and the mixed model residual error variance σ2e as follows (Appendix):

σd(t)2=t2s2σd(s)22t2s2s2σe2 (6)

For t > s, the second term is greater than or equal to zero, and therefore:

σ˙d(t)2=t2s2σd(s)2 (7)

is an overestimate of σ2d(t), and power calculations by equation (1) with σ˙d(t)2 used in place of σ2d(t) will be conservative overestimates of required sample size to achieve power (1 – β).

Conservative formula 2.

Alternatively, under the linear mixed effects model, the variance of change from baseline to follow-up σ2d(t) for a study of length t can be expressed as a function of σ2d(s), σ2(s) = the variance of the quantitative measure at follow-up time s, σ2(0) = the variance at the baseline observation, and σab = the covariance of the random intercept and slope coefficients (Appendix):

σd(t)2=σd(s)2+(t2s2s2)(σ(s)2σ(0)2)2(t2s2s)σab (8)

For σab > 0 and t > s the final term is greater than or equal to zero, and therefore:

σ¨d(t)2=σd(s)2+(t2s2s2)(σ(s)2σ(0)2) (9)

is an overestimate of σ2d(t), and power calculations by equation (1) with σ¨d(t)2 used in place of σ2d(t) will yield conservative overestimates of required sample size to achieve power (1 – β). Note that the solution is exact, i.e., σ¨d(t)2 = σ2d(t), when σab = 0. Since both equation (7) and equation (9) yield exact or conservative solutions for required sample size under the linear mixed effects model assumption when the covariance of the random intercept and slope coefficients is non-negative, the smallest sample size estimate of the two calculated sample sizes, i.e., the one using σd(t)2=min[σ˙d(t)2,σ¨d(t)2], is to be preferred under these conditions.

5. Discussion

Considerable costs can be incurred when time and resources are allocated to an otherwise well conceived but underpowered study. Investigators may therefore wish instead to opt for a conservative solution to the problem at hand, i.e., one that results in a trial with greater than nominal power. In the absence of longitudinal pilot data, there is little information and little recourse for powering longitudinal trials. We recommend against using equation (3) however, and rather suggest using equation (2) to estimate the variance for sample size calculations when balanced two-wave pilot data is available, because equation (2) requires you to explicitly acknowledge the unknown parameters and therefore provide estimates for these parameters to the sample size formula. For the case where longitudinal pilot data of length s < t are available, we have provided conservative formulas for sample size calculation for trials of length t that are valid under the modest assumption that the data derive from a mixed effects model with linear longitudinal trajectories that are fanning apart. Anti-conservative trials that ignore these concerns risk unnecessary false negative findings and lost opportunities for drug development.

Acknowledgements

The authors were supported by NIA grants AG034439, AG005131, and AG10483. Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (NIH U01 AG024904). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Abbott; Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Amorfix Life Sciences Ltd.; AstraZeneca; Bayer HealthCare; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan Pharmaceuticals Inc.; Eli Lilly and Company; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; GE Healthcare; Innogenetics, N.V.; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Servier; Synarc Inc.; and Takeda Pharmaceutical Company. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health Rev March 26, 2012 (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of California, Los Angeles.

Appendix

Conditions under which σ2d(s) leads to anti-conservative sample size estimation.

When data are generated by a linear mixed effect model

We begin by introducing notation for two-wave data generated by the mixed effects model commonly applied to longitudinal data from clinical trials and cohort studies. Let y be observations on some random variable of interest that follow the linear mixed effects model with bivariate normal random intercept and slope coefficients and i.i.d. residual error:

yiτ=α+βτ+ai+biτ+eiτ (A1)
(aibi)N(0,(σa2σabσabσb2)),eiτN(0,σe2) (A2)

Here α and β represent the fixed intercept and slope terms (rather than the Type I and Type II error probabilities), respectively, ai and bi are their random counterparts for subject i, and the e are i.i.d. residual error terms specific to subject i at time τ. For two-wave data, τ will typically only take on two values by design, e.g., zero at baseline and t at follow-up for a trial of length t, or zero at baseline and s at follow-up for a trial of length s. It follows from equations (A1) and (A2) that for any τ:

σ(τ)2=Var{yiτ}=Var{ai+biτ+eiτ}=σa2+τ2σb2+2τσab+σe2 (A3)

Also, for any τ ≠ 0:

σd(τ)2=Var{yiτyi0}=Var{biτ+eiτei0}=τ2σb2+2σe2 (A4)

To prove that σ2d(s) leads to anti-conservative sample size estimates under this model, simply note that by (A4) σ2d(s) < σ2d(t), and therefore nt[s] calculated using σ2d(s) in equation (1) will underestimate nt when s < t and σ2b > 0. The σ2b > 0 condition simply means that trajectories are fanning apart, as is typically observed in longitudinal studies of chronic, progressive disease.

When σ2(0) < σ2(s) < σ2(t) and 0 ≤ ρ(0t)ρ(0s):

To show that equation (5) is greater than zero, and hence that nt[s] is anti-conservative, under the general conditions that 1) σ2(0) < σ2(s) < σ2(t) and 2) 0 ≤ ρ(0t)ρ(0s), we need to prove that the numerator of equation (5) is positive under these conditions, which is equivalent to showing that:

σ(s)+σ(t)2σ(0)>ρ(0t)σ(t)ρ(0s)σ(s)σ(t)σ(s) (A5)

The proof follows immediately by noting that, given the assumptions, the left-hand side of inequality (A1) is > 1, and the right-hand side of (A1) is ≤ 1.

Derivation of equation (6).

Using the model and notation above, equation (6) follows by writing:

σd(t)2t2s2σd(s)2=t2σb2+2σe2t2s2(s2σb2+2σe2)=2t2s2s2σe2 (A6)

Derivation of equation (8).

Similarly, equation (8) follows by noting that:

σd(t)2σd(s)2(t2s2s2)(σ(s)2σ(0)2)=t2σb2+2σe2(s2σb2+2σe2)(t2s2s2)×[(σa2+s2σb2+2sσab+σe2)(σa2+σe2)]=(t2s2)σb2(t2s2s2)(s2σb2+2sσab)=2(t2s2s)σab (A7)

Footnotes

Full-text Open Access Publication available c/o the Journal of Statistics in Medical Research

Ard MC, Edland SD. Power Calculations for Two-Wave, Change from Baseline to Follow-Up Study Designs. pInternational Journal of Statistics in Medical Research. 2012; 1(1):45–50. doi: 10.6000/1929–6029.2012.01.01.03.

References

  • 1.Hampel H, Mitchell A, Blennow K, Frank RA, Brettschneider S, Weller L, et al. Core biological marker candidates of Alzheimer’s disease - perspectives for diagnosis, prediction of outcome and reflection of biological activity. J Neural Trans. 2004;111:247–72. [DOI] [PubMed] [Google Scholar]
  • 2.Matthews B, Siemers ER, Mozley PD. Imaging-based measures of disease progression in clinical trials of disease modifying drugs for Alzheimer disease. Am J Geriat Psychiatr. 2003;11:146–59. [PubMed] [Google Scholar]
  • 3.Mohs RC, Kawas C, Carrillo MC. Optimal design of clinical trials for drugs designed to slow the course of Alzheimer’s disease. Alzheimers Dement. 2006;2:131–9. [DOI] [PubMed] [Google Scholar]
  • 4.Thal LJ, Kantarci K, Reiman EM, Klunk WE, Weiner MW, Zetterberg H, et al. The role of biomarkers in clinical trials for Alzheimer disease. Alz Dis Assoc Dis. 2006;20:6–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Meinert CL. Clinical trials: design, conduct, and analysis. 2nd ed. New York: Oxford University Press; 2012. [Google Scholar]
  • 6.Lachin JM. Introduction to sample-size determination and power analysis for clinical trials. Control Clin Trials. 1981;2:93–113. [DOI] [PubMed] [Google Scholar]
  • 7.Overall JE, Doyle SR. Estimating sample sizes for repeated measurement designs. Control Clin Trials. 1994;15:100–23. [DOI] [PubMed] [Google Scholar]
  • 8.Overall JE, Starbuck RR. Sample-size estimation for randomized pre-post designs. J Psychiatr Res. 1979;15:51–5. [DOI] [PubMed] [Google Scholar]
  • 9.Edland SD. Blomqvist revisited: how and when to test the relationship between level and longitudinal rate of change. Stat Med. 2000;19:1441–52. [DOI] [PubMed] [Google Scholar]
  • 10.Petersen RC, Aisen PS, Beckett LA, Donohue MC, Gamst AC, Harvey DJ, et al. Alzheimer’s Disease Neuroimaging Initiative (ADNI): Clinical characterization. Neurol. 2010;74:201–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Mohs RC, Knopman D, Petersen RC, Ferris SH, Ernesto C, Grundman M, et al. Development of cognitive instruments for use in clinical trials of antidementia drugs: Additions to the Alzheimer’s disease assessment scale that broaden its scope. Alz Dis Assoc Dis. 1997;11:S13–S21. [PubMed] [Google Scholar]
  • 12.Ard MC, Edland SD. Power calculations for clinical trials in Alzheimer’s disease. J Alzheimers Dis. 2011;26:369–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.McEvoy LK, Edland SD, Holland D, Hagler DJ, Roddey JC, Fennema-Notestine C, et al. Neuroimaging enrichment strategy for secondary prevention trials in Alzheimer disease. Alz Dis Assoc Dis. 2010;24:269–77. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES