Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jun 1.
Published in final edited form as: Fertil Steril. 2024 Feb 6;121(6):899–901. doi: 10.1016/j.fertnstert.2024.01.040

A Brief Overview of Pilot Studies and Their Sample Size Justification

Allen R Kunselman 1
PMCID: PMC11128343  NIHMSID: NIHMS1973154  PMID: 38331310

Précis

Pilot studies, if properly designed and implemented, are an important tool that provide critical information for the development and potential success of a subsequent, larger trial. In fact, these small-scale studies are commonly used to assess the feasibility of whether a larger trial should be initiated. A popular investigator question is whether a pilot study requires a formal statistical power calculation. In general, the answer is “no”; however, the sample size needs to be justified based on the goal of the pilot study.

Keywords: pilot study, feasibility, sample size

Introduction

Full-scale, hypothesis-driven, confirmatory trials can be very resource intensive and expensive to conduct and may not ultimately be successful (either positively or negatively) due to improper planning. To help ensure the successful conduct of the main, full-scale trial a pilot study may first be undertaken. The aims and methods of a pilot study should align with the goals of the subsequent main trial. Moore et al. define pilot studies as “preparatory studies designed to test the performance characteristics and capabilities of study designs, measures, procedures, recruitment criteria, and operational strategies that are under consideration for use in a subsequent, often larger, study.”1 The terms pilot study and feasibility study often have been used interchangeably.2 However, some advocate that feasibility studies are used to estimate important parameters needed to design the future main trial (e.g., variability of outcome measure, recruitment rates, willingness of patients to be randomized, subject retention rates); whereas, pilot studies are miniature versions of the main trials focused on whether the processes of the main trial are working well (e.g., randomization procedures, training of staff).3,4 A pilot study can be either an internal study (i.e., the pilot data are included with the main trial study data) or an external study (i.e., the pilot data are independently assessed and not included in the main trial). Internal pilot studies usually incorporate an interim analysis at the end of the “pilot” stage, are often used to assess initial sample size assumptions in order to possibly re-estimate the sample size for the trial, and have proper accounting for the inflated type I error.5 Internal pilot studies must be thoroughly planned at the design stage of the main study as the pilot data are part of the main study itself. For the purpose of this brief overview, only external pilot studies and feasibility will be discussed.

Reasons for Conducting Pilot Studies

Before stating some valid reasons to perform a pilot study, let’s address a common improper justification many investigators use to conduct a pilot study. This elephant in the room is lack of funding. The lack of funding and resources is not justification in and of itself to conduct a small-scale study and simply label it as a “pilot” study. Matching the sample size for the study to the availability of funds without proper regard to goals is not sufficient justification.1 Though lack of funding is definitely an issue, a proper pilot study still needs to have clear goals, details of how these goals will be assessed, justification for the sample size given the goals, and proper methodological rigor for how the pilot study will be conducted.

Thabane et al. breakdown the reasons to perform a pilot study into four broad classifications.6 The first classification is “process” which assesses the feasibility of the steps and processes for future success of the main trial. Some process examples include recruitment rates, participant retention rates, protocol compliance/adherence rates, assessment of eligibility criteria, assessing the order of procedures to be performed at a clinical visit, tracking shipments of biologic material from the clinic to an external laboratory, identifying issues resulting in missing data, and assessment of data collection forms and tools for understandability and completeness. The second classification is “resources” which assesses time and budget concerns. Some resource examples include the time required for participants to complete study forms, the cost and time to mail surveys, the time required to complete the patient visit, equipment availability and cost to use it, plans for when equipment breaks or is lost/stolen, and the investigators’ availability and time to perform all required tasks. The third classification is “management” which assesses staffing and data management concerns. Some management examples are potential issues entering data, matching data from different sources properly and effectively, concerns from personnel running the study, and assessing staffing capacity to perform the required tasks. The fourth classification is “scientific” which focuses on the impact of the intervention/treatment. A scientific example is determining an estimate of the intervention effect and variability for the primary outcome in order to inform the main trial.

Word of Caution for Powering the Main Trial

If estimation of intervention effect size and/or variability of measures are the goal of the pilot study in order to inform the sample size of the subsequent main trial, some caution must first be mentioned. It is advised not to use the actual effect size from the pilot study as the effect size for the main trial because the estimated effect size from the pilot’s small sample size is very likely overestimated and may mislead.7 In short, over-estimation of the intervention effect size for the main trial will result in an under-powered study because the true effect size is smaller than that observed in the pilot study. If possible, the main trial’s intervention effect size should be based on the minimally clinical important difference (MCID) for the outcome of interest. The MCID is the smallest intervention effect that would have a meaningful impact for the patient. Furthermore, the MCID is typically smaller than the observed pilot study intervention effect size which is based on a small number of participants. Determination of the MCID, however, is beyond the scope of this brief overview. Similarly, it is advised not to use the actual variability (i.e., standard deviation for a continuous outcome) from the pilot study as the variability measure for the main trial because it has been shown to often be under-estimated resulting in an underpowered main trial.8 Browne has suggested using the upper limit of the 80% confidence interval (CI) for the standard deviation from the pilot study to perform the power calculation of the main trial.8 Additionally, sensitivity analyses of the assumptions for the main trial could be performed to account for the uncertainty in both the effect size and variability. In other words, for the main trial perform a range of sample sizes based on plausible estimates informed by the pilot study.

Sample Size Justification

Pilot studies are not confirmatory trials and thus should not be hypothesis-driven (e.g., p-value centric). The analysis of a pilot study should be focused on descriptive statistics (e.g., means, standard deviations, and quantiles for continuous variables and frequencies and percentages for categorical variables) and estimates of precision (i.e., width of CI).5 Because pilot studies are not hypothesis-driven, there is no need for a formal statistical power calculation. However, justification of the sample size for a pilot study is still required and should be based on the primary goal of the study. Although strongly not advised, if any hypothesis tests are conducted in the pilot trial the results should be interpreted with extreme caution and a “significant” result should not stop proceeding to a larger confirmatory trial.5

An estimate of precision using a confidence interval approach, particularly a one-sided CI, is an ideal way to justify the sample size.8,9,10 Based on the actual pilot goal such as feasibility, some estimates of precision are the 95% CI of the mean if the outcome of interest is continuous (e.g., time to complete a survey) or the 95% CI for a one-sample proportion if the outcome of interest is binary (e.g., protocol adherence rate). As an example, let’s assume the primary goal of a 6-month longitudinal pilot study is feasibility based on the retention of the participants at the end of 6 months. We anticipate the retention rate will be 90% and want the margin of error to be 5%, i.e., we are willing to accept a retention rate as low as 85% to consider the feasibility trial a success and move on to the main trial. As this is a binary endpoint, justifying the sample size using the 95% CI for a one-sample proportion (p±1.96p(1p)n), where p is the estimate of the proportion and n is the sample size, is appropriate. In order to successfully assess this feasibility retention goal for this example, a pilot study sample size of 36 participants would be required to achieve a 95% CI of (0.85, 0.95) for a one-sample proportion.

Another approach to justify the sample size is to use and reference one of the multiple pilot study sample size flat rules of thumb proposed.8,9,10,11,12 A flat rule of thumb is a single number that is suggested for the pilot study for all scenarios. For a binary outcome, such as an event rate, from 60 to 100 subjects per group should be sufficient for the pilot study.11 For a continuous outcome, a sample size of anywhere from 12 per group to 35 per group have been proposed.8,9,10,11,12 The larger the sample size for the pilot, the more precision the estimates will have for planning the main trial. It should be noted, however, that none of these flat rules of thumb take into consideration the size of the future main trial.

If the main trial’s target effect size is known a priori, stepped rules of thumb have been proposed which provide much more granularity for pilot study sample size justification.13,14 These proposed stepped rules of thumb are based on the standardized effect size, also known as Cohen’s d, for the main trial. For a two independent arm trial, the standardized effect size is the difference in means between the two arms divided by the pooled standard deviation. Table 1 reproduces, with some slight edits, the stepped rules of thumb summarized by Bell et al.14 As an example using Table 1, suppose a pilot study is being planned to support moving forward to a main trial. The main trial will be a two-arm, randomized, controlled trial with an active drug arm and a placebo arm. The investigators know that for the main trial, based on expert opinion, a minimally important clinical difference for the primary outcome is 10 units and that from literature review the standard deviation could be reasonably estimated as 18 units resulting in a standardized effect size for the main trial of 10/18=0.56. The value of 0.56 is considered a medium standardized effect size. Using Table 1, for a 90% powered main trial, the pilot study sample size should be 15 per arm and the subsequent main trial will require somewhere between 44 and 235 participants per arm.

Table 1.

Pilot study sample size per arm for a continuous outcome using stepped rules of thumb as a function of the main trial’s target standardized effect size.13,14

80% power for main trial 90% power for main trial
Standardized Effect Size (d)* Pilot study sample size per arm Main trial sample size per arm Pilot study sample size per arm Main trial sample size per arm
Extra small (d<0.1) 50 >1571 75 >2103
Small (0.1≤ d<0.3) 20 176—1571 25 235—2103
Medium (0.3≤ d<0.7) 10 34—176 15 44—235
Large (d≥0.7) 10 ≤34 10 ≤44
*

Standardized effect size: d=(x¯1x¯2)SDpooled, where x¯1= mean of arm  1, x¯2= mean of arm 2, and the pooled standard deviation: SDpooled =(n11)s12+(n21)s22n1+n22, where n1 = number in arm 1, n2 = number in arm 2, s12= variance in arm 1, and s22= variance in arm 2.

Summary

A pilot study is an important tool to assess the processes to be used in the main trial, identify problematic issues and determine if they can be corrected, and to determine the feasibility of conducting a main trial. Clear goals of the pilot study should be stated a priori with details as to what will be considered a successful pilot (e.g., feasibility success criteria). Analysis of pilot studies should be based on descriptive statistics and precision of estimates (CIs) and ideally not involve any inferential statistics (i.e., p-values). As the pilot study is not conducted for the purpose of inference, a power calculation is not needed. However, the sample size needs to be justified based on the goal of the pilot study.

Funding:

The project described was supported by the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant UL1 TR002014. The content is solely the responsibility of the author and does not necessarily represent the official views of the NIH.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Disclosure: A.R.K. declares stock ownership in Merck.

References

  • 1.Moore CG, Carter RE, Nietert PJ, Stewart PW. Recommendations for planning pilot studies in clinical and translational research. Clin Transl Sci. 2011;4(5):332–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Lancaster GA, Thabane L. Guidelines for reporting non-randomised pilot and feasibility studies. Pilot Feasibility Stud. 2019. 5:114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Whitehead AL, Sully BG, Campbell MJ. Pilot and feasibility studies: is there a difference from each other and from a randomised controlled trial? Contemp Clin Trials. 2014. 38(1):130–3. [DOI] [PubMed] [Google Scholar]
  • 4.Arain M, Campbell MJ, Cooper CL, Lancaster GA. What is a pilot or feasibility study? A review of current practice and editorial policy. BMC Med Res Methodol. 2010. 16;10:67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Lancaster GA, Dodd S, Williamson PR. Design and analysis of pilot studies: recommendations for good practice. J Eval Clin Pract. 2004;10(2):307–12. [DOI] [PubMed] [Google Scholar]
  • 6.Thabane L, Ma J, Chu R, Cheng J, Ismaila A, Rios LP, Robson R, Thabane M, Giangregorio L, Goldsmith CH. A tutorial on pilot studies: the what, why and how. BMC Med Res Methodol. 2010;10:1. Erratum in: BMC Med Res Methodol. 2023;23(1):59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kraemer HC, Mintz J, Noda A, Tinklenberg J, Yesavage JA. Caution regarding the use of pilot studies to guide power calculations for study proposals. Arch Gen Psychiatry. 2006. 63(5):484–9. [DOI] [PubMed] [Google Scholar]
  • 8.Browne RH. On the use of a pilot sample for sample size determination. Stat Med. 1995;14(17):1933–40. [DOI] [PubMed] [Google Scholar]
  • 9.Cocks K, Torgerson DJ. Sample size calculations for pilot randomized trials: a confidence interval approach. J Clin Epidemiol. 2013;66(2):197–201. [DOI] [PubMed] [Google Scholar]
  • 10.Sim J, Lewis M. The size of a pilot study for a clinical trial should be calculated in relation to considerations of precision and efficiency. J Clin Epidemiol. 2012;65(3):301–8. [DOI] [PubMed] [Google Scholar]
  • 11.Teare MD, Dimairo M, Shephard N, Hayman A, Whitehead A, Walters SJ. Sample size requirements to estimate key design parameters from external pilot randomised controlled trials: a simulation study. Trials. 2014;15:264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Julious SA. Sample size of 12 per group rule of thumb for a pilot study. Pharmaceut. Statist 2005. 4:287–291. [Google Scholar]
  • 13.Whitehead AL, Julious SA, Cooper CL, Campbell MJ. Estimating the sample size for a pilot randomised trial to minimise the overall trial sample size for the external pilot and main trial for a continuous outcome variable. Stat Methods Med Res. 2016;25(3):1057–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bell ML, Whitehead AL, Julious SA. Guidance for using pilot studies to inform the design of intervention trials with continuous outcomes. Clin Epidemiol. 2018;10:153–157. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES