Abstract
Optimum sample size is an essential component of any research. The main purpose of the sample size calculation is to determine the number of samples needed to detect significant changes in clinical parameters, treatment effects or associations after data gathering. It is not uncommon for studies to be underpowered and thereby fail to detect the existing treatment effects due to inadequate sample size. In this paper, we explain briefly the basic principles of sample size calculations in medical studies.
Keywords: Sample size, Medical studies
Introduction
Sample size calculations or sample size justifications is one of the first steps in designing a clinical study. The sample size is the number of patients or other investigated units that will be included in a study and required to answer the research hypothesis in the study. The main purpose of the sample size calculation is to determine the enough number of units needed to detect the unknown clinical parameters or the treatment effects or the association after data gathering.
If the sample size is too small, the investigator may not be able to answer the study question. On the other hand, the number of patients in many studies is limited due to practicalities such as cost, patient inconvenience, decisions not to proceed with an investigation or a prolonged study time. Investigators should calculate the optimum sample size before data gathering to avoid the mistakes because of too small sample size and also wasting money and time, because of too large sample size. Besides, sample size calculations for research projects are an essential part of a study protocol for submission to ethical committees or for some peer review journals (1). It is very important to determine the sample size according to the study design and the objectives of the study. Making mistakes in the calculation of the size of sample can lead to incorrect or insignificant results (2). In this paper, we explain briefly the basic principles of sample size calculations in medical studies.
Assumptions for sample size calculation
There are some assumptions in order to calculate the sample size including variability, type I and type II errors and the smallest effect of interest.
Outcome's variability
The variability in the outcome variable is the population variance of a given outcome that is estimated by the standard deviation. Investigators can use an estimate obtained from a pilot study or the reported variation the previously studies.
The type I and type II errors
The type I error is the rejection of a true null hypothesis and type II error is the failure to reject a false null hypothesis. In other meaning, a type I error is corresponding to the level of confidence in sample size calculation, which is the degree of uncertainty or probability that a sample value lies outside a stated limits (2) and type II error is in corresponding to power, which means the ability of a statistical test to reject the false null hypothesis. Power analysis can be used to calculate the minimum sample size so that investigator can detect an effect of a given size.
Effect size
The effect size is the minimal difference between the studied groups that the investigator wishes to detect or the difference between estimation and unknown parameter which investigator wants to estimate. Therefore, one can makes a statement that it does not matter how much the sample estimation differs from true population value by a certain amount. This amount is called minimum effect size.
Sample size calculation in cross-sectional studies
In cross-sectional studies the aim is to estimate the prevalence of unknown parameter(s) from the target population using a random sample. So an adequate sample size is needed to estimate the population prevalence with a good precision.
To calculate this adequate sample size there is a simple formula, however it needs some practical issues in selecting values for the assumptions required in the formula too and in some situations, the decision to select the appropriate values for these assumptions are not simple (3). The following simple formula would be used for calculating the adequate sample size in prevalence study (4); Where n is the sample size, Z is the statistic corresponding to level of confidence, P is expected prevalence (that can be obtained from same studies or a pilot study conducted by the researchers), and d is precision (corresponding to effect size).
The level of confidence usually aimed for is 95%, most researchers present their results with a 95% confidence interval (CI). However, some researchers wants to be more confident can chose a 99% confidence interval.
Researcher needs to know the assumed P in order to use in formula. This can be estimated from previous studies published in the study domain or conduct a pilot study with small sample to estimate the assumed P value. This assumed P is a very important issue because the precision (d) should be selected according to the amount of P. There is not enough guideline for choosing appropriate d. Some authors recommended to select a precision of 5% if the prevalence of the disease is going to be between 10% and 90%, However, when the assumed prevalence is too small (going to be below 10%), the precision of 5% seems to be inappropriate. For example, if the assumed prevalence is 1% the precision of 5% is obviously crude and it may cause inappropriate sample size (3). A conservative choice would be one-fourth or one-fifth of prevalence as the amount of precision in the case of small P. In Table 1, we presented sample size calculation for three different P and three different precisions. For P = 0.05, the appropriate precision is 0.01 which resulted to 1825 samples. For P = 0.2, the best precision would be 0.04 and when P increases to 0.6, the precision could increases up to 0.1 (or more), yields to 92 samples. The investigators should notice to the appropriate precision according to assumed P. The wrong precision yields to wrong sample size (too small or too large).
Table 1.
Precision | Assumed Prevalence | ||
---|---|---|---|
0.05 | 0.2 | 0.6 | |
0.01 | 1825 | 6147 | 9220 |
0.04 | 114 | 384 | 576 |
0.10 | 18 | 61 | 92 |
Sample size calculation in case-control studies
The case-control is a type of epidemiological observational study. It is often used to identify risk factors that may associated to a disease by comparing the risk factors in subjects who have that disease (the cases) with subjects who do not have the disease (the controls).
The sample size calculation for unmatched case control studies (the number of cases and controls) needs these assumptions; the assumed number of cases and controls who experienced the risk factors from similar studies or from a pilot study (also researchers can use the assumed odds ratio; OR), the level of confidence (almost 95%) and the proposed power of the study (would be from 80%). There are software or guide books that provide the investigators with the formula or the sample size calculated in tables according to different assumptions (5). But researchers should remember that, in the presence of a significant confounding factor (6), researchers require a larger sample size. Since the confounding variables must be controlled for in any analysis, a more complex statistical model must be made, so a larger sample is required to achieve significance.
Sample size in clinical trials
In a clinical trial, if the sample size is too small, a well conducted study may fail to answer its research hypothesis or may fail to detect important effects and associations (6). The minimum information needed to calculate sample size for a randomized controlled trial includes the power, the level of significance, the underlying event rate in the population and the size of the treatment effect sought. Besides this, the calculated sample size should be adjusted for other factors including expected compliance rates and, less commonly, an unequal allocation ratio (7).
There are some recommendations for different phases of clinical trials based on their sample size; in phase I trial that involve drug safety on human volunteers. Initial trials might require a total of around 20-80 patients. In phase II trials that investigate the treatment effects, seldom require more than 100-200 patients (8).
Conclusion
Optimum sample size is an essential component of any research (9). It is not uncommon for studies to be underpowered and fail to detect treatment effects due to inadequate sample size (10). The calculation of adequate sample size is an important part of any clinical studies and a professional statistician is the best person to ask for help at the time of planning a research project (6). However, researchers must provide the necessary information in order that the sample size can be determined according to correct assumptions (1). There are many statistical books provided the methods for sample size calculation in medical studies (5) and also several software programs available to help with sample size calculations (11), or online software in the internet. While these programs are user-friendly, researchers should consult an experienced statistician at the design stages of their projects to avoid methodological errors.
(Please cite as: Pourhoseingholi MA, Vahedi M, Rahimzadeh M. Sample size calculation in medical studies. Gastroenterol Hepatol Bed Bench 2013;6(1):14-17).
References
- 1.Macfarlane TV. Sample size determination for research projects. J Orthod. 2003;30:99–100. doi: 10.1093/ortho/30.2.99. [DOI] [PubMed] [Google Scholar]
- 2.Dahiru T, Kene TS, Aliyu AA. Statistics in medical research: misuse of sampling and sample size determination. Ann of Afr Med. 2006;5(3):158–61. [Google Scholar]
- 3.Naing L, Winn T, Rusli BN. Practical issues in calculating the sample size for prevalence studies. Arch Orofacial Sci. 2006;1:9–14. [Google Scholar]
- 4.Daniel WW, editor. 7th ed. New York: John Wiley & Sons; 1999. Biostatistics: a foundation for analysis in the health sciences. [Google Scholar]
- 5.Lemeshow S, Hosmer DW Jr, Klar J, Lwanga SK, editors. Chichester: John Wiley & Sons; 1990. Adequacy of sample size in health studies. [Google Scholar]
- 6.Pourhoseingholi MA, Baghestani AR. When calculation of minimum sample size is not justified. Hepat Mon. 2011;11:208–209. [PMC free article] [PubMed] [Google Scholar]
- 7.Gebski V, Marschner I, Keech AC. Specifying objectives and outcomes for clinical trials. Med J Aust. 2002;176:491–92. doi: 10.5694/j.1326-5377.2002.tb04522.x. [DOI] [PubMed] [Google Scholar]
- 8.Pocock ST, editor. Essex: John Wiley & Sons; 1990. Clinical trials, a practical approach. [Google Scholar]
- 9.Zodpey SP. Sample size and power analysis in medical research. Indian J Dermatol Venereol Leprol. 2004;70:123–28. [PubMed] [Google Scholar]
- 10.Frieman JA, Chalmers TC, Smith H, Jr, Kuebler RR. The importance of beta, the type II error and sample size in the design and interpretation of the randomised control trial: survey of 71 “negative” trials. N Engl J Med. 1978;299:690–94. doi: 10.1056/NEJM197809282991304. [DOI] [PubMed] [Google Scholar]
- 11.EpiInfo 6. Geneva: CDC & WHO; 1997. Database and statistics software for public health professionals. [Google Scholar]