INTRODUCTION
In the previous article, we explained how to calculate the sample size for a cross sectional study based on a rate or a single proportion (1). In this article, we will explain how to calculate the sample size for an independent cohort study based on a comparison of two proportions representing the event rates in both the exposed and the non-exposed groups.
WHEN TO USE THE SAMPLE SIZE CALCULATION PROCEDURE OF TWO PROPORTIONS
The methods explained hereafter should be used in case that the primary outcome of your research study is expressed as a risk ratio or two proportions. Although the risk ratios and two proportions are mainly obtained from cohort studies, other research designs might follow the same scenario if the primary outcome is a comparison of two proportions.
For example, a prospective cohort study to assess the risk of dementia among patients with cerebral microbleeds (exposed group) in comparison with those without cerebral microbleeds (non-exposed group); in this study the incidence of dementia in the two groups are expressed as the relative risk (RR).
Another example, a randomized controlled trial to compare the sustained virologic response rates between daclatasvir and ledipasvir treatments in patients with hepatitis C virus infection; in this study, the SVR rates are expressed as the relative risk (RR).
Requirements for sample size calculation based on prevalence
(1) Expected RR: between exposed and non-exposed groups*
(2) Probability of event in exposed group*
(3) Probability of event in non-exposed group
(4) Statistical power: 0.8, 0.85, or 0.9
(5) Alpha: usually 0.05
(6) Ratio of unexposed to exposed group (1 in case of equal groups)
* Either the RR or the probability of event in exposed group will be needed.
EXAMPLE: CASE STUDY OF EARLY MORTALITY IN CKD PATIENTS WITH HIGH GFR
Assume that we will conduct a cohort study to investigate the impact of high GFR on early mortality in patients with chronic kidney disease, who started hemodialysis. In this study, we will follow two groups of CKD patients: the exposed group is defined as CKD patients with GFR>10 ml/min/1.73 m2 while the non-exposed group is defined as those with GFR≤10 ml/min/1.73 m2.
The literature showed that the RR of early mortality between patients with high vs. low GFR rates was 2.72 as reported by Gómez de la Torre-Del Carpio (2); in this study the proportion rate of mortality in the non-exposed group (low GFR group) was 7.5%. The following steps will show how to calculate the sample size to detect a RR of 2.72 with 90% statistical power and 5% margin of error assuming two equal group.
Calculations will be done by the Epi Info, a program developed by the Centers for Disease Control and Prevention available via the link https://www.cdc.gov/epiinfo/index.html. First, open the software then select “STATCALC” for sample size calculation options (Figure 1). Then, select “cohort or cross-sectional” (Figure 2). Finally, submit the data and check the table for the calculation results. As shown in Figure 3, the results table show a sample size of 266 participants corresponding with 95% confidence level. Therefore, the sample size calculation required for this study will be 330 patients (165 patients per group).
References
- 1.Fahim N, Negida A. Sample Size Calculation Guide - Part 1: How to Calculate the Sample Size Based on the Prevalence Rate. Adv J Emerg Med. 2018;2(4):e50. doi: 10.22114/AJEM.v0i0.101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Gómez de la Torre-del Carpio A, Bocanegra-Jesús A, Guinetti-Ortiz K, Mayta-Tristán P, Valdivia-Vega R. Mortalidad precoz en pacientes con enfermedad renal crónica que inician hemodiálisis por urgencia en una población peruana: Incidencia y factores de riesgo. Nefrología. 2018;38(4):425–32. doi: 10.1016/j.nefro.2017.11.017. [DOI] [PubMed] [Google Scholar]