Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2022 Aug 23;212:7–9. doi: 10.1016/j.puhe.2022.08.008

Simple approximation of sample size for precise estimates of SARS-CoV-2 infection from point-seroprevalence studies

AM Nikiforuk a,b,, I Sekirov a,c, AN Jassem a,c
PMCID: PMC9395286  PMID: 36174438

Abstract

Objective

This study aimed to model the precision of SARS-CoV-2 seroprevalence estimates.

Methods

Sample size and precision estimates were calculated using the normal approximation to the binomial distribution. The relationship between sample size and precision was visualized across a range of assumed SARS-CoV-2 seroprevalence from 2% to 75%.

Results

The calculation found that 2% precision was attainable by taking moderately sized sample sets when the expected seroprevalence of SARS-CoV-2 infection exceeds 2%. In populations with a low incidence of SARS-CoV-2 infection and an expected seroprevalence of less than 2%, larger samples are required for precise estimates.

Conclusions

Taking a sample of 177–1000 participants can provide precise prevalence estimates of SARS-CoV-2 infection in vaccinated and unvaccinated populations. Larger sample sizes are only necessary in low prevalence settings.

Keywords: Prevalence, SARS-CoV-2, Precision, Study design, Serological survey, Epidemiology


Surveillance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections has evolved throughout the COVID-19 pandemic but remains a critical public health tool to estimate risk and inform containment, mitigation, and treatment strategies.1 In the first year of the pandemic, many countries used nucleic acid amplification tests for routine surveillance of SARS-CoV-2 infection counts in their populations.2 Overtime, the use of molecular tests was scaled down in many jurisdictions due to a combination of factors, including cost, patient hesitancy, and availability of take-home rapid antigen tests. In the second year of the pandemic, surveillance programs transitioned to using serological testing and novel methods such as environmental sampling of wastewater to estimate SARS-CoV-2 infection rates.3 Going forward, the use of serological testing for surveillance is likely to increase, as it can be used to estimate SARS-CoV-2 point-prevalence and measure antibody responses post COVID-19 vaccination and fluctuations in anti-SARS-CoV-2 antibody levels over time.4

In response to the increased need to design and implement SARS-CoV-2 serosurveillance studies, we provide a simple approximation of the sample size required for precise estimates.

Accurately estimating the point prevalence of a disease in a population requires a precision-based sample size calculation. The precision of an estimate refers to half the width of the desired confidence interval (CI).5 For example, if the CI equals five units, then the precision would equal two and a half units. Calculating the desired precision in the design stage of the study has several benefits, as it requires that the investigators think about the width of the CI and whether it exceeds the expected prevalence value. For example, if a disease has an expected prevalence of 3%, then a study with a CI of 10% lacks sufficient precision to observe the true prevalence in the population of interest: Prevalence = 3%, 95% CI: –2% to 8%, the CI includes zero. Therefore, precision should be derived from the expected prevalence estimate, and studies should not be conducted before making a precision-based sample size calculation.

The simplest (frequentist) methods to perform a precision-based sample size calculation assume that the test being used to classify disease or evidence of infection possesses perfect sensitivity and specificity. This assumption is not impractical, as many studies use a test with high diagnostic accuracy, reporting 95% CIs accommodates for error and corrections for instrument bias can be made postestimation by the Rogan-Gladen method.6 Biological confounders of test results should also be taken into consideration, for example, interpretation of serological test results should always be done in reference to the time of infection and disease severity, as antibodies wane overtime and those with mild symptoms may have a less robust serological response. We used a sample size formula derived from the normal approximation of the binomial distribution to estimate the required sample size for precise SARS-CoV-2 serological surveys.7 Notation of formula to estimate precision is as follows:

n=(z2Np(1p))(N1)(h2p2)+z2p(1p)

where n is the sample size, N represents population size∗, p represents prevalence, h represents precision, and z is Z-score. A population∗ of one million and Z-score of 1.96 was specified.

We used the above formula to calculate the necessary sample size to achieve precise estimates across a range of expected SARS-CoV-2 prevalence from 2% to 75% (Fig. 1 ; Table S1).

Fig. 1.

Fig. 1

Calculated estimates of precision by sample size assuming varying prevalence. (A) X-axis is scaled from 0 to 5000; (B) X-axis is scaled from 0 to 500.

The precision curve (Fig. 1) shows that sample sizes ranging from 177 to 1000 possess adequate precision even when the expected prevalence equals 2% (Fig. 1, Table S1). In the fall of 2021, during the fourth epidemiological phase of the COVID-19 pandemic, high seroprevalence estimates were observed in unvaccinated persons living in South Africa, anti-spike 68.4% (95% CI, 67.2–69.6%), and antinucleocapsid 39.7% (95% CI, 38.4–41.0%).8 In the United States, similar trends were reported with antinucleocapsid seroprevalence increasing from 2021 to 2022 from 33.5% (95% CI, 33.1–34.0%) to 57.7% (95% CI, 57.1–58.3%).9 Assuming that, early 2022 seroprevalence rates exceed approximately 30%, a sample of 817 provides an estimate with <2% precision (Table S1). In health-based research, a precision of 2–5% is recommended for most applications, we recommend that the investigator select the appropriate precision on a case-to-case basis.10 Of note, if a serosurvey is planning to use multiple antigenic targets in its study design, the estimated sample size for high precision results should be calculated using the assay target that is expected to yield lower seroprevalence estimates (e.g. such as the antinucleocapsid antibodies in the case of SARS-CoV-2, that are known to wane over time).11

Investigators should consider the precision afforded by their sample size in the design phase of a SARS-CoV-2 seroprevalence study (or any other pathogen of interest), samples size greater than 1000 are only necessary in low prevalence settings. In a high prevalence but low incidence environment, a smaller sample size can be used to estimate the number of people infected but may not reliably measure changes overtime. We recommend taking several randomly selected stratified samples overtime to reduce bias and observe longitudinal trends.

Author statements

Ethical approval

No ethical approval was sought for the study, as it does not involve the recruitment of participants, data collection, or measurement of an intervention. The analysis describes calculations, which does not rely on observational or experimental data of any kind.

Funding

A.M.N. (under the supervision of A.N.J.) was awarded a Frederick Banting and Charles Best Canada Graduate Scholarship from the Canadian Institutes of Health Research to pursue doctoral studies (#434951). I.S. and A.N.J. obtained funding from Public Health Agency of Canada through the COVID-19 Immunity Taskforce (2021-HQ-000141).

Competing interests

The authors have no conflicts of interest to declare.

Author contributions

A.M.N., I.S., and A.N.J. conceived and designed the study. Data analysis was performed by A.M.N. A.M.N., I.S., and A.N.J. responded to the peer review. All authors interpreted the data, contributed to writing and editing the manuscript, and provided their approval for publication.

Footnotes

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.puhe.2022.08.008.

Appendix A. Supplementary data

The following is the Supplementary data to this article:

Multimedia component 1
mmc1.docx (12.8KB, docx)

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1
mmc1.docx (12.8KB, docx)

Articles from Public Health are provided here courtesy of Elsevier

RESOURCES