Skip to main content
The BMJ logoLink to The BMJ
. 2006 May 13;332(7550):1127–1129. doi: 10.1136/bmj.38793.637789.2F

Sample sizes of studies on diagnostic accuracy: literature survey

Lucas M Bachmann 1, Milo A Puhan 2, Gerben ter Riet 3, Patrick M Bossuyt 4
PMCID: PMC1459608  PMID: 16627488

Abstract

Objectives To determine sample sizes in studies on diagnostic accuracy and the proportion of studies that report calculations of sample size.

Design Literature survey.

Data sources All issues of eight leading journals published in 2002.

Methods Sample sizes, number of subgroup analyses, and how often studies reported calculations of sample size were extracted.

Results 43 of 8999 articles were non-screening studies on diagnostic accuracy. The median sample size was 118 (interquartile range 71-350) and the median prevalence of the target condition was 43% (27-61%). The median number of patients with the target condition—needed to calculate a test's sensitivity—was 49 (28-91). The median number of patients without the target condition—needed to determine a test's specificity—was 76 (27-209). Two of the 43 studies (5%) reported a priori calculations of sample size. Twenty articles (47%) reported results for patient subgroups. The number of subgroups ranged from two to 19 (median four). No studies reported that sample size was calculated on the basis of preplanned analyses of subgroups.

Conclusion Few studies on diagnostic accuracy report considerations of sample size. The number of participants in most studies on diagnostic accuracy is probably too small to analyse variability of measures of accuracy across patient subgroups.

Introduction

Estimates of sensitivity and specificity in small studies on diagnostic accuracy are usually imprecise, with wide confidence intervals. This makes it difficult to assess just how informative a test may be. Subgroup analysis is often needed because sensitivity and specificity may vary across patient subgroups, yet estimates are even less precise when subgroups are considered.1 Investigators should calculate the sample size needed for sufficiently narrow confidence intervals at the planning stages of a study, as is common practice for randomised trials.2,3 For example, if a diagnostic test requires a sensitivity of at least 90% for adequate decision making, the lower boundary of the 95% confidence interval should be at least 90%.

We hypothesised that studies of diagnostic accuracy rarely report considerations of sample size and tend to be small. We assumed that authors would state calculations of sample size if they had been performed. We investigated study sizes, the number of subgroup analyses, and how often studies on diagnostic accuracy reported calculations of sample sizes.

Methods

Two reviewers independently screened all issues of the BMJ, Lancet, New England Journal of Medicine, and JAMA as well as four specialist journals (Thorax, Gastroenterology, American Journal of Obstetrics and Gynecology, and European Journal of Pediatrics) published in 2002 for studies on the accuracy of tests. From each full report we extracted data on the type of test(s) studied (table), study sizes, the number of subgroup analyses, and how often the studies reported calculations of sample size. We calculated 95% confidence intervals, medians, and interquartile ranges.

Table 1.

Key features of 57 studies on accuracy of diagnostic tests published in eight major medical journals in 2002

First author Type of test Prevalence (%) Sample size Screening Subgroup analysis Multivariable analysis Stratified reporting Number of subgroups
Schneider Imaging 2 8640 Yes No No No -
Pilcher Laboratory tests 0.5 8194 Yes No No No -
Bahado-singh (1) Laboratory tests 2 5641 Yes Yes Yes No 2
Kulasingam Laboratory tests 3 4075 Yes Yes No Yes 2
Lu Physical examination 1 3710 Yes No No No -
Vintzileos Imaging 2 3291 Yes No No No -
Vasan Laboratory tests 6 3177 Yes Yes Yes Yes 4
Bahado-singh (2) Imaging 3 3003 No Yes Yes No 5
Selvachandran History 4 2268 No Yes Yes Yes 2
Maisel Laboratory tests 47 1586 No No No No -
Lenders Laboratory tests 25 858 No Yes No Yes 2
Tibble Laboratory tests 44 602 No Yes No Yes 2
Bahado-singh (3) Laboratory tests 3 568 Yes Yes Yes No -
Azuma Laboratory tests 6 561 Yes Yes No Yes 3
Ikeda Physical examination 59 529 No No No No -
Laing History 21 458 Yes No No No -
Schutter Laboratory tests 55 412 No No No No -
Wang Laboratory tests 50 394 No Yes No Yes 2
Muensterer Imaging 6 386 No No No No -
Chavarria Laboratory tests 7 378 No No No No -
Rettenbacher Imaging 17 350 No Yes No Yes 3
Rubin Laboratory tests 39 342 No No No No -
Luck History 4 341 Yes No No No -
Ghezzi Laboratory tests 3 306 No No No No -
Riordan History 73 278 No Yes Yes No 19
Kim Laboratory tests 30 251 No Yes No Yes 2
Vayssiere History 5 242 Yes Yes Yes No 2
Virkki Laboratory tests 85 215 No Yes Yes No 7
Remes History 16 212 No No No No -
Hughes Laboratory tests 4 208 No Yes No Yes 2
Bouin Other 43 199 No No No No -
Ribeiro Laboratory tests 85 177 No Yes No Yes 2
Riskin-Mashiah Imaging 6 166 Yes No No No -
Selan Laboratory tests 27 139 No No No No -
Oudkerk Imaging 30 118 No No No No -
Mihm Laboratory tests 58 113 No No No No -
McManus Other 64 110 Yes No No No -
McMahon Physical examination 12 109 No No No No -
Stiller Other 6.5 107 No No No No -
Dueholm Imaging 69 106 No No No No -
Andrews Imaging 53 100 No No No No -
Jossens Laboratory tests 32 97 No Yes No Yes 4
DeRoche Laboratory tests 84 90 No No No No -
Narang Laboratory tests 39 80 No No No No -
Harewood Imaging 61 80 No No No No -
Larsen Other 75 79 No Yes No Yes 2
Warke Laboratory tests 41 71 No No No No -
Hara Imaging 66 60 No No No No -
Gerber Laboratory tests 34 53 No No No No -
Chmait Imaging 85 53 No No No No -
Georgakoudi Laboratory tests 64 44 No No No No -
Ragette Other 79 42 No Yes No Yes 3
Parker Imaging * 33 No No No No -
Odunsi Laboratory tests 39 33 No No No No -
Cosmi Imaging 53 32 No No No No -
Broth Other 41 29 No No No No -
Satoh Laboratory tests 61 23 No No No No -

This table provides information for both screening (excluded) and non-screening studies.

*

Could not be determined.

Results

Fifty seven of 8999 articles reported test accuracy. Fourteen studies focused on a screening test and were excluded, which left 43 clinical studies for analysis. The median sample size was 118 (interquartile range 71-350) and the median prevalence was 43% (27-61%). The median number of patients with the target condition—needed to calculate a test's sensitivity—was 49 (28-91). The median number of patients without the target condition—needed to determine a test's specificity—was 76 (27-209).

Two of 43 studies (5%; 95% confidence interval 1.3% to 15.5%) reported a priori calculations of sample size, but no study reported that the sample size had been calculated on the basis of preplanned analyses of subgroups. Twenty articles (47%) reported results for subgroups of patients. The number of subgroups ranged from two to 19 (median four). Four studies used multivariable regression, but none used interaction terms.

Discussion

In this survey of studies on diagnostic accuracy in eight major journals, only 4.7% of the studies reported that they considered sample size. Analysing small numbers of participants with and without the target condition usually yields imprecise estimates of overall diagnostic accuracy, and even less precise estimates of subgroups. For example, when the number of patients with the target condition is 49 the two sided 95% confidence interval of a sensitivity of 81% (40 true positives) is 68% to 91%.4,5

To ensure reasonably precise estimates of sensitivity and specificity investigators should consider sample sizes during the planning stages of the study. Investigators should calculate how precise the estimates of test accuracy should be for a particular diagnostic situation and report these calculations with confidence intervals. Arguably, sample size calculations are not important once data collection has been completed.2 All that matters is the width of the confidence intervals. However, besides determining the minimum study size needed, calculations of sample size have another useful feature that remains important after the study has finished. These calculations require authors to think about the minimum precision needed for a test to be clinically meaningful. It is easier for readers to interpret reported confidence intervals if they have access to these data.

In conclusion, few studies on diagnostic accuracy report calculations of sample size. The number of participants in most studies on diagnostic accuracy is probably too small to analyse the variability of measures of accuracy across subgroups of patients.

What is already known on this topic

To assess the minimum size needed for sufficiently narrow confidence intervals of sensitivity and specificity in study groups as a whole and in clinically relevant subgroups in particular, sample sizes should be considered at the planning stage of studies on test accuracy

What this study adds

Few studies on test accuracy report calculations of sample size

Overall size and subgroup size tend to be small in these studies, which leads to imprecise estimates of sensitivity and specificity

This article was posted on bmj.com on 20 April 2006: http://bmj.com/cgi/doi/10.1136/bmj.38793.637789.2F

Contributors: All members of the SUBIRAR (subjectivity rationality and reasoning) research collaboration (Klaus Eichler, Madlaina Scharplatz, and Johann Steurer, Horten Centre, University of Zurich, Switzerland, Ulrich Hoffrage, Max Planck Institute for Human Development and Cognition, Berlin, Germany; Alfons G Kessels, Hans Severens, Maastricht University, Germany; Khalid S Khan, University of Birmingham, UK; Jos Kleijnen, Centre for Reviews and Dissemination, University of York, UK) were involved in the design and critical review of the study. LMB, MAP, and GtR developed the protocol. LMB and MAP acquired the data. All authors interpreted the data and helped prepare the manuscript. LMB was guarantor.

Funding: LMB was supported by the Swiss National Science Foundation (grants 3233B0-103182 and 3200B0-103183).

Competing interests: None declared.

References

  • 1.Irwig L, Bossuyt P, Glasziou P, Gatsonis C, Lijmer J. Designing studies to ensure that estimates of test accuracy are transferable. BMJ 2002;324: 669-71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Schulz KF, Grimes DA. Sample size calculations in randomised trials: mandatory and mystical. Lancet 2005; 365: 1348-53. [DOI] [PubMed] [Google Scholar]
  • 3.Lijmer JG, Bossuyt PM, Heisterkamp SH. Exploring sources of heterogeneity in systematic reviews of diagnostic tests. Stat Med 2002;21: 1525-37. [DOI] [PubMed] [Google Scholar]
  • 4.Pepe MS. The statistical evaluation of medical tests for classification and prediction. Oxford Statistical Science Series, Oxford University Press, 2003. www.fhcrc.org/science/labs/pepe/book/ (accessed 6 Apr 2006).
  • 5.Pepe MS. Study design and hypothesis testing. In: The statistical evaluation of medical tests for classification and prediction. New York: Oxford University Press, 2003: 214-51.

Articles from BMJ : British Medical Journal are provided here courtesy of BMJ Publishing Group

RESOURCES