Skip to main content
The BMJ logoLink to The BMJ
. 2001 Nov 17;323(7322):1188.

Systematic reviews of evaluations of diagnostic and screening tests

Odds ratio is not independent of prevalence

Nicole Jill-Marie Blackman 1
PMCID: PMC1121660  PMID: 11711421

Editor—Deeks, in the third of four articles on evaluations of diagnostic and screening tests, promoted the odds ratio as often being constant regardless of the diagnostic threshold.1 We agree with Deeks's statement that the choice of threshold varies according to the prevalence of the disease. But the statement that the odds ratio is generally constant regardless of the diagnostic threshold can be misleading.

The value of an odds ratio, like that of other measures of test performance—for example, sensitivity, specificity, and likelihood ratios—depends on prevalence.2 For example, a test with a diagnostic odds ratio of 10.00 is considered to be a very good test by current standards. It is easy to verify that this is generally true only in populations at high risk. A diagnostic odds ratio of 10.00 in a low risk population may represent a very weak association between the experimental test and the gold standard test. This is so because the observable range of values for an odds ratio increases as the prevalence of the disease decreases (moves away from 1/2).

References

  • 1.Deeks JJ. Systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323:157–162. doi: 10.1136/bmj.323.7305.157. . (21 July.) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Kraemer HC. The robustness of common measures of 2×2 association to bias due to misclassifications. American Statistician. 1985;39:286–290. [Google Scholar]
BMJ. 2001 Nov 17;323(7322):1188.

Two issues were simplified

Gerben ter Riet 1,2,3, Alphons G H Kessels 1,2,3, Lucas M Bachmann 1,2,3

Editor—We would like to draw attention to two points, which Deeks in his review article simplified.1-1

Firstly, consider an example to illustrate the futility of what might be called the “reflex to fill the fourfold table” in research into diagnostic accuracy. Consider a study on an experimental test that claims to give clinicians more certainty in situations where they have only a few indications that disease may be present. But let us assume that the indications are not strong enough to justify the performance of truly invasive tests. Without the new experimental test these patients would be sent home. The value of the new test lies in its ability to identify those patients who have the disease and would benefit from treatment. In this scenario, the analysis of only those patients who test positively on the experimental test (two cells filled of the fourfold table) suffices to learn about its usefulness.

Secondly, Deeks ends his explanation of the application of the likelihood ratio by saying that knowledge of other characteristics of a particular patient that either increase or decrease their prior probability of endometrial cancer can be incorporated into the calculation by adjusting the pretest probability accordingly. This, however, assumes constancy of likelihood ratios, which should not be assumed because it is usually incorrect. In practice, the knowledge of other patient characteristics will have an influence on the magnitude of the likelihood ratios of following tests. This is so because when a chain of diagnostic tests (history taking, physical exam, lab tests, or imaging) is performed on a patient, certain results from his or her clinical history make the likelihood to find certain lab results more (or less) likely. This in turn influences the chances of finding certain imaging results. In other words, the results of the component tests are not mutually independent. For example, on average, women with a positive test on ultrasound (thickened endometrium) are more likely to test positively on hysteroscopy, in which the endometrial thickness is also assessed, albeit in a different manner.

The theoretical solution to this problem is the calculation of likelihood ratios that are conditional on the results of the preceding tests in the diagnostic test chain. In practice, this is usually not feasible owing to lack of data, and most investigators use logistic regression models to account for all these dependencies. These models, however, yield diagnostic odds ratios, not likelihood ratios. It is partly this complexity that hampers the application of simple diagnostic accuracy studies to clinical practice. In figure 2 the numerators in the second column of the right hand panel represent the number of false positives, not the true negatives.

References

  • 1-1.Deeks JJ. Systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323:157–162. doi: 10.1136/bmj.323.7305.157. . (21 July.) [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from BMJ : British Medical Journal are provided here courtesy of BMJ Publishing Group

RESOURCES