INTRODUCTION
Diagnostic error adversely affects patients and healthcare systems. To achieve diagnostic excellence, correct diagnostic test interpretation is a prerequisite. Two related heuristics designed to help interpret diagnostic tests—SpPin and SnNout—have been taught for decades by leaders in the field of evidence-based medicine.1,2 SpPin indicates that when Specificity is high, a Positive result rules in the disease in question, and SnNout indicates that when Sensitivity is high, a Negative result rules out the disease in question. Our experience over years teaching diagnostic reasoning to hundreds of medicine residents and faculty at eight academic medical centers is that these heuristics are universally known and frequently relied upon to evaluate the utility of diagnostic tests.
Unfortunately, relying on SpPin and SnNout can be maladaptive, increasing diagnostic error. Previous publications warning about limitations of SpPin and SnNout have focused on data quality issues (risk of bias, imprecision, and generalizability) or have used complicated formulas many find difficult to understand.3,4 This paper improves upon the existing literature, using simple examples without formulas to illustrate the limitations of SpPin and SnNout that exist even when data for test characteristics are of high quality (large representative sample with low risk of bias). In addition, we demonstrate that to effectively evaluate the utility of diagnostic tests, one must rely on likelihood ratios interpreted in the context of pretest probability, rather than rely on these heuristics.
THE ORIGINS OF SPPIN AND SNNOUT
The SnNout heuristic was conceived over three decades ago in the context of a test that was reported to have 100% sensitivity.1 Sensitivity is the probability of a positive test result among patients with the disease in question. Specificity is the probability of a negative test result among patients without the disease in question, and the SpPin heuristic was subsequently conceived as a counterpart to SnNout.1 SpPin and SnNout are guaranteed to work when the corresponding test characteristic is 100%. However, the use of the heuristics has expanded over time to include tests with sensitivity or specificity less than 100%, but with values still considered “high.”1–3
WHAT CONSTITUTES “RULING IN” AND “RULING OUT”?
When tests are less than 100% accurate (which is almost always), residual diagnostic uncertainty will exist. As a practical matter, we consider a disease ruled out when its probability is less than some low threshold (justifying abandonment of further testing for that disease) and ruled in when its probability is greater than some high threshold (justifying initiation of treatment for that disease without further testing).1,2 Therefore, a test’s utility for ruling in or ruling out disease depends on a patient’s posttest probability.
THE PROBLEMS WITH SPPIN AND SNNOUT
1) Neither Sensitivity Nor Specificity Should Be Considered in Isolation of the Other
Correctly assessing how a test result changes probability of disease requires information about test performance in patients both with and without the disease in question. Neither sensitivity nor specificity contains that information. Likelihood ratios do.
The likelihood ratio (LR) for a given test result is the probability of that result among patients with the disease in question divided by the probability of the same result among patients without that disease. When LR = 1, the result is equally likely in both groups and does not affect probability of disease (pretest probability = posttest probability). When LR > 1, probability of disease increases, and when LR < 1, probability of disease decreases. The further away from one in either direction, the greater the change in probability, with possible values for LR ranging from zero to infinity.
The following exercise illustrates the importance of relying on LR rather than sensitivity or specificity alone. Consider test characteristics of three available tests—A, B, and C—for a certain disease (Table 1). According to SpPin and SnNout, test A is best for ruling in disease (highest specificity) and test B is best for ruling out disease (highest sensitivity). In truth, test C is best at both because it has both the largest and smallest LR and thus generates the highest posttest probability when positive and lowest posttest probability when negative. SpPin and SnNout get it wrong.
Table 1 Likelihood Ratios* Are Superior to SpPin and SnNout—an Illustrative Example
Best test to rule in disease based on:† | Best test to rule out disease based on:† | |||||||
---|---|---|---|---|---|---|---|---|
Test | Sensitivity | Specificity | LR+ ‡ | LR− § | SpPin‖ | Highest posttest probability generated | SnNout¶ | Lowest posttest probability generated |
A | 30% | 95% | 6.0 | 0.74 | ✓ | |||
B | 95% | 30% | 1.4 | 0.17 | ✓ | |||
C | 90% | 90% | 9.0 | 0.11 | ✓ | ✓ |
*Likelihood ratio = probability of a given test result among those with disease / probability of the same test result among those without disease
†The best test for ruling in disease is the one that can generate the highest posttest probability (due to having the result with the highest likelihood ratio), and the best test for ruling out disease is the one that can generate the lowest posttest probability (due to having the result with the lowest likelihood ratio). This is also true for tests with more than two possible results. Using SpPin and SnNout to judge which test is best results in the wrong answer
‡LR+ = likelihood ratio for a positive test result = probability of a positive test among those with disease / probability of a positive test among those without disease = sensitivity / (100 − specificity)
§LR− = likelihood ratio for a negative test result = probability of a negative test among those with disease / probability of a negative test among those without disease = (100 − sensitivity) / specificity
‖SpPin = heuristic that indicates that when Specificity is high, a Positive result rules in the disease in question, which incorrectly implies that the test with the highest specificity, when positive, is the best test for ruling in disease
¶SnNout = heuristic that indicates that when Sensitivity is high, a Negative result rules out the disease in question, which incorrectly implies that the test with the highest sensitivity, when negative, is the best test for ruling out disease
A second exercise, using a real-world example, is further illuminating. Kernig’s sign has 5% sensitivity and 95% specificity for meningitis.5 According to SpPin and SnNout, this test can rule in meningitis when positive, but cannot rule out meningitis when negative. In truth, because the LR for a positive result (LR+) = 1 and the LR for a negative result (LR−) = 1, probability of meningitis does not change with either result. For any dichotomous test, when sensitivity + specificity = 100%, that test is unhelpful.
2) Pretest Probability Matters
When sensitivity and specificity are both high, SpPin and SnNout are still unreliable, particularly when a patient’s pretest probability is far from the rule-in (for SpPin) or rule-out (for SnNout) threshold. A classic example is a positive HIV antibody test in a patient with a very low pretest probability of HIV (1 in 10,000). Even if specificity = 99.8%—a seemingly clear example of SpPin—and sensitivity = 100% (LR+ = 500), posttest probability after a positive test is just 5%, which is clearly inadequate to rule in HIV and initiate treatment.6
3) Most Tests Are Not Truly Dichotomous
Sensitivity and specificity are numbers that imply there are only two possible test results. Such tests are rare in the real world. Physical exam maneuvers and imaging studies tend to be ordinal, with results such as “negative,” “indeterminate,” and “positive” (e.g., chest X-ray for pneumonia), and blood tests tend to be continuous, with essentially infinite possible results (e.g., B-type natriuretic peptide for heart failure).
Dichotomizing non-dichotomous tests introduces measurement error and leads to mistakes. The solution is to instead use multilevel LRs to maximize a test’s utility.7 For example, in a recent study evaluating ultrasound measurement of jugular venous pressure for diagnosis of elevated central venous pressure, authors dichotomized the test and reported a sensitivity of 73% and specificity of 79% (LR+ = 3.4, LR− = 0.34). While this test would not be considered very helpful according to SpPin or SnNout, a more useful reanalysis demonstrated six distinct levels of test results with unique LRs that ranged from zero to infinity.8
OUT WITH THE OLD RULE, IN WITH THE OLDER RULE
Bayes’ Rule: Pretest Odds x Likelihood Ratio = Posttest Odds1,7
Bayes’ rule considers test performance in patients both with and without the disease in question, incorporates pretest probability, does not require dichotomization, and allows for easy comparison between posttest probability and decision-making thresholds. While the advantages of this approach have long been recognized, including by evidence-based medicine experts who simultaneously taught SpPin and SnNout,1,2 it seems that most learners over the past several decades have retained only the heuristics, perhaps due to their simplicity. Fortunately, with the availability of a handy nomogram1,2 and more recently, online calculators (e.g., https://sample-size.net/post-probability-calculator-test-new/), clinicians need not worry about memorizing formulas, converting between probability and odds, or making any calculations on their own.
LIMITATIONS TO OUR APPROACH
First, when data quality issues are present, LR estimates will be unreliable. However, the same limitations will apply to SpPin and SnNout,3 and Bayes’ rule will still improve upon the heuristics by incorporating pretest probability. Second, accurately estimating a patient’s pretest probability can be difficult. Likewise, finding the correct LR for a test result can be difficult because diagnostic accuracy studies often report dichotomized test characteristics for non-dichotomous tests. However, previously described strategies for estimating pretest probability and for using multiple levels to interpret data from diagnostic accuracy studies can be used to help overcome these challenges.1,2,7
AN ALTERNATIVE TO OUR APPROACH: THE LIKELIHOOD RATIO HEURISTIC
When teaching diagnostic test interpretation and Bayes’ rule, some evidence-based medicine experts have promoted an alternative heuristic that goes something like this: LRs greater than 10 or less than 0.1 are very powerful and often conclusive; LRs ranging from 5 to 10 or 0.1 to 0.2 have a moderate effect on probability; LRs ranging from 2 to 5 or 0.2 to 0.5 have a small effect on probability; and LRs ranging from 0.5 to 2 are rarely helpful.9 We agree that it can be useful for learners to get a feel for the impact of different LRs in order to develop an innate sense of how “good” a test result is based on its LR. However, it is important to contextualize the utility of LRs in terms of the varied magnitudes of effect they will have at different pretest probabilities, the potential availability of other independent tests, and decision-making thresholds.1 Even tests with very modest LRs can appropriately change patient management, depending on these other factors.
CONCLUSION
While conceived as well-intentioned teaching tools, given their multiple flaws, it’s time to retire the heuristics SpPin and SnNout. A reinvigorated emphasis on considering pretest probability and using multilevel LRs with Bayes’ rule is needed in medical education at all levels, as part of the greater effort in healthcare to achieve diagnostic excellence.
Acknowledgements:
We thank the peer reviewers for their time, effort, and valuable comments and suggestions, which helped us improve the quality of this work.
Declarations:
Conflict of Interest:
The authors declare that they do not have a conflict of interest.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Chapter 4: The interpretation of diagnostic data. In: Sackett DL, Haynes RB, Guyatt GH, Tugwell P, editors. Clinical Epidemioogy: a Basic Science for Clinical Medicine. 2. Boston: Little, Brown and Company; 1991. pp. 69–152. [Google Scholar]
- 2.Strauss SE, Glasziou P, Richardson WS, Haynes RB. Chapter 5: Diagnosis and screening. In: Strauss SE, Glasziou P, Richardson WS, Haynes RB, editors. Evidence-Based Medicine: How to Practice and Teach EBM. 5. Edinburgh: Elsevier; 2019. pp. 163–175. [Google Scholar]
- 3.Pewsner D, Battaglia M, Minder C, Marx A, Bucher HC, Egger M. Ruling a diagnosis in or out with “SpPIn” and “SnNOut”: a note of caution. BMJ. 2004;329(7459):209–213. doi: 10.1136/bmj.329.7459.209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Baeyens JP, Serrien B, Goossens M, Clijsen R. Questioning the “SPIN and SNOUT” rule in clinical testing. Arch Physiother. 2019;9:4. doi: 10.1186/s40945-019-0056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Thomas KE, Hasbun R, Jekel J, Quagliarello VJ. The diagnostic accuracy of Kernig’s sign, Brudzinski’s sign, and nuchal rigidity in adults with suspected meningitis. Clin Infect Dis. 2002;35(1):46–52. doi: 10.1086/340979. [DOI] [PubMed] [Google Scholar]
- 6.Kim S, Lee JH, Choi JY, Kim JM, Kim HS. False-positive rate of a “fourth-generation” HIV antigen/antibody combination assay in an area of low HIV prevalence. Clin Vaccine Immunol. 2010;17(10):1642–1644. doi: 10.1128/CVI.00258-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Baduashvili A, Guyatt G, Evans AT. ROC anatomy—getting the most out of your diagnostic test. J Gen Intern Med. 2019;34(9):1892–1898. doi: 10.1007/s11606-019-05125-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Fischer BG. Accuracy of ultrasound jugular venous pressure height in predicting central venous congestion. Ann Intern Med. 2022;175(5):W53. doi: 10.7326/L22-0116. [DOI] [PubMed] [Google Scholar]
- 9.Furukawa TA, Strauss SE, Bucher HC, Agoritsas T, Guyatt G. Chapter 18: Diagnostic tests. In: Guyatt G, Rennie D, Meade MO, Cook DJ, eds. Users’ Guides to the Medical Literature: a Manual for Evidence-Based Clinical Practice. 3rd ed. New York: McGraw-Hill Education; 2015:351