Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Aug 26.
Published in final edited form as: Acad Med. 2011 May;86(5):618–627. doi: 10.1097/ACM.0b013e318212eb00

Bayes’ Theorem and the Physical Examination: Probability Assessment and Diagnostic Decision-Making

Scott R Herrle 1, Eugene C Corbett Jr 2, Mark J Fagan 3, Charity G Moore 4, D Michael Elnicki 5
PMCID: PMC3427763  NIHMSID: NIHMS277146  PMID: 21436660

Abstract

Purpose

To determine how examination findings influence the probability assessment and diagnostic decision-making of third- and fourth-year medical students, internal medicine residents, and academic general internists.

Method

In a 2008 cross-sectional, Web-based survey, participants from three medical schools were asked questions about their training and eight examination scenarios representing four conditions. Participants were given literature-derived pre-examination probabilities (pre-EPs) for each condition and were asked to 1) estimate post-examination probabilities (post-EPs) and 2) select a diagnostic choice (either report that condition is present, order more tests to establish diagnosis, or report that condition is absent). Participants’ inverse transformed logit (ITL) mean post-EPs were compared with corresponding literature-derived post-EPs.

Results

Of 906 individuals invited to participate, 684 (75%) submitted a completed survey. In two of four scenarios with positive findings, the participants’ ITL mean post-EPs were significantly less than corresponding literature-derived post-EP point estimates (P <.001 for each). In three of the four scenarios consisting of negative findings, ITL mean post-EPs were significantly greater than corresponding literature-derived post-EP point estimates (P <.001 for each). In the four scenarios with positive findings, 17%–38% of participants ordered more diagnostic tests when the literature indicated a >85% probability that the condition was present. In the four scenarios with largely negative findings, 70%–85% chose to order diagnostic tests to further reduce diagnostic uncertainty.

Conclusions

All three groups tended to similarly underestimate the impact of examination findings on condition probability assessment, especially negative findings, and often ordered more tests when probabilities indicated that additional testing was unnecessary.


Despite the key role that the physical examination occupies in patient care,18 the decline in examination skills has been well documented.920 In a 1996 commentary, Mangione and Peitzman argued that one way to improve examination skills is for “teachers of physical diagnosis [to] separate wheat from chaff and discard signs or maneuvers of little value.”6 This approach requires examination findings to be viewed as diagnostic tests each with their own test characteristics.2125 Fortunately, the recent focus on the principles of evidence-based medicine has led to the determination of these characteristics for a number of findings.2125 However, the quality of the available data is variable2425 and the extent to which educators have incorporated this literature into their teaching is not known.

Bayes’ theorem (see the Appendix) allows clinicians to apply the published test characteristics of examination findings to their probability assessment.2630 When using a Bayesian approach, clinicians develop an initial probability that a patient has a disorder. This probability is then sequentially revised using information obtained from the history, physical examination, and diagnostic testing to arrive at a final probability estimate.

Although the frequency with which clinicians employ a Bayesian approach in their decision-making is not known,29 research dating back to the 1970s has highlighted several commonly made mistakes.26,29,3135 First, clinicians often inaccurately form their initial probability estimates by overestimating the prevalence of rare conditions and underestimating the prevalence of common conditions.26,31,35 Second, clinicians often do not revise their initial probability estimate as much as would be suggested by Bayes’ theorem.26,31 This observation may be due to “anchoring,” in which an individual’s final probability estimate is highly sensitive to the probability at which he or she starts.26,31 Third, clinicians tend to give more weight to items encountered later on in a patient interaction than to those items encountered earlier on.32,33 Finally, physicians tend to overemphasize the importance of diagnostic testing, perhaps due to believing that diagnostic tests are more accurate than historical items and examination findings.34

Currently, little is known about how clinicians apply examination findings to their probability assessments. In this cross-sectional, multi-institutional, Web-based survey study, we sought to determine and compare how third- and fourth-year medical students, internal medicine residents, and academic general internists apply examination findings to their probability assessments and to determine the impact that these findings have on the ordering of diagnostic tests.

Method

Study sites and participants

In 2008, our study recruited participants from three U.S. medical schools—the University of Pittsburgh School of Medicine, Alpert Medical School of Brown University, and the University of Virginia School of Medicine—and their affiliated residency programs. The target population was third-year and fourth-year medical students, internal medicine residents, and academic general internists. No training on Bayesian principles or diagnostic test characteristics was provided to any participant prior to study participation. The study was approved by the institutional review boards at each institution.

Survey design and content

The survey was written, reviewed, and edited by the study authors and pilot-tested by seven individuals who were not study participants but who represented the three levels of training included in the study.

The survey began with a series of socio-demographic questions about age, sex, current training or employment status, medical school attended, and, if applicable, date of medical school graduation, name of residency program attended, and date of completion. Next, it asked participants whether they had received formal teaching about the physical examination and about the principles of evidence-based medicine during medical school, residency, or both. Finally, it presented four cases and asked participants questions about condition probability and diagnostic strategy.

We decided to limit the survey to four conditions to avoid overburdening the participants. To provide content validity, we selected the conditions from cases highlighted in the Rational Clinical Examination Series (RCES), a series that has been appearing in JAMA since 1992 and is now available in book form.21,25 To determine which conditions to include, we ranked the conditions according to their perceived relevance and importance and asked five academic general internists to do the same. We then included the four highest-ranked conditions—ascites, heart failure, group A beta-hemolytic streptococcal pharyngitis, and acute anterior cruciate ligament (ACL) tear—in the survey.

Each of the four cases consisted of the following: a brief history; a pre-examination probability (pre-EP) that the condition was present based on the prevalence of the condition and information contained in the history; two separate examination scenarios containing information about whether specific findings were present or absent; and questions about the participant’s estimated post-examination probability (post-EP) and choice of diagnostic strategy, given the findings. The key historical items and exam findings included in each case are detailed in Table 1. Participants were instructed that they should assume that the pre-EPs were accurate, that the exam findings were based on examinations conducted by competent clinicians, and that there were no barriers to performing any tests if so needed to arrive at a diagnosis. Pre-EPs were provided because we wanted to start each participant at the same probability, as clinicians vary widely in their initial assessment of condition probability.35

Table 1.

Description of Case Histories and Associated Examination Scenarios Presented in a Survey of 684 Medical Students, Internal Medicine Residents, and Academic General Internists at Three U.S. Medical Schools, 2008

Condition History components Physical examination components
Ascites 62-year-old man with three weeks of progressively increasing abdominal girth, 12-pound weight gain, and ankle edema; no known history of liver, kidney, or heart disease; has consumed six cans of beer daily for 35 years; no history of illicit drug use, tattoos, or blood transfusions Presence of bulging flanks, shifting dullness, fluid wave, and leg edema
Absence of bulging flanks, shifting dullness, fluid wave, and leg edema
Congestive heart failure 78-year-old non-smoking woman with hypertension and coronary artery disease status post myocardial infarction six years ago presents to the Emergency Department with three days of dyspnea and orthopnea; denies paroxysmal nocturnal dyspnea, chest pain, palpitations, and lightheadedness; no history of congestive heart failure exacerbations; reports compliance with low-salt diet and medications, which include aspirin, lisinopril, metoprolol, and simvastatin Presence of jugular venous distention, rales, third heart sound, murmur, and leg edema
Absence of jugular venous distention, rales, third heart sound, murmur, and leg edema
Group-A beta-hemolytic streptococcal pharyngitis 24-year-old woman presents to primary care physician’s office with complaint of a sore throat for two days accompanied by subjective fevers, chills, and myalgias; denies cough and coryza; reports friend recently had “strep throat” Presence of fever, tonsillar swelling and exudates, tender cervical lymphadenopathy
Absence of fever, tonsillar swelling and exudates, and tender cervical lymphadenopathy
Acute anterior cruciate ligament tear 24-year-old man presents to primary care physician’s office with complaint of two days of right knee pain, which began while he was playing football; heard a “pop” after being hit by another player; has been using ice and ibuprofen with improvement in pain; knee feels like it is going to buckle; no previous knee injuries or traumas Positive response to the anterior drawer and Lachman tests
Positive response to the anterior drawer test and negative response to the Lachman test

For each of the four conditions, the survey presented two examination scenarios, one consisting of an examination of positive findings, making the condition more likely, and a second examination consisting largely of negative findings, making the condition less likely. For each scenario, the survey asked participants to provide a post-EP, expressed in terms of a whole number percentage ranging from 0 to 100. It also asked participants to select one of three diagnostic options: tell the patient that he or she has the condition in question, tell the patient that further testing is required for diagnosis, or tell the patient that he or she does not have the condition in question. Participants were not asked to provide specific treatment regimens or to choose specific diagnostic tests.

We neither encouraged nor discouraged participants from utilizing outside resources. An unlimited amount of time was given to complete the survey, although, once started, the survey had to be completed in a single sitting.

Primary outcomes and sample size calculations

The primary outcome measures were (1) participants’ mean post-EPs using inverse transformed logit (ITL) values, (2) comparison of participants’ ITL mean post-EPs with corresponding literature-derived post-EPs, and (3) diagnostic option selected by participants for each of the eight scenarios.

We determined a priori to sample sufficient numbers of participants to allow for meaningful comparison across groups. For sample size calculations, we estimated response rates of 70% for the student and resident groups and 50% for the faculty group. For calculations involving post-EPs, we determined that we would have 90% power to detect an effect size of 0.15 (defined as the average group deviation divided by the within-group standard deviation) for comparing mean post-EPs across the three groups using an analysis of variance (ANOVA) with an α of .05 and an average sample size of 195 per group.36 For calculations involving diagnostic strategy, we determined that a total sample size of 590 (175 students, 290 residents, and 125 faculty) would achieve 90% power to detect an effect size of 0.16 using the chi-squared test of independence.36

Study recruitment

After we identified 906 eligible individuals, we sent each an e-mail invitation to complete the survey during an eight-week period in the fall of 2008. To increase the response rate, we sent weekly e-mail reminders. We also indicated that we would give a $15 gift card to students and residents who completed the survey and would enter all participants into a drawing for three prizes.

Analysis of data

We used descriptive statistics to characterize participants in terms of socio-demographic data.

Pre-EPs and literature-derived post-EPs were calculated by using data obtained from the RCES.25,3740 Calculation of pre-EPs began with the authors’ estimate of condition prevalence. This initial probability estimate was then sequentially revised for each historical item using published likelihood ratios in a step-by-step manner by making the posterior probability of the first item the initial probability of the second item and so on. A similar approach was followed to calculate literature-derived post-EP point estimates and their associated 95% confidence intervals by using the pre-EP as the initial probability then adding in the examination findings. In order to account for the possibility that exam findings with similar pathophysiologic mechanisms may not be conditionally independent, we also calculated “adjusted” literature-derived post-EPs as follows: for groups of findings with a theoretically similar pathophysiologic basis (i.e., bulging flanks, shifting dullness, and fluid wave), we used only the finding with the most extreme likelihood ratio and ignored the other findings. Table 2 details the calculation of unadjusted and adjusted literature-derived post-EPs.

Table 2.

Calculation of Literature-Derived Post-Exam Probabilities for Four Scenarios Presented as Part of a Survey of 684 Medical Students, Internal Medicine Residents, and Academic General Internists at Three U.S. Medical Schools, 2008

Scenario and components Published likelihood ratio, point estimate (95% CI) Unadjusted post-exam probability,* % (95% CI) Adjusted post-exam probability,* % (95% CI)
Ascites (positive scenario)
 Pre-exam probability N/A 30 30

 Bulging flanks (+) 2 (1.5, 2.6)37 46.2 (39.1, 52.7) ----

 Shifting dullness (+) 2.7 (1.9, 3.9)37 69.9 (55, 81.3) ----

 Fluid wave (+) 6 (3.3, 11.1)37 93.3 (80.1, 98) 72 (58.6, 82.6)

 Lower extremity edema (+) 3.837 98.1 (93.9, 99.5) 90.7 (84.3, 94.7)

 Post-exam probability N/A 98.1 (93.9, 99.5) 90.7 (84.3, 94.7)

Ascites (negative scenario)
 Pre-exam probability N/A 30 30

 Bulging flanks (−) 0.3 (0.2, 0.6)37 11.4 (7.9, 20.5) 11.4 (7.9, 20.5)

 Shifting dullness (−) 0.3 (0.2, 0.6)37 3.7 (1.7, 13.4) ----

 Fluid wave (−) 0.4 (0.3, 0.6)37 1.5 (0.5, 8.5) ----

 Lower extremity edema (−) 0.237 0.3 (0.1, 0.8) 2.5 (1.7, 4.9)

 Post-exam probability N/A 0.3 (0.1, 0.8) 2.5 (1.7, 4.9)

Heart failure (positive scenario)
 Pre-exam probability N/A 20 20

 Jugular venous distention (+) 5.1 (3.2, 7.9)38 56 (44.4, 66.4) 56 (44.4, 66.4)

 Rales (+) 2.8 (1.9, 4.1)38 78.1 (60.3, 89) ----

 S3 (+) 11 (4.9, 25)38 97.5 (88.2, 99.5) 93.3 (79.6, 98)

 Any murmur (+) 2.6 (1.7, 4.1)38 99 (92.7, 99.9) 97.3 (86.9, 99.5)

 Leg edema (+) 2.3 (1.5, 3.7)38 99.6 (95, 100) ----

 Post-exam probability N/A 99.6 (95, 100) 97.3 (86.9, 99.5)

Heart failure (negative scenario)
 Pre-exam probability N/A 20 20

 Jugular venous distention (−) 0.66 (0.57, 0.77)38 14.2 (12.5, 16.1) ----

 Rales (−) 0.51 (0.37, 0.7)38 7.8 (5, 11.8) 11.3 (8.5, 14.9)

 S3 (−) 0.88 (0.83, 0.94)38 6.9 (4.2, 11.2) 10.1 (7.2, 14.1)

 Any murmur (−) 0.81 (0.73, 0.9)38 5.7 (3.1, 10.2) 9.4 (5.4, 12.9)

 Leg edema (−) 0.64 (0.47, 0.87)38 3.7 (1.5, 9) ----

 Post-exam probability N/A 3.7 (1.5, 9) 9.4 (5.4, 12.9)

Streptococcal pharyngitis (positive scenario)
 Pre-exam probability N/A 40 40

 Fever (+) (1.1, 3.0)39 57.7 (42.3, 66.7) 57.7 (42.3, 66.7)

 Tonsillar swelling (+) (1.4, 3.1)39 75.4 (50.7, 86.1)

 Tonsillar exudates (+) 3.4 (1.8, 6)39 91.2 (64.9, 97.4) 82.3 (56.9, 92.3)

 Tender anterior lymph nodes (+) (1.2, 1.9)39 94.1 (68.9, 98.6) 87.8 (61.3, 95.8)

 Anterior cervical lymphadenopathy (+) (0.47, 2.9)39 96.4 (51, 99.5) ----

 Post-exam probability N/A 96.4 (51, 99.5) 87.8 (61.3, 95.8)

Streptococcal pharyngitis (negative scenario)
 Pre-exam probability N/A 40 40

 Fever (−) (0.27, 0.94)39 28.7 (15.3, 38.5) 28.7 (15.3, 38.5)

 Tonsillar swelling (−) 0.63 (0.56, 0.72)39 25 (9.2, 31.1) ----

 Tonsillar exudates (−) 0.72 (0.6, 0.88)39 19.4 (5.7, 28.4) 22.5 (9.8, 35.5)

 Tender anterior lymph nodes (−) 0.6 (0.49, 0.71)39 12.6 (2.9, 22) 14.8 (5.1, 28.1)

 Anterior cervical lymphadenopathy (−) (0.58, 0.92)39 9.8 (1.7, 20.6) ----

 Post-exam probability N/A 9.8 (1.7, 20.6) 14.8 (5.1, 28.1)

Acute anterior cruciate ligament tear (positive scenario)
 Pre-exam probability N/A 50 50

 Lachman test (+) 42 (2.7, 651)40 97.7 (73, 99.8) 97.7 (73, 99.8)

 Anterior drawer test (+) 3.8 (0.7, 22)40 99.4 (65.4, 100) ----

 Post-exam probability N/A 99.4 (65.4, 100) 97.7 (73, 99.8)

Acute anterior cruciate ligament tear (negative scenario)
 Pre-exam probability N/A 50 50

 Lachman test (−) 0.1 (0, 0.4)40 9.1 (0, 28.6) 9.1 (0, 28.6)

 Anterior drawer test (+) 3.8 (0.7, 22)40 27.6 (6.5, 68.8) ----

 Post-exam probability N/A 27.6 (6.5, 68.8) 9.1 (0, 28.6)
*

Probabilities were calculated as follows: The initial (starting) probability was the provided pre-exam probability. The pre-exam probability was then sequentially revised using the published likelihood ratios for the exam findings included in the scenario until the post-exam probability was derived. For exam findings with only point estimates available for likelihood ratios, the point estimate was used to modify the upper and lower limits of the 95% confidence interval. For exam findings with only 95% confidence intervals available, the midpoint of the 95% confidence interval was used to derive the post-exam probability point estimate.

For each of the eight exam scenarios, we calculated the participants’ ITL mean post-EP for the total sample and for each group. ITL values were utilized instead of raw responses in order to account for the fact that the probability scale behaves differently at the extremes than it does in the middle range. We converted the participants’ raw responses to logit values using the following formula: ln(p/(1−p)) where ln=natural logarithm and p=probability value. After calculating mean and 95% confidence intervals for the logit values, we then converted these values back to traditional probabilities using the following formula: e(p/(1+p)) where e=exponential function and p=probability function to arrive at the ITL values. We then used ANOVA and t-tests to determine the degree of concordance among the study groups and between the study-derived values and the literature-derived values. When overall differences were detected between groups, we used the Scheffé method to test for pairwise differences.

For the diagnostic strategy outcome, we calculated the frequencies of options chosen for each scenario by the total sample and each group. We used chi-square tests, Fisher’s exact tests, and 3 × 3 contingency tables to compare distributions of choices and tests of marginal homogeneity to analyze changes in the choice of diagnosis option made by individuals responding to the two scenarios for each case (within-person changes).

All statistical analyses were performed using STATA 10.0 (Stata Corp, College Park, Texas).

Results

Response rates and characteristics of participants

Of 906 individuals invited to participate, 684 (75%) completed the survey. The total sample of participants consisted of 255 students, 264 residents, and 165 academic general internists. The response rate was highest among students (80%) and lowest among faculty (72%). Each institution had at least a 65% response rate.

As shown in Table 3, participation in a formal physical examination curriculum was nearly universal during medical school (91% of total sample) but less common during residency (24%). Training in evidence-based medicine was also more common during medical school (70% of total sample) than during residency (59%), but the faculty group was less likely than the other groups to have had this training.

Table 3.

Response Rates and Socio-demographic Characteristics of 684 Medical Students, Internal Medicine Residents, and Academic General Internists at Three US Medical Schools Who Responded to the Survey, 2008

Characteristic Total sample Group
Students Residents Faculty
Participants: no. 684 255 264 165
Response rate: % 75 80 74 72
Age in years: mean (± SD) 31 (8.7) 26 (3.5) 29 (2.5) 44 (9.3)
Men: no. (%) 366 (54) 131 (51) 139 (53) 96 (58)
Years since medical school graduation: no. (± SD) 7.5 (9.3)* N/A 1.8 (2.2) 16.6 (9.1)
Received formal physical examination training during medical school: no. (%) 624 (91) 239 (94) 246 (93) 139 (84)
Received formal physical examination training during residency: no. (%) * 101 (24)* N/A 57 (22) 44 (27)
Received formal evidence-based medicine training during medical school: no. (%) 477 (70) 212 (83) 199 (75) 66 (40)
Received formal evidence-based medicine training during residency: no. (%)* 255 (59)* N/A 169 (64) 86 (52)
*

Total does not include medical students

Conditional probabilities

Table 4 provides a summary of the conditional probabilities for each case. For all eight examination scenarios, the participants’ ITL mean post-EPs were significantly different than the unadjusted literature-derived post-EP point estimates (P<.001 for each). When comparing the participants’ ITL mean post-EPs to corresponding adjusted literature-derived post-EP values, comparisons in five of eight scenarios revealed significant differences (P <.001 for each).

Table 4.

Conditional Probabilities for Cases and Associated Examination Scenarios Presented as Part of a Survey of 684 Medical Students, Internal Medicine Residents, and Academic General Internists at Three US Medical Schools, 2008

Case Exam scenario Pre-exam probability, % Results from literature Results from participants
Unadjusted post- exam probability, % (95% CI)* Adjusted post-exam probability, % (95% CI) ITL post-exam probability, mean % (95% CI)
Ascites Presence of bulging flanks, shifting dullness, fluid wave, and leg edema 30 98.1 (93.9, 99.5) 90.7 (84.3, 94.7) 90.3 (89.5, 91.1)
Absence of bulging flanks, shifting dullness, fluid wave, and leg edema 30 0.3 (0.1, 0.8) 2.5 (1.7, 4.9) 24.0 (22.7, 25.4)
Heart failure Presence of jugular venous distention, rales, third heart sound, murmur, and leg edema 20 99.6 (95, 100) 97.3 (86.9, 99.5) 87.0 (86.0, 87.9)
Absence of jugular venous distention, rales, third heart sound, murmur, and leg edema 20 3.7 (1.5, 9.0) 9.4 (5.4, 12.9) 13.3 (12.5, 14.1)
Streptococcal Pharyngitis Presence of fever, tonsillar swelling and exudates, tender and enlarged anterior cervical lymph nodes 40 96.4 (51.0, 99.5) 87.8 (61.3, 95.8) 79.0 (77.9, 80.1)
Absence of fever, tonsillar swelling and exudates, and tender and enlarged anterior cervical lymph nodes 40 9.8 (1.7, 20.6) 14.8 (5.1, 28.1) 16.6 (15.6, 17.6)
Acute anterior cruciate ligament tear Positive response to the anterior drawer and Lachman tests 50 99.4 (65.4, 100) 97.7 (73, 99.8) 90.0 (89.2, 90.8)
Positive response to the anterior drawer test and negative response to the Lachman test) 50 27.6 (6.5, 68.8) 9.1 (0, 28.6) 54.9 (53.0, 56.9)
*

P <.001 for each unadjusted literature vs. participants’ ITL mean post-exam probability comparison.

P <.001 for each adjusted literature vs. participants’ ITL mean post-exam probability comparison except for the following exam scenarios: ascites case with exam scenario consisting of positive findings (P =.5), heart failure case with exam scenario consisting of negative findings (P =.53), and streptococcal pharyngitis case with exam scenario consisting of positive findings (P =.51).

ITL = inverse transformed logit.

In all four scenarios consisting of positive findings, the participants’ ITL mean post-EPs were significantly lower than the unadjusted literature-derived post-EP point estimates (P<.001 for each). When comparing the participants’ ITL mean post-EPs to corresponding adjusted literature-derived post-EP values, comparisons in two of four scenarios were significantly lower (P <.001 for each).

In all four scenarios consisting mainly of negative findings, the participants’ ITL mean post-EPs were significantly higher than the unadjusted literature-derived post-EP point estimates (P<.001 for each). When comparing the participants’ ITL mean post-EPs to corresponding adjusted literature-derived post-EP values, comparisons in three of four scenarios were significantly higher (P <.001 for each).

Findings for all eight scenarios were consistent across groups with only small differences seen (the mean absolute difference in post-EP estimates between groups was 2.6%).

Diagnostic strategy options

Table 5 shows the diagnostic options selected for each scenario by the three groups and the total sample. In four of the eight scenarios (positive and negative scenarios for ascites, and positive scenarios for streptococcal pharyngitis and acute ACL tear), there were significant differences between groups in terms of the frequencies that each diagnostic option was chosen. However, in all eight scenarios, the relative ordering of diagnostic options (i.e., from most frequently chosen to least frequently chosen) was consistent for all groups.

Table 5.

Diagnostic Option Selected by Participants for Each of Four Examination Scenarios Presented as Part of a Survey of 684 Medical Students, Internal Medicine Residents, and Academic General Internists at Three US Medical Schools, 2008

Case Scenario Group Option selected by participant, P value
Tell patient s/he has condition no. (%) * Order tests to confirm diagnosis Tell patient s/he does not have condition
Ascites Presence of bulging flanks, shifting dullness, fluid wave, and leg edema Students 180 (71) 73 (29) 2 (1) .02
Residents 204 (77) 59 (22) 1 (0)
Faculty 138 (84) 27 (16) 0 (0)
Total sample 522 (76) 159 (23) 3 (1)
Absence of bulging flanks, shifting dullness, fluid wave, and leg edema Students 1 (0) 207 (81) 47 (18) .02
Residents 6 (2) 230 (87) 28 (11)
Faculty 2 (1) 145 (88) 18 (11)
Total sample 9 (1) 582 (85) 93 (14)
Heart failure Presence of jugular venous distention, rales, third heart sound, murmur, and leg edema Students 211 (83) 44 (17) 0 (0) .3
Residents 215 (81) 49 (19) 0 (0)
Faculty 140 (85) 25 (15) 0 (0)
Total sample 566 (83) 118 (17) 0 (0)
Absence of jugular venous distention, rales, third heart sound, murmur, and leg edema Students 2 (1) 203 (80) 50 (20) .7
Residents 2 (1) 228 (86) 34 (13)
Faculty 2 (1) 139 (84) 24 (15)
Total sample 6 (1) 570 (83) 108 (16)
Strepto-coccal pharyngitis Presence of fever, tonsillar swelling and exudates, and tender anterior cervical lymphadenopathy Students 155 (61) 100 (39) 0 (0) <.01
Residents 155 (59) 108 (41) 1 (0)
Faculty 113 (69) 51 (31) 1 (1)
Total sample 423 (62) 259 (38) 2 (0)
Absence of fever, tonsillar swelling and exudates, and tender anterior cervical lymph nodes Students 3 (1) 182 (71) 70 (28) .9
Residents 2 (1) 184 (70) 78 (30)
Faculty 3 (2) 115 (70) 47 (29)
Total sample 8 (1) 481 (70) 195 (29)
Acute anterior cruciate ligament tear Positive response to the anterior drawer and Lachman tests Students 179 (70) 76 (30) 0 (0) .01
Residents 150 (57) 113 (43) 1 (1)
Faculty 97 (59) 68 (41) 0 (0)
Total sample 426 (62) 257 (38) 1 (0)
Positive response to the anterior drawer test and negative response to the Lachman test Students 45 (18) 206 (81) 4 (2) .07
Residents 26 (10) 228 (86) 10 (4)
Faculty 22 (13) 139 (84) 4 (2)
Total sample 93 (14) 573 (84) 18 (3)
*

Because of rounding, not all numbers add to 100%.

P values reflect comparisons between groups for each examination scenario.

This was the most common group choice for each scenario

In the four scenarios with positive findings, most participants (range, 62%–83%) chose to tell the patient that he or she had the condition and treat accordingly. Interestingly, 17%–38% of participants ordered additional testing even when the literature indicated a >85% probability that the condition was present. In the four scenarios with largely negative findings, the majority of participants (range, 70%–85%) chose to order diagnostic tests to further refine diagnostic uncertainty.

For all four conditions, a significant proportion of participants (P <.001) changed their diagnostic decision when the scenario changed from positive to negative findings: 82% regarding ascites, 85% regarding heart failure, 67% regarding streptococcal pharyngitis, and 50% regarding an acute ACL tear. For each condition, most participants changed from telling the patient that he or she has the condition in the scenario with positive findings to ordering tests to confirm the diagnosis in the scenario with largely negative findings.

Discussion

In this multi-institutional study examining how medical students, residents, and faculty estimate conditional probabilities and choose diagnostic options for four commonly encountered conditions, we found that these groups tended to similarly undervalue physical examination findings and that they tended to undervalue negative findings to an ever greater extent than they undervalued positive findings.

There are several possible explanations for our findings. First, these results may in fact represent a true undervaluing of the physical examination. This could support the previous finding that physicians value diagnostic testing more than findings from the history and physical examination regardless of their diagnostic test characteristics.34 Second, these results may reflect the phenomenon of anchoring that was discussed previously.26,31 Third, although there is an increasing body of evidence available to help clinicians clarify the value of examination findings for a number of common conditions, it is not clear how widespread this body of evidence is being used by clinicians. As a result, our findings could reflect that clinicians are unaware of this evidence and hence, do not apply it to their decision-making. Finally, although the instructions for our study indicated that all examinations were performed by competent clinicians, the participants may have lacked confidence in their own ability to perform physical examinations and may have factored in their own skills when they selected a post-EP or diagnostic option.

Interestingly, we observed only small and clinically insignificant differences between estimates of post-EPs provided by the students, residents, and faculty in our study. This suggests that physicians with more experience in performing examinations do not assign a greater value to examination findings than do trainees. This may also reflect the effect of faculty modeling on residents and students, leading each group to think and perform similarly and to undervalue findings. In the process, it may hinder the teaching of examination skills beyond the basic level.

We also found that large numbers of participants chose to order additional diagnostic testing even when available data suggests that the estimated condition probability is low. Although we did not ask participants to provide the threshold probabilities above and below which they would accept that no further workup is required, our results help to illustrate that physicians differ significantly in their comfort levels in dealing with uncertainty. Unfortunately, there is not a consensus amongst physicians about what threshold probabilities should be used for “ruling in” and “ruling out” individual disorders. This is not unexpected, as thresholds could be expected to vary among conditions of different severity and impact.

Responses to the questions about training in our study indicate that while the formal teaching of examination skills is nearly universal in medical schools, it is a much less common in residency programs. With most trainees lacking exposure to a formal curriculum during residency, it is not surprising that a widespread decline in clinical skills has been reported.920

Although our study had excellent response rates, it had several limitations. First, although it focused on four common conditions with high face validity (i.e., ascites, heart failure, streptococcal pharyngitis, and acute ACL tear), its findings may not be generalizable to other conditions. Second, some might argue that the use of the condition probability outcome is artificial since most physicians probably do not explicitly calculate condition probabilities for their patients. We attempted to address this concern by also including a more clinically important outcome—i.e., choice of diagnostic strategy option. Third, in calculating probabilities, we used published likelihood ratios, but these ratios are of varying quality and precision and some are based on research conducted many years ago when the gold standards for diagnosing conditions may have differed from those used today. Also, these published likelihood ratios were developed in specific patient populations and may not be generalizable to other populations. Fourth, when calculating probabilities, we assumed that each item was conditionally independent from the others used. In order to address the concern that this may not be true, we calculated adjusted probabilities that were more conservative and made the magnitude of our results smaller. Fifth, because we wanted participants to focus on examination findings rather than history items, we provided pre-EPs for all scenarios. Even though we instructed the participants to view the pre-EPs as accurate, it is possible that some of them ignored this instruction.

Conclusions

In this study, trainees and experienced physicians similarly underestimated the impact of examination findings when estimating condition probabilities and, as a consequence, often chose to order additional diagnostic testing to reduce diagnostic uncertainty. A better understanding of when and how physicians apply examination findings in their assessment of condition probability may provide the foundation for improving the way physicians use these observations in everyday clinical practice. This, in turn, may reduce the unnecessary use of expensive and potentially risky testing in today’s increasingly cost-conscious and patient safety–oriented environment.

Acknowledgments

The authors thank Rosanne Granieri, MD, and Dr. Kevin Kraemer, MD, MSc, for their thoughtful input and support of the study.

Funding/Support: This study was made possible by Grant Number UL1 RR024153 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH), and NIH Roadmap for Medical Research. Its contents are solely the responsibility of the authors and do not necessarily represent the official view of NCRR or NIH. Additional funding for this study was provided by the Shadyside Hospital Foundation. The Shadyside Hospital Foundation had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript.

Appendix

Bayes’ Theorem

Bayes’ theorem was first developed by Sir Thomas Bayes, an 18th century English minister and amateur mathematician.30 When employing a Bayesian approach to probability assessment, one starts with an initial probability estimate that is based on one’s knowledge of disease prevalence or from one’s previous experiences.35 This initial probability estimate, termed the prior probability, is then sequentially modified on the basis of each piece of additional evidence encountered to form new probabilities, termed posterior probabilities.22,30

In mathematical terms, Bayes’ theorem can be stated as follows:

P(xA)=[P(Ax)P(x)]/P(A)

In this theorem,

  • P(x) = the probability of condition × being present.

  • P(A) = the probability of A being present.

  • P(x|A) = the probability of condition × being present given the presence of A.

  • P(A|x) = the probability of A being present given the presence of condition x.

Thus, in this formula,41 P(x|A) is the posterior probability and P(x) is the prior probability. Bayes’ theorem is most commonly expressed using likelihood ratios (LRs). Fortunately, the LRs for many physical examination findings are now available, although the data are of varying quality.24,25 An LR is the likelihood that a given finding would be expected in a patient with a particular disorder P(A|x) compared to the likelihood that the same finding would be expected in a patient without that condition P(A|no x).30,34,42 An LR > 1 for a finding means that the condition is more likely given the finding and results in a posterior probability that is greater than the prior probability. An LR = 1 for a finding does not change the probability of the condition being present and results in a posterior probability that is the same as the prior probability. An LR < 1 for a finding means that the condition is less likely given the finding and results in a posterior probability that is less than the prior probability.24,25,34,40,42

When using Bayes’ theorem with likelihood ratios, one must convert from probabilities to odds (probability/1−probability) and then back to probabilities.34,42 One starts by calculating the prior odds from prior probability by using the following formula: prior odds = prior probability/(1−prior probability).42 Next, the posterior odds is calculated using the LR for the finding and the prior odds as follows: posterior odds = LR * prior odds.34 In this formula, the LR represents the weight of new evidence encountered. The posterior odds can then be converted back to a probability as follows: posterior probability = posterior odds/(posterior odds+1).34,42 The task is made easier by use of one of many published nomograms.34,42

As an example, suppose that you are seeing a 24-year-old man who complains of two days of right knee pain that began while he was playing football. He reports hearing a “pop” after being hit by another player. He feels like the knee is going to buckle. Based on the information that you obtain as part of the history and your clinical experience, your probability estimate for him having suffered an anterior cruciate ligament (ACL) tear is 50%. Next, you examine the knee. As part of your knee examination, you perform a Lachman test, which you find to be positive. You recall that a positive Lachman test has a likelihood ratio of 42 (95% CI, 2.7, 651).40

First, you determine the prior odds:

PrioroddsofACLtear=priorprobability/(1-priorprobability)=0.50(1-0.50)=1

Then, you calculate the posterior odds using the Bayes’ theorem formula that includes likelihood ratios:

PosterioroddsofACLtear=LRpriorodds=421=42

Finally, you convert the posterior odds to a posterior probability:

PosteriorprobabilityofACLtear=posteriorodds/(1+posteriorodds)=42/(1+42)=0.977

Thus, you determine that the posterior probability that this patient has an ACL tear given a prior probability of 50% and the presence of a positive Lachman test is approximately 98%.

Footnotes

Other disclosures: None

Ethical approval: The institutional review boards (IRBs) at the University of Pittsburgh School of Medicine and the Alpert Medical School of Brown University gave expedited approval for the study while the IRB at the University of Virginia School of Medicine exempted the study from review.

Previous presentations: This study was presented in part in poster format in 2009 at the 32nd annual meeting of the Society of General Internal Medicine (SGIM) in Miami, Florida.

Contributor Information

Scott R. Herrle, Section of General Internal Medicine, Veterans Affairs Pittsburgh Healthcare System, Division of General Internal Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania.

Eugene C. Corbett, Jr, Division of General Medicine, Geriatrics, and Palliative Care, Department of Medicine, University of Virginia, Charlottesville, Virginia.

Mark J. Fagan, Department of Medicine, Alpert Medical School of Brown University, Providence, Rhode Island.

Charity G. Moore, Center for Research on Health Care Data Center, Division of General Internal Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania.

D. Michael Elnicki, Division of General Internal Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania.

References

  • 1.Sandler G. The importance of the history in the medical clinic and the cost of unnecessary tests. Am Heart J. 1980;100(pt 1):928–31. doi: 10.1016/0002-8703(80)90076-9. [DOI] [PubMed] [Google Scholar]
  • 2.Kern DC, Parrino TA, Korst DR. The lasting value of clinical skills. JAMA. 1985;254:70–76. [PubMed] [Google Scholar]
  • 3.Sackett DL, Rennie D. The science of the art of the clinical examination. JAMA. 1992;267:2650–2. [PubMed] [Google Scholar]
  • 4.Peterson MC, Holbrook JH, Von Hales D, Smith NL, Staker LV. Contributions of the history, physical examination, and laboratory investigation in making medical diagnoses. West J Med. 1992;156:163–5. [PMC free article] [PubMed] [Google Scholar]
  • 5.Kravitz RL, Cope DW, Bhrany V, Leak B. Internal medicine patients’ expectations for care during office visits. J Gen Intern Med. 1994;9:75–81. doi: 10.1007/BF02600205. [DOI] [PubMed] [Google Scholar]
  • 6.Mangione S, Peitzman SJ. Physical diagnosis in the 1990s: art or artifact? J Gen Intern Med. 1996;11:490–3. doi: 10.1007/BF02599046. [DOI] [PubMed] [Google Scholar]
  • 7.Reilly BM. Physical examination in the care of medical inpatients: an observational study. Lancet. 2003;362:1100–5. doi: 10.1016/S0140-6736(03)14464-9. [DOI] [PubMed] [Google Scholar]
  • 8.Smith MA, Burton WB, Mackay M. Development, impact, and measurement of enhanced physical diagnosis skills. Adv in Health Sci Educ. 2009;14:547–56. doi: 10.1007/s10459-008-9137-z. [DOI] [PubMed] [Google Scholar]
  • 9.Wiener S, Nathanson M. Physical examination: frequently observed errors. JAMA. 1976;236:852–5. [PubMed] [Google Scholar]
  • 10.Wray N, Friedland J. Detection and correction of housestaff errors in physical diagnosis. JAMA. 1983;249:1035–7. [PubMed] [Google Scholar]
  • 11.Johnson J, Carpenter J. Medical house staff performance in physical examination. Arch Intern Med. 1986;146:937–41. [PubMed] [Google Scholar]
  • 12.Li J. Assessment of basic physical examination skills of internal medicine residents. Acad Med. 1994;69:296–9. doi: 10.1097/00001888-199404000-00013. http://journals.lww.com/academicmedicine/Abstract/1994/04000/Assessment_of_basic_physical_examination_skills_of.13.aspx. [DOI] [PubMed] [Google Scholar]
  • 13.Paauw DS, Wenrich MD, Curtis JR, Carline JD, Ramsey PG. Ability of primary care physicians to recognize physical findings associated with HIV infection. JAMA. 1995;274:1380–2. [PubMed] [Google Scholar]
  • 14.Mangione S, Nieman LZ. Cardiac auscultation skills of internal medicine and family practice trainees. JAMA. 1997;278:717–22. [PubMed] [Google Scholar]
  • 15.Mangione S, Nieman LZ. Pulmonary auscultatory skills during training in internal medicine and family practice. Am J Respir Crit Care Med. 1999;159:1119–24. doi: 10.1164/ajrccm.159.4.9806083. [DOI] [PubMed] [Google Scholar]
  • 16.Ozuah PO, Dinkevich E. Physical examination skills of US and international medical graduates. JAMA. 2001;286:1021. doi: 10.1001/jama.286.9.1021. [DOI] [PubMed] [Google Scholar]
  • 17.Peixoto AJ. Birth, death, and resurrection of the physical examination: clinical and academic perspectives on bedside diagnosis. Yale J Biol Med. 2001;74:221–8. [PMC free article] [PubMed] [Google Scholar]
  • 18.Jauhar S. The demise of the physical exam. N Engl J Med. 2006;354:548–51. doi: 10.1056/NEJMp068013. [DOI] [PubMed] [Google Scholar]
  • 19.Verghese A. Culture shock—patient as icon, icon as patient. N Engl J Med. 2008;359:2748–51. doi: 10.1056/NEJMp0807461. [DOI] [PubMed] [Google Scholar]
  • 20.Fred HL, Grais IM. Bedside skills: an exchange between dinosaurs. Tex Heart Inst J. 2010;37:205–7. [PMC free article] [PubMed] [Google Scholar]
  • 21.Sackett DL. A primer on the precision and accuracy of the clinical examination. JAMA. 1992;267:2638–44. [PubMed] [Google Scholar]
  • 22.Holleman DR, Simel DL. Quantitative assessments from the clinical examination: how should clinicians integrate the numerous results? J Gen Intern Med. 1997;12:165–71. doi: 10.1007/s11606-006-5024-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hatala R, Smieja M, Kane SL, Cook DJ, Meade MO, Nishikawa J. An evidence-based approach to the clinical examination. J Gen Intern Med. 1997;12:182–7. doi: 10.1007/s11606-006-5027-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.McGee S. Evidence-Based Physical Diagnosis. 2. St. Louis, MO: Saunders Elsevier; 2007. [Google Scholar]
  • 25.Simel D, Rennie D. The Rational Clinical Examination: Evidence-Based Clinical Diagnosis. New York, NY: McGraw-Hill; 2008. [Google Scholar]
  • 26.Elstein AS. Heuristics and biases: selected errors in clinical reasoning. Acad Med. 1999;74:791–4. doi: 10.1097/00001888-199907000-00012. http://journals.lww.com/academicmedicine/Abstract/1999/07000/Heuristics_and_biases__selected_errors_in_clinical.12.aspx. [DOI] [PubMed] [Google Scholar]
  • 27.Goodman SN. Toward evidence-based medicine statistics. 2: The Bayes factor. Ann Intern Med. 1999;130:1005–13. doi: 10.7326/0003-4819-130-12-199906150-00019. [DOI] [PubMed] [Google Scholar]
  • 28.Lurie JD, Sox HC. Principles of medical decision making. Spine. 1999;24:493–8. doi: 10.1097/00007632-199903010-00021. [DOI] [PubMed] [Google Scholar]
  • 29.Elstein AS, Schwartz A. Evidence base of clinical diagnosis: clinical problem solving and diagnostic decision making: selective review of the cognitive literature. BMJ. 2002;324:729–32. doi: 10.1136/bmj.324.7339.729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Gill CJ, Sabin L, Schmid CH. Why clinicians are natural bayesians. BMJ. 2005;330:1080–3. doi: 10.1136/bmj.330.7499.1080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Tversky A, Kahneman D. Judgment under uncertainty: heuristics and biases. Science. 1974;185:1124–31. doi: 10.1126/science.185.4157.1124. [DOI] [PubMed] [Google Scholar]
  • 32.Bergus GR, Chapman GB, Gjerde C, Elstein AS. Clinical reasoning about new symptoms in the face of pre-existing disease: sources of error and order effects. Fam Med. 1995;27:314–20. [PubMed] [Google Scholar]
  • 33.Chapman GB, Bergus GR, Elstein AS. Order of information affects clinical judgment. J Behav Decis Making. 1996;9:201–11. [Google Scholar]
  • 34.Halkin A, Reichman J, Schwaber M, Paltiel O, Brezis M. Likelihood ratios: getting diagnostic testing into perspective. Q J Med. 1998;91:247–58. doi: 10.1093/qjmed/91.4.247. [DOI] [PubMed] [Google Scholar]
  • 35.Phelps MA, Levitt A. Pretest probability estimates: a pitfall to the clinical utility of evidence-based medicine? Acad Emerg Med. 2004;11:692–4. [PubMed] [Google Scholar]
  • 36.Hintze J. NCSS, PASS, and GESS. NCSS; Kaysville, Utah: 2007. [Accessed January 20, 2011]. www.ncss.com. [Google Scholar]
  • 37.Williams JW, Jr, Simel DL. The rational clinical examination. Does this patient have ascites? How to divine fluid in the abdomen. JAMA. 1992;267:2645–8. doi: 10.1001/jama.267.19.2645. [DOI] [PubMed] [Google Scholar]
  • 38.Wang CS, FitzGerald JM, Schulzer M, Mak E, Ayas NT. Does this dyspneic patient in the emergency department have congestive heart failure? JAMA. 2005;294:1944–56. doi: 10.1001/jama.294.15.1944. [DOI] [PubMed] [Google Scholar]
  • 39.Ebell MH, Smith MA, Barry HC, Ives K, Carey M. Does this patient have strep throat? JAMA. 2000;284:2912–8. doi: 10.1001/jama.284.22.2912. [DOI] [PubMed] [Google Scholar]
  • 40.Solomon DH, Simel DL, Bates DW, Katz JN, Schaffer JL. Does this patient have a torn meniscus or ligament of the knee? Value of the physical examination. JAMA. 2001;286:1610–20. doi: 10.1001/jama.286.13.1610. [DOI] [PubMed] [Google Scholar]
  • 41.Jaynes ET. Probability theory: the logic of science. 3. New York, NY: Cambridge Press; 2003. [Google Scholar]
  • 42.Straus SE, Richardson WS, Glasziou P, Haynes RB. Evidence-Based Medicine. 3. New York, NY: Churchill Livingstone; 2005. [Google Scholar]

RESOURCES