Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Apr 8.
Published in final edited form as: AJR Am J Roentgenol. 2020 Feb 11;214(5):1122–1130. doi: 10.2214/AJR.19.22189

Differences in Outcomes Associated With Individual Radiologists for Emergency Department Patients With Headache Imaged With CT: A Retrospective Cohort Study of 25,596 Patients

Matthew S Davenport 1,2,3, Shokoufeh Khalatbari 4, Nahid Keshavarzi 4, Michael Connolly 1, Keith E Kocher 5, Suzanne T Chong 1, Ashok Srinivasan 1
PMCID: PMC8029644  NIHMSID: NIHMS1675791  PMID: 32045308

Abstract

OBJECTIVE.

The purpose of this study was to determine whether diagnostic radiologists impart variation into resource use and patient outcomes in emergency department (ED) patients undergoing CT for headache.

MATERIALS AND METHODS.

This was a single-institution retrospective quality assurance cohort study of 25,596 unique adult ED patients undergoing head CT for headache from January 2012 to October 2017. CT examinations were interpreted by 55 attending radiologists (25 neuroradiologists, 30 radiologists of other specialties) who each interpreted a mean of 1469.8 ± 787.9 CT examinations. Risk adjustment for variables thought to influence outcome included baseline risk (demographics, Elixhauser comorbidity score), clinical factors (vital signs, ED triage and pain scores, laboratory data, hydrocephalus, prior intracranial hemorrhage, neurosurgical consultation within last 12 months), and system factors (time of CT, physician experience, neuroradiology training). Multivariable models were built to analyze the effect of individual radiologists on subsequent outcomes. Any p value less than 0.007 was considered significant after Bonferroni correction.

RESULTS.

The study found 57.5% (14,718/25,596) of CT interpretations were performed by neuroradiologists, and most patients (98.1% [25,119/25,596]) had no neurosurgical history. After risk adjustment, individual radiologists were not an independent predictor of hospital admission (p = 0.49), 30-day readmission (p = 0.30), 30-day mortality (p = 0.14), or neurosurgical intervention (p = 0.04) but did predict MRI use (p < 0.001; odds ratio [OR] range among radiologists, 0.009–38.2), neurology consultation (p < 0.001; OR range, 0.4–3.2), and neurosurgical consultation (p < 0.001; OR range, 0.1–9.9).

CONCLUSION.

Radiologists with different skills, experience, and practice patterns appear interchangeable for major clinical outcomes when interpreting CT for headache in the ED, but their differences predict differential use of downstream health care resources. Resource use measures are potential quality indicators in this cohort.

Keywords: Headache, Outcome, Quality, Resource, utilization


An ideal diagnostic radiology quality measure is one that is data-driven, specific, actionable, linked to patient outcomes, and can be targeted for performance improvement. Breast imaging has multiple examples, such as provider-level positive predictive value of mammogram results stratified by BI-RADS scores. Outside of breast imaging, such measures do not exist for much of diagnostic radiology [14]. In 2018, the American College of Radiology developed and released 11 proposed diagnostic radiology quality metrics [5, 6]. The stated goal was to “develop meaningful measures for radiologists that promote population health through diagnostic accuracy, clinical effectiveness, and care coordination.” Unfortunately, none of the metrics are outcome measures. Each focuses on the process of generating an image or creating a report rather than the ultimate outcome for the patient.

Historically, efforts to create outcome measures in diagnostic radiology have been challenged because diagnostic radiologists are indirect care providers. The information communicated by radiologists can be obscured by the many complexities involved in the delivery of health care. If information provided by radiologists is ignored, misconstrued, or has an unclear link to treatment that can affect a patient’s outcome, a radiologist’s value will be diminished and hard to measure. In the past, the solution to these challenges has been to focus on process rather than outcome measures because these are generally easier to measure and report. However, process measures are only useful if they meaningfully predict health care outcomes. If a process has no effect on downstream care, or if process variation does not translate into outcome variation, then targeting that process for improvement is unlikely to be worthwhile. To determine what processes have the most impact on care, we need to study how our processes affect patient care [24].

Isolating the effect of the radiologist on patient outcomes requires a large sample size, so not all clinical algorithms are feasible to study. One common scenario that has come under increasing scrutiny is the imaging of headache [713]. Patients presenting to the emergency department (ED) with headache often undergo CT scanning to rule out serious diagnoses (e.g., hemorrhage, mass), which may or may not be indicated depending on patient risk factors and presentation [713]. How radiologists interpret those studies (variable diagnostic accuracy, variable positioning on an ROC curve, variable methods of reporting findings) may have an effect on the probability that a patient gets further imaging, receives subspecialty consultation, gets admitted to the hospital, receives an intervention, or is readmitted within a short period of time.

This study attempts to determine whether diagnostic radiologists impart variation into downstream resource use and patient outcome in ED patients undergoing CT for headache. If individual radiologists were found to be associated with variation in patient care, the knowledge could help identify potential radiology-specific quality metrics worthy of further exploration.

Materials and Methods

Institutional review board approval was obtained and informed consent waived for this HIPAA-compliant retrospective quality assurance cohort study. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines were used in the preparation of this article.

Study Population

The study population was composed of all adult (≥ 18 years) patients presenting with headache to the ED of a single quaternary academic medical center and imaged with CT of the head from January 2012 to October 2017. Eligible patients were identified by current procedural terminology codes (70450, 70460, and 70470), billing codes, and a digital search of the electronic medical record. The diagnosis of headache as the indication for imaging (explicitly stated) was identified in the order for the CT scans. Patients with a chief complaint or imaging indication mentioning trauma were not part of the study group. Traumatic events for which trauma was not mentioned (e.g., fall) were not excluded. The minimum number of CT examinations read per radiologist was prospectively set at 10. Patients were excluded if they were already included in the study. In such instances, only the first encounter was included. There were no other exclusion criteria. This resulted in 25,596 unique adult ED patients undergoing head CT for headache from January 2012 to October 2017 (Table 1). CT examinations were interpreted by 55 attending radiologists with a wide range of experience and training (25 neuroradiologists, six emergency radiologists, 24 radiology fellows) who each reported between 11 and 2542 CT examinations in the study group (mean, 1469.8 ± 787.9 [SD] examinations). None of the six emergency radiologists included in the study had completed a fellowship in neuroradiology.

TABLE 1:

Overall Study Population Details

Characteristic Value
Total patients 25,596 (100.0)
 Men 11,839 (46.2)
 Women 13,757 (53.7)
Patient age (y) 56.5 ± 20.6
Elixhauser comorbidity sum score 4.7 ± 11.6
History of
 Hydrocephalus 341 (1.3)
 Subdural hematoma 176 (0.7)
 Subarachnoid hemorrhage 170 (0.7)
 Neurosurgical consult within last 12 mo 477 (1.9)
Timing of CT scan
 Weekend 6712 (26.2)
 Weekday 18,884 (73.8)
 Daytime (3 am to 6 pm) 13,333 (52.1)
 Overnight (6 pm to 3 am) 12,263 (47.9)
Measurements at presentation in ED
 Triage scorea 2.4 ± 0.6
 Pain scoreb 4.2 ± 3.6
 No. of abnormal vital signs
  None 9623 (37.6)
  One 11,417 (44.6)
  Two or more 4507 (17.6)
 Body temperature (°F)c 98.1 ± 1.0
 Pulse (beats/min) 84.1 ± 18.8
 Blood pressure (mm Hg)
  Systolic 141.4 ± 29.8
  Diastolic 69.7 ± 20.9
 Functional oxygen saturation (%) 97.0 ± 2.7
 Respiratory rate (breaths/min) 18.0 ± 3.8
Laboratory value
 Hemoglobin (g/dL) 12.9 ± 2.3
 Creatinine (mg/dL) 1.1 ± 1.0
 WBC count (cells per µL) 9.4 ± 10.3
 C-reactive protein (mg/L) 3.5 ± 6.8
 Hemoglobin A1c (%) 6.7 ± 1.9
ED attending physiciand
 Patients treated per physician 593.1 ± 334.3
 Experience (y) 12.4 ± 9.3
Attending radiologiste
 Neuroradiology fellowship trained 25 (45.5)
 CT scans read by a neuroradiologist 14718 (57.5)
 CT scans interpreted per reader 1469.8 ± 787.9
 Experience (y) 13.7 ± 9.2

Note—Unless otherwise indicated, values are expressed as numbers with percentages in parentheses or mean ± SD. Inclusive sums may not total 25,596 due to missing data; percentages may not total 100 due to rounding. ED = emergency department.

a

ED triage was scored as 1 (most urgent) to 5 (least urgent).

b

Pain was scored as 0 (none) to 10 (high).

c

36.7°C ± 0.6°C.

d

Total number of attending ED physicians was 81.

e

Total number of attending radiologists was 55.

Data Collection

The electronic medical record was queried by the institutional research data office to identify relevant demographic data, variables, and outcome measures. The data collected are described in this section.

Demographic data—

Demographic data included patient sex and age.

Emergency department visit data—

ED visit data included the times of ED admission, ED discharge, inpatient or observation admission, and inpatient or observation discharge. Also recorded was the time the index CT was performed, the time laboratory data was obtained (closest to index CT within 3 days prior and 6 hours after for all data except hemoglobin A1c, which was the value closest to CT within 1 year prior and 1 year after), date of last neurosurgical consultation before the index CT (if any), time of any brain MRI performed within 72 hours after the index CT, time of any neurology or neurosurgery consultation performed after the index CT during the same encounter (i.e., ED stay and concurrent hospitalization combined), time of any neurosurgical (including endovascular) intervention after the index CT during the same encounter, time of hospital readmission (if any), and time of death (if any). Temporal data were used to determine the timing of the index CT (e.g., weekend or weekday and daytime or overnight); whether a patient was admitted or discharged; when and whether laboratory data were obtained; whether a neurosurgical appointment occurred within the 12 months before the index CT; whether a brain MRI was performed within 72 hours after the index CT; whether a neurology consultation, neurosurgical consultation, or neurosurgical intervention (including endovascular) was obtained after the index CT but during the same encounter; and whether readmission or patient death occurred within 30 days.

Provider data—

The provider data included the ED attending physician’s experience in years (when multiple ED attending physicians were involved, the first ED attending physician assigned to the visit was used), the attending radiologist’s experience in years (at the time of each examination), attending radiologist’s fellowship training (if any), number of patients in the cohort cared for by each attending radiologist, and radiology resident involvement in examination interpretation.

Clinical data—

The clinical data included 30 Elixhauser comorbidities and patient history of hydrocephalus, prior subdural hematoma, and prior subarachnoid hemorrhage [4, 1417]. Also recorded were the vital signs including systolic and diastolic blood pressure, pulse, temperature, pulse oximetry; respiration rate; pain score on a scale of 0 (none) to 10 (worst pain of life); patient ED triage score on a scale of 1 (most urgent) to 5 (least urgent), and laboratory data measuring hemoglobin, serum creatinine, WBC count, C-reactive protein, and hemoglobin A1c [18].

Outcome measures—

Seven clinical and resource use outcome measures were targeted: brain MRI performed within 72 hours after the CT, neurology consultation performed after the CT but during the same encounter, neurosurgery consultation performed after the CT but during the same encounter, inpatient admission during the same encounter, readmission within 30 days, neurosurgical intervention (including endovascular) during the same encounter, and 30-day mortality. These outcomes were chosen to reflect indicators of resource use (MRI, consultations), hospitalization (admission, readmission), and patient outcome (intervention, mortality). “Encounter” was defined as the cumulative ED visit and (if any) inpatient stay linked to the index CT.

Elixhauser Comorbidities and Elixhauser Composite Score

Thirty Elixhauser comorbidities were collected through previously validated methods of International Classification of Diseases (ICD)-9 code analysis [14, 15]. All relevant ICD-9 codes assigned from the index encounter to 6 months prior were included. These data were transformed into a single composite score using established methods [4, 1517]. This composite summary score is designed to control for baseline patient illness, with higher composite scores indicating greater comorbid disease burden.

Data Analysis

Categoric data were presented using counts and percentages and continuous data were presented using means and SD. Intraclass correlation coefficients (ICC) for all outcomes were estimated using unconditional random intercept models to assess the need to account for clustering effects of attending radiologists and attending ED physicians. The observed ICCs were small in magnitude, ranging from less than 0.01 to 0.01 for attending radiologists and less than 0.01–0.1 for ED attending physicians. Because of negligible nesting effect, conventional univariate and multivariable logistic regression modeling was performed.

The risk adjustment for variables thought to influence outcome included baseline risk (demographics, Elixhauser comorbidities), clinical factors (vital signs, ED triage and pain scores, laboratory data, hydrocephalus, prior intracranial hemorrhage, neurosurgical consultation within last 12 months), and system factors (time of CT, attending physician experience, neuroradiology training). Multivariable models were built to analyze the effect of individual radiologists on subsequent outcomes. Predictors with p < 0.15 in univariate analysis were included in the stepwise multivariable models. The final model selection was according to the minimum Akaike information criterion. Radiology resident involvement in examination interpretation did not reach threshold for inclusion in the final multivariable models.

To assess the overall significance effect of individual radiologists, the difference between deviance statistics equal to −2 × log likelihood of the final models with and without radiologists was calculated and tested. This deviance difference uses a chi-square distribution with degrees of freedom equal to the difference between the numbers of parameters estimated. Additionally, to better quantify radiologist effect, the difference between the AUC from final models with and without radiologist was considered. Adjusted odds ratios and 95% CIs were calculated and presented graphically. Any p value less than 0.007 was considered significant for primary outcomes after Bonferroni correction (0.05 divided by seven outcome measures, 0.007). All inference testing was performed using SAS version 9.4 (SAS Institute). Power estimation was done using PASS version 16 software (NCSS Statistical Software).

Post-Hoc Subgroup and Sensitivity Analyses

CT report impressions of a random sample of approximately one-third (8433) of all included CT scans were manually reviewed by one study team member with 8 years of experience blinded to all other data. Report impressions were coded as positive (i.e., mass, acute or subacute ischemia, hemorrhage, hematoma, hydrocephalus, midline shift, infection, cerebral edema, anoxic injury, herniation, aneurysm, acute or subacute fracture), indeterminate (i.e., possible important finding, that is, questionable stroke, hemorrhage, or mass, or finding of indeterminate significance), or negative (i.e., clinically unimportant findings).

Sensitivity analyses were conducted to determine whether the observed effects of radiologist variation on inpatient admission, readmission within 30 days, and 30-day mortality were robust to severity of pain (pain scores of 9 or 10, or pain scores of 8 or less), severity of illness (ED triage score of 1, or ED triage scores of 2–5), and CT result type (positive, indeterminate, or negative). Sensitivity analyses on the basis of pain scores and ED triage scores were performed on the entire dataset because that information was available for every patient. Sensitivity analyses on the basis of CT result type were performed on the manually reviewed subgroup. In each case, multivariable logistic regression models were built to analyze the effect of individual radiologists on subsequent outcomes considering the interaction of radiologist with the sensitivity variable.

To determine the effect of report language on the likelihood of subsequent imaging, the use of the term “MRI” in report impressions was modeled with respect to the likelihood of MRI within 72 hours. This analysis was performed on the overall cohort and also stratified by CT results (positive, indeterminate, negative) in the manually reviewed cohort.

Power Calculation

A priori sample size estimation was informed by results from a previous study analyzing risk-adjusted radiologist effect on hospital admission for ED patients undergoing CT for right lower quadrant pain [4]. The power of the likelihood ratio test to reject the null hypothesis that radiologist has no effect on hospital admission was estimated. The effect size was measured as the magnitude of the chi-square statistics derived from final models inclusive and exclusive of radiologist [19]. A sample size of 11,500 would achieve 80% power to detect an overall radiologist effect size (w) of 0.04 with a significance level (α) of 0.05.

Results

Study Population

There were 25,596 included patients (1 index CT per patient). A small majority (53.7% [13,757/25,596]) were women and a large majority (98% [25,119/25,596]) had no neurosurgical history. Mean Elixhauser comorbidity sum score was 4.7 ± 11.6 (range: −29 to 84). On presentation to the ED, most patients had either zero (37.6% [9,623/25,596]) or one (44.6% [11,417/25,596]) abnormal vital signs (Table 1). Additional data are shown in Table 1. Model denominators varied due to missing data, but the minimum model sample size was 16,642 (neurology consultation) (Tables 24). Therefore, each of the models had adequate power to detect a radiologist effect size equal to or smaller than 0.04.

TABLE 2:

Final Multivariable Models for Outcomes Related to Resource Utilization

Covariate MRI ≤ 72 ha Neurology Consultationb Neurosurgical Consultationc

p Odds Ratio (95% CI) p Odds Ratio (95% CI) p Odds Ratio (95% CI)

Male sex 0.03 1.21 (1.02–1.43)
Elixhauser sum score < 0.001 1.02 (1.01–1.02) 0.03 1.01 (1.00–1.02)
Pain score < 0.001 0.95 (0.94–0.97) < 0.001 0.95 (0.95–0.95)
ED triage score < 0.001 0.67 (0.61–0.73) < 0.001 0.84 (0.84–0.84) < 0.001 0.71 (0.62–0.83)
Laboratory value
 Hemoglobin < 0.001 1.07 (1.05–1.10) < 0.001 1.08 (1.08–1.08) 0.002 0.95 (0.91–0.98)
 Creatinine < 0.001 0.81 (0.74–0.87) < 0.001 0.94 (0.94–0.94) 0.004 0.83 (0.73–0.94)
 WBC count 0.04 0.99 (0.98–1.00)
Body temperature 0.004 1.13 (1.04–1.22)
Pulse 0.005 1.00 (0.99–1.00)
Systolic blood pressure < 0.001 1.01 (1.01–1.01)
History of
 Hydrocephalus < 0.001 2.90 (1.88–4.47)
 Subarachnoid hemorrhage < 0.001 4.80 (2.93–7.89)
 Neurosurgical consult within last 12 mo < 0.001 2.46 (1.90–3.20) < 0.001 1.69 (1.68–1.70)
Time of CT scan
 Weekday vs weekend 0.003 1.20 (1.07–1.35) < 0.001 1.13 (1.13–1.13)
 Daytime vs overnight 0.001 1.24 (1.11–1.38)
Provider experience (y)
 Attending radiologist < 0.001 0.87 (0.83–0.90)
 Attending ED physician < 0.001 1.62 (1.62–1.62)
ED attending physician < 0.001 Variable (< 0.001 to > 999) < 0.001 Variable (0.08–4.50)
Attending radiologist
 All < 0.001 Variable (0.009–38.2) < 0.001 Variable (0.4–3.2) < 0.001 Variable (0.1–9.9)
 Neuroradiology fellowship trained < 0.001 1.3 (1.307–1.312)

Note—Data with variable odds ratios include the range of observed odds ratios for members of that class (e.g., attending radiologist). Any p value less than 0.007 was considered significant for hypothesis testing after Bonferroni correction. Dash (—) indicates that covariate did not enter final model. ED = emergency department.

a

Total of evaluable patients for this outcome was 18,826.

b

Total of evaluable patients for this outcome was 16,642.

c

Total of evaluable patients for this outcome was 20,702.

TABLE 4:

Final Multivariable Models for Outcomes Related to Neurosurgical Intervention and Death Within 30 Days

Covariate Neurosurgical Interventiona Death After 30 db

p Odds Ratio (95% CI) p Odds Ratio (95% CI)

Patient age < 0.001 1.03 (1.02–1.03)
Elixhauser sum score < 0.001 1.03 (1.02–1.04) < 0.001 1.04 (1.03–1.04)
Pain score 0.01 0.94 (0.89–0.99)
ED triage score < 0.001 0.36 (0.31–0.41)
Laboratory value
 Hemoglobin < 0.001 1.20 (1.10–1.30) < 0.001 0.86 (0.83–0.88)
 Creatinine 0.006 1.08 (1.02–1.13)
 WBC count < 0.001 1.01 (1.01–1.01)
 Functional oxygen saturation < 0.001 0.94 (0.93–0.96)
Body temperature < 0.001 0.79 (0.75–0.83)
Pulse < 0.001 1.02 (1.02–1.02)
Respiratory rate 0.002 0.89 (0.83–0.96) 0.008 1.02 (1.00–1.03)
Systolic blood pressure < 0.001 0.99 (0.991–0.996)
Prior subdural hematoma 0.002 4.48 (1.71–11.77)
Neurosurgical consult within last 12 mo < 0.001 6.35 (3.69–10.97) < 0.001 2.13 (1.51–3.01)
Attending radiologist 0.04 Variable (0.2–13.9) 0.14 Variable (0.4–4.1)

Note—Data with variable odds ratios include the range of observed odds ratios for members of that class (e.g., attending radiologist). Any p value less than 0.007 was considered significant for hypothesis testing after Bonferroni correction. Dash (—) indicates covariate did not enter final model. ED = emergency department.

a

Total of evaluable patients for this outcome was 19,439.

b

Total of evaluable patients for this outcome was 20,408.

In this study 57.5% (14,718/25,596) of CT interpretations were made by neuroradiologists, and the mean experience of the attending radiologists was 13.7 ± 9.2 years. CT scans most commonly were obtained on a weekday (73.8% [18,884/25,596]) during the daytime (52.1% [13,333/25,596]). The study cohort was managed by 81 ED attending physicians who treated mean of 593.1 ± 334.3 included patients and had a mean experience of 12.4 ± 9.3 years.

Multivariable Analyses

Individual model denominators varied because of incomplete covariate data (Tables 24). Of the included patients in each model, 10.8% (2033/18,826) underwent brain MRI 72 hours or less after the CT, 13.0% (2164/16,642) underwent neurology consultation, 2.7% (567/20,702) underwent neurosurgical consultation, 41.3% (7437/18,005) were admitted from the ED, 9.0% (1906/21,229) were readmitted within 30 days, 0.5% (103/19,439) underwent neurosurgical intervention, and 4.7% (956/20,408) died within 30 days.

After risk adjustment, individual radiologists were not found to be independent predictors of hospitalization (admission, p = 0.49; 30-day readmission, p = 0.30) or overall patient outcome (30-day mortality, p = 0.14; neurosurgical intervention, p = 0.04), but were predictors of resource use, including brain MRI within 72 hours or less (p < 0.001; odds ratio range among radiologists [OR], 0.009–38.2), neurology consultation (p < 0.001; OR range, 0.4–3.2), and neurosurgical consultation (p < 0.001; OR range, 0.1–9.9) (Figs. 13).

Fig. 1—

Fig. 1—

Graph of multivariable odds ratios (squares) and 95% CIs (dashes) illustrates effect of individual radiologists on downstream brain MRI use within 72 hours after head CT for headache. Higher odds ratios indicate higher odds of brain MRI being performed. Data are plotted on log scale.

Fig. 3—

Fig. 3—

Graph of multivariable odds ratios (squares) and 95% CIs (dashes) illustrates effect of individual radiologist on neurosurgical consultation after head CT for headache. Higher odds ratios indicate higher odds of neurosurgical consultation being obtained. Data are plotted on log scale.

In general, the overall magnitude of radiologist effect was small but varied substantially among individual radiologists. The variance explained for brain MRI performed in 72 hours or less was 3.6% (95% CI, 3.3–3.9%), for neurology consultation it was 0.8% (95% CI, 0.7–1.0%), and for neurosurgical consultation it was 2.8% (95% CI, 2.6–3.0%). Figures 13 show risk-adjusted odds ratios and 95% CIs for the 55 radiologists and their individual effect on outcome.

The greatest effect of individual radiologist on outcome was for likelihood of brain MRI within 72 hours after the index CT (Fig. 1). This outcome showed wide radiologist-level variation (maximum OR, 38.2 [95% CI, 16.9–86.6]; minimum OR, 0.009 [95% CI, 0.001–0.14]) that was not affected by subspecialty training (Table 2). However, the effect was mitigated by radiologist experience (p < 0.001; OR, 0.87; 95% CI, 0.83–0.90) (Table 2). For every additional year of attending radiologist experience, there were 13% less odds of a brain MRI being performed within 72 hours after the index CT.

Although individual radiologists were also associated with the likelihood of neurology consultation (p < 0.001), the radiologist effect was small both overall (model variance explained, 0.8%) and at the individual radiologist level (OR range, 0.4–3.2), with only a few individual outliers (Fig. 2). For this outcome, neuroradiology training (p < 0.001; OR, 1.3; 95% CI, 1.307–1.312) increased the probability that a neurology consultation would be obtained, but radiologist experience did not (Table 2).

Fig. 2—

Fig. 2—

Graph of multivariable odds ratios (squares) and 95% CIs (dashes) illustrates effect of individual radiologist on neurology consultation after head CT for headache. Higher odds ratios indicate higher odds of neurology consultation being obtained.

Radiologist variation explained 2.8% of model variance for neurosurgical consultation (p < 0.001), but adjusted odds ratios for individual radiologists were unstable with wide CIs, likely because of the rarity of the outcome (567 consults after 20,702 CT examinations) (Fig. 3). Neither neuroradiology training nor radiologist experience had a significant effect in this model (Table 2).

Post Hoc Subgroup and Sensitivity Analyses

Of 8433 manually reviewed CT report impressions, 15.6% (1319/8433) were positive, 5.9% (494/8433) were indeterminate, and 78.5% (6620/8433) were negative. There was no significant interaction of individual radiologist with the presence of severe pain (pain score 9 or 10), highly urgent ED acuity (triage score of 1), or CT result type (positive, indeterminate, or negative) for the outcomes of inpatient admission (p = 0.99, 0.90, 0.81, respectively), readmission within 30 days (p = 0.99, 0.99, 0.99, respectively), or 30-day mortality (p = 0.99, 0.35, 0.99, respectively), indicating that prevalence of disease and disease severity did not modulate the null effect of radiologist variation on these outcome measures.

MRI was mentioned in 50.3% (12,510/24,783) of CT report impressions. Presence of MRI in the CT report impression increased the likelihood of an MRI being performed within 72 hours. This was true overall (p < 0.001; OR, 1.8; 95% CI, 1.6–2.1) and for all CT result types: positive (p = 0.02; OR, 1.5; 95% CI, 1.1–2.0), indeterminate (p = 0.02; OR, 2.0; 95% CI, 1.1–3.8), and negative (p < 0.001; OR, 1.9; 95% CI, 1.6–2.3).

Discussion

We found that individual radiologists did not predict important clinical outcomes for ED patients undergoing CT for headache, regardless of disease prevalence or illness severity. The lack of effects was seen in negative studies as well as positive studies, in severely ill patients and patients with milder symptoms, and across a wide range of radiologist experience and training. However, radiologists did have an effect on downstream health care resource use. In particular, there was wide radiologist-level variation in the likelihood a brain MRI would be performed within 72 hours after the index CT (OR range, 0.009–38.2). This relationship was not modulated by neuroradiology fellowship training, but was affected by radiologist experience. If a patient’s head CT was interpreted by a more experienced physician, the patient had 13% less odds of undergoing a subsequent MRI per year of attending radiologist experience. Notably, neither the individual ED physician nor the ED physician’s level of experience affected this outcome. Individual radiologists also predicted the likelihood of neurology and neurosurgery consultation, but these relationships were smaller in magnitude (neurology) or less stable (neurosurgery) than the association with additional imaging. In general, these relationships imply not only that radiologists are affecting downstream care activities, but that they are doing so inconsistently (homogeneous performance would not have produced significant results in our models).

Detecting differences in patient care that can be linked to radiologist variation is a first step toward uncovering the impact of radiologist performance, and opens the potential for meaningful performance improvement. Outcomes that are insensitive to radiologist differences (e.g., mortality within 30 days in this study) are unlikely to be affected by further standardization of radiologist performance, whereas outcomes that are sensitive to radiologist differences (e.g., MRI within 72 hours after head CT in this study) should be explored for possible targeting. Although these results are associations and causation has not been confirmed, if radiologists do affect the likelihood of subsequent imaging as suggested here, we should aim for radiologists to have a more consistent effect on use of that resource. Now that evidence of provider-level variation exists and this variation predicts differences in subsequent resource use, the relationship can be further explored. For example, potential explanations for the study’s findings could include variation in use of diagnostic certainty words in radiology reports, variable provider sensitivity and specificity, and nonstandard methods of making recommendations. Post hoc secondary analyses showed that mentioning MRI in the report impression—even for negative studies— approximately doubled the odds that an MRI would be performed within 72 hours.

Diagnostic radiologist quality indicators that link radiologist activity to patient outcome are rare, largely because of the difficulty in making measurements that accurately reflect the results of radiologist activity [16]. For example, in 2018, a retrospective cohort study of 2169 ED patients undergoing CT for right lower quadrant pain investigated the effect of individual radiologist on clinical outcomes (admission, readmission, surgery, aspiration, or drainage) and found no difference between radiologists within or across specialty (abdominal or emergency radiology) [4]. However, despite including 2169 patients, the authors concluded their sample size was only adequate to assess for moderate or large pairwise effect sizes. Effect size and lack of control groups (because all imaging has an attending radiologist interpretation) are key challenges in linking radiologist activity to patient care. In the current work, radiologist-level variation explained only 3.1% of the variance in the model that had the largest detectable effect (MRI within 72 hours).

The lack of differentiation between neuroradiologists and emergency radiologists for most of the outcomes we studied implies that for ED patients with headache, overall patient outcomes are similar when care is provided by either group of radiologists. Of the seven outcomes we studied, neuroradiology training only was associated with one: CT examinations interpreted by neuroradiologists were more likely to be followed by neurology consultation. Whether this is because of differences in radiologist reporting or communication style or because of selection bias is unclear. ED examinations at our institution are interpreted by neuroradiologists from 3 am to 6 pm, and by emergency radiologists from 6 pm to 3 am. An unknown collinear temporal bias related to neurologist availability might be affecting these results.

Although this study has shown that radiologists are associated with variation in resource use, the study design does not inform in detail why that occurs or whether the relationship is causal. Further work would be required to uncover those associations. One possible method is to target the highest and lowest decile radiologists in models in which radiologist had an effect, and then analyze sample reports from those individuals in a blinded randomized fashion with structured coding. That might inform whether style, structure, wording, or accuracy differences could affect the likelihood that resources like MRI or subspecialty consultation are subsequently used. For example, post hoc secondary analyses showed that simply including the term “MRI” in the report impression increased the odds that MRI would be obtained, regardless of the report contents. The current study design treats the report as a black box and uses the premise that if provider-level variation has no measurable effect on patient outcome, then there is likely little benefit to targeting that process for provider-level improvement. Null effects on patient outcome should not be misinterpreted to mean that radiologists are expendable. If radiologists are necessary but performing at a consistent level, they would have no detectable effect in our models. This is because all patients underwent head CT and all patients had an attending radiologist interpretation of their imaging examination. Detecting an effect of individual radiologist on a given outcome indicates that the outcome is sensitive to radiologist input, but the inverse (no effect) is not necessarily true. Finally, other outcomes were not studied (e.g., time-to-treatment of an acute abnormality) that may be sensitive to radiologist variation. Further study would be needed to address those.

In conclusion, radiologists with different skills, experience, and practice patterns appear interchangeable for major clinical outcomes when interpreting CT for headache in the ED, but their differences predict differential use of downstream health care resources. Neuroradiologists and emergency radiologists performed similarly from the point of view of the average patient. Resource use measures, particularly MRI within 72 hours after CT, are potential quality indicators in this cohort, although the overall contribution of radiologist variation to resource consumption is low. Including exculpatory language that mentions MRI in negative CT head reports probably increases the odds of MRI being performed. Further study to determine whether, why, and how radiologist variation is contributing to variation in downstream care is needed to better inform directed quality improvement activities.

TABLE 3:

Final Multivariable Models for Outcomes Related to Hospital Admission and Readmission Within 30 Days

Covariate Inpatient Admissiona Readmission After 30 db

p Odds Ratio (95% CI) p Odds Ratio (95% CI)

Male sex < 0.001 1.41 (1.31–1.51)
Patient age < 0.001 1.02 (1.02–1.02) < 0.001 1.01 (1.01–1.01)
Elixhauser sum score < 0.001 1.04 (1.03–1.04) < 0.001 1.02 (1.02–1.02)
Pain score 0.001 0.98 (0.97–0.99)
ED triage score < 0.000 0.51 (0.48–0.55)
Laboratory value
 Hemoglobin < 0.001 0.86 (0.84–0.87) < 0.001 0.91 (0.89–0.92)
 Creatinine < 0.001 1.24 (1.18–1.29) 0.01 1.05 (1.01–1.09)
 WBC count < 0.001 1.08 (1.07–1.09)
 Functional oxygen saturation < 0.001 0.93 (0.92–0.95)
Body temperature < 0.001 1.08 (1.04–1.13) 0.05 1.05 (1.00–1.10)
Pulse < 0.001 1.01 (1.01–1.02) 0.001 1.00 (1.00–1.01)
Respiratory rate < 0.001 1.03 (1.02–1.04)
Blood pressure
 Systolic 0.046 0.999 (0.998–1.000)
 Diastolic < 0.001 0.996 (0.995–0.998)
History of hydrocephalus < 0.001 1.76 (1.27–2.44)
Neurosurgical consult within last 12 mo < 0.001 5.31 (4.12–6.85) < 0.001 1.91 (1.48–2.46)
Attending radiologist 0.49 Variable (0.2–3.9) 0.30 Variable (0.2–2.5)

Note—Data with variable odds ratios include the range of observed odds ratios for members of that class (e.g., attending radiologist). Any p value less than 0.007 was considered significant for hypothesis testing after Bonferroni correction. Dash (—) indicates covariate did not enter final model. ED = emergency department.

a

Total of evaluable patients for this outcome was 18,005.

b

Total of evaluable patients for this outcome was 21,229.

Acknowledgments

Supported in part by NIH grants UL1TR00433 and UL1TR002240.

References

  • 1.Larson DB, Donnelly LF, Podberesky DJ, Merrow AC, Sharpe RE Jr, Kruskal JB. Peer feedback, learning, and improvement: answering the call of the Institute of Medicine report on diagnostic error. Radiology 2017; 283:231–241 [DOI] [PubMed] [Google Scholar]
  • 2.Davenport MS, Larson DB. Measuring diagnostic radiologists: what measurements should we use? J Am Coll Radiol 2019; 16:333–335 [DOI] [PubMed] [Google Scholar]
  • 3.Rauscher GH, Murphy AM, Orsi JM, Dupuy DM, Grabler PM, Weldon CB. Beyond the mammography quality standards act: measuring the quality of breast cancer screening programs. AJR 2014; 202:145–151 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Davenport MS, Khalatbari S, Ellis JH, Cohan RH, Chong ST, Kocher KE. Novel quality indicators for radiologists interpreting abdominopelvic CT images: risk-adjusted outcomes among emergency department patients with right lower quadrant pain. AJR 2018; 210:1292–1300 [DOI] [PubMed] [Google Scholar]
  • 5.Itri JN, Raghavan K, Patel SB, et al. Developing quality measures for diagnostic radiologists: part 1. J Am Coll Radiol 2018; 15:1362–1365 [DOI] [PubMed] [Google Scholar]
  • 6.Itri JN, Raghavan K, Patel SB, et al. Developing quality measures for diagnostic radiologists: part 2. J Am Coll Radiol 2018; 15:1366–1384 [DOI] [PubMed] [Google Scholar]
  • 7.Callaghan BC, Kerber KA, Pace RJ, Skolarus LE, Burke JF. Headaches and neuroimaging: high utilization and costs despite guidelines. JAMA Intern Med 2014; 174:819–821 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.American College of Radiology website. Ten things physicians and patients should question. www.choosingwisely.org/societies/american-college-of-radiology. Updated June 29, 2017. Accessed October 28, 2018
  • 9.Clarke CE, Edwards J, Nicholl DJ, Sivaguru A. Imaging results in a consecutive series of 530 new patients in the Birmingham Headache Service. J Neurol 2010; 257:1274–1278 [DOI] [PubMed] [Google Scholar]
  • 10.Sempere AP, Porta-EtessAm J, Medrano V, et al. Neuroimaging in the evaluation of patients with non-acute headache. Cephalalgia 2005; 25:30–35 [DOI] [PubMed] [Google Scholar]
  • 11.Silberstein SD. Practice parameter: evidence-based guidelines for migraine headache (an evidence-based review): report of the Quality Standards Subcommittee of the American Academy of Neurology. Neurology 2000; 55:754–762 [DOI] [PubMed] [Google Scholar]
  • 12.American College of Radiology website. ACR Appropriateness Criteria: headache. acsearch.acr.org/docs/69482/Narrative/. Accessed October 28, 2018
  • 13.Holle D, Obermann M. The role of neuroimaging in the diagnosis of headache disorders. Ther Adv Neurol Disorder 2013; 6:369–374 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity measures for use with administrative data. Med Care 1998; 36:8–27 [DOI] [PubMed] [Google Scholar]
  • 15.Quan H, Sundararajan V, Halfon P, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care 2005; 43:1130–1139 [DOI] [PubMed] [Google Scholar]
  • 16.van Walraven C, Austin PC, Jennings A, Quan H, Forster AJ. A modification of the Elixhauser comorbidity measures into a point system for hospital death using administrative data. Med Care 2009; 47:626–633 [DOI] [PubMed] [Google Scholar]
  • 17.Thompson NR, Fan Y, Dalton JE, et al. A new Elixhauser-based comorbidity summary measure to predict in-hospital mortality. Med Care 2015; 53:374–379 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Agency for Healthcare Research and Quality website. Emergency Severity Index (ESI): a triage tool for emergency department care. www.ahrq.gov/sites/default/files/wysiwyg/professionals/systems/hospital/esi/esihandbk.pdf. Accessed January 6, 2019
  • 19.NCSS website. PASS sample size software. ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/PASS/Chi-Square_Tests.pdf. Accessed January 23, 2019

RESOURCES