Feasibility of Pediatric Diagnostic Quality Measurement in All United States Hospitals

Kenneth A Michelson; Joseph A Grubenhoff

doi:10.1016/j.annemergmed.2025.07.035

. Author manuscript; available in PMC: 2025 Sep 6.

Published before final editing as: Ann Emerg Med. 2025 Sep 3:S0196-0644(25)01134-5. doi: 10.1016/j.annemergmed.2025.07.035

Feasibility of Pediatric Diagnostic Quality Measurement in All United States Hospitals

Kenneth A Michelson ¹, Joseph A Grubenhoff ²

PMCID: PMC12412902 NIHMSID: NIHMS2101426 PMID: 40905901

Abstract

Objective

To evaluate the proportion of emergency departments (EDs) with sufficient volumes to measure pediatric misdiagnosis reliably.

Methods

We conducted a cross-sectional study of a nationally representative 20% sample of US EDs within the 2022 Nationwide Emergency Department Sample. We counted the number of child visits (<18 years old) at each ED for each of 24 serious pediatric emergency conditions and each ED’s total across all conditions. We calculated the proportion of EDs that could reliably measure misdiagnosis rates at least 10% worse than condition-specific national reference standards. We also calculated the proportion of children visiting measurable EDs.

Results

Reliable misdiagnosis measurement across all serious conditions was possible in 614/4515 EDs (13.6%, 95% confidence interval [CI] 11.5–15.9). Appendicitis was the most reliably measurable condition (n=530 EDs, 11.7%, 95% CI 9.8–14.0), while complicated pneumonia (n=33, 0.7%, 95% CI 0.3–1.5), testicular torsion (n=29, 0.6%, 95% CI 0.2–1.4), and intussusception (n=25, 0.6%, 95% CI 0.2–1.2) were less frequently measurable. The 20 other included conditions were not reliably measurable in any ED (0.0%, 95% CI 0.0–0.4). Midwest, non-metropolitan, and EDs evaluating <1800 children/year were least likely to be able to support reliable measurement. Among 185490 children with a serious condition, 130894 (70.6%) visited an ED in which misdiagnosis was measurable.

Conclusions

Few EDs have sufficient pediatric volumes to reliably measure diagnostic accuracy generally, and even fewer can do so for individual conditions. Aggregation of EDs could improve power to measure misdiagnosis.

INTRODUCTION

Background

Timely diagnosis of serious conditions improves outcomes.¹ Delayed diagnosis is common in medicine, with as many as 73,000 attributable deaths in the United States each year.^2,3 Approximately 3% of children in specialized pediatric emergency departments (EDs) experienced misdiagnoses.⁴ However, most children in the US are treated in general, non-pediatric EDs. More than half of US EDs evaluate fewer than 5 children per day and report low readiness to care for children.⁵ Misdiagnosis is substantially more common in EDs with less pediatric experience, with possible delays in diagnosis occurring 27% less frequently for every 2-fold increase in pediatric volume.⁶

Importance

There is growing interest in developing quality metrics of misdiagnosis as a means of improving diagnostic excellence.^7,8 The criterion standard for assessing misdiagnoses relies on manual record review.^9,10 Trigger tools, electronic screening approaches that identify high-risk cases (such as revisits), can reduce the review burden.^4,11,12 Unfortunately, conducting such reviews remains infeasible in most EDs, as few have comprehensive programs to perform such evaluations.¹³ The use of administrative data for diagnostic quality metrics is attractive because it obviates the need for manual review.¹⁴

If administrative data quality metrics were implemented, each ED would need sufficient case volumes of patients at risk of misdiagnosis. However, given the low volumes of children treated by most EDs, whether there is sufficient volume to support measurement is unclear. This investigation would inform the direction of future efforts to measure misdiagnosis in pediatric emergency care.

Goals of This Investigation

Our objectives were (1) to evaluate the proportion of US EDs with sufficient volumes to support accurate diagnostic quality measurement across a range of serious pediatric emergency conditions, and (2) to determine how many US children visit EDs with sufficient volumes. We hypothesized that few US EDs would have sufficient volumes to support measurement for most serious conditions.

METHODS

Study Design and Setting

We conducted a cross-sectional study of US EDs using all data from the 2022 Nationwide Emergency Department Sample. This database is a nationally representative sample of 20% of US EDs, with complete data from each of the sampled EDs. The database can be used to make national estimates of number of EDs.

Selection of Participants

We analyzed ED visits among children < 18 years old with a serious pediatric emergency condition. Conditions were chosen based on their propensity for missed diagnoses in children, for the likelihood that missed diagnoses may be associated with worse outcomes, and acuity of presentation (as opposed to missed chronic conditions such as cancer). The 24 conditions we included were appendicitis, atraumatic cranial hemorrhage, bacterial meningitis, compartment syndrome, complicated pneumonia, craniospinal abscess, deep neck infection, ectopic pregnancy, encephalitis, intussusception, Kawasaki disease, mastoiditis, myocarditis, necrotizing fasciitis, orbital cellulitis, osteomyelitis, ovarian torsion, pulmonary embolism, pyloric stenosis, septic arthritis, sinus venous thrombosis, slipped capital femoral epiphysis, stroke, and testicular torsion. The condition set was used in a recent study of misdiagnosis.⁶ Each eligible visit was identified based on having a diagnosis code in any position for the condition (Supplemental Table 1). Finally, we also used a composite of all serious conditions.

Outcome

The primary outcome was measurability: whether misdiagnoses for a given condition could be measured reliably in a specific ED. A condition was considered measurable if an ED had enough cases to detect whether the misdiagnosis rate was significantly higher than a reference rate. We derived reference rates from a prior 8-state study that measured rates of possible delayed diagnoses.^6,15 For example, 12.8% of children with pyloric stenosis had a possible missed diagnosis, which was the reference rate in this study for pyloric stenosis.

We assessed whether each ED had enough cases of each condition to detect a 10 percentage-point increase in misdiagnoses, which we defined as the minimum clinically meaningful difference. This threshold corresponds to a number needed to harm (NNH) of 10;¹⁶ that is, for every 10 patients with a condition, one more would experience a misdiagnosis. For example, for pyloric stenosis, an ED’s misdiagnosis rate would be considered measurably worse than the reference rate if the lower bound of the 95% binomial confidence interval (CI) around an observed misdiagnosis rate of 22.8% was above 12.8% (Figure 1). These CIs were not weighted, because the database includes all visits for each sampled ED. A complete description of this approach appears in the Supplemental Methods, Section A.

Figure 1: — Approach for determining measurability of misdiagnosis at each hospital. Example simulated hospitals are shown. Pyloric stenosis has a 12.8% reference rate of misdiagnosis. The primary study outcome was whether a hospital could reliably measure misdiagnoses occurring 10% more often than the reference rate – 22.8% for pyloric stenosis. Measurability was present when a hospital’s volume of pyloric stenosis was high enough that the 95% confidence interval around 22.8% did not cross 12.8%.

The primary outcome was measurability of the composite of all conditions, and condition-specific measurability proportions were secondary outcomes.

Variables

The exposure variable was condition-specific volume, defined as the number of encounters per condition in 2022 by ED. We also extracted ED demographics: overall pediatric volume (in National Pediatric Readiness Program categories: <1800; 1800–4999; 5000–9999; or ≥10000 visits in 2022)⁵, urban-rural status (metropolitan teaching, metropolitan non-teaching, or non-metropolitan), trauma level, for-profit status, and US region.

Analysis

We first described ED demographics with proportions. For the composite of all conditions, we calculated the proportion of EDs that achieved measurability and the survey-weighted 95% CI (Supplemental Methods, Section B), then repeated this for each serious condition. To evaluate how measurable misdiagnosis would be in subtypes of EDs, we evaluated measurability by region, urban-rural status, and pediatric volume. We then explored the sensitivity of measurability rates to the NNH cutoff of 10. We varied NNH cutoffs from 1 (extremely insensitive for misdiagnosis) to 100 (extremely sensitive to misdiagnosis) and evaluated what proportion of EDs would achieve measurability. Finally, we determined what proportion of visits for each condition occurred in EDs with measurability. For this visit-level analysis, survey-weighted 95% CIs were calculated accounting for the stratified sample.

We conducted a sensitivity analysis in which we tested four different reference misdiagnosis rates using previously-applied methods^6,15: (1) we used the possible misdiagnosis rate derived only from EDs with at least 10000 pediatric visits per year, (2) the estimated possible misdiagnosis rate from an ED with 50000 pediatric visits per year, (3) a reference misdiagnosis rate 25% lower than the original, and (4) reference rates 25% higher than the original. The latter two rate sets were chosen to represent a spectrum of possible reference rates in case the 8-state study suffered from selection bias relative to a true national reference rate.

Analyses were performed in R 4.5.0 (R Foundation, Vienna, Austria). All counts and statistics were weighted except where noted. Visit-level statistics were performed using the survey package.¹⁷

RESULTS

Characteristics of Study Subjects

We analyzed 993 sampled EDs corresponding to 4515 weighted EDs nationally. Characteristics of EDs are displayed in Table 1. Among 185490 weighted visits for a serious condition, the 3 most common conditions were appendicitis (n=102954, 54.3%), complicated pneumonia (n=13343, 7.1%), and intussusception (n=12675, 6.7%). Ten conditions had total volumes under 2000 visits/year.

TABLE 1:

Characteristics of hospitals, including both weighted national estimates and unweighted observations.

Pediatric volume	National estimate N=4515 n (%), 95% CI	Unweighted count N=993 n (%)
<1,800	1648 (36.5)	346 (34.8)
1,800–4,999	1464 (32.4)	325 (32.7)
5,000–9,999	794 (17.6)	183 (18.4)
10,000 or more	609 (13.5)	139 (14.0)
Region
Northeast	547 (12.1)	117 (11.8)
Midwest	1348 (29.9)	301 (30.3)
South	1715 (38.0)	370 (37.3)
West	905 (20.0)	205 (20.6)
Trauma level
L1	255 (5.6)	65 (6.5)
L2	327 (7.2)	87 (8.8)
L3	443 (9.8)	113 (11.4)
L1 or 2 (anonymized)	22 (0.5)	6 (0.6)
Non-trauma	3468 (76.8)	722 (72.7)
Urban-rural status
Metro non-teaching	1163 (25.8)	260 (26.2)
Metro teaching	1562 (34.6)	355 (35.8)
Non-metro	1790 (39.6)	378 (38.1)
Control^*
Government	872 (19.3)	200 (20.1)
Non-profit	2344 (51.9)	495 (49.8)
For profit	654 (14.5)	155 (15.6)
Missing	645 (14.3)	143 (14.4)

Open in a new tab

Ownership was missing in 14.3% of cases

Main Results

Misdiagnosis of any serious condition was measurable in 614/4515 (13.6%, 95% CI 11.5–15.9) EDs (Table 2). The most commonly measurable conditions were appendicitis, complicated pneumonia, testicular torsion, and intussusception. All 20 other conditions were not measurable in any ED.

TABLE 2:

Proportion (and 95% confidence interval) of emergency departments with sufficient volumes of children to be able to detect a number needed to harm of 10. A number needed to harm of 10 indicates that one additional child would experience misdiagnosis compared with a reference misdiagnosis rate derived from all emergency departments.

Serious condition	n (%, 95% CI)
All conditions	614 (13.6%, 11.5–15.9)
Appendicitis	530 (11.7%, 9.8–14.0)
Complicated pneumonia	33 (0.7%, 0.3–1.5)
Testicular torsion	29 (0.6%, 0.2–1.4)
Intussusception	25 (0.6%, 0.2–1.2)
Each other condition^*	0 (0.0%, 0.0–0.4)

Open in a new tab

Atraumatic cranial hemorrhage, bacterial meningitis, compartment syndrome, craniospinal abscess, deep neck infection, ectopic pregnancy, encephalitis, Kawasaki disease, mastoiditis, myocarditis, necrotizing fasciitis, orbital cellulitis, osteomyelitis, ovarian torsion, pulmonary embolism, pyloric stenosis, septic arthritis, sinus venous thrombosis, slipped capital femoral epiphysis, and stroke

More EDs could measure misdiagnosis with lower thresholds of NNH, meaning those EDs could only detect only higher rates of misdiagnosis. Using a NNH threshold of 5, 38.9% of EDs were measurable; at a threshold of 20, 3.6% were measurable (Figure 2). Measurability significantly differed by region (lowest in the Midwest), urban-rural status (lowest in non-metropolitan EDs), and pediatric volume (lowest in EDs with <1800 pediatric visits/year) (Figure 3).

Figure 2: — Proportion (and in blue, 95% confidence interval) of emergency departments with sufficient volumes of children with any serious condition that misdiagnosis could be detected. Proportions were calculated for number needed to harm detection thresholds ranging from 1 (every child with a serious condition at the emergency department experiences misdiagnosis) to 100 (for every 100 children who visit the emergency department with a serious diagnosis, 1 additional child experiences misdiagnosis compared to the reference rate).

Figure 3: — Proportions (and 95% confidence intervals) of emergency departments that could precisely measure misdiagnosis rates by region, urban-rural status, and volume.

Among 185490 children with a serious condition, 130894 (70.6%, 95% CI 64.9, 75.8) visited an ED in which misdiagnosis was measurable. Visits to EDs that could precisely measure misdiagnosis were only possible for children with four specific conditions: appendicitis (60.9%, 95% CI 54.7, 66.7), complicated pneumonia (27.7%, 95% CI 13.4, 46.3), intussusception (21.7%, 95% CI 8.8, 40.5), and testicular torsion (17.4%, 95% CI 7.1, 33.0). Among the other 20 conditions, no children visited an ED that could measure misdiagnosis precisely enough to detect a NNH of 10.

Measurability rates were similar using two other methods of setting reference standards. Using the reference standards derived from EDs with volume >10000 of children, 676 (15.0%) EDs were measurable; and using the reference standards derived from EDs with an estimated volume of 50000, 569 (12.6%) were measurable. Reference standards 25% lower than the original reference values resulted in 741 (16.4%) of EDs being measurable, while using values 25% higher resulted in 549 (12.2%) being measurable.

LIMITATIONS

The main limitation was possible selection bias. We could only evaluate the incidence of children with diagnosed conditions, meaning the ED had to be aware that the patient had the condition. Because missed diagnoses are more common at low-volume EDs, our study was likely biased toward undercounting children with serious conditions. However, this bias would be unlikely to substantially alter study results, as missed diagnoses would have to be many-fold more common than timely diagnoses for this to affect conclusions. Second, as an additional contributor to selection bias, we selected a broad list of conditions, but it is also not comprehensive, for example by not including chronic conditions. By definition, adding more conditions would increase all-condition measurability by increasing denominator size. Third, there was possible misclassification bias, as some children with a diagnosis code for a serious condition may not have had it. That would tend to bias the study toward overcounting, though we intentionally chose conditions where this was unlikely. It would also not affect study conclusions, since the size of this bias would only decrease the number of measurable EDs. Finally, we did not measure true misdiagnosis rates but rather projected typical rates based on a prior multistate study.

DISCUSSION

Misdiagnoses among a set of serious pediatric emergency conditions are precisely measurable at few US EDs, although most children visit those EDs. Among the 13.6% of EDs that have sufficient volumes of children to support measurement in aggregate, appendicitis is the main driver of serious condition volume. Less than 1% of US EDs has sufficient volume to measure misdiagnoses in any other serious condition we evaluated. This study illustrates the central contradiction in pediatric diagnostic quality improvement nationally. Each low-volume ED individually sees too few children to measure quality. Yet, most EDs are low-volume, and in aggregate have substantially higher rates of misdiagnosis than high-volume EDs.⁶

Our study shows that misdiagnoses in appendicitis are the best single-condition pediatric diagnostic quality metric, because they are by far the most common. It is probably the only condition in which some higher-volume general EDs could individually be evaluated. Appendicitis is also appealing as a potential target because there has been substantial work to understand diagnosis and misdiagnoses in appendicitis.^18–24 Prior work has also used administrative data to detect these misdiagnoses with high accuracy.²⁵

More generally, our findings indicate that widespread measurement of pediatric misdiagnosis is only possible in US EDs by increasing the denominator of children evaluated. There are several potential avenues to do this. First, a larger set of conditions could be evaluated. We created our set of conditions by focusing on acute conditions we feel can be accurately identified in administrative data, but there may be others. For example, we avoided conditions in which misdiagnoses are known not to be accurately evaluated using administrative data such as sepsis.²⁶ We also avoided chronic conditions such as new cancer diagnoses, however a broader composite measure of misdiagnoses could include some. Second, rather than evaluating misdiagnosis rates in single EDs, they could be evaluated at higher levels of aggregation, such as within hospital systems that have a single administrative structure, or within whole regions, wherein large children’s hospitals often play a regional role in coordinating care. Third, EDs could measure over a longer period of time to accrue more cases. In this study we used a year of data, which is a relatively long period of time for quality efforts – extending this further would limit the usefulness of measures. Fourth, measuring misdiagnosis rates among clinicians or clinician groups rather than institutions could provide measurable and actionable data. Finally, lower NNH benchmarks could be used. We showed that decreasing the NNH threshold to 5 doubled the number of EDs that could measure all-condition misdiagnoses. However, that increased power to detect misdiagnoses carries a tradeoff with the ability to detect only very significant harms.

Serious safety events involving misdiagnoses tend to lead to more serious harm than those involving other contributors.²⁷ Pediatric health system leaders recognize the risks that misdiagnoses pose to the children they serve and share frustration regarding the ability to measure them for improvement.²⁸ National healthcare quality organizations have identified institutional diagnostic safety and quality measurement priorities to begin learning how to address these risks. Recently, the US Centers for Disease Control and Prevention published a list of core elements necessary for building diagnostic excellence programs including actions to identify, monitor and learn from diagnostic safety events.²⁹ The National Quality Forum (NQF) published a list of 62 measure concepts in 2017 including three focused on organizational diagnostic performance: diagnostic accuracy in key clinical sites including the ED; establishing mechanisms for capturing and measuring changes in diagnosis; and access to appropriate testing for the most commonly encountered conditions.³⁰ Yet operationalizing these measurement concepts remains challenging. In 2023, NQF convened an expert working group to address measurement barriers, chief among them “identifying data standards need to speed interoperability” indicating that implementing measurement strategies have been slow to emerge.³¹ The Agency for Healthcare Research and Quality has developed a diagnostic safety Common Format for Event Reporting that could provide an interoperable standard for aggregating and analyzing diagnostic safety events.³² However, usability testing highlighted the need for refinement for applicability across care settings and some safety experts question the ability to implement measurement strategies that require significant resource allocation.³³ Methods using administrative data to capture diagnostic safety events show promise in achieving interoperability by leveraging existing data captured for other purposes while also reducing the resources necessary to identify such events.¹⁴ Our data underscore that such interoperability may be challenging to achieve if individual institutions cannot identify the safety signals due to low signal-to-noise ratios or insufficient statistical power.

Based on these findings, future work should focus on (1) expanding the set of conditions in which misdiagnoses occur and developing accurate methods of measurement; (2) testing higher levels of aggregation for diagnostic quality measurement, including hospital systems and regions, (3) educational innovations such as those on cognitive bias or decision-making strategies^34–38, (4) testing whether broad multi-condition diagnostic excellence initiatives improve pediatric diagnosis, (5) evaluating the role of readiness efforts³⁹ and pediatric quality initiatives⁴⁰ are associated with diagnostic quality, and (6) developing and testing approaches that could improve diagnostic quality such as dissemination of clinical pathways, teleconsultation, and cautious integration of artificial intelligence.^19,41–46

In summary, few US EDs have sufficient volumes of children with serious conditions to precisely measure diagnostic accuracy. Decreasing misdiagnoses cannot rely on single-ED measures of diagnostic quality without substantial tradeoffs. Future measurement efforts must innovate beyond single-ED, single-condition rates.

Supplementary Material

Supp Material

NIHMS2101426-supplement-Supp_Material.docx^{(29.3KB, docx)}

REFERENCES

1.Marshall TL, Rinke ML, Olson APJ, Brady PW. Diagnostic Error in Pediatrics: A Narrative Review. Pediatrics. 2022;149(Supplement 3). doi: 10.1542/peds.2020-045948d [DOI] [PubMed] [Google Scholar]
2.Committee on Diagnostic Error in Health Care, Board on Health Care Services, Institute of Medicine, The National Academies of Sciences, Engineering, and Medicine. Improving Diagnosis in Health Care. (Balogh EP, Miller BT, Ball JR, eds.). National Academies Press; 2015:21794. doi: 10.17226/21794 [DOI] [PubMed] [Google Scholar]
3.Singh H, Meyer AND, Thomas EJ. The frequency of diagnostic errors in outpatient care: estimations from three large observational studies involving US adult populations. BMJ Qual Saf. 2014;23(9):727. doi: 10.1136/bmjqs-2013-002627 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Mahajan P, White E, Shaw K, et al. Epidemiology of diagnostic errors in pediatric emergency departments using electronic triggers. Acad Emerg Med. 2025;32(3):226–245. doi: 10.1111/acem.15087 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Remick KE, Hewes HA, Ely M, et al. National Assessment of Pediatric Readiness of US Emergency Departments During the COVID-19 Pandemic. JAMA Netw Open. 2023;6(7):e2321707. doi: 10.1001/jamanetworkopen.2023.21707 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Michelson KA, Rees CA, Florin TA, Bachur RG. Emergency Department Volume and Delayed Diagnosis of Serious Pediatric Conditions. JAMA Pediatr. 2024;178(4). doi: 10.1001/jamapediatrics.2023.6672 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Yang D, Fineberg HV, Cosby K. Diagnostic Excellence. JAMA. 2021;326(19):1905. doi: 10.1001/jama.2021.19493 [DOI] [PubMed] [Google Scholar]
8.Bradford A, Tran A, Ali KJ, et al. Evaluation of Measure Dx, a Resource to Accelerate Diagnostic Safety Learning and Improvement. J Gen Intern Med. 2025;40(4):782–789. doi: 10.1007/s11606-024-09132-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Singh H, Sittig DF. Advancing the science of measurement of diagnostic errors in healthcare: the Safer Dx framework. BMJ Qual Saf. 2015;24(2):103. doi: 10.1136/bmjqs-2014-003675 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Singh H, Khanna A, Spitzmueller C, Meyer AND. Recommendations for using the Revised Safer Dx Instrument to help measure and improve diagnostic safety. Diagnosis. 2019;6(4):315–323. doi: 10.1515/dx-2019-0012 [DOI] [PubMed] [Google Scholar]
11.Lam D, Dominguez F, Leonard J, Wiersma A, Grubenhoff JA. Use of e-triggers to identify diagnostic errors in the paediatric ED. BMJ Qual Saf. 2022;31(10):735–743. doi: 10.1136/bmjqs-2021-013683 [DOI] [PubMed] [Google Scholar]
12.Vaghani V, Gupta A, Mir U, et al. Implementation of Electronic Triggers to Identify Diagnostic Errors in Emergency Departments. JAMA Intern Med. 2025;185(2):143. doi: 10.1001/jamainternmed.2024.6214 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Campione Russo A, Tilly J, Kaufman L, et al. Hospital commitments to address diagnostic errors: An assessment of 95 US hospitals. J Hosp Med. 2025;20(2):120–134. doi: 10.1002/jhm.13485 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Michelson KA, Bachur RG. The High Value of Blurry Data in Improving Pediatric Emergency Care. Hosp Pediatr. 2019;9(12):1007–1009. doi: 10.1542/hpeds.2019-0200 [DOI] [PubMed] [Google Scholar]
15.Michelson KA, Bachur RG, Rangel SJ, Monuteaux MC, Mahajan P, Finkelstein JA. Emergency Department Volume and Delayed Diagnosis of Pediatric Appendicitis: A Retrospective Cohort Study. Ann Surg. 2023;278(6):833–838. doi: 10.1097/SLA.0000000000005972 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Citrome L, Ketter TA. When does a difference make a difference? Interpretation of number needed to treat, number needed to harm, and likelihood to be helped or harmed. Int J Clin Pract. 2013;67(5):407–411. doi: 10.1111/ijcp.12142 [DOI] [PubMed] [Google Scholar]
17.Lumley T. Analysis of Complex Survey Samples. J Stat Softw. 2004;9(8):1–19. doi: 10.18637/jss.v009.i08 [DOI] [Google Scholar]
18.Cotton DM, Vinson DR, Vazquez-Benitez G, et al. Validation of the Pediatric Appendicitis Risk Calculator (pARC) in a Community Emergency Department Setting. Ann Emerg Med. 2019;74(4):471–480. doi: 10.1016/j.annemergmed.2019.04.023 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Kharbanda AB, Vazquez-Benitez G, Ballard DW, et al. Effect of Clinical Decision Support on Diagnostic Imaging for Pediatric Appendicitis. JAMA Netw Open. 2021;4(2):e2036344. doi: 10.1001/jamanetworkopen.2020.36344 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Bachur RG, Levy JA, Callahan MJ, Rangel SJ, Monuteaux MC. Effect of Reduction in the Use of Computed Tomography on Clinical Outcomes of Appendicitis. JAMA Pediatr. 2015;169(8):755–760. doi: 10.1001/jamapediatrics.2015.0479 [DOI] [PubMed] [Google Scholar]
21.Bachur RG, Callahan MJ, Monuteaux MC, Rangel SJ. Integration of Ultrasound Findings and a Clinical Score in the Diagnostic Evaluation of Pediatric Appendicitis. J Pediatr. 2015;166(5):1134–1139. doi: 10.1016/j.jpeds.2015.01.034 [DOI] [PubMed] [Google Scholar]
22.Mahajan P, Basu T, Pai CW, et al. Factors Associated With Potentially Missed Diagnosis of Appendicitis in the Emergency Department. JAMA Netw Open. 2020;3(3):e200612. doi: 10.1001/jamanetworkopen.2020.0612 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Michelson KA, Reeves SD, Grubenhoff JA, et al. Clinical Features and Preventability of Delayed Diagnosis of Pediatric Appendicitis. JAMA Netw Open. 2021;4(8):e2122248. doi: 10.1001/jamanetworkopen.2021.22248 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Staab S, Black T, Leonard J, Bruny J, Bajaj L, Grubenhoff JA. Diagnostic Accuracy of Suspected Appendicitis. Pediatr Emerg Care. 2022;38(2):e690–e696. doi: 10.1097/pec.0000000000002323 [DOI] [PubMed] [Google Scholar]
25.Michelson KA, Bachur RG, Dart AH, et al. Identification of delayed diagnosis of paediatric appendicitis in administrative data: a multicentre retrospective validation study. BMJ Open. 2023;13(2):e064852. doi: 10.1136/bmjopen-2022-064852 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Michelson KA, Bachur RG, Grubenhoff JA, et al. OUTCOMES OF MISSED DIAGNOSIS OF PEDIATRIC APPENDICITIS, NEW-ONSET DIABETIC KETOACIDOSIS, AND SEPSIS IN FIVE PEDIATRIC HOSPITALS. J Emerg Med. 2023;65(1):e9–e18. doi: 10.1016/j.jemermed.2023.04.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Burrus S, Hall M, Tooley E, Conrad K, Bettenhausen JL, Kemper C. Factors Related to Serious Safety Events in a Children’s Hospital Patient Safety Collaborative. Pediatrics. 2021;148(3):e2020030346. doi: 10.1542/peds.2020-030346 [DOI] [PubMed] [Google Scholar]
28.Hoffman JM, Keeling NJ, Forrest CB, et al. Priorities for Pediatric Patient Safety Research. Pediatrics. 2019;143(2):e20180496. doi: 10.1542/peds.2018-0496 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Morgan DJ, Singh H, Srinivasan A, Bradford A, McDonald LC, Kutty PK. CDC’s Core Elements to promote diagnostic excellence. Diagnosis. Published online November 28, 2024. doi: 10.1515/dx-2024-0163 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.National Quality Forum. Improving Diagnostic Quality and Safety Final Report. Published online September 2017. Accessed May 1, 2025. https://www.qualityforum.org/Publications/2017/09/Improving_Diagnostic_Quality_and_Safety_Final_Report.aspx
31.National Quality Forum. Advancing Measurement of Diagnostic Excellence for Better Healthcare. Published online September 2024. Accessed May 1, 2025. https://www.qualityforum.org/ProjectMaterials.aspx?projectID=98290
32.Bradford A, Shahid U, Schiff GD, et al. Development and Usability Testing of the Agency for Healthcare Research and Quality Common Formats to Capture Diagnostic Safety Events. J Patient Saf. 2022;18(6):521–525. doi: 10.1097/PTS.0000000000001006 [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Stockwell DC, Sharek P. Diagnosing diagnostic errors: it’s time to evolve the patient safety research paradigm. BMJ Qual Saf. 2022;31(10):701–703. doi: 10.1136/bmjqs-2021-014517 [DOI] [PubMed] [Google Scholar]
34.Olson APJ, Graber ML. Improving Diagnosis Through Education. Acad Med. 2020;95(8):1162–1165. doi: 10.1097/acm.0000000000003172 [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Prakash S, Sladek RM, Schuwirth L. Interventions to improve diagnostic decision making: A systematic review and meta-analysis on reflective strategies. Med Teach. 2019;41(5):517–524. doi: 10.1080/0142159x.2018.1497786 [DOI] [PubMed] [Google Scholar]
36.Lambe KA, O’Reilly G, Kelly BD, Curristan S. Dual-process cognitive interventions to enhance diagnostic reasoning: a systematic review. BMJ Qual Saf. 2016;25(10):808–820. doi: 10.1136/bmjqs-2015-004417 [DOI] [PubMed] [Google Scholar]
37.Reilly JB, Ogdie AR, Von Feldt JM, Myers JS. Teaching about how doctors think: a longitudinal curriculum in cognitive bias and diagnostic error for residents. BMJ Qual Saf. 2013;22(12):1044–1050. doi: 10.1136/bmjqs-2013-001987 [DOI] [PubMed] [Google Scholar]
38.Ramnarayan P, Cronje N, Brown R, et al. Validation of a diagnostic reminder system in emergency medicine: a multi-centre study. Emerg Med J EMJ. 2007;24(9):619–624. doi: 10.1136/emj.2006.044107 [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Newgard CD, Lin A, Malveau S, et al. Emergency Department Pediatric Readiness and Short-term and Long-term Mortality Among Children Receiving Emergency Care. JAMA Netw Open. 2023;6(1):e2250941. doi: 10.1001/jamanetworkopen.2022.50941 [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Remick KE, Bartley KA, Gonzales L, MacRae KS, Edgerton EA. Consensus-driven model to establish paediatric emergency care measures for low-volume emergency departments. BMJ Open Qual. 2022;11(3):e001803. doi: 10.1136/bmjoq-2021-001803 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Topol EJ. Toward the eradication of medical diagnostic errors. Science. 2024;383(6681):eadn9602. doi: 10.1126/science.adn9602 [DOI] [PubMed] [Google Scholar]
42.Castaneda C, Nalley K, Mannion C, et al. Clinical decision support systems for improving diagnostic accuracy and achieving precision medicine. J Clin Bioinforma. 2015;5(1):4. doi: 10.1186/s13336-015-0019-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Croskerry P The Rational Diagnostician and Achieving Diagnostic Excellence. JAMA. 2022;327(4):317–318. doi: 10.1001/jama.2021.24988 [DOI] [PubMed] [Google Scholar]
44.Sibbald M, Monteiro S, Sherbino J, LoGiudice A, Friedman C, Norman G. Should electronic differential diagnosis support be used early or late in the diagnostic process? A multicentre experimental study of Isabel. BMJ Qual Saf. 2021;31(6):bmjqs-2021–013493. doi: 10.1136/bmjqs-2021-013493 [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Dave N, Bui S, Morgan C, Hickey S, Paul CL. Interventions targeted at reducing diagnostic error: systematic review. BMJ Qual Saf. 2022;31(4):297–307. doi: 10.1136/bmjqs-2020-012704 [DOI] [PubMed] [Google Scholar]
46.Callahan CW, Malone F, Estroff D, Person DA. Effectiveness of an Internet-Based Store-and-Forward Telemedicine System for Pediatric Subspecialty Consultation. Arch Pediatr Adolesc Med. 2005;159(4):389–393. doi: 10.1001/archpedi.159.4.389 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Material

NIHMS2101426-supplement-Supp_Material.docx^{(29.3KB, docx)}

[R1] 1.Marshall TL, Rinke ML, Olson APJ, Brady PW. Diagnostic Error in Pediatrics: A Narrative Review. Pediatrics. 2022;149(Supplement 3). doi: 10.1542/peds.2020-045948d [DOI] [PubMed] [Google Scholar]

[R2] 2.Committee on Diagnostic Error in Health Care, Board on Health Care Services, Institute of Medicine, The National Academies of Sciences, Engineering, and Medicine. Improving Diagnosis in Health Care. (Balogh EP, Miller BT, Ball JR, eds.). National Academies Press; 2015:21794. doi: 10.17226/21794 [DOI] [PubMed] [Google Scholar]

[R3] 3.Singh H, Meyer AND, Thomas EJ. The frequency of diagnostic errors in outpatient care: estimations from three large observational studies involving US adult populations. BMJ Qual Saf. 2014;23(9):727. doi: 10.1136/bmjqs-2013-002627 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Mahajan P, White E, Shaw K, et al. Epidemiology of diagnostic errors in pediatric emergency departments using electronic triggers. Acad Emerg Med. 2025;32(3):226–245. doi: 10.1111/acem.15087 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Remick KE, Hewes HA, Ely M, et al. National Assessment of Pediatric Readiness of US Emergency Departments During the COVID-19 Pandemic. JAMA Netw Open. 2023;6(7):e2321707. doi: 10.1001/jamanetworkopen.2023.21707 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Michelson KA, Rees CA, Florin TA, Bachur RG. Emergency Department Volume and Delayed Diagnosis of Serious Pediatric Conditions. JAMA Pediatr. 2024;178(4). doi: 10.1001/jamapediatrics.2023.6672 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Yang D, Fineberg HV, Cosby K. Diagnostic Excellence. JAMA. 2021;326(19):1905. doi: 10.1001/jama.2021.19493 [DOI] [PubMed] [Google Scholar]

[R8] 8.Bradford A, Tran A, Ali KJ, et al. Evaluation of Measure Dx, a Resource to Accelerate Diagnostic Safety Learning and Improvement. J Gen Intern Med. 2025;40(4):782–789. doi: 10.1007/s11606-024-09132-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Singh H, Sittig DF. Advancing the science of measurement of diagnostic errors in healthcare: the Safer Dx framework. BMJ Qual Saf. 2015;24(2):103. doi: 10.1136/bmjqs-2014-003675 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Singh H, Khanna A, Spitzmueller C, Meyer AND. Recommendations for using the Revised Safer Dx Instrument to help measure and improve diagnostic safety. Diagnosis. 2019;6(4):315–323. doi: 10.1515/dx-2019-0012 [DOI] [PubMed] [Google Scholar]

[R11] 11.Lam D, Dominguez F, Leonard J, Wiersma A, Grubenhoff JA. Use of e-triggers to identify diagnostic errors in the paediatric ED. BMJ Qual Saf. 2022;31(10):735–743. doi: 10.1136/bmjqs-2021-013683 [DOI] [PubMed] [Google Scholar]

[R12] 12.Vaghani V, Gupta A, Mir U, et al. Implementation of Electronic Triggers to Identify Diagnostic Errors in Emergency Departments. JAMA Intern Med. 2025;185(2):143. doi: 10.1001/jamainternmed.2024.6214 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Campione Russo A, Tilly J, Kaufman L, et al. Hospital commitments to address diagnostic errors: An assessment of 95 US hospitals. J Hosp Med. 2025;20(2):120–134. doi: 10.1002/jhm.13485 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Michelson KA, Bachur RG. The High Value of Blurry Data in Improving Pediatric Emergency Care. Hosp Pediatr. 2019;9(12):1007–1009. doi: 10.1542/hpeds.2019-0200 [DOI] [PubMed] [Google Scholar]

[R15] 15.Michelson KA, Bachur RG, Rangel SJ, Monuteaux MC, Mahajan P, Finkelstein JA. Emergency Department Volume and Delayed Diagnosis of Pediatric Appendicitis: A Retrospective Cohort Study. Ann Surg. 2023;278(6):833–838. doi: 10.1097/SLA.0000000000005972 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Citrome L, Ketter TA. When does a difference make a difference? Interpretation of number needed to treat, number needed to harm, and likelihood to be helped or harmed. Int J Clin Pract. 2013;67(5):407–411. doi: 10.1111/ijcp.12142 [DOI] [PubMed] [Google Scholar]

[R17] 17.Lumley T. Analysis of Complex Survey Samples. J Stat Softw. 2004;9(8):1–19. doi: 10.18637/jss.v009.i08 [DOI] [Google Scholar]

[R18] 18.Cotton DM, Vinson DR, Vazquez-Benitez G, et al. Validation of the Pediatric Appendicitis Risk Calculator (pARC) in a Community Emergency Department Setting. Ann Emerg Med. 2019;74(4):471–480. doi: 10.1016/j.annemergmed.2019.04.023 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Kharbanda AB, Vazquez-Benitez G, Ballard DW, et al. Effect of Clinical Decision Support on Diagnostic Imaging for Pediatric Appendicitis. JAMA Netw Open. 2021;4(2):e2036344. doi: 10.1001/jamanetworkopen.2020.36344 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Bachur RG, Levy JA, Callahan MJ, Rangel SJ, Monuteaux MC. Effect of Reduction in the Use of Computed Tomography on Clinical Outcomes of Appendicitis. JAMA Pediatr. 2015;169(8):755–760. doi: 10.1001/jamapediatrics.2015.0479 [DOI] [PubMed] [Google Scholar]

[R21] 21.Bachur RG, Callahan MJ, Monuteaux MC, Rangel SJ. Integration of Ultrasound Findings and a Clinical Score in the Diagnostic Evaluation of Pediatric Appendicitis. J Pediatr. 2015;166(5):1134–1139. doi: 10.1016/j.jpeds.2015.01.034 [DOI] [PubMed] [Google Scholar]

[R22] 22.Mahajan P, Basu T, Pai CW, et al. Factors Associated With Potentially Missed Diagnosis of Appendicitis in the Emergency Department. JAMA Netw Open. 2020;3(3):e200612. doi: 10.1001/jamanetworkopen.2020.0612 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Michelson KA, Reeves SD, Grubenhoff JA, et al. Clinical Features and Preventability of Delayed Diagnosis of Pediatric Appendicitis. JAMA Netw Open. 2021;4(8):e2122248. doi: 10.1001/jamanetworkopen.2021.22248 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Staab S, Black T, Leonard J, Bruny J, Bajaj L, Grubenhoff JA. Diagnostic Accuracy of Suspected Appendicitis. Pediatr Emerg Care. 2022;38(2):e690–e696. doi: 10.1097/pec.0000000000002323 [DOI] [PubMed] [Google Scholar]

[R25] 25.Michelson KA, Bachur RG, Dart AH, et al. Identification of delayed diagnosis of paediatric appendicitis in administrative data: a multicentre retrospective validation study. BMJ Open. 2023;13(2):e064852. doi: 10.1136/bmjopen-2022-064852 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Michelson KA, Bachur RG, Grubenhoff JA, et al. OUTCOMES OF MISSED DIAGNOSIS OF PEDIATRIC APPENDICITIS, NEW-ONSET DIABETIC KETOACIDOSIS, AND SEPSIS IN FIVE PEDIATRIC HOSPITALS. J Emerg Med. 2023;65(1):e9–e18. doi: 10.1016/j.jemermed.2023.04.006 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Burrus S, Hall M, Tooley E, Conrad K, Bettenhausen JL, Kemper C. Factors Related to Serious Safety Events in a Children’s Hospital Patient Safety Collaborative. Pediatrics. 2021;148(3):e2020030346. doi: 10.1542/peds.2020-030346 [DOI] [PubMed] [Google Scholar]

[R28] 28.Hoffman JM, Keeling NJ, Forrest CB, et al. Priorities for Pediatric Patient Safety Research. Pediatrics. 2019;143(2):e20180496. doi: 10.1542/peds.2018-0496 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Morgan DJ, Singh H, Srinivasan A, Bradford A, McDonald LC, Kutty PK. CDC’s Core Elements to promote diagnostic excellence. Diagnosis. Published online November 28, 2024. doi: 10.1515/dx-2024-0163 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.National Quality Forum. Improving Diagnostic Quality and Safety Final Report. Published online September 2017. Accessed May 1, 2025. https://www.qualityforum.org/Publications/2017/09/Improving_Diagnostic_Quality_and_Safety_Final_Report.aspx

[R31] 31.National Quality Forum. Advancing Measurement of Diagnostic Excellence for Better Healthcare. Published online September 2024. Accessed May 1, 2025. https://www.qualityforum.org/ProjectMaterials.aspx?projectID=98290

[R32] 32.Bradford A, Shahid U, Schiff GD, et al. Development and Usability Testing of the Agency for Healthcare Research and Quality Common Formats to Capture Diagnostic Safety Events. J Patient Saf. 2022;18(6):521–525. doi: 10.1097/PTS.0000000000001006 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Stockwell DC, Sharek P. Diagnosing diagnostic errors: it’s time to evolve the patient safety research paradigm. BMJ Qual Saf. 2022;31(10):701–703. doi: 10.1136/bmjqs-2021-014517 [DOI] [PubMed] [Google Scholar]

[R34] 34.Olson APJ, Graber ML. Improving Diagnosis Through Education. Acad Med. 2020;95(8):1162–1165. doi: 10.1097/acm.0000000000003172 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Prakash S, Sladek RM, Schuwirth L. Interventions to improve diagnostic decision making: A systematic review and meta-analysis on reflective strategies. Med Teach. 2019;41(5):517–524. doi: 10.1080/0142159x.2018.1497786 [DOI] [PubMed] [Google Scholar]

[R36] 36.Lambe KA, O’Reilly G, Kelly BD, Curristan S. Dual-process cognitive interventions to enhance diagnostic reasoning: a systematic review. BMJ Qual Saf. 2016;25(10):808–820. doi: 10.1136/bmjqs-2015-004417 [DOI] [PubMed] [Google Scholar]

[R37] 37.Reilly JB, Ogdie AR, Von Feldt JM, Myers JS. Teaching about how doctors think: a longitudinal curriculum in cognitive bias and diagnostic error for residents. BMJ Qual Saf. 2013;22(12):1044–1050. doi: 10.1136/bmjqs-2013-001987 [DOI] [PubMed] [Google Scholar]

[R38] 38.Ramnarayan P, Cronje N, Brown R, et al. Validation of a diagnostic reminder system in emergency medicine: a multi-centre study. Emerg Med J EMJ. 2007;24(9):619–624. doi: 10.1136/emj.2006.044107 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Newgard CD, Lin A, Malveau S, et al. Emergency Department Pediatric Readiness and Short-term and Long-term Mortality Among Children Receiving Emergency Care. JAMA Netw Open. 2023;6(1):e2250941. doi: 10.1001/jamanetworkopen.2022.50941 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Remick KE, Bartley KA, Gonzales L, MacRae KS, Edgerton EA. Consensus-driven model to establish paediatric emergency care measures for low-volume emergency departments. BMJ Open Qual. 2022;11(3):e001803. doi: 10.1136/bmjoq-2021-001803 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Topol EJ. Toward the eradication of medical diagnostic errors. Science. 2024;383(6681):eadn9602. doi: 10.1126/science.adn9602 [DOI] [PubMed] [Google Scholar]

[R42] 42.Castaneda C, Nalley K, Mannion C, et al. Clinical decision support systems for improving diagnostic accuracy and achieving precision medicine. J Clin Bioinforma. 2015;5(1):4. doi: 10.1186/s13336-015-0019-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Croskerry P The Rational Diagnostician and Achieving Diagnostic Excellence. JAMA. 2022;327(4):317–318. doi: 10.1001/jama.2021.24988 [DOI] [PubMed] [Google Scholar]

[R44] 44.Sibbald M, Monteiro S, Sherbino J, LoGiudice A, Friedman C, Norman G. Should electronic differential diagnosis support be used early or late in the diagnostic process? A multicentre experimental study of Isabel. BMJ Qual Saf. 2021;31(6):bmjqs-2021–013493. doi: 10.1136/bmjqs-2021-013493 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] 45.Dave N, Bui S, Morgan C, Hickey S, Paul CL. Interventions targeted at reducing diagnostic error: systematic review. BMJ Qual Saf. 2022;31(4):297–307. doi: 10.1136/bmjqs-2020-012704 [DOI] [PubMed] [Google Scholar]

[R46] 46.Callahan CW, Malone F, Estroff D, Person DA. Effectiveness of an Internet-Based Store-and-Forward Telemedicine System for Pediatric Subspecialty Consultation. Arch Pediatr Adolesc Med. 2005;159(4):389–393. doi: 10.1001/archpedi.159.4.389 [DOI] [PubMed] [Google Scholar]

PERMALINK

Feasibility of Pediatric Diagnostic Quality Measurement in All United States Hospitals

Kenneth A Michelson, MD, MPH

Joseph A Grubenhoff, MD, MSCS