Disaggregating performance metrics by patients’ race, ethnicity, and language (REaL) can identify health care disparities. However, electronic health record (EHR) data are often inaccurate and not representative of patients’ self-identified REaL, the gold standard.1–3 We assessed REaL data validity in our pediatric emergency department (ED) by comparing EHR-documented to survey-reported REaL data.
Methods
Design and Data
As part of a quality improvement project, we surveyed a convenience sample of 92 patients in our pediatric ED between December 2022 and April 2023. We excluded patients with critical and severe illness, who were ≥18 years of age, or who were lacking any EHR-documented REaL data. We obtained EHR-documented REaL data through chart review. We collected self-reported REaL data via face-to-face surveys of patients’ parents (race and ethnicity reflected patient identities and language reflected parent identities). Survey questions mirrored standardized language intended for use by the registration staff when soliciting REaL data.4
The Supplemental Information details additional survey and institutional REaL data procedures.
Analysis
We reported the number and proportion of patients in each REaL group by EHR-documented and survey-report. Consistent with the practices of our health system and multiple government agencies, we reported a combined race and ethnicity category, grouping Hispanic and Latino patients, regardless of race.5,6 To assess sample representativeness, we compared our sample’s EHR-documented REaL data to 1 year of ED visits. We compared EHR-documented and survey-reported REaL data using χ-square tests and concordance using Cohen’s κ. We cross-tabulated EHR-documented versus survey-reported REaL data to identify patterns of discordance. Columbia University’s Institutional Review Board approved this study.
Results
In comparing our sample’s EHR data to the overall ED population, racial and ethnic distributions were similar but differed by language (P < .001; Table 1).
TABLE 1.
Aggregate Race, Ethnicity, Language, and Combined Race and Ethnicity for Study Sample and 1 y of ED Visits
| Study Sample | All Patients Between June 2021 and June 2022 | ||||
| EHR-Documented vs Survey Report | Sample EHR-Documented vs Total ED Population EHR-Documented | ||||
| Category | EHR-Documented N (%) | Survey Report N (%) | P a | EHR-Documented N (%) | P a |
| Race | <.0001 | .280 | |||
| AAPI | 0 | 1 (1.09) | 140 (0.29) | ||
| AIAN | 0 | 1 (1.09) | 315 (0.65) | ||
| Black | 22 (23.91) | 30 (32.61) | 10 060 (20.68) | ||
| White | 22 (23.91) | 13 (14.13) | 16 915 (34.78) | ||
| Other | 39 (42.39) | 47 (51.09) | 17 774 (36.54) | ||
| Declined/unknown | 9 (9.78) | 0 | 3435 (7.06) | ||
| Ethnicity | <.0001 | .096 | |||
| Hispanic/Latino in origin | 61 (66.30) | 68 (73.91) | 34 028 (69.78) | ||
| Non-Hispanic/Latino in origin | 20 (21.74) | 24 (26.09) | 11 456 (23,49) | ||
| Declined/unknown | 11 (10.89) | 0 | 3122 (6.40) | ||
| Combined race/ethnicityb | <.0001 | .672 | |||
| AAPI | 0 | 1 (1.09) | 272 (0.56) | ||
| AIAN | 0 | 0 | 68 (0.14) | ||
| Black | 14 (15.22) | 15 (16.30) | 6056 (12.45) | ||
| Hispanic/Latino in origin | 61 (66.30) | 68 (73.91) | 34 022 (69.95) | ||
| White | 6 (6.52) | 5 (5.43) | 4353 (8.95) | ||
| Other | 5 (5.43) | 3 (3.26) | 1546 (3.18) | ||
| Declined/unknown | 6 (6.52) | 0 | 2322 (4.77) | ||
| Language | <.0001 | <.0001 | |||
| English | 66 (71.74) | 75 (82.42) | 30 381 (62.30) | ||
| French | 2(2.17) | 1 (1.10) | 53 (0.11) | ||
| Spanish | 22 (23.91) | 15 (16.48) | 15 584 (31.96) | ||
| Other (grouped + other) | 2 (2.17) | 0 | 2577 (5.28) | ||
| Declined/unknown | 0 | 0 | 171 (0.35) | ||
AAPI, Asian American/Pacific Islander; AIAN, American Indian/Alaskan Native.
χ square.
Consistent with the Centers for Disease Control New York State Department of Health’s reporting standards for race and ethnicity, we grouped individuals identifying ethnically as “Hispanic or Latino” in a combined race and ethnicity category (ie, Hispanic/Latino ethnicity overrode the race category).5,6
EHR-documented and survey-reported data were statistically different across each data type (P < .0001). There was 53% (κ 0.30), 85% (κ 0.66), 84% (κ 0.66), and 84% (κ 0.55) agreement between EHR-documented and survey-reported race, ethnicity, combined race and ethnicity, and language data, respectively (Table 2). Most cases of discordance between EHR-documented and survey-reported race, ethnicity, and race and ethnicity included ambiguous data documented in the EHR (“other,” “declined/unknown”), as follows: race (27/43), ethnicity (11/15) and combined race and ethnicity (11/14). Most language discordance occurred between EHR-documented Spanish and survey-reported English (9/15). A sensitivity analysis excluding patients with ambiguous EHR data improved concordance in all categories.
TABLE 2.
Analyses of Concordance and Patterns of Discordance Between EHR-Documented and Survey-Reported REaL Data
| Race | Agreement | Expected Agreement | Kappaa | |||||
| 53.26% | 32.83% | 0.3 | ||||||
| Discordant (N) | Survey-reported race | |||||||
| EHR-documented race | 43/92 (46.74%) | AAPI | AIAN | Black | White | Other | Declined/unknown | |
| AAPI | NR | — | NR | NR | NR | NR | NR | |
| AIAN | NR | NR | — | NR | NR | NR | NR | |
| Black | 5 | 0 | 0 | — | 1 | 4 | 0 | |
| White | 15 | 0 | 0 | 2 | — | 13 | 0 | |
| Other | 14 | 1 | 1 | 7 | 5 | - | 0 | |
| Declined/unknown | 9 | 0 | 0 | 4 | 0 | 5 | — | |
| Ethnicity | Agreement | Expected agreement | Kappaa | |||||
| 84.78% | 54.68% | 0.66 | ||||||
| Discordant (N) | Survey-reported ethnicity | |||||||
| EHR-documented ethnicity | 14/92 (15.22%) | Hispanic/Latino | Non-Hispanic/Latino | Declined/unknown | ||||
| Hispanic/Latino | 1 | — | 1 | 0 | ||||
| Non-Hispanic/Latino | 2 | 2 | — | 0 | ||||
| Declined/unknown | 11 | 6 | 5 | — | ||||
| Language | Agreement | Expected agreement | Kappaa | |||||
| 83.52% | 63.60% | 0.55 | ||||||
| Discordant (N) | Survey-reported language | |||||||
| EHR-documented language | 15/92 (16.30%) | English | French | Spanish | Other | |||
| English | 3 | — | 0 | 3 | 0 | |||
| French | 1 | 1 | — | 0 | 0 | |||
| Spanish | 9 | 9 | 0 | — | 0 | |||
| Other | 2 | 2 | 0 | 0 | — | |||
| Combined race/ethnicity | Agreement | Expected agreement | Kappaa | |||||
| 83.70% | 52.02% | 0.66 | ||||||
| Discordant (N) | Survey-reported combined race/ethnicity | |||||||
| EHR-documented combined race/ethnicity | 15/92 (16.30%) | AAPI | Black | Hispanic/Latino in Origin | White | Other | Declined/unknown | |
| AAPI | NR | — | NR | NR | NR | NR | NR | |
| Black | 2 | 0 | — | 1 | 0 | 1 | 0 | |
| Hispanic/Latino | 1 | 0 | 1 | — | 0 | 0 | 0 | |
| White | 2 | 0 | 0 | 2 | — | 0 | 0 | |
| Other | 4 | 1 | 0 | 2 | 1 | — | 0 | |
| Declined/unknown | 6 | 0 | 2 | 3 | 0 | 1 | — | |
AAPI, Asian American/Pacific Islander; AIAN, American Indian/Alaskan Native; NR, not reportable (no patients documented as this identify in the EHR); —, cells represent cases in which EHR and survey-reported data were concordant.
Table 2 includes calculated agreement, expected agreement and κ value between EHR-documented and survey-reported data for each demographic category. Below these data for each category, the number discordant out of all cases (n = 92) is listed. This number of discordant cases is broken down vertically by each individual race, ethnicity, or language, as documented in the EHR. Each of these subsets is horizontally stratified by survey-reported discordant response, with listed percentages representing the proportion of discordance in each EHR-recorded individual race, ethnicity, or language that was due to the corresponding race, ethnicity, or language indicated via survey.
Sensitivity analyses assessing concordance of survey-reported and EHR-reported data were performed with those patients entered as declined/unknown in the EHR removed. Removing these patients showed increased concordance across all categories, with Kappas values of race = 0.36, ethnicity = 0.90, and combined race and ethnicity = 0.76. There were no declined or unknown entries in the EHR for language.
Discussion
Identifying and reducing health care disparities requires valid REaL data. In our study, EHR-documented and survey-reported REaL data concordance varied; concordance was lowest for race (53%) and improved with a combined race and ethnicity category (85%). Most race and ethnicity discordance occurred when ambiguous (unknown, declined, or other) data in the EHR conflicted with a specific identity listed in the survey. Most language discordances included EHR-documented Spanish and survey-reported English.
We cannot infer the etiologies of REaL data discordance from our study. However, for our hospital’s majority Hispanic and Latino population, discordance may represent malalignment between REaL categories and patients’ identities; specifically, Latino patients may feel a stronger affiliation to the “Hispanic/Latino” ethnicity than racial categories, prompting a choice of “other/declined” race. Our data and the literature suggest that combined race and ethnicity categories may better represent the identities of Latino patients,7 despite the sacrifice of specificity and nuance.8
Beyond the Latino population, discordant data may reflect inadequate data collection practices amenable to future quality improvement. For example, reducing ambiguous data in the EHR represents a key target to improve discordance. However, discordance may also reflect natural variation in REaL identities as children age and variation in caregivers’ preferred language by clinical scenario or the caregiver present.
No standard or benchmark rate of EHR versus self-reported REaL data concordance exists.3 However, our results reveal similar or improved concordance compared with the existing literature.8 Best practices for REaL data collection are a top disparities research priority and a topic of national discussion with practices currently in flux.9 National surveys now include combined, granular race and ethnicity categories focused on ancestry (eg, Hispanic Mexican, Cuban, Filipino)10; the Oregon Health Authority permits multiracial constituents to indicate all applicable races and ethnicities.10
Study limitations include constraints related to data practices (eg, do not report multiracial identities, cannot account for discrepancies between parent-reported and patient-reported REaL) and nonresponse bias. We used a convenience sample with an unknown response rate (estimate ≤5 patients declined). Institutional patient demographics and REaL data collection practices vary, limiting generalizability.
Ensuring REaL data validity is a foundational step toward reliably identifying and addressing institutional health care disparities. Our institution’s REaL data can be strengthened further through efforts to reduce the use of ambiguous data.
Supplementary Material
Acknowledgments
The authors would like to acknowledge Dr Sarah Schechter, who contributed to study design and power calculations, and the New York-Presbyterian Dalio Center for Health Justice for implementing the “We Ask Because We Care” campaign, an initiative to train front-line staff in a standardized approach to asking patients about their self-identified race, ethnicity, and language data.
Footnotes
Dr DeLong conceptualized and designed the study, wrote the data collection survey, performed in-person patient surveys and chart review, and drafted the initial manuscript; Dr Bregstein conceptualized and designed the study and wrote the data collection survey; Drs Steinberg and Apfel performed in-person patient surveys and provided feedback on the design of data collection methods; Drs Brachio, Schlosser Metitiri, Meyer, and Pincus contributed to the conceptualization of the study; Dr Nash conceptualized and designed the study and performed data analyses; and all authors contributed to the critical review and revision of the manuscript for intellectual content, granted final approval of the version to be published, and are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
COMPANION PAPER: A companion to this article can be found online at www.hosppeds.org/cgi/doi/10.1542/hpeds.2024-007843.
FUNDING: Funded by the National Institutes of Health (NIH). This work was supported by the National Center for Advancing Translational Sciences, NIH, through grant UL1TR001873. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
CONFLICT OF INTEREST DISCLOSURES: Dr Pincus is on the Clinical Advisory Board for AbleTo and is a consultant for the National Committee for Quality Assurance. The other authors have indicated they have no potential conflicts of interest relevant to this article to disclose.
References
- 1.Chin MH. Using patient race, ethnicity, and language data to achieve health equity. J Gen Intern Med. 2015;30(6):703–705 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Cowden JD, Flores G, Chow T, et al. Variability in collection and use of race/ethnicity and language data in 93 pediatric hospitals. J Racial Ethn Health Disparities. 2020;7(5):928–936 [DOI] [PubMed] [Google Scholar]
- 3.Vedelli JKH, Azizi Z, Anand KJS. Missing race and ethnicity data in pediatric studies. JAMA Pediatr. 2023;178(1):6–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Shapiro A, Meyer D, Riley L, Kurtz B, Barchi D; NEJM Catalyst. Building the foundations for equitable care. Available at: https://catalyst.nejm.org/doi/full/10.1056/CAT.21.0256. Accessed January 28, 2024
- 5.Yoon P, Hall J, Fuld J, et al. Alternative methods for grouping race and ethnicity to monitor COVID-19 outcomes and vaccination coverage. MMWR Morb Mortal Wkly Rep. 2021;70(32):1075–1080 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Jones N, Marks R, Ramirez R, Ríos-Vargas M; Census.gov. 2020 Census illuminates racial and ethnic composition of the country. Available at: https://www.census.gov/library/stories/2021/08/improved-race-ethnicity-measures-reveal-united-states-population-much-more-multiracial.html. Accessed January 28, 2024
- 7.Noe-Bustamante L, Gonzalez-Barrera A, Edwards K, et al. ; PewResearch.org. Measuring the racial identity of Latinos. Available at: https://www.pewresearch.org/hispanic/2021/11/04/measuring-the-racial-identity-of-latinos/. Accessed July 5, 2023
- 8.Gutman CK, Lion KC, Waidner L, et al. Gaps in the identification of child race and ethnicity in a pediatric emergency department. West J Emerg Med. 2023;24(3):547–551 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Portillo EN, Rees CA, Hartford EA, et al. Research priorities for pediatric emergency care to address disparities by race, ethnicity, and language. JAMA Netw Open. 2023;6(11):e2343791. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Agency for Healthcare Research and Quality. Defining categorization needs for race and ethnicity data. Available at: https://www.ahrq.gov/research/findings/final-reports/iomracereport/reldata3.html. Accessed January 28, 2024
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
