Skip to main content
American Academy of Pediatrics Selective Deposit logoLink to American Academy of Pediatrics Selective Deposit
. 2024 Aug 19;14(9):e399–e402. doi: 10.1542/hpeds.2024-007774

Validation of Patients’ Race, Ethnicity, and Language Data in a Pediatric Emergency Department

Adam DeLong a,, Joan Bregstein b, Danielle Steinberg a, Gabriel Apfel a, Sandhya S Brachio c, Katherine R Schlosser Metitiri a, Dodi Meyer d, Harold Pincus e, Katherine A Nash a
PMCID: PMC11358592  PMID: 39155746

Disaggregating performance metrics by patients’ race, ethnicity, and language (REaL) can identify health care disparities. However, electronic health record (EHR) data are often inaccurate and not representative of patients’ self-identified REaL, the gold standard.1–3 We assessed REaL data validity in our pediatric emergency department (ED) by comparing EHR-documented to survey-reported REaL data.

Methods

Design and Data

As part of a quality improvement project, we surveyed a convenience sample of 92 patients in our pediatric ED between December 2022 and April 2023. We excluded patients with critical and severe illness, who were ≥18 years of age, or who were lacking any EHR-documented REaL data. We obtained EHR-documented REaL data through chart review. We collected self-reported REaL data via face-to-face surveys of patients’ parents (race and ethnicity reflected patient identities and language reflected parent identities). Survey questions mirrored standardized language intended for use by the registration staff when soliciting REaL data.4

The Supplemental Information details additional survey and institutional REaL data procedures.

Analysis

We reported the number and proportion of patients in each REaL group by EHR-documented and survey-report. Consistent with the practices of our health system and multiple government agencies, we reported a combined race and ethnicity category, grouping Hispanic and Latino patients, regardless of race.5,6 To assess sample representativeness, we compared our sample’s EHR-documented REaL data to 1 year of ED visits. We compared EHR-documented and survey-reported REaL data using χ-square tests and concordance using Cohen’s κ. We cross-tabulated EHR-documented versus survey-reported REaL data to identify patterns of discordance. Columbia University’s Institutional Review Board approved this study.

Results

In comparing our sample’s EHR data to the overall ED population, racial and ethnic distributions were similar but differed by language (P < .001; Table 1).

TABLE 1.

Aggregate Race, Ethnicity, Language, and Combined Race and Ethnicity for Study Sample and 1 y of ED Visits

Study Sample All Patients Between June 2021 and June 2022
EHR-Documented vs Survey Report Sample EHR-Documented vs Total ED Population EHR-Documented
Category EHR-Documented N (%) Survey Report N (%) P a EHR-Documented N (%) P a
Race <.0001 .280
 AAPI 0 1 (1.09) 140 (0.29)
 AIAN 0 1 (1.09) 315 (0.65)
 Black 22 (23.91) 30 (32.61) 10 060 (20.68)
 White 22 (23.91) 13 (14.13) 16 915 (34.78)
 Other 39 (42.39) 47 (51.09) 17 774 (36.54)
 Declined/unknown 9 (9.78) 0 3435 (7.06)
Ethnicity <.0001 .096
 Hispanic/Latino in origin 61 (66.30) 68 (73.91) 34 028 (69.78)
 Non-Hispanic/Latino in origin 20 (21.74) 24 (26.09) 11 456 (23,49)
 Declined/unknown 11 (10.89) 0 3122 (6.40)
Combined race/ethnicityb <.0001 .672
 AAPI 0 1 (1.09) 272 (0.56)
 AIAN 0 0 68 (0.14)
 Black 14 (15.22) 15 (16.30) 6056 (12.45)
 Hispanic/Latino in origin 61 (66.30) 68 (73.91) 34 022 (69.95)
 White 6 (6.52) 5 (5.43) 4353 (8.95)
 Other 5 (5.43) 3 (3.26) 1546 (3.18)
 Declined/unknown 6 (6.52) 0 2322 (4.77)
Language <.0001 <.0001
 English 66 (71.74) 75 (82.42) 30 381 (62.30)
 French 2(2.17) 1 (1.10) 53 (0.11)
 Spanish 22 (23.91) 15 (16.48) 15 584 (31.96)
 Other (grouped + other) 2 (2.17) 0 2577 (5.28)
 Declined/unknown 0 0 171 (0.35)

AAPI, Asian American/Pacific Islander; AIAN, American Indian/Alaskan Native.

a

χ square.

b

Consistent with the Centers for Disease Control New York State Department of Health’s reporting standards for race and ethnicity, we grouped individuals identifying ethnically as “Hispanic or Latino” in a combined race and ethnicity category (ie, Hispanic/Latino ethnicity overrode the race category).5,6

EHR-documented and survey-reported data were statistically different across each data type (P < .0001). There was 53% (κ 0.30), 85% (κ 0.66), 84% (κ 0.66), and 84% (κ 0.55) agreement between EHR-documented and survey-reported race, ethnicity, combined race and ethnicity, and language data, respectively (Table 2). Most cases of discordance between EHR-documented and survey-reported race, ethnicity, and race and ethnicity included ambiguous data documented in the EHR (“other,” “declined/unknown”), as follows: race (27/43), ethnicity (11/15) and combined race and ethnicity (11/14). Most language discordance occurred between EHR-documented Spanish and survey-reported English (9/15). A sensitivity analysis excluding patients with ambiguous EHR data improved concordance in all categories.

TABLE 2.

Analyses of Concordance and Patterns of Discordance Between EHR-Documented and Survey-Reported REaL Data

Race Agreement Expected Agreement Kappaa
53.26% 32.83% 0.3
Discordant (N) Survey-reported race
EHR-documented race 43/92 (46.74%) AAPI AIAN Black White Other Declined/unknown
AAPI NR NR NR NR NR NR
AIAN NR NR NR NR NR NR
Black 5 0 0 1 4 0
White 15 0 0 2 13 0
Other 14 1 1 7 5 - 0
Declined/unknown 9 0 0 4 0 5
Ethnicity Agreement Expected agreement Kappaa
84.78% 54.68% 0.66
Discordant (N) Survey-reported ethnicity
EHR-documented ethnicity 14/92 (15.22%) Hispanic/Latino Non-Hispanic/Latino Declined/unknown
Hispanic/Latino 1 1 0
Non-Hispanic/Latino 2 2 0
Declined/unknown 11 6 5
Language Agreement Expected agreement Kappaa
83.52% 63.60% 0.55
Discordant (N) Survey-reported language
EHR-documented language 15/92 (16.30%) English French Spanish Other
English 3 0 3 0
French 1 1 0 0
Spanish 9 9 0 0
Other 2 2 0 0
Combined race/ethnicity Agreement Expected agreement Kappaa
83.70% 52.02% 0.66
Discordant (N) Survey-reported combined race/ethnicity
EHR-documented combined race/ethnicity 15/92 (16.30%) AAPI Black Hispanic/Latino in Origin White Other Declined/unknown
AAPI NR NR NR NR NR NR
Black 2 0 1 0 1 0
Hispanic/Latino 1 0 1 0 0 0
White 2 0 0 2 0 0
Other 4 1 0 2 1 0
Declined/unknown 6 0 2 3 0 1

AAPI, Asian American/Pacific Islander; AIAN, American Indian/Alaskan Native; NR, not reportable (no patients documented as this identify in the EHR); —, cells represent cases in which EHR and survey-reported data were concordant.

Table 2 includes calculated agreement, expected agreement and κ value between EHR-documented and survey-reported data for each demographic category. Below these data for each category, the number discordant out of all cases (n = 92) is listed. This number of discordant cases is broken down vertically by each individual race, ethnicity, or language, as documented in the EHR. Each of these subsets is horizontally stratified by survey-reported discordant response, with listed percentages representing the proportion of discordance in each EHR-recorded individual race, ethnicity, or language that was due to the corresponding race, ethnicity, or language indicated via survey.

a

Sensitivity analyses assessing concordance of survey-reported and EHR-reported data were performed with those patients entered as declined/unknown in the EHR removed. Removing these patients showed increased concordance across all categories, with Kappas values of race = 0.36, ethnicity = 0.90, and combined race and ethnicity = 0.76. There were no declined or unknown entries in the EHR for language.

Discussion

Identifying and reducing health care disparities requires valid REaL data. In our study, EHR-documented and survey-reported REaL data concordance varied; concordance was lowest for race (53%) and improved with a combined race and ethnicity category (85%). Most race and ethnicity discordance occurred when ambiguous (unknown, declined, or other) data in the EHR conflicted with a specific identity listed in the survey. Most language discordances included EHR-documented Spanish and survey-reported English.

We cannot infer the etiologies of REaL data discordance from our study. However, for our hospital’s majority Hispanic and Latino population, discordance may represent malalignment between REaL categories and patients’ identities; specifically, Latino patients may feel a stronger affiliation to the “Hispanic/Latino” ethnicity than racial categories, prompting a choice of “other/declined” race. Our data and the literature suggest that combined race and ethnicity categories may better represent the identities of Latino patients,7 despite the sacrifice of specificity and nuance.8

Beyond the Latino population, discordant data may reflect inadequate data collection practices amenable to future quality improvement. For example, reducing ambiguous data in the EHR represents a key target to improve discordance. However, discordance may also reflect natural variation in REaL identities as children age and variation in caregivers’ preferred language by clinical scenario or the caregiver present.

No standard or benchmark rate of EHR versus self-reported REaL data concordance exists.3 However, our results reveal similar or improved concordance compared with the existing literature.8 Best practices for REaL data collection are a top disparities research priority and a topic of national discussion with practices currently in flux.9 National surveys now include combined, granular race and ethnicity categories focused on ancestry (eg, Hispanic Mexican, Cuban, Filipino)10; the Oregon Health Authority permits multiracial constituents to indicate all applicable races and ethnicities.10

Study limitations include constraints related to data practices (eg, do not report multiracial identities, cannot account for discrepancies between parent-reported and patient-reported REaL) and nonresponse bias. We used a convenience sample with an unknown response rate (estimate ≤5 patients declined). Institutional patient demographics and REaL data collection practices vary, limiting generalizability.

Ensuring REaL data validity is a foundational step toward reliably identifying and addressing institutional health care disparities. Our institution’s REaL data can be strengthened further through efforts to reduce the use of ambiguous data.

Supplementary Material

Supplemental Information

Acknowledgments

The authors would like to acknowledge Dr Sarah Schechter, who contributed to study design and power calculations, and the New York-Presbyterian Dalio Center for Health Justice for implementing the “We Ask Because We Care” campaign, an initiative to train front-line staff in a standardized approach to asking patients about their self-identified race, ethnicity, and language data.

Footnotes

Dr DeLong conceptualized and designed the study, wrote the data collection survey, performed in-person patient surveys and chart review, and drafted the initial manuscript; Dr Bregstein conceptualized and designed the study and wrote the data collection survey; Drs Steinberg and Apfel performed in-person patient surveys and provided feedback on the design of data collection methods; Drs Brachio, Schlosser Metitiri, Meyer, and Pincus contributed to the conceptualization of the study; Dr Nash conceptualized and designed the study and performed data analyses; and all authors contributed to the critical review and revision of the manuscript for intellectual content, granted final approval of the version to be published, and are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

COMPANION PAPER: A companion to this article can be found online at www.hosppeds.org/cgi/doi/10.1542/hpeds.2024-007843.

FUNDING: Funded by the National Institutes of Health (NIH). This work was supported by the National Center for Advancing Translational Sciences, NIH, through grant UL1TR001873. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

CONFLICT OF INTEREST DISCLOSURES: Dr Pincus is on the Clinical Advisory Board for AbleTo and is a consultant for the National Committee for Quality Assurance. The other authors have indicated they have no potential conflicts of interest relevant to this article to disclose.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Information

Articles from Hospital Pediatrics are provided here courtesy of American Academy of Pediatrics

RESOURCES