Skip to main content
Rand Health Quarterly logoLink to Rand Health Quarterly
. 2022 Nov 14;10(1):4.

Imputation of Race and Ethnicity in Health Insurance Marketplace Enrollment Data, 2015–2022 Open Enrollment Periods

Melony E Sorbero, Roald Euller, Aaron Kofner, Marc N Elliott
PMCID: PMC9718056  PMID: 36484074

Short abstract

Information on the race and ethnicity of individuals enrolled through the Health Insurance Marketplace is critical for assessing past enrollment efforts and determining whether outreach campaigns should be modified. This article presents the results of imputing race and ethnicity for HealthCare.gov Marketplace enrollees from 2015 through 2022 with an approach that uses surnames, first names, and addresses to estimate race and ethnicity.

Keywords: Health Care Access, Health Equity, Health Insurance Markets

Abstract

Information on the race and ethnicity of individuals enrolled through the HealthCare.gov Health Insurance Marketplace is critical for assessing past enrollment efforts and determining whether outreach campaigns should be modified or tailored moving forward. However, approximately one-third of insurance applicants do not complete the race and Hispanic ethnicity questions on the Marketplace application. When self-reported race and ethnicity information is missing, other information about an individual can be used to infer race and ethnicity, such as surnames, first names, and addresses, with each characteristic contributing meaningfully to the identification of six mutually exclusive racial and ethnic groups: American Indian (AI)/Alaskan Native (AN); Asian American, Native Hawaiian, and Pacific Islander (AANHPI); Black; Hispanic; Multiracial; and White. Surnames are particularly useful for distinguishing people who identify as Hispanic and AANHPI from other racial and ethnic groups. Geocoded address information is particularly useful in distinguishing Black and White individuals who frequently reside in racially segregated neighborhoods.

This article presents the results of imputing race and ethnicity for Marketplace enrollees from 2015 through 2022 using the modified Bayesian Improved First Name Surname and Geocoding (BIFSG) method, developed by the RAND Corporation, which uses surnames, first names, and residential addresses to indirectly estimate race and ethnicity.


Information on the race and ethnicity of individuals enrolling through the Health Insurance Marketplaces is critical for assessing past enrollment efforts, determining whether outreach campaigns should be modified or tailored moving forward, and identifying where to target outreach activities. However, approximately one-third of insurance applicants do not complete the race and Hispanic ethnicity questions on the Marketplace application.

The RAND Corporation's modified Bayesian Improved First Name Surname and Geocoding (BIFSG) method uses surnames, first names, and residential addresses to indirectly estimate race and ethnicity. We used 2015–2022 data from the Centers for Medicare and Medicaid Services Multidimensional Insurance Data Analytics System (MIDAS), which contains person-year level data for Marketplace enrollees. The surname and first name for each individual were used to estimate initial probabilities for each of the six mutually exclusive racial and ethnic groups: American Indian/Alaskan Native (AI/AN); Asian American, Native Hawaiian, and Pacific Islander (AANHPI); Black; Hispanic; Multiracial; and White. Geocoded address information was used to refine these estimations and generate final probabilities.

Self-reported race and ethnicity were missing for 32.5 percent of the 71,610,609 records across the eight years of MIDAS enrollment data (2015 through 2022). Using enrollees’ records from other years to replace missing race and ethnicity reduced the level of missingness to 23.5 percent. Enrollees who self-reported race and ethnicity were more likely to be AANHPI than nonreporting enrollees for whom race and ethnicity were imputed (9.4 percent versus 6.7 percent) or White (59.5 percent versus 49.3 percent) and less likely to be Black (10.9 percent versus 15.7 percent) or Hispanic (17.9 percent versus 26.1 percent). When combining self-reported race and ethnicity data with the imputed race and ethnicity probabilities for enrollees who did not report their race and ethnicity, we estimated that 8.7 percent of Marketplace enrollees were AANHPI; 0.6 percent were AI/AN; 12.0 percent were Black; 19.8 percent were Hispanic; 1.8 percent were Multiracial; and 57.1 percent were White.

Based on conventional standards for C-statistics, the ability of the modified BIFSG to differentiate AANHPI, Black, Hispanic, and White enrollees from other groups was “excellent.” It did not reach an “acceptable” level for AI/AN or Multiracial enrollees.1 Currently, we recommend that modified BIFSG-imputed race and ethnicity not be used to make inferences about AI/AN or Multiracial enrollees.

Notes

1

A C-statistic of 0.7 is considered “acceptable”; 0.8 is considered “strong” (Hosmer and Lemeshow, 2000); and 0.9 or higher is considered “excellent” (per authors).

This research was funded by the Office of the Assistant Secretary for Planning and Evaluation and carried out within the Payment, Cost, and Coverage Program in RAND Health Care.

References

  1. Hosmer D. W., Lemeshow S. Applied Logistic Regression. 2nd ed. New York: Wiley-Interscience Publication; 2000. , , , : , . [Google Scholar]

Articles from Rand Health Quarterly are provided here courtesy of The RAND Corporation

RESOURCES