Skip to main content
American Journal of Public Health logoLink to American Journal of Public Health
editorial
. 2023 Dec;113(12):1278–1282. doi: 10.2105/AJPH.2023.307465

The Federal Agencies’ Hidden Efforts to Produce Equitable Data

Amy O’Hara 1,, Rosemary Rhodes 1
PMCID: PMC10632842  PMID: 37939342

The COVID-19 pandemic illustrated the US government’s inability to effectively respond to a public health emergency in a timely manner. Data quality and availability issues hindered efforts to understand disparities in health outcomes, such as the disproportionate impact of the pandemic on Black and Latino populations. Despite the Centers for Disease Control and Prevention’s temporary authorization to gather and share data, inadequate capacity and infrastructure resulted in delayed detection and response, which cost millions of American lives.

The public health data ecosystem lacked methods to normalize and synthesize information from thousands of outdated systems and standards to aid in that normalization. Legal, political, and cultural barriers prevented information sharing. However, efforts to address these obstacles are underway, with multiple agencies working to remedy the lack of relevant and timely data for all socio-demographic groups. We highlight a selection of federal initiatives that are investing in equitable data.

IMPROVING DEMOGRAPHIC DATA

Many federal agencies are actively working on equity issues and complying with executive orders1,2 to advance racial equity and support for underserved communities. This section focuses on federal initiatives involving data collection, particularly activities in the US Department of Health and Human Services (HHS), Census Bureau, and Office of Management and Budget (OMB) to improve demographic data collection and standardization.

Through the Equitable Data Working Group,3 chaired by the OMB, agencies are improving methods to collect data that can be disaggregated across population groups and relevant geographies. For the first time since 1997, OMB plans to revise Statistical Policy Directive No. 15: Standards for Maintaining, Collecting, and Presenting Federal Data on Race and Ethnicity to ensure that federal agencies are collecting comparable, representative data. As OMB notes, there have been “large societal, political, economic, and demographic shifts in the United States” during the last 25 years, necessitating a review of the standards4 to reflect increased racial and ethnic diversity in the United States and to accurately depict growing numbers of people who identify as more than one racial or ethnic category.

The Census Bureau has collected data on race since the first census in 1790 and on Hispanic or Latino origin since 1970. The Census Bureau regularly assesses the quality and completeness of its data. Nine million people reported multiple race groups in 2010; that number rose to 33.8 million in 2020. Under the current standards, race is collected separately from ethnicity. Research indicates that a growing number of Hispanic respondents are not selecting a race or choosing “Some Other Race.” This was confirmed in the 2020 Census, when nearly 50 million people identified as “Some Other Race” (45.3 million people of Hispanic or Latino origin were classified as “Some Other Race” either alone or in combination, compared with only 4.6 million people who were not of Hispanic or Latino origin). Census Bureau research has demonstrated that more specific and accurate responses are obtained when using a “combined question” that asks what categories describe a person. Such question changes can ensure that our increasingly diverse population is accurately reflected in data.

Statistical Policy Directive No. 15 review is also considering adding a new minimum race category for Middle Eastern or North African (MENA). Adding a MENA race category5 will give people descending from 22 Arab countries, three non-Arab MENA countries, and three transnational communities visibility in data. MENA data could then be used in health and social science research, including research to improve immigrant and language services, and hate crime reporting.

Administrative data and electronic health records also have data collection problems for race and ethnicity. In the public health sphere, 70% of fields for race and ethnicity are missing on electronic health records.6 State data systems are missing race and ethnicity data for large fractions of their client records as well. For example, data on race and ethnicity are regularly missing in state eligibility and claims systems, limiting the Centers for Medicare and Medicaid Services’ (CMS’s) ability to assess program utilization and characteristics of Medicaid enrollees. A recent Census Bureau study found that 19% of Medicaid beneficiaries were missing race data overall, but the missingness varied widely by states. This report, “Enhancing Race and Ethnicity Information in Medicaid Data,”7 illustrated how decennial census and American Community Survey data can fill in the missing information to produce statistics, reducing the percentage with missing information from 19% to 7%. Such efforts will address a growing data gap: a 2022 CMS report noted that in 2016, seven states reported missing race/ethnicity information for 50% or more of their beneficiaries.

Neglecting to correctly measure and capture an increasingly diverse population undermines ongoing federal efforts to achieve health equity. For example, a recent Kaiser Family Foundation study8 found that when Asian subcategories are aggregated, the population may appear to fare better than White populations across various indicators. The report highlights the variation in the uninsured rate between Asian and Native Hawaiian and other Pacific Islander (NHOPI) subgroups, finding that NHOPI subgroups are more likely to be uninsured compared with the Asian population—a finding that is lost when the Asian subcategories are combined. Broad racial aggregation can also distort other variations in economic and health outcomes, which can lead to federal programs and policies inadvertently misallocating investment into less needy groups.

Improvements to demographic data extend beyond race and ethnicity, with important work underway regarding sexual orientation and gender identity (SOGI). Building on a decade of work across federal agencies,9 best practices10 have emerged to capture data beyond male and female designations. SOGI research and testing are underway on various data collections across agencies. The Collaborating Center for Questionnaire Design and Evaluation Research at the National Center for Health Statistics (NCHS) is currently leading efforts to design, test, and implement various SOGI questions for federal surveys and other government agency data collections, such as a single, nonbinary gender question for federal health surveys.11 Collecting SOGI will enable research on health disparities suffered by lesbian, gay, bisexual, transgender, queer or questioning, intersex, and asexual individuals, such as differential cancer rates and risks of anxiety and depression.

Beyond these efforts to improve demographic data collections, efforts are also underway to improve other data elements sought by equity assessments. The Census Bureau is blending data from surveys, censuses, and administrative records to improve data on earnings, addressing missingness and underreporting by using multiple sources of data, illustrated in the National Experimental Wellbeing Statistics12 project. More accurate income data will lead to better studies of inequality and economic mobility. The Census Bureau is exploring methods to measure citizenship using administrative data,13 a worthy (and politically charged) project since self-response rates on that variable have dropped steadily in the American Community Survey.14

These efforts are likely to transform how we measure our population. The Equitable Data Working Group recommended capacity building to assess equity within and across data sources.15 Researchers should encourage transparency in these assessments and ensure that agencies consider how data collections impact measures of disparities and outcomes and support resources across agencies to maintain complete, comparable data.

IMPROVING EQUITABLE ACCESS TO DATA

Data access has not been equitable; in too many cases, access relied on personal connections and unwritten rules. Federal agencies have publicized their data inventories and learning agendas as directed by the Foundations for Evidence-Based Policymaking Act16 (the “Evidence Act”). The Evidence Act requires agencies to name a chief data officer, evaluation officer, and statistical official to improve the collection, management, analysis, and use of data across agencies and departments. These requirements are already showing tangible impacts for data users and researchers. For example, the HHS Statistical Official has led efforts to improve metadata and apply consistent tagging and digital object identifiers to their data sets, increasing the discoverability and usability of their data assets.

The Evidence Act also required OMB to establish “one front door” leading users to government data assets. In launching Researchdatagov.org, with its growing index of government data sets available for research, agencies have taken a big step forward in democratizing data access. Before this site launched, those seeking federal government data sets had to contact individual agencies and navigate their separate application processes. The catalog in Researchdatagov.org further helps researchers and policymakers understand where current data gaps exist.

Agencies are also helping researchers discover data equity resources. For example, the HHS Assistant Secretary for Planning and Evaluation highlights demographic and economic characteristics to support analyses of health disparities, as seen in the inventory17 of products resulting from the Patient-Centered Outcomes Research Trust Fund and the HHS-wide inventory18 of federal data for conducting patient-centered outcomes research on economic outcomes.

Addressing the recommendation of the Equitable Data Working Group to provide tools that help users analyze and navigate data, the Census Bureau has released multiple data equity tools19 that illustrate digital equity, economic mobility, community resilience, and more. The Census Bureau has also developed data products that inform studies about social determinants of health, including demographic portraits20 of Supplemental Nutrition Assistance Program recipients, an interactive tool for analyzing small-area income and poverty estimates by age,21 and My Community Explorer,22 which provides economic, social, race and ethnicity, and business profiles through a user-friendly, interactive map. These resources, data inventories, and tools alike enable users to more easily find, explore, and compare data—making access to data more equitable.

The pandemic compelled federal agencies to reevaluate data infrastructures to address pressing policy challenges. New laws, including the Evidence Act described previously, and the Creating Helpful Incentives to Produce Semiconductors and Science Act of 202223 (CHIPS Act) facilitate the collection, maintenance, and evaluation of federal, state, local, territorial, and tribal data to strengthen capacity-building efforts. Specifically, the CHIPS Act establishes a National Secure Data Service (NSDS) demonstration project to show how a government-wide data linkage and privacy protection strategy could evolve and support researchers and equity assessments.

The Advisory Committee on Data for Evidence Building report,24 released in October 2022, generated recommendations for how an NSDS could expand access to data, enable robust, accurate data linkages, and develop privacy-preserving techniques. The Committee recommended that the NSDS consider potential harms to vulnerable and marginalized populations when their data are used, specifically assessing the value of linkages and analyses relative to privacy concerns. An NCHS pilot (https://bit.ly/3Fltv9G) has been launched to determine promising uses for interoperable vital statistics, given the variation in unmet measurement needs across federal, state, and local levels. These needs are difficult to address because of chronic underinvestment in data infrastructure.

Federal agencies are investing in system modernizations to enable real-time data sharing with other federal agencies, state and local governments, and private and nonprofit data collectors. HHS has made strides toward interoperability at scale by establishing governance models within health information networks. HHS leadership on the Trusted Exchange Framework and Common Agreement (https://bit.ly/3tCgru5) focuses on transparency and privacy, which are critical for health data equity.

ADVANCING EQUITABLE METHODS

Federal agencies are actively exploring how artificial intelligence (AI) and machine learning can be used responsibly to ensure more ethical and equitable uses of federal health data. Current efforts include HHS’s development of a Trustworthy AI (https://bit.ly/46t7Ubc) playbook to reduce risks and build public trust when applying AI in federal activities and the National Cancer Institute’s encouragement of greater attention to data bias (https://bit.ly/46zTeHg) when employing AI. NCHS is also increasing its use of AI; they currently employ AI tools in survey operations for nonresponse detection, to code cause of death using text strings (https://www.cdc.gov/nchs/nvss/medcoder.htm), and to identify and replace personally identifiable information provided in interviews using speech-to-text software. These efforts result in greater efficiency and better data quality.

HHS also leads in research on using privacy-enhancing technologies that reduce risks of disclosure for groups and individuals in data sets. Privacy-preserving data linkages have been tested and implemented in the National COVID Cohort Collaborative (https://ncats.nih.gov/n3c), National Institutes of Health’s All of Us25 program, and NCHS’s linkage (https://bit.ly/3QkEoig) of the National Hospital Care Survey with Medicaid data. Federal investment in high-risk, high-reward projects is developing through the Advanced Research Projects Agency for Health (ARPA-H; https://arpa-h.gov). Following other ARPAs, ARPA-H will invite both public and private organizations to design innovative solutions to complex problems. This represents a major shift in how federal agencies fund medical research, supporting the next generation of moonshots for health, advancing both science and the awareness of diversity and equity issues. We look forward to seeing how ARPA-H can accelerate secure, responsible health surveillance and AI-enabled solutions to reduce disease burdens and find cures.

It is also worth noting where federal policies spur changes among providers and health information systems. In 2024, CMS (https://bit.ly/48Yt4je) and many states will require hospitals to implement SDOH screenings for all patients aged 18 years and older, thereby creating a consistent collection of standardized data that various agencies can use. Many vendors offer the ability to collect data on these social factors (https://bit.ly/3Q20B38) that influence health status but are not directly medical, through standardized, structured data fields (i.e., Z-codes). SDOH indicators range from housing stability to social connectedness. Information can be collected during clinical encounters by providers, social workers, community health workers, case managers, patient navigators, and nurses through health risk assessments, screening tools, and self-reporting. The Joint Commission (https://bit.ly/3S28CIc) and the National Committee for Quality Assurance created requirements and reimbursement incentives to collect richer SDOH data. Federal agencies could incentivize SDOH data collection—especially across Medicare Advantage, Medicaid, and commercial payers. Some providers are concerned about asking patients about their unmet housing, food, and safety needs without having adequate resources to provide them. To mitigate some of these concerns, the Centers for Disease Control and Prevention has provided grants (https://www.cdc.gov/populationhealth/sdoh/index.htm) to state, local, territorial, and tribal jurisdictions to implement cross-sector interventions that impact many of these social needs.

SUMMARY

We have highlighted numerous, ongoing federal initiatives and activities, including large-scale investments, major changes in data collection and access, and subtle but meaningful changes in processes.

Were you aware of these activities? If not, consider tracking research coming out of the Federal Committee on Statistical Methodology (https://www.fcsm.gov) and the NSDS demonstration project (https://ncses.nsf.gov/about/national-secure-data-service-demo#card1896) to stay informed on data standards and methods that improve equitable data access and uses.

Researchers and public health professionals need to engage with federal agencies to achieve a public health data system with complete and accurate data to identify and address inequities. We need to acknowledge the need to collect and use demographic characteristics that reveal unique aspects of our population. It is critical that all groups are visible, as they grow or shrink in number, and as they disperse across regions or cluster together. Among federal statistical agencies, there is tension between visibility and privacy for these groups. We need a constructive, open dialogue on resolving this tension to find balance between data privacy and utility.

ACKNOWLEDGMENTS

This work was supported by the Robert Wood Johnson Foundation (grant #79331).

CONFLICTS OF INTEREST

The authors have no conflicts of interest to report.

See also Toward More Equitable Public Health Data, pp. 12761308.

REFERENCES


Articles from American Journal of Public Health are provided here courtesy of American Public Health Association

RESOURCES