Skip to main content
Health Services Research logoLink to Health Services Research
. 2022 May 20;57(Suppl 2):207–213. doi: 10.1111/1475-6773.14000

The effect of differential privacy on Medicaid participation among racial and ethnic minority groups

Christoph F Kurz 1,2,, Adriana N König 1,2, Karl M F Emmert‐Fees 2,3, Lindsay D Allen 4
PMCID: PMC9660420  PMID: 35524543

Abstract

Objective

To investigate how county and state‐level estimates of Medicaid enrollment among the total, non‐Hispanic White, non‐Hispanic Black or African American, and Hispanic or Latino/a population are affected by Differential Privacy (DP), where statistical noise is added to the public decennial US census data to protect individual privacy.

Data Sources

We obtained population counts from the final version of the US Census Bureau Differential Privacy Demonstration Products from 2010 and combined them with Medicaid enrollment data.

Study Design

We compared 2010 county and state‐level population counts released under the traditional disclosure avoidance techniques and the ones produced with the proposed DP procedures.

Data Collection/Extraction Methods

Not applicable.

Principal Findings

We find the DP method introduces errors up to 10% into counts and proportions of Medicaid participation rate accuracy at the county level, especially for small subpopulations and racial and ethnic minority groups. The effect of DP on Medicaid participation rate accuracy is only small and negligible at the state level.

Conclusions

The implementation of DP in the 2020 census can affect the analyses of health disparities and health care access and use among different subpopulations in the United States. The planned implementation of DP in other census‐related surveys such as the American Community Survey can misrepresent Medicaid participation rates for small racial and ethnic minority groups. This can affect Medicaid funding decisions.

Keywords: census, confidentiality/privacy issues, Medicaid, racial and ethnic differences in health and health care


What is known on this topic

  • The census is the primary source of statistical information about the US population, and the accuracy of its data is critical to health services research.

  • The Census Bureau will infuse additional noise into its 2020 census data products to protect individuals' privacy.

  • Such privacy measures are suspected to disproportionately affect underrepresented populations and ethnic and racial subgroups.

What this study adds

  • Increased privacy measures in the census can affect Medicaid participation rate accuracy for the non‐Hispanic Black or African American and Hispanic or Latino/a population at the county‐level with lower population counts by introducing errors up to 10%.

  • State‐level accuracy of Medicaid participation rates is rarely affected by DP.

  • The implementation of DP in census‐related surveys can misrepresent Medicaid participation rates for small racial and ethnic minority groups and affect funding decisions.

1. INTRODUCTION

The constitutionally mandated decennial census survey, which includes the annual American Community Survey (ACS), is the foremost source of demographic and statistical information about the United States' population. It determines each state's representation in congress and guides the allocation of $1.5 trillion in federal funds per year. 1 The survey has particular importance in the health policy sphere for two reasons. First, census data are used in thousands of publications annually in social science and policy research, evaluating and informing data‐driven health policy. 2 Health services researchers, in particular, rely on accurate census statistics for measuring health disparities and tracking health care access and use among different subpopulations. 3 , 4 , 5 Second, federal, state, tribal, and local governments use ACS statistics to help with decision making and to allocate over $675 billion each year back to communities. 6 Concerningly, recent changes to privacy security measures for the census may have jeopardized the quality and utility of these important US data resources. 2

Balancing the need for accurate census survey data with an equally essential obligation to protect respondents' privacy is a trade‐off that has come to a head in recent years. 7 Some argue that the confidentiality of census data is susceptible to “database reconstruction,” a process for inferring individual‐level responses from tabular data. 2 To protect the privacy of the 2020 census participants from these techniques, the US Census Bureau includes additional privacy security methods based on the concept of Differential Privacy (DP). Although controversial among researchers, 2 DP is a promising concept that can protect respondents' privacy against a wide range of attacks, such as reidentification and record linkage. 8 , 9

The DP approach is based on adding a controlled amount of statistical noise to the data in a way that protects the information of any single respondent against identification. 10 While prior work has found that census estimates generated via DP data may be accurate for aggregate population statistics, they are subject to considerable error for mortality rate estimates of subgroups and ethnic and racial minority groups. 11

If these errors for members of racial and ethnic minority groups persist in other areas of the survey, there may be important implications for disparities. For example, one of the key questions asked in the ACS is about respondents' health insurance status. 12 The Census Bureau uses these data to create statistics about the percentage of people covered by health insurance and the sources of health insurance, which are then used to plan government programs, determine eligibility criteria, and encourage people to participate in health insurance programs. 12 For members of racial and ethnic minority groups, who face longstanding disparities in health insurance coverage and health, these programs are essential for accessing health care and protection from high health care costs. 13

Differential privacy is very likely to be applied to the ACS as well. 14 The purpose of this paper is to estimate the extent to which DP may influence ACS statistical accuracy for topics that have implications for health disparities. We used the example of Medicaid participation rates.

Medicaid provides public health insurance coverage to millions of Americans and plays a substantial role in improving access to care, health outcomes, financial security, and other factors essential for reducing disparities. Without accurate data on who is participating in Medicaid, funds and programs may be misallocated away from areas most in need.

Using the Census Bureau's demonstration dataset containing 2010 decennial counts produced with proposed DP and traditional techniques, we evaluated how the implementation of DP affects estimates of Medicaid participation rates at the county and state levels for the non‐Hispanic White, non‐Hispanic Black or African American, and Hispanic or Latino/a population. We anticipate that results from this study will inspire more work on the relative drawbacks versus benefits of incorporating DP into the ACS.

2. METHODS

2.1. Data

We used two data sources to construct Medicaid participation rates, which we defined using the number of Medicaid enrollees as the numerator and population counts as the denominator. For the numerator, we acquired 2010 Medicaid enrollment data directly from the Centers for Medicare & Medicaid Services (CMS) via the Medicaid Statistical Information System and the Kaiser Family Foundation. 15 Enrollment is defined as the number of individuals who are enrolled in Medicaid at any time during the federal fiscal year. We included all 1735 counties for which complete information on race and ethnicity (i.e., non‐Hispanic White, non‐Hispanic Black or African American, Hispanic or Latino/a, and Other populations) was available (out of 3195). The group “Other” included the following races and ethnicities: American Indian and Alaska Native alone; Asian alone; Native Hawaiian and Other Pacific Islander alone; Some Other Race alone; and Two or More Races. If data for the year 2010 were not reported, we used data from subsequent years up to 2013. For the state of Idaho, data for the non‐Hispanic Black or African American category were missing for all years and could not be included.

We chose to construct these measures at the county and state levels because these are two geographical units of analysis that play a large role in both health services research and health policy. Medicaid policies are typically determined at the state level, while counties operate hospitals, nursing homes, behavioral and mental health care, testing services, and other public health services. 16

For the denominator, we obtained county and state‐level data from the 2010 US census that were released using traditional disclosure avoidance techniques, as well as those produced using the DP procedures. 17 Traditional disclosure techniques applied to the 2010 census include household‐swapping, rounding, and top‐ and bottom‐swapping. 18 The 2010 Demonstration Data Products introduces DP to the census data. Using the 2010 census data thus provides the opportunity to compare traditional disclosure techniques with DP. Since the 2020 census data only contains DP, this comparison is not possible for the more recent data set.

The DP concept was developed by 19 and leverages random noise—random variation around the true value—to obscure individual‐level data. This conserves the statistical properties of the data but makes individual information hard to identify. 20

An essential component of DP is the privacy loss parameter, usually denoted by ε. This parameter determines the amount of noise added to the computation, and therefore defines what can be learned about an individual as a result of their private information being included in a differentially private analysis. Lower ε means more noise is added, increasing privacy but decreasing statistical accuracy. The census DP counts were produced under a global ε of 19.61, which includes an ε of 17.14 for persons and an ε of 2.47 for housing units. 21 We used the final version of the US Census Bureau Differential Privacy Demonstration Products, released in August 2021. Data were accessed via IPUMS‐NHGIS. 22 Population counts were available for the same race and ethnicity categories used by CMS.

2.2. Analysis

We defined Medicaid participation counts for county c and race and ethnicity r as A(c,r). We defined population counts separately for the traditional census approach P(c,r,trad) and the novel DP approach, denoted as P(c,r,DP).

This allowed calculation of the Medicaid participation rate for county c and race and ethnicity r, m(c,r,j), as the numerical.

Medicaid participants count divided by the population,

mc,r,j=Ac,rPc,r,j,

where j refers to either trad or DP. The absolute difference in the participation rate is then defined as

da=mc,r,tradmc,r,DP.

The relative difference in the participation rate between traditional and DP Census data is

dr=mc,r,tradmc,r,DPmc,r,trad+mc,r,DP2

We multiplied participation rates by 100 to get percentages, then calculated the absolute and relative difference between the two. The absolute difference is a simple measure of the absolute deviation of two values, independent of size. The relative difference, which compares two quantities while considering the sizes of the two measures being compared, is the preferred quantitative indicator where the two outcomes are expected to be the same. For example, if Medicaid participation among Hispanic or Latino/a individuals is 5% using traditional population counts and 6% using DP population counts, the absolute difference is |5%–6%| = 1%. The relative difference is |(5%–6%)|/(|(5% + 6%)|/2) = 18%. We applied the same analytical procedure to the state‐level data.

3. RESULTS

Figure 1 illustrates the differences in traditional‐ versus DP‐defined race and ethnicity‐specific Medicaid participation rates at the county level, expressed as absolute and relative differences. We find that DP introduces higher absolute and relative errors into racial and ethnic minority populations compared to larger populations. Among these minority groups, the largest absolute differences appear among the non‐Hispanic Black or African American population. Counties in the eastern and southern United States—where the average county population size is much smaller than other areas—were most affected. For example, in the 2010 census, Gilmer County, Georgia, has a non‐Hispanic Black or African American population count of 98, but the DP count is 55. For the non‐Hispanic White population, the difference is usually small and negligible (see Table 1 for more examples). The large discrepancy in differences between counties is further illustrated by the coefficient of variation (defined as [standard deviation]/mean). It is 242% for non‐Hispanic White, 310% for non‐Hispanic Black or African American, 441% for Hispanic or Latino/a, and 267% for other populations.

FIGURE 1.

FIGURE 1

Effect of Differential Privacy on Medicaid participation rates at the county level. Darker red color indicates larger differences. White areas on the map indicate missing data. AA, African American; DP, differential privacy; Pop., population; Trad., traditional disclosure methods [Color figure can be viewed at wileyonlinelibrary.com]

TABLE 1.

2010 Census population counts using traditional and Differential Privacy disclosure methods for selected counties

County Total Pop. Total Pop. DP NH White NH White DP NH Black or AA NH Black or AA DP Hispanic or Latino/a Hispanic or Latino/a DP Other Other DP
Gilmer County, GA 28,292 28,292 25,078 25,109 98 55 2677 2680 439 448
Crawford County, MO 24,696 24,694 23,804 23,832 64 48 365 347 463 467
Marshall County, KY 31,448 31,450 30,669 30,680 48 41 350 354 381 375
Wilson County, KS 9409 9408 8866 8920 30 34 217 160 296 294
Mills County, IA 15,059 15,061 14,390 14,386 57 49 359 396 253 230
Ashe County, NC 27,281 27,276 25,420 25,411 148 121 1311 1349 402 395
Stonewall County, TX 1490 1491 1206 1216 38 34 209 194 37 47
Swain County, NC 13,981 13,982 9168 9155 75 95 540 510 4198 4222
Winston County, AL 24,484 24,483 23,237 23,245 115 97 639 655 493 486
Iron County, WI 5916 5919 5772 5770 3 6 35 46 106 97
Wilson County, KS 9409 9408 8866 8920 30 34 217 160 296 294
Macon County, MO 15,566 15,570 14,735 14,789 349 349 150 122 332 310
Taylor County, WV 16,895 16,896 16,377 16,382 125 110 143 122 250 282
Ste. Genevieve County, MO 18,145 18,146 17,607 17,590 117 106 149 181 272 269
Appomattox County, VA 14,973 14,975 11,483 11,498 2998 3014 167 142 325 321
Boyd County, NE 2099 2100 2024 2033 1 0 33 35 41 32
Cavalier County, ND 3993 3992 3890 3893 4 1 24 30 75 68
Perkins County, SD 2982 2984 2883 2886 2 2 20 28 77 68
Karnes County, TX 14,824 14,827 5956 5931 1351 1344 7376 7429 141 123

Abbreviations: AA, African American; DP, differential privacy; NH, non‐Hispanic; Pop., population.

In counties with very homogeneous racial and ethnic populations that are coupled with low total population size, DP errors are more evident. For example, Iron County, Wisconsin, has a total population of 5916, where almost 98% identify as non‐Hispanic White; the non‐Hispanic Black or African American population contains only three individuals, and the Hispanic or Latino/a population counts 35. However, the DP census lists 6 non‐Hispanic Black or African Americans and 46 Hispanic or Latino/a individuals, respectively. Combined with high Medicaid participation, this results in large differences. Such shifts make Medicaid participation rates for certain combinations of county and race and ethnicity hard to interpret.

As the population denominator increases, the errors introduced by the DP diminish. This is illustrated in Figure 2, which shows calculations at the state level (please note the different color ranges compared with Figure 1). At the state level, the maximum absolute difference is 0.1 percentage points, while the maximum relative difference is less than 0.5%. Even for states with little racial and ethnic diversity, such as Montana or Wyoming, the effect of DP has a minor influence on Medicaid participation rate accuracy. According to the 2010 census, Montana has a non‐Hispanic Black or African American population count of 3743; this changes to 3731 in the DP census demonstration file, resulting in only a small relative difference, even with high Medicaid participation among non‐Hispanic Black or African American individuals in this state.

FIGURE 2.

FIGURE 2

Effect of Differential Privacy on Medicaid participation rates at the state level. Darker red color indicates larger differences. White areas on the map indicate missing data. AA, African American; DP, differential privacy; Pop., population; Trad., traditional disclosure methods [Color figure can be viewed at wileyonlinelibrary.com]

4. DISCUSSION

In this paper, we have shown that applying the DP algorithm to the 2010 US Decennial Census results in a misrepresentation of Medicaid participation rates among already‐marginalized racial and ethnic groups. Distortions from the DP approach were most evident when examining data at the county level and in states with less racial and ethnic diversity. It has to be noted that county‐level data were only available for 1735 out of 3195 counties. Medicaid participation rates among certain combinations of county and race and ethnicity differed between DP and non‐DP counts on an absolute scale by more than three percentage points, while relative difference sometimes exceeded 10%. Importantly, non‐Hispanic White individuals are the only ethnic and racial subgroup for which the DP algorithm accurately captured Medicaid participation rates.

This finding may have important implications for health policy. Although our study used data from the 2010 Decennial Census (the only data set for which DP‐adjusted data are available), health insurance coverage information was not collected as part of this survey. Rather, this information is collected via the annual ACS, to which DP will likely be applied in the future. 14 The ACS insurance data play in planning government programs, resource allocation, and policy evaluation and tracking. Therefore, policy makers and researchers alike need to be aware of DP's potential pitfalls, especially when it comes to racial and ethnic groups most affected by insurance and related health disparities.

Our results are in line with previous findings showing the distorting effect of DP census data on COVID rates 23 and mortality. 11 In contrast, 24 find that census tract estimates of mortality rates and inequities are not affected by DP. Their analysis, however, is limited to the state of Massachusetts, a populous state with more diverse demographics. We also did not find an effect for Massachusetts state and counties.

Beyond insurance, the application of DP to the 2020 census data may compound other equity concerns about that year's survey, including a decision to cut the reporting period short at a time when 60 million households still needed to be recorded. Notably, those households were hard‐to‐count populations, disproportionately people of color. 25

Even setting aside these potential equity implications, the use of DP in the 2020 census remains highly controversial. On the one hand, protecting respondent privacy is a chief concern. NPR details a recent security exercise in which Census Bureau researchers were able to reconstruct a complete set of records for every person included in the 2010 census numbers. 26 After cross‐referencing the reconstructed data with purchased commercial records, they were able to reidentify 52 million people by name. The researchers estimated that more than half of the population included in the 2010 census could be identified if attackers were able to procure even more commercial data records. On the other hand, the reconstruction described represents an extraordinarily resource‐intensive process, unlikely to occur outside the Bureau's laboratory, leading some to proclaim that “differential privacy goes far beyond what is necessary to keep data safe under census law and precedent.” 2

As our study illustrates, the introduction of DP to future census products may introduce data inaccuracies, especially among already‐marginalized populations. Researchers and policy makers should be aware of this potential pitfall when designing studies and allocating resources in the future.

ACKNOWLEDGMENTS

None.

Kurz CF, König AN, Emmert‐Fees KMF, Allen LD. The effect of differential privacy on Medicaid participation among racial and ethnic minority groups. Health Serv Res. 2022;57(Suppl. 2):207‐213. doi: 10.1111/1475-6773.14000

REFERENCES


Articles from Health Services Research are provided here courtesy of Health Research & Educational Trust

RESOURCES