Abstract
Objectives
Evaluating population health initiatives at the community level necessitates valid counterfactual communities, which includes having similar population composition, health care access, and health determinants. Estimating appropriate county counterfactuals is challenging in states with large intercounty variation. We describe an application of K-means cluster analysis for determining county-level counterfactuals in an evaluation of an intervention, a county perinatal system of care for Medicaid-insured pregnant women.
Methods
We described counties by using indicators from the American Community Survey, Area Health Resources Files, University of Wisconsin Population Health Institute County Health Rankings, and vital records for Michigan Medicaid-insured births for 2009, the year the intervention began (or the closest available year). We ran analyses of 1000 iterations with random starting cluster values for each of a range of number of clusters from 3 to 10 with commonly used variability and reliability measures to identify the optimal number of clusters.
Results
The use of unstandardized features resulted in the grouping of 1 county with the intervention county in all solutions for all iterations and the frequent grouping of 2 additional counties with the intervention county. Standardized features led to no solution, and other distance measures gave mixed results. However, no county was ideal for all subpopulation analyses.
Practice Implications
Although the K-means method was successful at identifying comparison counties, differences between the intervention county and comparison counties remained. This limitation may be specific to the intervention county and the constraints of a within-state study. This method could be more useful when applied to other counties in and outside Michigan.
Keywords: evaluation, methods, population health
Complex, multifactorial public health problems necessitate population-wide solutions and, thus, community-level interventions are frequently used. The gold standard for evaluating community-level interventions, such as those at the county level, is the group-randomized trial. 1 However, group-randomized trials are often not feasible or appropriate in a small number of communities, particularly in community-based participatory research and other scenarios in which the intervention is initiated by community members or in system-of-care interventions that depend on coordinating local resources. It is not clear how best to evaluate non–randomly assigned community interventions under these circumstances. 2 -4 As in individual-level quasi-experimental analyses, the challenge is to identify an appropriate counterfactual group or communities that are as similar as possible to the intervention community to approximate group randomization, 3,5 but little guidance exists on selection criteria or methods.
We conducted a demonstration project during 2009-2015 to determine whether enhancements to a county perinatal system of care improved population outcomes for Medicaid-insured pregnant women and their infants in Kent County, Michigan, a mixed urban/rural county containing the second-largest city in the state, Grand Rapids. Because our intervention targeted the entire county, the ideal counterfactual community or communities would have the same sociodemographic characteristics and level of complexity in service delivery as the intervention county. Complexity is influenced by many factors, including the numbers of residents, health care agencies, health care providers, and enhanced prenatal care programs.
Because we wanted to account for the broader context and administrative features of an entire county, we explored a clustering method, K-means cluster analysis, to identify 1 or more comparison counties in Michigan. K-means cluster analysis has been used increasingly in the public health and health services research literature to create clusters of exposures or individuals, most frequently for sources of air pollution 6 and for segmentation of health care populations. 7 Where the method was used for county-level clustering, it grouped all counties nationwide into clusters for descriptive health rankings rather than for intervention evaluation. 8,9
The objective of this study was to describe and discuss an application of K-means cluster analysis for determining county-level counterfactuals for evaluation of a county perinatal system of care for Medicaid-insured pregnant women. To our knowledge, this is the first use of this method to find the most similar counterfactual county.
Materials and Methods
Michigan has 83 counties. We selected county-level features from 3 publicly available data sources: the American Community Survey, 10 the Area Health Resources Files, 11 and the University of Wisconsin Population Health Institute County Health Rankings. 12 The fourth data source was a limited dataset of Michigan resident live birth records for Medicaid-insured births only (hereinafter called Michigan Medicaid vital records [MMVR]), including county identifiers. We retrieved the dataset from the Michigan Department of Health and Human Services (MDHHS) Health Services Data Warehouse through an honest broker (a neutral third party, who is not part of the research team in any way).
We considered the following categories of features: population composition, social and economic factors, health care access, health outcomes, health behaviors, and physical environment. We selected 35 features across these broad categories, some chosen from Rettenmaier and Wang 13 and others based on the evaluation team’s experience with the Michigan Medicaid Maternal and Infant Health Program and Strong Beginnings, a federal Healthy Start program. 14 -16 These features represented a mixture of compositional (ie, aggregated from individual-level characteristics) and contextual (ie, pertaining to the county environment) measures. 17 Whenever possible, we used data from 2009 because the enhanced county perinatal system of care was implemented in that year. For data elements without information from 2009, we used the following years as proxies, in priority order: 2010, 2008, or 2011. We did not have data on 4 features in some counties (number of counties ranged from 1435); for these counties, we imputed mean values. We described the 35 features with the data source, year, mean, and SD (Table 1).
Table 1.
County-level features used to characterize Michigan counties from the American Community Survey (ACS), a Area Health Resources Files (AHRF), b Michigan Medicaid vital records (MMVR), c and University of Wisconsin Population Health Institute (UWPHI) County Health Rankings, d 2008-2011
| Domain | Characteristic | Feature | Source | Year | Mean (SD) |
|---|---|---|---|---|---|
| Population composition | Race/ethnicity | % of Black females aged 15-54 | AHRF | 2010 | 2.2 (4.5) |
| % of Latina females aged 15-54 | AHRF | 2010 | 1.9 (1.7) | ||
| Urbanity | % of Population living in urban area | AHRF | 2010 | 38.4 (27.4) | |
| Social and economic | Social support | % of Families with female head of household | AHRF | 2010 | 15.1 (3.8) |
| % of Adults without social/emotional support e | UWPHI | 2010 | 17.2 (3.7) | ||
| Education | % of Adults aged ≥25 with <high school diploma (5-y average) | AHRF | 2010 | 12.5 (3.3) | |
| % of Black adults aged ≥25 with <high school diploma (5-y average) | AHRF | 2009 | 22.3 (17.2) | ||
| % of Latinx adults aged ≥25 with <high school diploma (5-y average) | AHRF | 2009 | 26.8 (15.9) | ||
| Income/poverty | Median annual household income (log transformed) | AHRF | 2009 | 10.6 (0.2) | |
| % of Population in deep poverty (<50% federal poverty level, 5-y average) | AHRF | 2009 | 7.1 (2.4) | ||
| % of Children aged <18 y in deep poverty (<50% federal poverty level, 5-y average) | AHRF | 2009 | 10.5 (4.0) | ||
| No. of female heads of household in poverty (<100% federal poverty level, 5-y average) | ACS | 2009 | 1803 (5692) | ||
| Employment | % of Population aged 16-64 unemployed (5-y average) | AHRF | 2010 | 25.9 (6.1) | |
| Access to health care | Health insurance | % of Women aged 18-64 without health insurance | AHRF | 2009 | 15.9 (2.2) |
| No. of Medicaid-eligible females | AHRF | 2008 f | 13 968 (37 053) | ||
| No. of Medicaid-eligible people aged <21 | AHRF | 2008 f | 13 871 (36 826) | ||
| Health care providers | No. of nonfederal medical doctors and doctors of osteopathy in patient care per 1000 population | AHRF | 2010 | 1.5 (1.4) | |
| No. of hospitals | AHRF | 2010 | 2.1 (3.4) | ||
| No. of hospital beds for obstetric care | AHRF | 2011 f | 20.7 (53.5) | ||
| No. of Federally Qualified Health Centers per 1000 population | AHRF | 2009 | 1.6 (2.9) | ||
| No. of primary care providers per 100 000 population | UWPHI | 2010 | 82.0 (51.0) | ||
| Health outcome | Mortality | Infant mortality (total, per 1000 population, 5-y average) | AHRF | 2009 | 6.8 (2.4) |
| Infant mortality (non-White, per 1000 population, 5-y average) g | AHRF | 2009 | 11.2 (5.6) | ||
| Births | No. of Medicaid births | MMVR | 2009 | 840 (2121) | |
| No. of Black Medicaid births | MMVR | 2009 | 224 (1119) | ||
| % of Medicaid births to unmarried women | MMVR | 2009 | 56.4 (7.2) | ||
| % of Medicaid births to mothers aged <18 y | MMVR | 2009 | 3.9 (2.0) | ||
| % of Black Medicaid preterm births | MMVR | 2009 | 6.5 (14.6) | ||
| % of Black Medicaid low birth weight | MMVR | 2009 | 5.0 (9.5) | ||
| Health behavior | Tobacco use | % of Adult current smokers h | UWPHI | 2010 | 23.6 (4.3) |
| Alcohol | % of Adult binge or heavy drinkers i | UWPHI | 2010 | 17.7 (3.7) | |
| Sexual behavior | Chlamydia rate per 100 000 population | UWPHI | 2010 | 187.6 (158.5) | |
| Physical environment | Housing | Ratio of vacant vs occupied housing units | AHRF | 2009 | 0.4 (0.4) |
| % of Zip codes without healthy food outlets | UWPHI | 2010 | 51.7 (18.9) | ||
| Air quality | No. of annual air pollution days | UWPHI | 2010 | 3.6 (4.2) |
aUS Census Bureau. 10
bHealth Resources and Services Administration, US Department of Health and Human Services. 11
cMMVR is a limited dataset of live birth records among Michigan residents for Medicaid-insured births only, retrieved from the Michigan Department of Health and Human Services Health Services Data Warehouse.
dUniversity of Wisconsin Population Health Institute. 12
eEighteen counties missing data for this variable; values replaced with mean values.
fData for 2009 and 2010 not available.
gThirty-five counties missing data for this variable; values replaced with mean values.
hEighteen counties missing data for this variable; values replaced with mean values.
iFourteen counties missing data for this variable; values replaced with mean values.
Analysis
We used K-means cluster analyses to identify the comparison county or counties that would be classified most frequently in the same cluster as the intervention county. Cluster analysis, also called data segmentation, groups or segments a collection of objects (counties in our study) into subgroups or clusters such that objects (counties) in the same cluster are more similar to each other than objects (counties) assigned to other clusters. 18 In a cluster analysis of counties, counties are described by a set of characteristics (features, measurements) that are preselected according to the purpose of an evaluation; in our study, we selected characteristics relevant to the evaluation of a pregnancy-centered health program. Counties in the same cluster may have different degrees of similarity or dissimilarity with each other and to a specific county (Kent County in our study). Among the most widely used clustering methods is the K-means (or median) clustering, for which the within-cluster variation is as small as possible. 6,7
We tested solutions for 3-10 clusters for 1000 iterations each with randomly assigned starting cluster values. To identify the best solution, we used scree plots to visualize a kink in the curve generated from the within sum of squares (WSS) or its logarithm (log[WSS]) for all cluster solutions, that is, the point at which reduction in WSS or log(WSS) is not appreciable enough to warrant increasing the number of clusters. 19 The WSS can be thought of as the error sum of squares in regression analysis, where the total sum of squares is defined when all objects are in 1 cluster. We also used another criterion, the η2 coefficient: The η2 coefficient has a value that ranges from 0 to 1, with higher values indicating better clustering. We also transformed the WSS to obtain the proportional reduction of error (PRE) and to identify the number of clusters that leads to the largest reduction in PRE. Finally, we used the gap statistic to estimate the optimal number of clusters. 20
Nineteen of the 35 features are proportions that have a natural range (0-100), and the other features are continuous measures that may have wide SDs. If we standardized these features, some natural variation (eg, the size of the population in each county) could be masked. In K-means analyses, if all variables are standardized, then clustering based on correlation (similarity) is equivalent to clustering based on squared distance (dissimilarity). 21 Therefore, we ran all analyses twice, with and without a z score standardizing all variables. We then conducted several robustness checks on both unstandardized and standardized analyses, including using the 5 standardization methods in Schaffer and Green 22 and the mean absolute deviation standardization to calculate the Euclidean distance, L1 distance, Canberra distance, and 1 – correlation distance of the 35 features between Kent County and other counties. We used Stata version 15 (StataCorp LLC) for all analyses. The evaluation project was reviewed by the Michigan State University, MDHHS, and Spectrum Health institutional review boards and determined not to be human subjects research.
Results
By visual inspection of the scree plot, when features were not standardized, the optimal number of clusters was 4, where WSS could be seen at a kink point and the reduction in PRE was the largest (Figure 1). On the other hand, when features were standardized, we found no clear optimal solution of the number of clusters (Figure 2). The gap statistic suggested a 5-cluster solution when features were not standardized and no solution when features were standardized (details available upon request).
Figure 1.
Scree plot of within sum of squares (WSS), log(WSS), η 2 coefficient, and proportional reduction of error (PRE) to demonstrate the optimal number of clusters for 10 cluster solutions when county-level features for characterizing Michigan counties are not standardized, Michigan, 2008-2011. Visual inspection of the plot demonstrates that the optimal number of clusters is 4.
Figure 2.
Scree plot of within sum of squares (WSS), log(WSS), η 2 coefficient, and proportional reduction of error (PRE) to demonstrate the optimal number of clusters for 10 cluster solutions when county-level features used for characterizing Michigan counties are standardized, Michigan, 2008-2011. Visual inspection of the plot demonstrates no clear optimal solution of the number of clusters.
Because our goal was to find another county or other counties most suitable as a comparison with Kent County, we viewed higher frequency of a county in the same cluster as Kent County to be an indication of similarity to Kent County. The use of unstandardized features grouped 3 counties with Kent County in 1000 iterations when the number of clusters varied from 3 to 6, the range around the optimal number of 4 or 5. Macomb County clustered with Kent County in all iterations of all solutions, Oakland County clustered with Kent County in 979-1000 iterations, and Genesee County clustered with Kent County in 127-1000 iterations, depending on the number of clusters. In contrast, when we standardized all features by z score, no county was grouped with Kent County in a majority of iterations (details on standardized features and robustness checks available upon request).
Given the lack of an optimal solution under z score standardization and that the counties grouped together were not similar, we turned to different distance and standardization methods (Table 2). In terms of Euclidean distance (the distance measure used in K-means clustering), unstandardized features led to the same 3 counties, whereas Ingham County had the smallest distance to Kent County when we used 4 of 6 standardized measures. For the most part, the use of L1 and Canberra distance resulted in the same counties that were obtained by using Euclidean distance, while 1 – correlation distance produced different results.
Table 2.
Top 3 counties as nearest neighbors of Kent County based on various distance and standardization methods, Michigan, 2008-2011
| Distance measure | Unstandardized | Standardized measure a | |||||
|---|---|---|---|---|---|---|---|
| County | z Score | x/Max(x) b | x/[Min(x)–max(x)] c | x/Sum(x) d | Rank(x) e | Absolute value [x − mean(x)] f | |
| Euclidean | Macomb | Ingham | Ingham | Ingham | Genesee | Ingham | Macomb |
| Oakland | Kalamazoo | Muskegon | Kalamazoo | Ingham | Kalamazoo | Oakland | |
| Genesee | Muskegon | Kalamazoo | Muskegon | Oakland | Oakland | Genesee | |
| L1 | Macomb | Ingham | Ingham | Ingham | Macomb | Ingham | Macomb |
| Oakland | Kalamazoo | Kalamazoo | Kalamazoo | Genesee | Kalamazoo | Oakland | |
| Genesee | Macomb | Muskegon | Macomb | Oakland | Oakland | Genesee | |
| Canberra | Ingham | Ingham | Ingham | Ingham | Ingham | Ingham | Ingham |
| Genesee | Kalamazoo | Genesee | Kalamazoo | Genesee | Kalamazoo | Genesee | |
| Oakland | Genesee | Oakland | Macomb | Oakland | Jackson | Macomb | |
| 1 – correlation | Jackson | Ingham | Ingham | Ingham | Oakland | Oakland | Alpena |
| Monroe | Ottawa | Ottawa | Ottawa | Ingham | Ingham | Gogebic | |
| Montcalm | Oakland | Muskegon | Kalamazoo | Genesee | Kalamazoo | Roscommon | |
a x represents the value for each observation for a given feature.
bStandardized by dividing each value x by the maximum value x for that feature.
cStandardized by dividing each value x by the difference between the minimum value x and the maximum value x for that feature.
dStandardized by dividing each value x by the sum of the values of x for that feature.
eStandardized using the rank of each value x for that feature.
fStandardized by taking the absolute value of the difference between each value x and the mean value x for that feature.
The 3 most similar counties to Kent County, using unstandardized features, were Macomb, Oakland, and Genesee counties, and the 3 most similar counties, using standardized features, were Ingham, Kalamazoo, and Muskegon counties (Table 3) under most of the difference measures. In examining values individually, Macomb and Kalamazoo counties were within 2 percentage points of Kent County for 10 of 19 percentage-based features, with nearly identical median annual household income and numbers of Medicaid-eligible women and Medicaid-eligible people aged <21. Ingham and Muskegon counties were within 2 percentage points of Kent County for 7 of 19 percentage-based features, and Oakland and Genesee counties were within 2 percentage points of Kent County for 5 of 19 percentage-based features; all 7 counties had similar median annual household incomes. We found broad variation in access-to-care features across counties. However, we found 2 related features for which all comparison counties had smaller values than Kent County had, most of which were considerably smaller: the 2 Latinx population features, “percentage of Latina females aged 15-54” (Kent County, 6.7%; comparison counties, 1.5%-5.0%) and “percentage of Latinx people aged ≥25 with <high school diploma” (Kent County, 46.3%; comparison counties, 16.9%-35.7%).
Table 3.
Features of Kent County and comparison counties, from the American Community Survey (ACS), a Area Health Resources Files (AHRF), b Michigan Medicaid vital records (MMVR), c and University of Wisconsin Population Health Institute (UWPHI) County Health Rankings, d Michigan, 2008-2011
| Feature | Kent | Macomb | Oakland | Genesee | Ingham | Kalamazoo | Muskegon |
|---|---|---|---|---|---|---|---|
| % of Black females aged 15-54 | 7.3 | 7.0 | 10.5 | 14.9 | 8.6 | 8.0 | 10.0 |
| % of Latina females aged 15-54 | 6.7 | 1.5 | 2.4 | 2.0 | 5.0 | 2.8 | 3.1 |
| % of Population living in urban area | 84.3 | 97.2 | 95.2 | 83.2 | 86.8 | 82.5 | 76.7 |
| % of Families with female head of household | 18.8 | 18.9 | 16.9 | 26.0 | 22.4 | 19.7 | 22.6 |
| % of Adults without social/emotional support | 18.2 | 20.2 | 17.3 | 23.0 | 18.9 | 19.1 | 23.1 |
| % of Population aged ≥25 with <high school diploma (5-y average) | 11.7 | 12.4 | 7.8 | 11.9 | 9.3 | 8.3 | 12.3 |
| % of Black population aged ≥25 with <high school diploma (5-y average) | 19.7 | 13.5 | 10.3 | 16.3 | 16.8 | 16.7 | 18.8 |
| % of Latinx population aged ≥25 with <high school diploma (5-y average) | 46.3 | 22.4 | 27.0 | 16.9 | 24.6 | 23.6 | 35.7 |
| Median annual household income (log transformed) | 10.8 | 10.8 | 11.0 | 10.6 | 10.7 | 10.7 | 10.6 |
| % of Population in deep poverty (<50% federal poverty level, 5-y average) | 6.8 | 4.8 | 4.3 | 9.5 | 11.5 | 8.7 | 7.7 |
| % of Children aged <18 y in deep poverty (<50% federal poverty level, 5-y average) | 9.6 | 7.0 | 5.7 | 15.5 | 12.7 | 9.6 | 11.5 |
| No. of female heads of household in poverty (<100% federal poverty level, 5-y average) | 8864 | 7333 | 9024 | 9745 | 4457 | 3656 | 3398 |
| % of Population aged 16-64 unemployed (5-y average) | 19.5 | 22.7 | 20.8 | 30.7 | 20.7 | 19.2 | 28.5 |
| % of Women aged 18-64 without health insurance | 14.8 | 14.7 | 12.3 | 13.0 | 14.0 | 13.7 | 15.6 |
| No. of Medicaid-eligible females | 72 009 | 74 763 | 79 324 | 65 315 | 31 292 | 27 835 | 27 274 |
| No of Medicaid-eligible people aged <21 y | 74 313 | 73 401 | 78 119 | 65 053 | 30 506 | 26 218 | 26 446 |
| No. of nonfederal medical doctors or doctors of osteopathy in patient care per 1000 population | 3.3 | 1.6 | 5.7 | 2.5 | 3.8 | 3.5 | 1.7 |
| No. of hospitals | 7 | 8 | 17 | 4 | 3 | 4 | 4 |
| No. of hospital beds for obstetric care | 177 | 55 | 255 | 90 | 76 | 63 | 57 |
| No. of Federally Qualified Health Centers per 1000 population | 11 | 1 | 2 | 5 | 8 | 4 | 4 |
| No. of primary care providers per 100 000 population | 129.3 | 71.0 | 207.5 | 132.1 | 135.4 | 148.7 | 86.2 |
| Infant mortality (total, per 1000 population, 5-y average) | 7.4 | 6.5 | 6.4 | 9.2 | 7.1 | 7.4 | 6.4 |
| Infant mortality (non-White, per 1000 population, 5-y average) | 13.7 | 10.7 | 10.8 | 16.9 | 14.9 | 15.5 | 8.1 |
| No. of Medicaid births | 5045 | 4830 | 5320 | 3739 | 1943 | 1705 | 1700 |
| No. of Black Medicaid births | 1005 | 947 | 1506 | 1407 | 472 | 482 | 445 |
| % of Medicaid births to unmarried women | 60.4 | 52.8 | 57.3 | 71.8 | 62.5 | 65.2 | 64.5 |
| % of Medicaid births to mothers aged <18 y | 4.7 | 2.9 | 3.5 | 6.3 | 4.1 | 5.5 | 5.1 |
| % of Black Medicaid preterm births | 13.5 | 12.8 | 14.3 | 17.0 | 14.6 | 16.4 | 12.8 |
| % of Black Medicaid low birth weight | 12.1 | 13.2 | 12.7 | 15.2 | 14.0 | 17.2 | 11.5 |
| % of Adult current smokers | 21.3 | 22.5 | 18.4 | 25.9 | 20.5 | 20.5 | 30.7 |
| % of Adult binge or heavy drinkers | 17.1 | 18.6 | 16.0 | 16.0 | 19.3 | 22.1 | 20.5 |
| Chlamydia rate per 100 000 population | 559.9 | 195.8 | 274.7 | 709.3 | 660.9 | 540.9 | 632.3 |
| Ratio of vacant vs occupied housing units | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 |
| % of Zip codes without healthy food outlets | 55.1 | 14.6 | 33.3 | 45.9 | 59.5 | 45.8 | 46.7 |
| No. of annual air pollution days | 15 | 9 | 10 | 8 | 8 | 5 | 9 |
aUS Census Bureau. 10
bHealth Resources and Services Administration, US Department of Health and Human Services. 11
cMMVR is a limited dataset of live birth records among Michigan residents for Medicaid-insured births only, retrieved from the Michigan Department of Health and Human Services Health Services Data Warehouse.
dUniversity of Wisconsin Population Health Institute. 12
Discussion
K-means clustering achieved identification of 1-3 comparison counties when preselected features were not standardized in our study, whereas other distance measures identified comparison counties when features were standardized across methods. Overall, the evidence suggested that the selection of counties based on a single distance measure is sensitive to the choice of standardization and the choice of distance measure.
However, the features that differed between Kent County and similar counties were concerning. Compared with the best-matched county by the K-means analysis, Kent County had a greater percentage of Latina women and a greater percentage of people with <high school education. We were also cautious about interpreting differences in county maternal and infant health programming and infrastructure that were not reflected in the data sources. For example, one of the selected counties includes the city of Flint, which received an influx of maternal and child health resources during the study period because of the city’s water crisis. Ultimately, we opted against using the method described here and instead selected individual women from across the state (outside Kent County), using propensity score-weighted analyses, matched exactly on race, and using census tract–level and block group–level data, which are more granular than county-level data. 23 This approach was an option for us given our intervention design and available data; however, it would not be appropriate for other evaluation designs involving primary data collection and/or more limited resources.
Our results are consistent with the descriptive rankings of Wallace et al, 8 who used K-means clustering and national data to create clusters of counties with similar sociodemographic features. They noted considerable within-state heterogeneity, such that most states (including Michigan) contained counties from many different clusters. Of note, the 6 comparison counties that we identified were in the same cluster as Kent County in Wallace’s schema as well. However, they also indicated that the counties that were sociodemographically heterogeneous would benefit from more granular subcounty data, which we found applicable to our diverse intervention county.
Our findings are also congruent with recent discussion of how machine learning algorithms can reproduce biases against racial/ethnic minority groups. 24 -26 Although previous literature has focused on racial/ethnic disparities in predictive algorithms, our analysis demonstrates how a racial/ethnic subgroup can be obscured by a clustering method. Thus, the same cautions that should be extended to ensure equity in predictive modeling should be applied to cluster analyses as well.
Limitations
General limitations of the K-means method to select counterfactual communities include the following. First, as in any selection method, the results are only as good as the chosen features. In our example, we selected features through discussion with the research team and community partners with substantive knowledge of factors influencing perinatal health in the community. We captured data on a wide range of complex indicators, including health care determinants and social determinants of maternal and infant health outcomes, triangulated across 4 data sources. That said, our selection was likely not the optimal set, and it is probable that no single optimal set exists. A more data-driven approach than the one we took could be applied to selecting features. 8 Relatedly, given our interest in the Latinx population, adding more measures specific to this population would weight their importance more heavily than they were weighted in our study. However, this approach would not address the problem that the counties whose other features are most similar to Kent County have much smaller Latinx populations and would have led to no better solution.
Second, no matter how practically important a certain measure might be (in our case, important to the evaluation team and community partners), a variable could be unrelated to the classification of counties, if a natural grouping exists. Inclusion of such variables, typically called “masking variables,” 27 can substantially affect the accuracy of the solution. Two broad approaches for mitigating the effect of masking variables are variable weighting and variable selection. 28 However, the literature is vast, and no consensus exists when the true cluster is unknown in observational data. 29,30 Collaborations between practitioners and methodologists become especially important in complex real-world intervention evaluation.
Third, our method involved the assumption that if a counterfactual community is comparable at baseline, it will remain so throughout the study. This assumption is shared by any method for identifying a comparison at baseline, 3 but it may not hold because of differential changes in the communities, unrelated to the intervention, that can confound the effect of the intervention. The assumption was particularly salient in our study given the Flint water crisis and the resulting resource allocation. Careful selection of an analytic method can help to mitigate biases introduced by extrinsic temporal trends. 31 Moreover, in our data sources, not all features were measured during the optimal baseline year, necessitating the use of proxy years. However, the intervention evaluated was complex and involved many changes during the study period, and even the postbaseline proxy years preceded the years in which changes were implemented.
Finally, the use of K-means clustering in general comes with its own set of assumptions and limitations. 21 One issue of the K-means clustering method is that the resulting assignments depend on the random starting point. The K-means algorithm gives local minima and does not guarantee to give the global minimum, so the starting points should be varied to examine the end partitioning. Another issue of the K-means algorithm is that a variable with high variability can dominate the cluster analysis. This problem was evident in our data as the features “number of Medicaid-eligible females” and “number of Medicaid-eligible people aged <21 years” drove the solutions for our unstandardized analyses. A common solution is to standardize variables, but standardizing could also hide the true groupings in the data. Steinley 29 pointed out several diverging viewpoints on standardization, including critiques of z score standardization, that justify our use of different standardization methods. On the other hand, in our study, because the true cluster structure was unknown, not standardizing features may have given undue weights to variables with high variability, with the potential consequence of selecting a comparison county similar in these statistically dominating features and missing other possibly more important confounding variables. For example, when ranking the features by the standardized difference between Kent and Macomb counties, we found that the feature “number of Medicaid-eligible people aged <21 years” had the third smallest standardized difference, whereas the same feature ranked fifth and ninth in the other counties. If this variable was in fact not an important factor (which cannot be ascertained unequivocally by researchers), then perhaps another county should have been selected. This decision is a case-by-case decision and depends on the type of data and the nature of the groups. The number of clusters K is the tuning parameter that can be chosen by cross-validation if the number of independent and identically distributed observations is large, which was not possible in our example.
Given these limitations, one might question whether K-means clustering is an improvement over more subjective heuristic selection methods. We argue that it is an improvement. Some limitations of K-means analysis are shared with other selection methods (eg, selection is only as good as the attributes selected for; external temporal trends may confound the effect). In addition, any county-level selection method could mask subpopulations of interest when counties are heterogeneous. However, compared with a subjective heuristic method, the K-means clustering method allows selection to be based on a greater number of features, more efficiently and with greater transparency. That the choice of a comparison has ramifications for the apparent effectiveness of an intervention is certain; K-means clustering is a data-driven option for selecting an appropriate comparison.
Practice Implications
K-means clustering provides a rigorous, data-driven approach to selecting 1 or more counterfactual groups for a community-level intervention, representing an improvement over subjective selection of comparison communities. Although it provided a consistent comparison county solution, it did not result in an optimal comparison for the intervention county in our study. The method could be more useful at other geographic levels or for other counties in or outside Michigan. Nonrandomly assigned community interventions are common in population health; our study can stimulate conversation about how best to select appropriate comparisons. In particular, our study demonstrates that selection of counterfactual communities should be objective, transparent, and examined critically through a health equity lens.
Acknowledgments
The authors thank the Michigan Department of Health and Human Services for providing access to its Health Services Data Warehouse and consultation from the Maternal and Infant Health Division and the Division for Vital Records and Health Statistics.
Footnotes
Declaration of Conflicting Interests: The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project was supported by the Agency for Healthcare Research and Quality (grant R18-HS020208) and the Spectrum Health–Michigan State University Alliance Corporation. The content is solely the responsibility of the authors and does not necessarily represent the official views of either funding organization.
ORCID iD
Kelly L. Strutz, PhD, MPH https://orcid.org/0000-0002-1502-6161
References
- 1. Oakes JM. The (mis)estimation of neighborhood effects: causal inference for a practicable social epidemiology. Soc Sci Med. 2004;58(10):1929-1952. 10.1016/j.socscimed.2003.08.004 [DOI] [PubMed] [Google Scholar]
- 2. van der Laan MJ., Petersen M., Zheng W. Estimating the effect of a community-based intervention with two communities. J Causal Inference. 2013;1(1):83-106. 10.1515/jci-2012-0011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Farrell AD., Henry D., Bradshaw C., Reischl T. Designs for evaluating the community-level impact of comprehensive prevention programs: examples from the CDC Centers of Excellence in Youth Violence Prevention. J Prim Prev. 2016;37(2):165-188. 10.1007/s10935-016-0425-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Sanson-Fisher RW., Bonevski B., Green LW., D’Este C. Limitations of the randomized controlled trial in evaluating population-based health interventions. Am J Prev Med. 2007;33(2):155-161. 10.1016/j.amepre.2007.04.007 [DOI] [PubMed] [Google Scholar]
- 5. Cook TD., Shadish WR., Wong VC. Three conditions under which experiments and observational studies produce comparable causal estimates: new findings from within-study comparisons. J Pol Anal Manage. 2008;27(4):724-750. 10.1002/pam.20375 [DOI] [Google Scholar]
- 6. Bellinger C., Mohomed Jabbar MS., Zaïane O., Osornio-Vargas A. A systematic review of data mining and machine learning for air pollution epidemiology. BMC Public Health. 2017;17(1): 10.1186/s12889-017-4914-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Yan S., Kwan YH., Tan CS., Thumboo J., Low LL. A systematic review of the clinical application of data-driven population segmentation analysis. BMC Med Res Methodol. 2018;18(1): 10.1186/s12874-018-0584-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Wallace M., Sharfstein JM., Kaminsky J., Lessler J. Comparison of US county-level public health performance rankings with county cluster and national rankings. JAMA Netw Open. 2019;2(1): 10.1001/jamanetworkopen.2018.6816 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Centers for Disease Control and Prevention, US Department of Health and Human Services . Peer county methodology used by the Community Health Status Indicators 2015 web application. Accessed April 30, 2020. https://www.countyhealthrankings.org/sites/default/files/media/document/resources/CHSIpeerMethodology.pdf
- 10. US Census Bureau . American Community Survey (ACS). Accessed April 30, 2020. https://www.census.gov/programs-surveys/acs
- 11. Health Resources and Services Administration, US Department of Health and Human Services . Area Health Resources Files. Accessed April 30, 2020. https://data.hrsa.gov/topics/health-workforce/ahrf
- 12. University of Wisconsin Population Health Institute . County Health Rankings & Roadmaps. Accessed April 30, 2020. http://www.countyhealthrankings.org
- 13. Rettenmaier AJ., Wang Z. What determines health: a causal analysis using county level data. Eur J Health Econ. 2013;14(5):821-834. 10.1007/s10198-012-0429-0 [DOI] [PubMed] [Google Scholar]
- 14. Roman L., Raffo JE., Zhu Q., Meghea CI. A statewide Medicaid enhanced prenatal care program: impact on birth outcomes. JAMA Pediatr. 2014;168(3):220-227. 10.1001/jamapediatrics.2013.4347 [DOI] [PubMed] [Google Scholar]
- 15. Meghea CI., You Z., Raffo J., Leach RE., Roman LA. Statewide Medicaid enhanced prenatal care programs and infant mortality. Pediatrics. 2015;136(2):334-342. 10.1542/peds.2015-0479 [DOI] [PubMed] [Google Scholar]
- 16. Meghea CI., Raffo JE., VanderMeulen P., Roman LA. Moving toward evidence-based federal Healthy Start program evaluations: accounting for bias in birth outcomes studies. Am J Public Health. 2014;104(suppl 1):S25-S27. 10.2105/AJPH.2013.301276 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Krieger N. A glossary for social epidemiology. J Epidemiol Community Health. 2001;55(10):693-700. 10.1136/jech.55.10.693 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. McLachlan GJ. Cluster analysis and related techniques in medical research. Stat Methods Med Res. 1992;1(1):27-48. 10.1177/096228029200100103 [DOI] [PubMed] [Google Scholar]
- 19. Makles A. Stata tip 110: how to get the optimal K-means cluster solution. Stata J. 2012;12(2):347-351. 10.1177/1536867X1201200213 [DOI] [Google Scholar]
- 20. Tibshirani R., Walther G., Hastie T. Estimating the number of clusters in a data set via the gap statistic. J Royal Stat Soc Stat Methodol. 2001;63(2):411-423. 10.1111/1467-9868.00293 [DOI] [Google Scholar]
- 21. Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer; 2009. [Google Scholar]
- 22. Schaffer CM., Green PE. An empirical comparison of variable standardization methods in cluster analysis. Multivariate Behav Res. 1996;31(2):149-167. 10.1207/s15327906mbr3102_1 [DOI] [PubMed] [Google Scholar]
- 23. Roman LA., Raffo JE., Strutz KL. et al. Impact of a population-based systems approach on evidence-based care for Medicaid-insured pregnant and postpartum women. Preprint. Posted March 26, 2021. MedRxiv. 10.1101/2021.03.23.21253829 [DOI] [Google Scholar]
- 24. Obermeyer Z., Powers B., Vogeli C., Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. 10.1126/science.aax2342 [DOI] [PubMed] [Google Scholar]
- 25. Chouldechova A. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data. 2017;5(2):153-163. 10.1089/big.2016.0047 [DOI] [PubMed] [Google Scholar]
- 26. Chouldechova A., G’Sell M. Fairer and more accurate, but for whom? 2017. Accessed April 30, 2020. https://arxiv.org/abs/1707.00046
- 27. Fowlkes EB., Gnanadesikan R., Kettenring JR. Variable selection in clustering. J Classif. 1988;5(2):205-228. 10.1007/BF01897164 [DOI] [Google Scholar]
- 28. Brusco MJ., Cradit JD. A variable-selection heuristic for K-means clustering. Psychometrika. 2001;66(2):249-270. 10.1007/BF02294838 [DOI] [Google Scholar]
- 29. Steinley D. K-means clustering: a half-century synthesis. Br J Math Stat Psychol. 2006;59(pt 1):1-34. 10.1348/000711005X48266 [DOI] [PubMed] [Google Scholar]
- 30. Storlie CB., Myers SM., Katusic SK. et al. Clustering and variable selection in the presence of mixed variable types and missing data. Stat Med. 2018;37:2884-2899. 10.1002/sim.7697 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Sanson-Fisher RW., D’Este CA., Carey ML., Noble N., Paul CL. Evaluation of systems-oriented public health interventions: alternative research designs. Annu Rev Public Health. 2014;35:9-27. 10.1146/annurev-publhealth-032013-182445 [DOI] [PubMed] [Google Scholar]


