Abstract
Importance:
Better patient management can reduce emergency department (ED) use. Performance measures should reward plans for reducing utilization by predictably high-use patients, rather than rewarding plans that shun them.
Objective:
To develop a quality measure for ED use for people diagnosed with serious mental illness (SMI) or substance use disorder (SUD), accounting for both medical and social determinants of health (SDH) risks.
Design:
Regression modeling to predict ED use rates using diagnosis-based and SDH-augmented models, to compare accuracy overall and for vulnerable populations.
Setting:
MassHealth, Massachusetts’ Medicaid and Children’s Health Insurance Program
Participants:
MassHealth members ages 18–64, continuously enrolled for calendar year 2016, with a diagnosis of SMI or SUD.
Exposures:
Diagnosis-based model predictors are diagnoses from medical encounters, age, and sex. Additional SDH predictors describe housing problems, behavioral health issues, disability, and neighborhood-level stress.
Main Outcome and Measures:
We predicted ED use rates: 1) using age/sex and distinguishing between single or dual diagnoses; 2) adding summarized medical risk (DxCG); and 3) further adding social risk (SDH).
Results:
Among 144,981 study subjects, 57% were women, 25% dually diagnosed, 67% White/Non-Hispanic, 18% unstably housed, and 37% disabled. Utilization was higher by 77% for those dually diagnosed, 50% for members with housing problems, and 18% for members living in the highest-stress neighborhoods. SDH modeling predicted best for these high-use populations and was most accurate for plans with complex patients.
Conclusions:
To set appropriate benchmarks for comparing health plans, quality measures for ED visits should be adjusted for both medical and social risks.
Introduction
Health care quality measures are used to assess (and reward/penalize) plan and provider performance. However, in addition to care processes, patient characteristics contribute to outcomes. To incentivize better care for all, quality measures should be benchmarked to the expected outcomes of populations served. 1 We illustrate these ideas for measuring emergency department (ED) visits in a behavioral health population: individuals with serious mental illness (SMI) or substance use disorders (SUD).
This work is part of a project by MassHealth, Massachusetts’ Medicaid program, to measure quality among recently launched accountable care organizations (ACOs) participating in its Center for Medicare and Medicaid Services (CMS) sponsored Delivery System Reform Incentive Payment (DSRIP) program. Since ED visits are expensive, disrupt continuity of care, and may be potentially avoidable with improved patient management, lower ED use is viewed as a quality signal. 2,3 This MassHealth ED measure is designed to incentivize ACOs to reduce ED use for members ages 18–64 with behavioral health disorders. Although much ED use is appropriate, better patient management should be able to avert ED visits that occur due to poor access to or quality of routine care; thus, quality measures are commonly used in “pay-for-performance” programs to incentivize systems to reduce ED utilization.
In the US, ED use is associated with chronic medical conditions and multimorbidity, low income, work limitations, public insurance or lack of insurance, and poor mental health. 3,4 Without risk adjustment, providers caring for both more and less complex patients are held to the same targets; risk adjustment holds providers accountable for the patients they actually serve. Patient heterogeneity is sometimes addressed through stratification—here, examining outcomes separately for those with dual diagnoses (SMI+SUD) or either diagnosis alone. However, member characteristics (including demographics, medical complexity, and social risks, such as lack of stable housing) lead to differences in ED use within strata.
Methods
Population Studied
This measure targets adults ages 18–64 with a diagnosis of SMI and/or SUD. We used 2015 and 2016 data to identify these diagnoses, but only 2016 data (on MassHealth members eligible for managed care and enrolled with at most one 45-day gap in coverage in 2016) for model building.
In preparing for MassHealth’s 2018 ACO launch, we used the most recent available data to explore risk-adjustment outcomes on simulated (future) health plans by attributing 2016 members to ACOs using the same algorithm used by MassHealth in 2018—i.e., the ACO with which a member’s primary care doctor later affiliated.
Outcomes
We identified ED visits in either claims or encounter data on distinct service start dates using lists of procedure, revenue, and place of service codes that can be used to identify an emergency department visit in claims/encounter data. These were operationalized by the Emergency Department Value Set or the Emergency Department Procedure Code Value Set with the Emergency Department Place of Service Value Sets 4 owned by the National Committee for Quality Assurance (NCQA) and included in the measure with the permission of NCQA. Following common practice and MassHealth’s protocol, we only counted ED stays not leading to a hospital admission and treated multiple same-day claims as a single visit.
Model Variables
This work extends our experience with the MassHealth social determinants of health (SDH) payment model 5 adopted in October 2016 by Massachusetts and currently used to allocate payments to MassHealth ACOs and managed care organizations (MCO).
The DxCG v4.2 concurrent Medicaid relative risk score (RRS) 6 relies on age, sex, and diagnoses recorded from clinician encounters (e.g., ambulatory care visits, hospitalizations, etc.); it strongly predicts health care costs and use. 7,8 Software-produced scores summarizing the expected burden of medical morbidities on health care spending are multiplied by a constant to set mean = 1 for MassHealth’s managed-care-eligible members. MassHealth uses DxCG models in program management;9 these models employ a comprehensive 394-condition-category classification, but otherwise closely resemble Department of Health and Human Services’ Hierarchical Condition Category (HHS-HCC) models. 6,10
We identified additional variables from Medicaid claims and enrollment files: 20 age/sex categories; disability (disability-based entitlement, clients qualifying for specialized services for mental health, or intellectual disabilities); and housing problems (unstable housing, meaning ≥ 3 addresses within the year or ICD-code-identified homelessness).
The final SDH model component is the neighborhood stress score (NSS), calculated from seven census-block-group-level variables indicating economic stress. 11 We identified census block groups by geocoding members’ most recent recorded addresses. 12 We set NSS+ = 0 for the 0.2% of members who could not be geocoded and for members with NSS values less than 0. NSS values less than zero were recoded for policy reasons – to include only SDH variables that were positively associated with ED visit use.
Modeling Approach
Similar to modeling for the federal Marketplace,10 we wanted predictors to be statistically significant; reliably measured for nearly all members; and satisfy feasibility, fairness, and transparency considerations. We predicted 2016 use from 2016 patient diagnoses and other characteristics, and include precise model specifications in the supplement.
Analyses used Stata15.1. We modeled ED use rates (number of visits per time enrolled) with Poisson regression (generalized linear model using a Poisson distribution and log link, including a term for the log of time enrolled as the offset) and reported outcomes as visits per 100-person-years. We compared three models:
BASIC: Uses age, sex, and markers for SMI alone, SUD alone, and dual diagnosis.
DxCG: Adds DxCG relative risk score (RRS) summarizing medical complexity.
SDH: Further adds neighborhood stress score and individual-level factors relating to disability status and housing problems.
For each model, we used the percent variation in outcome explained (R2) and observed-to-expected ratio (O:E) to explore model fit for subgroups. We calculated pseudo-R2 values using 10-fold cross-validation. O:E ratios were calculated by dividing actual (observed) ED use by model-predicted (expected) use – when O:E exceeds 1.0, the model under-predicts actual use; when it is less than 1.0, the model over-predicts. Ratios near 1.0 represent good fit. To approximate 95% confidence intervals around O:E ratios we divided the upper and lower 95% bounds of actual, observed ED visit counts by the model-predicted, expected values which we considered fixed without variation. Because the sample size is large and standard errors are small, 95% confidence intervals were very narrow and are only shown in supplemental tables.
Results
We studied 144,981 CY2016 MassHealth members ages 18–64 meeting criteria for SMI and/or SUD (Table 1). About 1/3 were MassHealth-eligible due to disability; 18% had housing problems. Dual diagnoses (SMI+SUD) were present for 25%, 56% had SMI only, and 18% SUD only. Overall use was 177 ED visits per 100-person-years; 44% had no ED visits in 2016; 10% had 5 or more (see online Supplemental Table 1 for full distribution). As illustrated in Figure 1, much of this variation is predictable: groups identified through disability, housing problems, and stressful neighborhoods had higher than average rates of ED use.
Table 1:
Emergency Department Visits in 2016 by Patient Characteristics: MassHealth Members with Serious Mental Illness (SMI) or Substance Use Disorder (SUD)
| N | % | ED Visit Rate* | SD | |
|---|---|---|---|---|
| Total | 144,981 | 100% | 177.1 | 133.1 |
| Female | 82,760 | 57.1% | 171.1 | 130.8 |
| SMI only | 81,817 | 56.4% | 127.8 | 113.1 |
| SUD only | 26,792 | 18.5% | 141.7 | 119.0 |
| SMI+SUD | 36,372 | 25.1% | 314.2 | 177.2 |
| White/Non-Hispanic | 97,471 | 67.2% | 180.5 | 134.4 |
| Black/Non-Hispanic | 6,343 | 4.4% | 208.3 | 144.3 |
| Hispanic | 4,055 | 2.8% | 160.5 | 126.7 |
| Other Non-Hispanic | 1,422 | 1.0% | 91.6 | 95.7 |
| Missing/unknown | 35,690 | 24.6% | 167.6 | 129.5 |
| Housing problems | 26,438 | 18.2% | 266.1 | 163.1 |
| Unstably housed (3+ addresses) | 24,927 | 17.2% | 226.0 | 150.3 |
| Homelessness by ICD Code | 3,774 | 2.6% | 877.2 | 296.2 |
| Client of DMH | 4,572 | 3.2% | 301.9 | 173.8 |
| Client of DDS (not DMH) | 3,085 | 2.1% | 179.8 | 134.1 |
| All other disabled | 46,629 | 32.2% | 204.2 | 142.9 |
| Not disabled | 90,695 | 62.6% | 156.7 | 125.2 |
| Highest-stress (NSS) quintile | 31,711 | 21.9% | 202.8 | 142.4 |
Notes: Study population is individuals age 18 or older (average age = 40.7±12.4 years) with serious mental illness or substance use disorder enrolled in MassHealth in 2016, with at most one gap of no more than 45 days (144,981 members were observed for 144,291 person-years). Disability status indicates MassHealth eligibility as client of the Department of Mental Health (DMH), Department of Developmental Services (DDS) if not DMH, or entitled to Medicaid due to other disability (“All other disabled”) if not either DMH or DDS. RRS is the DxCG v4.2 concurrent model 312 risk score, normalized to have mean = 1 in the MassHealth population (that is, anyone who is managed care eligible and enrolled for at least 1 day in 2016); the mean DxCG score in the analyzed subgroup here is 2.7±3.2. Housing problems = 3 or more addresses or coded as homeless during 2016. NSS = Neighborhood Stress Score, normalized to the MassHealth population.
ED visit rate is per 100-person-years.
Figure 1:
Rate of Emergency Department (ED) Use in 2016 for Select Subgroups: MassHealth Members with Serious Mental Illness (SMI) or Substance Use Disorder (SUD)
Box-plots for subsets of the total sample (N = 144,249 person-years). Boxes mark the median and interquartile range (IQR); whiskers show the most extreme values within 1.5 IQR of the box; values outside the whiskers are jittered to show how many members had that many ED visits. *All values were used in calculations, but Y-axis is truncated at 15 for readability; from left to right 6.5%, 8.2%, 3.0%, 1.8%, 0.5%, and 0% of subset values were greater than 15. Homeless upper whisker cap is 24; the most ED visits for any individual is 178; distribution details are in supplemental Table S1. Smallest subset (Homeless) has n = 3,746; largest (SMI without SUD) has n = 81,434. Box width is proportional to the square root of n. Group means are marked with a circle and labeled with their value; the horizontal line marks the full study population mean (1.77); Lowest and Highest predicted deciles are determined using the SDH model.
R2 values for the BASIC, DxCG, and SDH models were 6.4%, 23.2%, and 23.8%, respectively, while the 10-fold cross-validation R2s were 5.1%, 20.5%, and 21.7%. We also compared O:E ratios among models and subgroups (Table 2). Non-adjusted O:E ratios, in the “Observed” column, were calculated by dividing actual ED visit rates by the population mean of 177 per 100-person-years.
Table 2.
Model Performance in 2016 by Patient Characteristics: MassHealth Members with Serious Mental Illness (SMI) or Substance Use Disorder (SUD)
| No Adjustment |
Risk Adjustment Model |
|||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Observed |
BASIC |
DxCG |
SDH |
|||||||
| N | % | Raw Rate | O:E Ratio | Expected Rate | O:E Ratio | Expected Rate | O:E Ratio | Expected Rate | O:E Ratio | |
| Total | 144,981 | 100% | 177.1 | 1.00 | 177.1 | 1.00 | 177.1 | 1.00 | 177.1 | 1.00 |
| SMI only | 81,817 | 56% | 127.8 | 0.72 | 127.8 | 1.00 | 127.8 | 1.00 | 127.8 | 1.00 |
| SUD only | 26,792 | 18% | 141.7 | 0.80 | 141.7 | 1.00 | 141.7 | 1.00 | 141.7 | 1.00 |
| SMI+SUD | 36,372 | 25% | 314.2 | 1.77 | 314.2 | 1.00 | 314.2 | 1.00 | 314.2 | 1.00 |
| Client of DMH | 4,572 | 3% | 301.9 | 1.70 | 188.0 | 1.61 | 244.8 | 1.23 | 301.9 | 1.00 |
| Client of DDS (not DMH) | 3,085 | 2% | 179.8 | 1.02 | 146.5 | 1.23 | 196.2 | 0.92 | 179.8 | 1.00 |
| All other disabled | 46,629 | 32% | 204.2 | 1.15 | 176.1 | 1.16 | 204.8 | 1.00 | 204.2 | 1.00 |
| Not disabled | 90,695 | 63% | 156.7 | 0.88 | 178.1 | 0.88 | 158.8 | 0.99 | 156.7 | 1.00 |
| Highest-stress (NSS) quintile | 31,711 | 22% | 202.8 | 1.14 | 173.6 | 1.17 | 179.3 | 1.13 | 204.9 | 0.99 |
| Housing problems | 26,438 | 18% | 266.1 | 1.50 | 198.7 | 1.34 | 225.4 | 1.18 | 266.1 | 1.00 |
| Unstably housed (3+ addresses) | 24,927 | 17% | 226.0 | 1.28 | 194.7 | 1.16 | 212.0 | 1.07 | 250.1 | 0.90 |
| Homelessness by ICD Code | 3,774 | 3% | 877.2 | 4.95 | 258.7 | 3.39 | 422.8 | 2.07 | 441.2 | 1.99 |
| White/Non-Hispanic | 97,471 | 67% | 180.5 | 1.02 | 183.0 | 0.99 | 184.0 | 0.98 | 181.4 | 1.00 |
| Black/Non-Hispanic | 6,343 | 4% | 208.3 | 1.18 | 169.8 | 1.23 | 184.6 | 1.13 | 193.7 | 1.07 |
| Hispanic | 4,055 | 3% | 160.5 | 0.91 | 157.7 | 1.02 | 154.3 | 1.04 | 162.1 | 0.99 |
| Other Non-Hispanic | 1,422 | 1% | 91.6 | 0.52 | 148.7 | 0.62 | 121.4 | 0.75 | 121.0 | 0.76 |
| Missing/unknown | 35,690 | 25% | 167.6 | 0.95 | 165.7 | 1.01 | 161.8 | 1.04 | 166.5 | 1.01 |
| Top 10% of SDH Model predicted | 14,499 | 10% | 591.1 | 3.34 | 263.5 | 2.24 | 567.5 | 1.04 | 584.3 | 1.01 |
| Bottom 10% of SDH Model predicted | 14,498 | 10% | 28.5 | 0.16 | 135.0 | 0.21 | 30.3 | 0.94 | 29.6 | 0.96 |
Notes: Study population is individuals age 18 or older (average age = 40.7±12.4 years) with serious mental illness or substance use disorder enrolled in MassHealth in 2016, with at most one gap of no more than 45 days (144,981 members were observed for 144,291 person-years). Disability status indicates MassHealth eligibility as client of the Department of Mental Health (DMH), Department of Developmental Services (DDS) if not DMH, or entitled to Medicaid due to other disability (“All other disabled”) if not either DMH or DDS. RRS is the DxCG v4.2 concurrent model 312 risk score, normalized to have mean = 1 in the MassHealth population (that is, anyone who is managed care eligible and enrolled for at least 1 day in 2016); the mean DxCG score in the analyzed subgroup here is 2.7±3.2. Housing problems = 3 or more addresses or coded as homeless during 2016. NSS = Neighborhood Stress Score, normalized to the MassHealth population. Observed O:E Ratio = raw rate divided by study population average (177.1±133.1); All models use indicators age/sex categories and for both SMI and SUD, the DxCG model adds the DxCG RRS score to the BASIC model, and the SDH model then adds SDH variables to the RRS model (See online supplemental Tables S2–S5 for full models). O:E ratios that deviate from 1 by ± 10% or more are in bold.
There are some marked differences in our work compared to what a non-risk-adjusted measure “expects.” For example, the observed O:E ratio for dual diagnosis members is 1.77, meaning that this group uses 77% more ED visits than average (314 versus 177 ED visits per 100-person-years). Because all three models explicitly recognize the presence of mental health and substance use diagnoses, each predicts perfectly for groups defined by dual and single SMI+SUD diagnoses; however, the BASIC model underpredicts for more complex patients. The DxCG model fits almost all subgroups better (many O:E ratios are now close to 1.0) but fails to capture the higher risk for member groups with disabilities, housing problems, or living in high-stress neighborhoods. The SDH model recognizes these additional factors and predicts their excess risk much better. We also modeled an interaction between housing problems and DxCG scores because we did not expect increases in ED visits for healthy people simply because they had housing problems (Supplemental Table 2). Furthermore, illness burden was greater in people with coded homelessness than those with unstable housing only.
Some models are more useful than others for identifying high-risk members. We explore this by using model-specified “expected” values (predictions) to sort observations into deciles of risk and then examining actual ED use for members in the lowest and highest deciles. More discriminating models lead to greater differences in actual ED visits between these deciles. Table 2 reveals vastly reduced mis-estimates of the average utilization rate for these highest and lowest risk groups with the DxCG model, and near-perfect prediction with the SDH model – underestimating by just 1% for the top decile and overestimating by 4% for those at lowest risk.
To explore the potential impact of risk adjustment on populations served by participating ACOs, we applied our models to members whose primary care relationships would later result in default attribution to ACOs. Table 3 shows summary patient characteristics and outcomes for six such “attributed” ACOs whose observed ED visit rates range from 113 to 217 (that is, from 0.64 to 1.22 times the population average rate of 177). For some plans, risk adjustment has little impact (here, ACO-B or ACO-E). However, ACO-A has above average risk; the SDH model interprets ACO-A’s observed ED visit rate of 217 per 100-person-years (22% higher than average), as only 11% greater than its SDH-model expected rate of 195. In contrast, ACO-C – with lower average risk – has an observed ED visit rate of just 3% higher than average while its SDH-adjusted rate is 14% higher than expected. Notice SDH risks are high for ACO-D, with 21% of members unstably housed and 63% residing in the most stressed neighborhood quintile. The DxCG model sees ACO-D’s performance as 4% worse than expected based on the characteristics of its members, while the SDH model sees it as 5% better.
Table 3.
ED Visit Rate Measures by Attributed ACO and Model: MassHealth Members with Serious Mental Illness (SMI) or Substance Use Disorder (SUD)
| Member Characteristics | No Adjustment | Risk Adjustment Model | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Both SMI and SUD | DxCG RRS | Housing problems | Highest-stress NSS quintile | Observed | BASIC* | DxCG* | SDH* | |||||||
| ACO | Size | % | Mean | SD | % | % | Raw Rate | O:E Ratio | Expected Rate | O:E Ratio | Expected Rate | O:E Ratio | Expected Rate | O:E Ratio |
| Total | 144,981 | 25.1% | 2.67 | 3.21 | 18.2% | 21.9% | 177.1 | 1.00 | 177.1 | 1.00 | 177.1 | 1.00 | 177.1 | 1.00 |
| A | Large | 30.1% | 2.89 | 3.55 | 22.1% | 30.3% | 217 | 1.22 | 186 | 1.16 | 190 | 1.14 | 195 | 1.11 |
| B | Large | 27.1% | 2.58 | 3.04 | 20.5% | 31.6% | 190 | 1.07 | 181 | 1.05 | 176 | 1.08 | 181 | 1.05 |
| C | Small | 22.9% | 2.33 | 2.85 | 17.2% | 19.3% | 182 | 1.03 | 175 | 1.04 | 161 | 1.13 | 160 | 1.14 |
| D | Medium | 21.3% | 2.70 | 3.17 | 20.7% | 63.1% | 176 | 1.00 | 168 | 1.05 | 169 | 1.04 | 185 | 0.95 |
| E | Medium | 18.5% | 2.69 | 3.01 | 15.9% | 9.1% | 145 | 0.82 | 167 | 0.87 | 176 | 0.82 | 170 | 0.85 |
| F | Small | 21.3% | 2.59 | 2.89 | 13.0% | 6.5% | 113 | 0.64 | 171 | 0.66 | 173 | 0.66 | 164 | 0.69 |
Notes: Study population contains individuals age 18 or older (average age = 40.7±12.4 years) enrolled in MassHealth in 2016, with at most one gap of no more than 45 days (144,981 members were observed for 144,291 person-years). RRS is the DxCG v4.2 concurrent model 312 risk score, normalized to have mean = 1 in the MassHealth population (that is, anyone who is managed care eligible and enrolled for at least 1 day in 2016); the mean DxCG score in the analyzed subgroup here is 2.7±3.2. Housing problems = 3 or more addresses or ICD-10 coded as homeless during 2016. NSS = Neighborhood Stress Score, a measure of socioeconomic distress, has mean = 0 and SD = 1 in the full MassHealth population. All models use indicators age/sex categories and for both SMI and SUD, the DxCG model adds the DxCG RRS score to the BASIC model, and the SDH model then adds SDH variables to the RRS model. Observed O:E Ratio = raw rate divided by study population average (177.1±133.1). O:E ratios that deviate from 1 by ± 10% or more are in bold. ACOs are listed in descending order of observed (raw) ED visit rates. Small ACOs have between 2,500 and 5,000 members, medium ones have 5,000 to 10,000 and large ones have over 10.000 members
Discussion
We studied a population of MassHealth adult members with SMI and/or SUD who sought care at the ED nearly twice per year (on average,177 ED visits per 100-person-years) in 2016. However, ED visit rates within our study population varied widely: 44% of members had no ED visits; 10% had 5 or more. Further, much of this variation is predictable: the rate for homeless members was roughly 5-fold greater than average while the rate was almost 2-fold higher in dual-diagnosis members and more than 25% lower in members with SMI but not SUD.
An unadjusted quality measure ignores differences in inherent risk for the populations that plans manage, effectively interpreting all deviations from average as quality signals. Although some worry that risk adjustment gives plans a “free pass” for poor performance with vulnerable patients, 13 we adjusted for patient-level factors that objectively drive ED use. Risk-adjusted benchmarks reward a plan for achieving better outcomes, taking into account the patients it has compared with other plans with similar patients – conceptually a fairer standard. We posit that it is unadjusted measures that “set the bar” in the wrong place by failing to hold plans with healthier members accountable to the better-than-average outcomes we can reasonably expect.
We assessed three risk models of increasing complexity: 1) a BASIC model that mainly accounts for differences due to SMI alone versus SUD alone versus dual diagnosis; 2) adding the DxCG medical risk score; and 3) further adding several non-medical variables including available measures of (and proxies for) social risk. The BASIC model explains only 5% of individual-level variance in visit rates, increasing 4-fold for the DxCG model and SDH models. While the SDH model improves the R2 only modestly over the DxCG model (from 20.5% to 21.7%), it makes meaningfully better predictions in subgroups of interest. This final model is powerful: members in the highest decile of SDH-predicted risk have 3.3 times as many ED visits as average, and those in the lowest decile only 1/6th of average.
The variable “housing problems” (18.2% of members) deserves discussion. It includes both the 17.2% of members who are unstably housed (three or more addresses in a year) and those (2.6%) with “homelessness” coded at least once during the year. Their ED use is very different: 226 versus 877 visits per 100-person-years. In 2016 there was no incentive to code homelessness, making its appearance more likely for members whose lack of housing seemed important, for example, when looking for a safe discharge venue following a hospitalization or ED visit. Given that model predictors should be measured reliably for nearly all members, we chose not to let “homelessness” be a predictor –instead merging it with unstable housing. We will examine ICD-10-CM coding in newer data for social risk factors such as homelessness and in other information sources to better capture social risk distinctions for future models.
Even if we have set the right (risk-adjusted) quality benchmark, it is unclear that taking money away from plans with “too many” ED visits will improve equity and drive value. Proper benchmarking enables us to interpret deviations from average as due to issues that could potentially be addressed. For example, in this study, one plan, ACO-C, moved from having an unremarkable O:E ratio of 1.03 to becoming a high outlier, at 1.14, after risk adjustment. What does that mean?
Most of the change is due to ACO-C’s low measured morbidity (its mean DxCG score is 13% below average). Perhaps their members are not very sick, and the plan should be managing them better. But maybe their providers are not coding medical problems as aggressively as others, so their members only look relatively healthy; this could be addressed by helping everyone to engage in legitimate upcoding. Another possibility is that the region in which this plan operates has few alternative venues to care for people with behavioral issues. If so, it makes more sense to invest in community capacity-building than to take money away from an under-resourced area.
Our “adjusters” do not capture all important determinants of ED use, and missing, or mis-specified, variables can skew findings. Race and ethnicity, in particular, were missing for 25% of the study population and therefore not considered for use as predictors. However, knowing that models can reinforce existing inequities, we examined model performance within available race/ethnicity strata. 14 Predicted values were within 10% of observed values for all groups except for the 1% classified as Other-Non-Hispanic, whose ED utilization was 25% lower than predicted. While it would be good to understand the reasons for this discrepancy, overprediction in a small group does little harm.
Models similar to ours should be easy to implement in other settings because we relied on broadly available information, e.g., member addresses used to identify unstable housing and neighborhood stress scores (after merging with U.S. Census variables), homelessness from ICD-10 diagnostic codes, and program-specific variables that should have analogous variables in other settings (e.g.,“client of the Department of Mental Health”). Ideally, EHRs will eventually capture rich, standardized, person-level SDH data, whose primary value would be in addressing those challenges to improve patient health, and for which a secondary benefit would be their availability for use in risk adjustment.
We used U.S. Census data a DxCG medical risk score because these were available to us. However, address-based measures of socioeconomic stress are widely available in many countries, as are various software algorithms for summarizing medical risk from ICD-10 diagnoses. 15 While the SDH variables that we included in our model only increased its overall explanatory power (R2) modestly over a model that already included medical morbidity, these SDH variables were crucial to appropriately accounting for excess risk in socially vulnerable subgroups.
Knowing that relationships in the data may change, MassHealth typically examines model performance in more recent data every few years, making modest adjustments as needed. The issue of how frequently models should be re-examined and/or changed is a potentially fruitful area for future research. We speculate that the predictive effect of time-dependent model variables, such as participant age and housing instability is likely to be stable over time. However, the overall configuration of models should change as availability and validity of variables, as well as important unmeasured factors, change over time. Further, the time horizon that should be considered when model building is also an area of future investigation.
Conclusion
Understanding why plans differ on an outcome is central to our ability to manage care. A quality measure for health plans that care for Medicaid behavioral health populations should use models that account for differences in ED visit use associated with differences in patients’ medical and social risks.
Supplementary Material
Footnotes
Disclosures: The authors whose names are listed above certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
References
- 1.Joynt Maddox KE, Reidhead M, Hu J, et al. Adjusting for social risk factors impacts performance and penalties in the hospital readmissions reduction program. Health Serv Res. 2019; 54: 327–336. 10.1111/1475-6773.13133 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rosenblatt RA, Wright GE, Baldwin LM, et al. The effect of the doctor-patient relationship on emergency department use among the elderly. Am J Public Health. 2000. January;90(1):97–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Dowd B, Karmarker M, Swenson T, et al. Emergency department utilization as a measure of physician performance. Am J Med Qual. 2014;29(2):135–143. doi: 10.1177/1062860613487196 [DOI] [PubMed] [Google Scholar]
- 4.http://store.ncqa.org/index.php/performance-measurement.html#qrs Accessed on 5/15/19. The ED, ED Procedure Code, and ED POS value sets contained in the SMI/SUD ED use measure are copyrighted and owned by the National Committee for Quality Assurance (NCQA), and are included in the measure with the permission of NCQA. NCQA disclaims all liability for the use and interpretation of the value sets or any statements made in this publication.
- 5.Ash AS, Mick EO, Ellis RP, Kiefe CI, Allison JJ, Clark MA. JAMA Intern Med. 2017. October 1;177(10):1424–1430. doi: 10.1001/jamainternmed.2017.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Chen J, Ellis RP, Toro KH, Ash AS. Mispricing in the Medicare Advantage Risk Adjustment Model. Inquiry. January-December 2015. 52: 0046958015583089, first published on May 1, 2015. doi: 10.1177/0046958015583089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Zhao Y, Ash AS, Ellis RP, et al. Predicting Pharmacy Costs and Other Medical Costs Using Diagnoses and Drug Claims. Medical Care. 2005;43: 34–43. [PubMed] [Google Scholar]
- 8.Ash AS, Ellis RP. Risk Adjusted Payment and Performance Assessment for Primary Care. Medical Care. 2012. August;50(8):643–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Cotiviti. DxCG Intelligence. https://www.cotiviti.com/solutions/quality-and-performance/dxcg-intelligence. Accessed 5/6/2019.
- 10.Kautter J, Pope GC, Ingber M, et al. The HHS-HCC Risk Adjustment Model for Individual and Small Group Markets Under the Affordable Care Act. Medicare & Medicaid Research Review, 2014. 4(3): E1–E11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gooding G Accessing American Community Survey Block Group Data. American Community Survey Office. U.S. Census Bureau. January 27, 2016. https://www.census.gov/content/dam/Census/programs-surveys/acs/guidance/training-presentations/2016_BlockGroups_Slides_01.pdf. Accessed 5/6/19. [Google Scholar]
- 12.MassGIS (Bureau of Geographic Information). MassGIS Data: Master Address Data - Statewide Address Points for Geocoding. https://docs.digital.mass.gov/dataset/massgis-data-master-address-data-statewide-address-points-geocoding?_ga=2.77712084.1563304303.1557160764-1798573585.1556904106. Accessed 5/6/19.
- 13.Report to Congress: Social Risk Factors and Performance under Medicare’s Value Based Purchasing Programs: a Report Required by the Improving Medicare Post-Acute Care Transformation (IMPACT) Act of 2014. Washington, DC: Department of Health and Human Services, Office of the Assistant Secretary for Planning and Evaluation, 2016. https://aspe.hhs.gov/system/files/pdf/253971/ASPESESRTCfull.pdf. Accessed 5/6/19. [Google Scholar]
- 14.Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019. October 25;366(6464):447–453. doi: 10.1126/science.aax2342. [DOI] [PubMed] [Google Scholar]
- 15.Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi JC, Saunders LD, Beck CA, Feasby TE, Ghali WA. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Medical Care. 2005. November;43(11):1130–9. doi: 10.1097/01.mlr.0000182534.19832.83. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

