Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Sep 1.
Published in final edited form as: Health Aff (Millwood). 2022 Sep;41(9):1324–1332. doi: 10.1377/hlthaff.2022.00185

Evaluating The Accuracy Of Medicare Risk Adjustment For Alzheimer’s Disease And Related Dementias

Natalia Festa 1, Mary Price 2, Max Weiss 3, Lidia M V R Moura 4, Nicole M Benson 5, Sahar Zafar 6, Deborah Blacker 7, Sharon-Lise T Normand 8, Joseph P Newhouse 9, John Hsu 10
PMCID: PMC9973227  NIHMSID: NIHMS1867128  PMID: 36067434

Abstract

In 2020 Medicare reintroduced Alzheimer’s disease and related dementias (ADRD) Hierarchical Condition Categories (HCC) to risk-adjust Medicare Advantage and accountable care organization (ACO) payments. The potential for Medicare spending increases from this policy change are not well understood because the baseline accuracy of ADRD HCC is uncertain. Using linked 2016–18 claims and electronic health record data from a large ACO, we evaluated the accuracy of claims-based ADRD HCC against a reference standard of clinician-adjudicated disease. An estimated 7.5 percent of beneficiaries had clinician-adjudicated ADRD. Among those with ADRD HCC, 34 percent did not have clinician-adjudicated disease. The false-negative and false-positive rates were 22.7 percent and 3.2 percent, respectively. Medicare spending for those with false-negative ADRD HCC exceeded that of true-positives by $14,619 per beneficiary. If, after the reintroduction of risk adjustment for ADRD, all false-negatives were coded as having ADRD, expenditure benchmarks for beneficiaries with ADRD would increase by 9 percent. Monitoring ADRD coding could become challenging in the setting of concurrent incentives to decrease false-negative rates and increase false-positive rates.


Risk-adjustment and population health management are based on diagnosis codes recorded on health care claims.1,2 As providers bear increasing financial risk,3 certain payment reforms may encourage heightened coding intensity, affecting population health management efforts and increasing total Medicare expenditures.4 The Centers for Medicare and Medicaid Services (CMS) reintroduced risk adjustment for Alzheimer’s disease and related dementias (ADRD) in 2020.5 This followed its removal in 2014, in part because of concerns regarding heightened coding intensity and the challenges of auditing ADRD claim accuracy.6 The reintroduction of this risk adjustment will affect reimbursement for a sizable and increasing share of beneficiaries enrolled in Medicare Advantage (MA) or Medicare accountable care organizations (ACOs).3,7

The prior exclusion of ADRD from risk adjustment, however, also created challenges. The cognitive and functional limitations of beneficiaries with ADRD are associated with higher Medicare expenditures.810 This led to underpayment for beneficiaries with true disease or outright avoidance of beneficiaries with ADRD by plans or provider organizations receiving risk-adjusted payments, as evidenced by above-average MA disenrollment for persons with ADRD.11 Furthermore, removing ADRD risk adjustment decreased incentives for cognitive evaluation and coding, particularly in the absence of effective disease-modifying therapies.12 This may have exacerbated ADRD underdiagnosis, which remains prevalent.13

With the reintroduction of ADRD Hierarchical Condition Categories (HCC) into risk adjustment, CMS recommences its management of the coding and auditing incentives that led to their prior removal.3,6,7 Within MA, per beneficiary per month payments will be prospectively risk-adjusted by ADRD HCC. Within Medicare ACOs, fee-for-service payments will now be reconciled against risk-adjusted expenditure benchmarks that include ADRD HCC, subject to a cap on population risk score increases.14 To attenuate incentives to inappropriately upcode, CMS reimburses the same amount for beneficiaries with complicated ADRD as for those with uncomplicated ADRD.15 There is limited information regarding the accuracy of the diagnostic codes constituting the HCC used to identify Medicare beneficiaries with ADRD.

We assessed ADRD coding accuracy before the reintroduction of ADRD risk-adjustment. We compared claims with diagnosis codes included in the ADRD HCC to a reference standard based on comprehensive medical record review by expert clinicians. Characterizing the relationship between ADRD HCC accuracy and risk adjustment in years before when the risk adjustment takes effect offers insight into the implications of evolving payment incentives.

Study Data And Methods

Data Sources And Sampling Approach

We used claims from Medicare Parts A and B together with electronic health record data from the Mass General Brigham ACO, linked at the individual level, to construct a cohort of older fee-for-service Medicare beneficiaries, whom we observed for calendar years 2016–18. The Mass General Brigham ACO serves a population of more than 100,000 beneficiaries across two academic medical centers, seven community hospitals, three specialty institutions, and twenty-one community health centers.

We required that as of January 1, 2016, beneficiaries were alive and met the following eligibility criteria: aged sixty-five years or older, community-dwelling at the time of ACO entry, enrolled in Medicare Parts A and B with a designated Medicare original reason for eligibility of age or disability, and aligned to the ACO during the entire observation period or until death. For subjects who died during the observation period, we required at least six months of continuous Medicare enrollment and alignment to the ACO. There were no dual-eligible beneficiaries in the sample.

Among the 40,690 beneficiaries meeting eligibility criteria, we generated a random sample, stratified on the preabstraction likelihood of ADRD (online appendix 1).16 We randomly sampled from each stratum to generate a sample of 1,002 subjects for detailed chart reviews, among whom 952 (95.0 percent) had sufficient electronic health record (EHR) data for abstraction. For the main analysis, we excluded thirty-seven beneficiaries for whom clinician-adjudicator classifications were of low quality (syndromic diagnosis “uncertain”), as specified below. This yielded 915 beneficiaries in the final sample. We weighted by the sampling inclusion probabilities to reconstruct the characteristics of the full selected sample (n = 37,200) and mask the sampling weights for privacy protection. Additional methodological details of our sample construction have been described in a prior publication.17

Reference Standard Definition Of ADRD

We developed a review protocol that four expert clinicians—two neurologists (Lidia Moura and Sahar Zafar) and two psychiatrists (Nicole Benson and Deborah Blacker)—used to evaluate the cognitive status of the 915 sampled beneficiaries, using all available clinical documentation.17 The diagnostic criteria informing the protocol were derived from the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition, and the standard clinical and research criteria developed by the National Institute on Aging and the Alzheimer’s Association.1820 Each reviewer underwent a three-month period of EHR abstraction training. Interrater agreement for the final protocol was reasonable (κ ≥ 0.80) for adjudication of the key measures along several domains. The four clinician adjudicators classified cognitive status using six mutually exclusive categories: “normal cognition,” “normal cognition versus mild cognitive impairment” (MCI), “MCI,” “MCI versus ADRD,” “ADRD,” and “unknown.” The reviewers differentiated ADRD from MCI by ascertaining beneficiaries’ functional status and discerning the temporal relationship of cognitive and functional decline. Diagnostic criteria for ADRD require that cognitive deficits precede functional decline.20 For cases in which the degree of cognitive impairment or the temporal relationship between cognitive and functional impairment were uncertain, beneficiaries were categorized within borderline diagnostic classifications (“A versus B”), using the diagnoses most closely aligned with their adjudicated status, as above. Among beneficiaries with ADRD, the clinician adjudicators also classified the severity of the disease. Last, they rated their confidence in each classification. Further details on the development of our reference standard, clinician-adjudicated ADRD, are available in a previous publication.17

Claims-Based ADRD Definition And Risk-Adjustment Scores

We estimated expected annual per beneficiary expenditures by multiplying the ADRD risk-adjustment weight in version 24 of the HCC model by the 2020 value for a base weight of 1.0 ($9,365.50).7,15,21 Risk-adjustment scores are functions of diagnosis and demographic factors, including age, that predict Medicare expenditures. HCC are sets of diagnostic codes that sort beneficiaries into hierarchical strata of disease severity across specified conditions. To date, CMS has defined two ADRD HCC categories: HCC-51 (“dementia with complications”) and HCC-52 (“dementia without complications”); each category requires one annual qualifying claim.15,22 Spending recorded on claims with the relevant diagnostic codes from the prior year are averaged to calculate a prospective risk weight for each HCC.22,23 Because HCC-51 and HCC-52 have identical risk-adjustment weights, we combined their constituent diagnostic codes into a single indicator variable to annually identify beneficiaries with qualifying claims during the observation period.

We refer to beneficiaries whose ADRD-related claims are concordant with their clinician-adjudicated status during the observation period as having true-positive or true-negative diagnoses and to beneficiaries whose ADRD-related claims are discordant with their clinician-adjudicated status as having false-positive or false-negative diagnoses. The classifications are subject to qualifications (see “Limitations”).

Statistical Analysis

We first counted beneficiaries assigned a qualifying claim under the ADRD-HCC during the observation period. Among those with a qualifying claim, we estimated the proportion of beneficiaries whose claims-based diagnostic codes were concordant or discordant with their clinician-adjudicated cognitive status. We evaluated reference standard characteristics, such as the diagnostic certainty with which clinician-adjudicators classified cognition, across beneficiary subgroups, as summarized in appendix 2.16

We evaluated beneficiaries’ baseline demographic (age, sex) characteristics across true- and false-positives and true- and false-negatives. We determined beneficiaries’ health status (Medicare risk-adjustment score, comorbid depression, and sensory impairment), using information at baseline and throughout the observation period. We calculated beneficiaries’ health care use across distinct settings. We described the setting in which qualifying diagnostic claims were assigned and whether the qualifying claim was facility- or physician-generated. For technical information regarding the distinction between facility- and physician-generated claims, please see appendix 3.16 We assessed the frequency with which ADRD-related diagnostic claims were assigned to beneficiaries throughout the observation period.

We next compared differences between observed annual Medicare expenditures for 2017 and 2018 and those predicted by the risk-adjustment score. We calculated observed expenditures in 2020 dollars using inpatient, outpatient, skilled nursing, home health, hospice, and durable medical equipment claims, excluding prescription drug costs, for which we lacked data. We evaluated differences in observed versus expected expenditures across subgroups.

We evaluated the potential effect of discordant ADRD claims on risk-adjustment scores. We used available claims from each year of the observation period (2016–18) to calculate annual, prospective scores for 2017–19. The clinician-adjudicated reference standard was fixed across the observation period, but we allowed the agreement of ADRD HCC with the reference standard to vary by year, depending on the presence of qualifying claims. For example, a true-positive in 2017 could be reclassified as a false-negative in 2018 in the absence of a qualifying ADRD claim.

For beneficiaries with ADRD HCC, we separately calculated observed prospective risk-adjustment scores for the true- and false-positive subgroups. We recalculated a corrected risk-adjustment score for the false-positive subgroup without risk-adjustment for ADRD. We repeated this process for beneficiaries without a qualifying ADRD claim, generating a corrected risk-adjustment score for beneficiaries in the false-negative subgroup. We then determined the percentage by which addition of the ADRD HCC increased scores among false-negatives and the percentage by which removal of the ADRD HCC decreased scores among false-positives.

We next calculated a blended average of the change to the scores in the respective true-negative and true-positive subgroups when beneficiaries with discordant claims were reclassified using their reference standard diagnosis. To do so, we weighted the observed score (for beneficiaries with concordant claims) and the corrected score (for beneficiaries with discordant claims) by the number of beneficiaries within each subgroup. Last, we evaluated the effect of reclassifying discordant beneficiaries with corrected risk-adjustment scores across the full sample. Because it is unlikely that the false-positive rate would decrease after reimbursement reform, we repeated this process, reclassifying only false-negative beneficiaries. We weighted all estimates by person-years of observation (n = 111,285). We used paired t-tests to evaluate the significance of the differences between observed and corrected risk-adjustment scores.

Sensitivity Analysis

We evaluated the ADRD HCC accuracy stratified by the health care setting in which a claim was assigned and whether it was facility- or physician-generated (appendix 3).16 To evaluate the effect of sample selection on the performance of each claims-based definition of ADRD, we varied the sample definition to include only beneficiaries whose ADRD-related claims were exclusively assigned within the Mass General Brigham ACO (appendix 4).16 To evaluate the effect of reviewer certainty on the identification of beneficiaries with false-positive administrative claims, we varied the reference standard definition to include only beneficiaries whose clinician-adjudicated ADRD status was assigned with a “moderate” or high” degree of certainty (appendix 5).16

Limitations

Our study has several limitations. First, our reference standard of clinician-adjudicated ADRD is subject to uncertainty because ADRD is subject to variable ascertainment among clinicians, even in the context of in-person evaluation.18,19,24 Because our real-world reference standard is predicated on clinician review of the EHR and not in-person evaluation, this further increases uncertainty. Nonetheless, medical record review remains the primary method by which to audit the accuracy of medical claims for risk-adjustment purposes.25

Second, use of care for cognitive concerns varies across populations, health systems, and geographies. These factors may influence the generalizability of our findings. Because our analysis is restricted to traditional Medicare beneficiaries within a single ACO, our findings may not generalize to MA beneficiaries or patients in other provider networks. Nonetheless, Mass General Brigham ACO providers serve both MA and traditional Medicare beneficiaries, and we expect coding conventions to be similar within the system. In addition, our findings are tied to the underlying prevalence of adjudicated ADRD. For this reason, we report performance characteristics that are both sensitive to (positive and negative predictive values and false-positive and false-negative rates) and insensitive to (sensitivity and specificity) underlying prevalence. The estimated prevalence in our sample approximates the lower bound of prevalence estimates in the Medicare population.26

Third, the financial implications of discordant ADRD HCC will vary because of several factors after the addition of the ADRD HCC categories. Our analyses use the average risk-adjustment scores and expenditures within relevant subgroups. After reimbursement reform, however, the expected decrease in false-negatives and increase in false-positives are likely to change the ADRD HCC weights.

Last, our sample does not include dual eligibles, and coding behavior and spending could well differ among that group. We consider it prudent to separately evaluate the performance of the ADRD HCC definition among dual eligibles because of multiple factors that should affect the performance of diagnostic claims within this population. For example, many dual eligibles with ADRD are nursing home residents, approximately two-thirds of whom are diagnosed with ADRD and have more advanced disease.27 This far exceeds the estimated prevalence of ADRD within the overall MA and traditional Medicare populations.26 Moreover, cognitive surveillance within long-term care settings is standardized compared with nonrandom patterns of cognitive screening and ADRD ascertainment among community-dwelling older persons.13,28,29

Study Results

Performance Characteristics Of Claims-Based ADRD Definitions

Exhibit 1 displays the count and percentages of beneficiaries with clinician-adjudicated disease (n = 2,806) or a qualifying ADRD diagnostic code under the HCC (n = 3,282) definition. With a 7.5 percent prevalence (95% confidence interval: 6.2, 9.1 percent) of clinician-adjudicated dementia (exhibit 1), we observed a 22.7 percent false-negative rate (95% CI: 14.1, 34.6 percent) and a 3.2 percent false-positive rate (95% CI: 2.5, 4.2 percent) for the HCC claims-based indicator of ADRD (data not shown). The false-negative rate is the complement of sensitivity, 77.3 percent; the false-positive rate is the complement of specificity, 96.8 percent (appendix 8).16 Thirty-four percent of beneficiaries with an ADRD HCC did not have clinician-adjudicated ADRD and were classified as false-positives. The prevalence of false-positive beneficiaries increased to 39 percent when the reference standard was restricted to include only ADRD cases classified with higher reviewer confidence (appendix 5).16 The HCC claims-based indicator ascertained ADRD with a positive predictive value of 66.1 percent (95% CI: 64.4, 67.7 percent) and a negative predictive value of 98.1 percent (95% CI: 98.0, 98.3 percent) (appendix 8).16

Exhibit 1:

Cases of Alzheimer’s disease and related dementias (ADRDs) identified by Medicare claims and clinician adjudication, Mass General Brigham accountable care organization (ACO), 2016–18

Clinician-adjudicated ADRD
Yes No Total
Total patients 2,806 34,394 37,200
HCC indicator
 Yes
  Number 2,168 1,114 3,282
  Percent 66 34 100
 No
  Number 638 33,280 33,918
  Percent 2 98 100

SOURCE Authors’ analysis of claims from Medicare Parts A and B and electronic health record data from the Mass General Brigham ACO. NOTES The exhibit displays the prevalence of beneficiaries with true- and false-positive and true- and false-negative ADRDs, as determined by clinician adjudication of medical records. Beneficiaries with true-positive diagnoses are those whose claims are concordant with clinician-adjudicated ADRD status. Beneficiaries with false-positive diagnoses are those whose claims are discordant with clinician-adjudicated ADRD status. The converse applies to true- and false-negatives. HCC is hierarchical condition categories (Medicare’s risk-adjustment system).

Sample Characteristics By Beneficiary Subgroup

Exhibit 2 displays the characteristics of beneficiary subgroups by the concordance of their ADRD HCC classification with their clinically adjudicated status. True-negative beneficiaries were the youngest subgroup, followed by false-positives. Baseline (2016) risk-adjustment scores suggested that false-positives and false-negatives were more clinically complex than their counterparts with concordant ADRD HCC status. Comorbid depression was most prevalent among true- and false-positives, whereas hearing impairment was most prevalent among false-positives and false-negatives. True-positives used the most hospital and postacute care, whereas false-positives and false-negatives respectively used the most outpatient specialty care and home health services.

Exhibit 2:

Characteristics and health care use of accountable care organization beneficiaries by concordance of Alzheimer’s disease and related dementias (ADRD)-Hierarchical Condition Categories (HCC) claims with clinician-adjudicated status, 2016–18

Characteristics and utilization HCC and claims pairing
False-positive True-positive False-negative True-negative
Beneficiaries (no.) 1,114 2,168 638 33,280
Demographic characteristics
 Age, years (%)
  65–74 30 16 5 59
  75–79 21 18 14 21
  80–84 18 23 16 13
  >85 31 43 66 7
 Female sex (%) 54 67 54 57
Health characteristics
 Baseline (2016) risk-adjustment score 1.52 1.31 1.67 0.84
 Depression (%) 47 46 12 20
 Hearing impairment (%) 42 22 45 28
Acute care utilization (average total)
 Hospital admissions (days per year) 3.4 4.1 3.2 0.9
 Hospital readmission (no. per year) 0.23 0.27 0.21 0.15
Supportive care utilization (average total)
 Institutional postacute care (days per year) 5.1 5.8 5.1 0.5
 Home health utilization (days per year) 13.6 24.8 28.8 2.9
 Outpatient utilization (average total)
  Primary care visits (no. per year) 4.2 3.3 3.5 3.0
  Any specialist visits (no. per year) 7.1 4.3 6.2 5.1

SOURCE Authors’ analysis of claims from Medicare Parts A and B and electronic health record data from the Mass General Brigham accountable care organization. NOTES The exhibit displays the baseline characteristics of accountable care organization beneficiaries identified by the ADRD HCC. Beneficiaries with true-positive diagnoses are those whose claims are concordant with the reference standard, clinician-adjudicated ADRD status. Beneficiaries with false-positive diagnoses are those whose claims are discordant with the reference standard, clinician-adjudicated ADRD status. The converse applies to true- and false-negatives. HCC are Medicare’s risk-adjustment system.

Allowing a maximum of one ADRD HCC constituent diagnostic code per encounter and day, false-positives were assigned an average of 3.2 qualifying claims compared with 13.6 claims for true-positives (appendix 6).16 Qualifying false-positive claims were most often assigned within a single year (74 percent) and during outpatient encounters (50 percent). The most common diagnostic claims assigned to false-positives under the ADRD HCC were nonspecific (for example, unspecified dementia without behavioral disturbance) or noncognitive (for example, degenerative disease of the nervous system, unspecified).

Appendix 7 summarizes the characteristics of beneficiaries (n = 3,413) whose electronic health records had insufficient information with which to adjudicate their cognitive status.16 These beneficiaries were predominately female and younger and healthier than individuals included in the main analysis. Less than 3 percent of beneficiaries died or disenrolled by the end of the observation period.

Annual Medicare Spending By ADRD-HCC Status

Exhibit 3 displays the differences in observed and predicted expenditures for beneficiaries with true- and false-positive and true- and false-negative ADRD HCCs in 2017. The patterns of observed expenditures among beneficiaries with discordant ADRD HCCs were the converse of predicted expenditures. Expenditures for false-negatives significantly exceeded predicted expenditures by $18,347 (95% CI: −$31,114, −$5,579). Expenditures for false-positives were $7,998 (95% CI: $2,758, $13,239) lower than predicted. The observed expenditures for false-negatives exceeded those of true-positive beneficiaries by $14,619 (95% CI: $1,468, $27,770) per beneficiary. The magnitude of these subgroup differences exceeded the change in expenditures that would correspond to appropriate ADRD risk-adjustment ($3,240). These patterns were similar in 2018 (not displayed).

Exhibit 3:

Observed versus expected per beneficiary annual expenditures, by claims identification status of Alzheimer’s disease and related dementias (ADRD), 2017

Claims identification status Annual expenditures ($) 95% confidence interval ($)
False-negatives
 Predicted 12,602 10,534, 14,670
 Observed 30,949 18,455, 43,443
 Difference −18,347 −31,114, −5,579
True-positives
 Predicted 18,146 16,295, 19,997
 Observed 16,330 12,476, 20,184
 Difference 1,816 −2,079, 5,712
False-negatives minus true-positives
 Observed difference 14,619 1,468, 27,770
No clinician-adjudicated ADRD
 False-positives
  Predicted 19,497 14,945, 24,050
  Observed 11,499 6,701, 16,297
  Difference 7,998 2,758, 13,239
 True-negatives
  Predicted 8,222 7,745, 8,700
  Observed 8,003 6,833, 9,173
  Difference 220 −931, 1,370
 False-positives minus true-negatives
  Observed difference 3,496 −1,446, 8,439

SOURCE Authors’ analysis of claims from Medicare Parts A and B and electronic health record data from the Mass General Brigham accountable care organization. NOTES The exhibit displays the average annual observed and predicted Medicare expenditures for beneficiaries, by ADRD Hierarchical Condition Categories (HCC) status for 2017. We compared observed and predicted expenditures across beneficiary subgroups, based on the concordance of their ADRD HCC status with their clinically adjudicated ADRD status. We calculated observed expenditures (in 2020 dollars), using inpatient, outpatient, skilled nursing, home health, hospice, and durable medical equipment claims, and excluded prescription drug costs. We calculated predicted expenditures by multiplying beneficiaries’ prospective risk-adjustment weights (using version 24 of the HCC model) by the 2020 Centers for Medicare and Medicaid Services base rate ($9,365.50). Consistent with prospective risk-adjustment, we used claims from 2016 to calculate predicted expenditures for 2017. HCC are Medicare’s risk-adjustment system.

Risk-Adjustment Implications Of Discordant ADRD-HCCs

Exhibit 4 summarizes the effect of discordant ADRD claims on risk-adjustment scores, averaged across person-years of observation. Using claims from 2016 to 2018 to calculate prospective scores, we observed an average score for the ACO population of 1.08. True-negatives had the lowest risk-adjustment score of any subgroup (0.99). After removing the weight for ADRD, corrected scores in the false-positive subgroup were 15 percent lower than observed scores (2.21 versus 2.59). After reclassifying all false-positive beneficiaries as true-negatives and using corrected weights, the average risk-adjustment score among persons with no clinician-adjudicated ADRD was unchanged (1.01). For beneficiaries with clinician-adjudicated ADRD, true-positives had greater average prospective risk-adjustment scores than false-negatives (2.15 versus 1.63). Within the false-negative subgroup, corrected scores were 22 percent greater than observed scores (1.99 versus 1.63). When reclassifying the false-negative beneficiaries within our sample as true-positives, the average risk-adjustment score among persons with clinician-adjudicated ADRD increased by 9 percent, to 2.07. The effects of reclassifying discordant beneficiaries yielded a 1 percent observed increase in the average population score, from 1.08 to 1.09. Thus, results were similar when leaving scores for false-positive beneficiaries uncorrected.

Exhibit 4:

Changes in Hierarchical Condition Categories (HCC) scores for beneficiaries with discordant identification of Alzheimer’s disease and related dementias (ADRD), 2016–18 Claims

Risk scores No clinician-adjudicated ADRD Clinician-adjudicated ADRD Overall population
True negative False positive Total True positive False negative Total
Subjects (no.) 101,601 1,361 102,962 4,421 3,902 8,323 111,285
Observed average risk score 0.99 2.59 1.01 2.15 1.63 1.90 1.08
Corrected average risk score a 2.21 1.01 a 1.99 2.07 1.09
Difference a −0.38**** −0.005**** a 0.36**** 0.17**** 0.008****
 Percent change a −15% 0% a 22% 9% 1%

SOURCE Authors’ analysis of claims from Medicare Parts A and B and electronic health record data from the Mass General Brigham accountable care organization. NOTES This exhibit displays the estimated effect on risk adjustment scores of claims-identified ADRD cases that were discordant with clinician adjudication, our reference standard. We used claims from each year of the observation period (2016–18) to calculate corrected annual prospective HCC scores (using version 24 of the HCC model) for ADRD cases identified by clinician adjudication, but not HCC. We allowed the concordance of ADRD HCC and the reference standard to vary by year, depending on the presence of qualifying ADRD HCC. We calculated averages within each subgroup across all person-years of observation (n = 111,285). We used paired t-tests to evaluate the significance of the differences between observed and corrected risk-adjustment scores. HCC are Medicare’s risk-adjustment system.

a

Not applicable.

****

p < 0.001

Sensitivity Analysis

Our sensitivity analysis demonstrated that physician-generated ADRD claims from outpatient health care encounters had the greatest combined sensitivity and specificity for clinician-adjudicated ADRD (appendix 3) compared with inpatient and facility-generated claims.16 These results were consistent across sensitivity analyses that varied the sample inclusion criteria (appendix 4), and the reference standard to include only beneficiaries whose ADRD status was classified with higher certainty (appendix 5).16

Discussion

This study characterizes the concordance between ADRD HCC and a clinician-adjudicated reference standard before a policy change that reintroduced the ADRD HCC into risk adjustment. We used data from a large ACO to evaluate the effects of ADRD HCC accuracy on annual Medicare spending and risk adjustment.

Incorporating ADRD HCC into risk adjustment creates incentives for providers to increase use of ADRD International Classification of Diseases codes in claims, resulting in a concomitant increase the false-positive rate and decrease in the false-negative rate of cases ascertained using the HCC definition. The magnitude of the changes in these two categories and the spending of beneficiaries within them will determine the change in Medicare spending, as well as future changes in risk-adjustment weights across HCC designations. We observed a 22.7 percent false-negative rate and a 3.2 percent false-positive rate in the context of 7.5 percent prevalence of ADRD. We demonstrated that false-negative beneficiaries exceed the Medicare expenditures predicted by their observed risk-adjustment score by 70 percent ($18,347). As a result, MA plans and ACOs have a large financial incentive to reclassify false-negative beneficiaries. If all false-negatives were to be coded as having ADRD, spending would increase by 9 percent among beneficiaries with ADRD and by 1 percent for the entire ACO population. To the extent the false-positive rate increases, Medicare spending would increase further.

The greater spending of the false-negatives likely contributes to ACOs exceeding annual expenditure benchmarks and to underestimating prospective monthly payments for patients with ADRD within MA. Although our data only allow us to assess implications for Medicare spending, most ADRD-related costs are attributable to long-term services and supports, which are disproportionately borne by families and Medicaid.10 This pattern has remained true despite enhanced long-term services and supports flexibilities under MA, which remain underused.30

The additional amount that CMS pays for patients with ADRD is based on the cost of traditional Medicare beneficiaries with ADRD, including those in ACOs.21 Therefore, changes in coding behavior among traditional Medicare beneficiaries in ACOs will alter the weight assigned to patients with ADRD and thus increase Medicare spending in MA. Moreover, the changed coding would likely spill over into traditional Medicare patients not in ACOs, further altering the mix of patients whose spending is used to compute the risk-adjustment weight for ADRD. Because CMS tries to maintain budget neutrality when weights change,31 any increase in ADRD weights will decrease reimbursement for other diagnoses.

The reintroduction of ADRD HCC into risk-adjustment creates incentives that would be expected to influence diagnostic coding behavior. The accuracy of future CMS risk adjustment could be improved by removing noncognitive and unspecific diagnoses from the set of diagnostic codes included in the ADRD HCC. Other approaches include requiring multiple annual qualifying diagnostic codes from multiple health care settings.32 Our findings suggest that there is potential for a considerable decrease in the rate of false-negative ADRD cases. Although risk adjustment for ADRD should reduce underdiagnosis, which may improve clinical management, its influence on false-negative and false-positive diagnoses has the potential to considerably increase total Medicare expenditures and the distribution of risk-adjustment weights across HCC.

Supplementary Material

Appendix

Acknowledgments

Contributions to this article from Natalia Festa were supported by the National Institute on Aging (Award No. T32 AG019134) and the Clinical and Translational Science Awards Program (Award No. TL1 TR001864) from the National Center for Advancing Translational Science. Contributions to this article from John Hsu, Lidia Moura, and Joseph Newhouse were supported by the National Institutes of Health (Grant Nos. K08AG053380, R01AG062282, and P01AG032952). Hsu has consulted for Cambridge Health Alliance, Columbia University, Community Servings, Delta Health Alliance, the Robert Wood Johnson Foundation, and the University of Southern California.

Biographies

Natalia Festa, Veterans Affairs Office of Academic Affiliations and Yale University, New Haven, Connecticut.

Mary Price, Massachusetts General Hospital and Harvard University, Boston, Massachusetts.

Max Weiss, Massachusetts General Hospital and Harvard University.

Lidia M.V.R. Moura, Massachusetts General Hospital and Harvard University.

Nicole M. Benson, Massachusetts General Hospital and Harvard University; McLean Hospital, Belmont, Massachusetts.

Sahar Zafar, Harvard University.

Deborah Blacker, Massachusetts General Hospital and Harvard University.

Sharon-Lise Normand, Harvard University.

Joseph P. Newhouse, Harvard University.

John Hsu, Massachusetts General Hospital and Harvard University.

Notes

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix

RESOURCES