Key Points
Question
Are higher scores for primary care physicians in the Medicare Merit-based Incentive Payment System (MIPS) associated with better performance on a broad range of clinical process and patient outcome measures?
Findings
In this cross-sectional observational study of 80 246 primary care physicians, MIPS scores were inconsistently related to performance on process and outcome measures, and physicians caring for more medically complex and socially vulnerable patients were more likely to receive low MIPS scores, even when they delivered relatively high-quality care.
Meaning
The MIPS program may not accurately capture the quality of care that primary care physicians provide.
Abstract
Importance
The Medicare Merit-based Incentive Payment System (MIPS) influences reimbursement for hundreds of thousands of US physicians, but little is known about whether program performance accurately captures the quality of care they provide.
Objective
To examine whether primary care physicians’ MIPS scores are associated with performance on process and outcome measures.
Design, Setting, and Participants
Cross-sectional study of 80 246 US primary care physicians participating in the MIPS program in 2019.
Exposures
MIPS score.
Main Outcomes and Measures
The association between physician MIPS scores and performance on 5 unadjusted process measures, 6 adjusted outcome measures, and a composite outcome measure.
Results
The study population included 3.4 million patients attributed to 80 246 primary care physicians, including 4773 physicians with low MIPS scores (≤30), 6151 physicians with medium MIPS scores (>30-75), and 69 322 physicians with high MIPS scores (>75). Compared with physicians with high MIPS scores, physicians with low MIPS scores had significantly worse mean performance on 3 of 5 process measures: diabetic eye examinations (56.1% vs 63.2%; difference, −7.1 percentage points [95% CI, −8.0 to −6.2]; P < .001), diabetic HbA1c screening (84.6% vs 89.4%; difference, −4.8 percentage points [95% CI, −5.4 to −4.2]; P < .001), and mammography screening (58.2% vs 70.4%; difference, −12.2 percentage points [95% CI, −13.1 to −11.4]; P < .001) but significantly better mean performance on rates of influenza vaccination (78.0% vs 76.8%; difference, 1.2 percentage points [95% CI, 0.0 to 2.5]; P = .045] and tobacco screening (95.0% vs 94.1%; difference, 0.9 percentage points [95% CI, 0.3 to 1.5]; P = .001). MIPS scores were inconsistently associated with risk-adjusted patient outcomes: compared with physicians with high MIPS scores, physicians with low MIPS scores had significantly better mean performance on 1 outcome (307.6 vs 316.4 emergency department visits per 1000 patients; difference, −8.9 [95% CI, −13.7 to −4.1]; P < .001), worse performance on 1 outcome (255.4 vs 225.2 all-cause hospitalizations per 1000 patients; difference, 30.2 [95% CI, 24.8 to 35.7]; P < .001), and did not have significantly different performance on 4 ambulatory care–sensitive admission outcomes. Nineteen percent of physicians with low MIPS scores had composite outcomes performance in the top quintile, while 21% of physicians with high MIPS scores had outcomes in the bottom quintile. Physicians with low MIPS scores but superior outcomes cared for more medically complex and socially vulnerable patients, compared with physicians with low MIPS scores and poor outcomes.
Conclusions and Relevance
Among US primary care physicians in 2019, MIPS scores were inconsistently associated with performance on process and outcome measures. These findings suggest that the MIPS program may be ineffective at measuring and incentivizing quality improvement among US physicians.
This cross-sectional study of US primary care physicians participating in the Medicare Merit-based Incentive Payment System (MIPS) program in 2019 examines whether primary care physicians’ MIPS scores are associated with performance on process and outcome measures.
Introduction
The Medicare Merit-based Incentive Payment System (MIPS) is the largest value-based payment program in the US, influencing reimbursement for nearly 1 million clinicians annually based on performance in 4 domains: cost, quality, improvement activities, and promoting interoperability.1 In 2019, the program’s third year, 99% of eligible clinicians participated; based on 2019 performance scores, Medicare Part B fee-for-service payments could have been adjusted by 7% in 2021, although actual adjustments were smaller.1
The MIPS program has been criticized for a number of perceived shortcomings, including use of irrelevant quality measures; high levels of administrative burden, especially for small, independent, and rural practices; and complex program design that prevents meaningful comparison across clinicians and carries the potential to worsen health care disparities.2,3,4,5,6,7 Citing these and other concerns, the Medicare Payment Advisory Commission has called for the MIPS program to be eliminated.8,9 Other organizations and experts, however, have argued that the program should be reformed, not replaced, and Congress has not indicated that it intends to abandon the program.10,11
Little empirical evidence exists on whether the MIPS program accurately measures quality or can distinguish high vs low clinical performance—a critical aspect of value-based payment programs. One study of hospital-based surgical care found that surgeons’ MIPS quality scores in 2017 were not associated with postoperative complications,12 but there has been no comprehensive evaluation of whether the MIPS program captures the quality of care delivered in primary care settings.
This study examined 3 questions. First, are higher physician MIPS scores associated with better performance on clinical process measures? Second, are higher MIPS scores associated with better patient outcomes? Third, what are the characteristics of physicians for whom MIPS scores are discordant with patient outcomes (ie, low MIPS scores and good outcomes or high MIPS scores and poor outcomes)?
Methods
Data and Sample
Because only deidentified administrative data were used, this study was deemed exempt by the institutional review board at Weill Cornell Medical College. The main data sources included the 2019 Physician Compare files,13 2019 Medicare Data on Provider Practice and Specialty (MD-PPAS) file,14 and a 20% sample of 2018-2019 Medicare fee-for-service claims. Using the Physician Compare Doctors and Clinicians Quality Payment Program: Overall MIPS Performance file, we identified clinicians who participated in the MIPS program in 2019. Using clinicians’ National Provider Identifiers, we linked the Physician Compare file with physician specialty data in MD-PPAS to identify primary care physicians (general practice, family practice, internal medicine, and geriatrics). We attributed patients to the physician who delivered a plurality of their evaluation and management visits, following logic developed by the Centers for Medicare & Medicaid Services (CMS) for the Value-Based Modifier program.15
Exposure
The MIPS program is one of 2 tracks through which clinicians can participate in the Quality Payment Program under the Medicare and CHIP Reauthorization Act, passed by Congress in 2015 (see Box for further details). In 2019, the MIPS program measured performance across 4 categories to create a composite MIPS final score: quality (45%), cost (15%), improvement activities (15%), and promoting interoperability (25%). Clinicians could receive a score from 0 to 100, with higher scores indicating better performance. Clinicians with a score greater than 75 were eligible for an “exceptional performance” bonus, while clinicians with a score of 30 received no adjustment and clinicians with a score less than 30 were eligible for penalties. Based on 2019 program performance, clinicians’ Medicare Part B payments could be adjusted by up to ±7% in 2021; however, because the program is budget-neutral, and more clinicians received high scores than low scores, actual positive payment adjustments were smaller.
Box. Overview of the 2019 Medicare MIPS Program.
General Factsa
One of 2 tracks for clinicians to participate in the Quality Payment Program under the 2015 Medicare and CHIP Reauthorization Act (MACRA).
The MIPS program combined 3 legacy programs: Physician Quality Reporting System (PQRS), Value-Based Payment Modifier (VM), and Medicare EHR Incentive Program for Eligible Professionals.
Clinicians are exempt from the MIPS program if they participated in an advanced Alternative Payment Model (APM) or did not meet the program’s low-volume thresholds.
2019 MIPS Performance Categoriesb
Quality (45%): Clinicians select 6 of 257 possible quality measures to report, including at least 1 outcome or “high-priority” measure.
Cost (15%): Costs are calculated from administrative claims data and include a Medicare Spending Per Beneficiary measure, a Total Per Capita Cost measure, and if applicable, 8 episode-based measures.
Improvement activities (15%): Clinicians could attest to completing 1 or more of 118 improvement activities, including implementation of advance care planning processes and registration in a prescription drug monitoring program.
Promoting interoperability (25%): Clinicians must use 2015 Edition Certified Electronic Health Record Technology and report on measures related to e-prescribing, health information exchange, provider-to-patient exchange, and public health and clinical data exchange.
There have been changes to the MIPS program since 2019. For example, some “topped-out” measures—those for which performance was consistently high and meaningful comparisons could not be made—have been removed. In 2022, physicians’ final score assigns a lower weight to quality (30%) and a higher weight to cost (30%). Furthermore, the minimum score needed to avoid a potential penalty has increased to 75, and the score needed to qualify for an “exceptional performance” bonus has increased to 89. A maximum payment adjustment of ±9% will be applied to 2024 Medicare Part B claims based on 2022 performance.
Outcomes
To assess physician quality, we generated 5 clinical process measures and 6 patient outcome measures using 2019 Medicare fee-for-service claims data and validated measure specifications maintained by the National Committee for Quality Assurance, CMS, and the Agency for Healthcare Research and Quality (AHRQ) (eTable 1 in Supplement 1). Process measures included annual eye examinations for patients with diabetes, annual hemoglobin A1c (HbA1c) testing for patients with diabetes, breast cancer screening for women aged 50 to 74 years, annual influenza vaccination, and annual tobacco screening.18,19 These measures are among those clinicians could report to fulfill MIPS requirements in 2019. (Annual HbA1c testing was used instead of percentage of patients with diabetes and HbA1c levels under poor control because the latter is underrepresented in claims owing to infrequent use of necessary modifier codes.) Clinical outcome measures included 4 ambulatory care–sensitive admission indicators (for chronic obstructive pulmonary disease, hypertension, heart failure, and complications of diabetes),20 as well as indicators for an emergency department visit not resulting in an inpatient admission and an all-cause inpatient admission based on the Medicare Beneficiary Summary File. These measures are not included in the MIPS program but rather are commonly used indicators endorsed by AHRQ to examine the quality of outpatient care.21
For each physician, process measure performance was defined as the proportion of attributed and eligible patients who received the recommended care in the year, and outcome measure performance as the number of patients with an ambulatory care–sensitive admission, emergency department visit, or inpatient admission per 1000 attributed and eligible patients in the year.
Physician, Practice, and Patient Characteristics
We constructed physician-level covariate indicators reflecting physician, practice, and patient characteristics. For physician characteristics, we used MD-PPAS to identify the number of enrollees seen by the physician in 2019 and to determine their census region. We used the zip code listed on the plurality of a physician’s Medicare carrier file claims to identify whether they practiced in a rural area, as defined by the Federal Office of Rural Health Policy.22 Physicians’ practice was defined by their primary tax identification number as indicated in MD-PPAS. MD-PPAS was also used to construct a measure of physician practice size (number of unique physician National Provider Identifiers billing under the same primary tax identification number) and multispecialty status (defined as practices in which less than 80% of National Provider Identifiers were primary care physicians). We identified whether a physician participated in a practice associated with a health system using the 2018 AHRQ Compendium of US Health Systems Group Practice Linkage File, which links tax identification numbers to health systems.23
For patient characteristics, we constructed variables to measure the percentage of each physician’s patients who were clinically complex or socially vulnerable using the Medicare Carrier file. For clinical complexity, we identified the mean age and mean Hierarchical Condition Category risk score of a physician’s attributed patients. The Hierarchical Condition Category risk score was constructed using 2018 Medicare claims data. To measure physicians’ care for socially vulnerable enrollees, we constructed the proportion of a physician’s attributed patients who were dually eligible for Medicare and Medicaid (a proxy for low-income) for at least 6 months in the year. We identified the proportion of Black, Hispanic, and White enrollees attributed to each physician based on the Research Triangle Institute (RTI) race variable in Medicare claims. The RTI variable incorporates self-reported race and ethnicity from the Medicare enrollment database and an RTI-developed imputation algorithm to improve the classification of Hispanic patients. These racial and ethnic categories were selected because they have the highest validity in Medicare claims data.24 All continuous characteristics were winsorized at the top and bottom 1%. Primary care physicians and patients with missing characteristics were excluded from our analysis.
Analysis
We examined the relationship between physicians’ MIPS scores and their performance on process and outcome measures. MIPS scores were categorized into low (≤30), medium (>30-75), and high (>75) performance, which aligns with MIPS program thresholds for payment adjustments (physicians with scores <30 received penalties and those with a score of 30 received no adjustment, while those with scores >30-75 received positive adjustments and those with scores >75 were eligible for an “exceptional performance” bonus). We used categorical rather than continuous MIPS scores because of the highly skewed distribution. Using t tests to identify statistically significant differences, we compared the mean physician performance on process and outcome measures across MIPS score categories. The sample varied across analyses for process or outcome measures because some physicians had no relevant attributed patients for a particular measure.
MIPS scores, which are based primarily on process measures such as those included in this analysis, are not adjusted for patient clinical or social complexity; therefore, we used unadjusted process measures to examine physicians’ performance on process measures. However, to examine physician performance on outcome measures, we adjusted for the clinical complexity of attributed patients. To construct the risk-adjusted outcome, we first calculated a physician’s observed-to-expected event rate using a patient-level logistic regression adjusting for enrollee age, sex, and Hierarchical Condition Category risk score, as well as physicians’ hospital referral region. Enrollee age and Hierarchical Condition Category risk score were included as categorical variables to allow a nonlinear relationship with the outcome. Hospital referral region controls enabled outcomes to differ across regions. The event rate was scaled by the base rate in the sample and can be interpreted as the rate that would be expected had the physician treated the average case-mix of patients. To account for variation in a physician’s panel size, an empirical Bayes shrinkage estimator was applied to the adjusted outcomes, following an approach used in previous literature (additional details available in the eMethods in Supplement 1).25,26
We calculated the frequency at which physicians’ MIPS scores were discordant with their performance on patient outcomes. To ensure that outcome measures with large variability did not drive the composite outcome score, we created a composite outcome score for each physician by combining the unscaled risk-adjusted rates (observed-to-expected rate) weighted by each adjusted outcome’s inverse standard deviation. As a sensitivity analysis, we created a composite score weighting each outcome equally. We classified physicians into quintiles based on the composite score. We then compared the proportion of physicians in each quintile by low, medium, and high MIPS score status.
We identified the characteristics of physicians for whom MIPS scores were discordant (in either direction) with their performance on patient outcome measures. We compared mean physician, practice, and patient panel characteristics for physicians with discordance vs concordance using t tests to identify statistically significant differences. Comparisons were separated by physicians with low vs high MIPS scores because the differences between discordant and concordant physicians could vary for each group (ie, characteristics for physicians with low MIPS scores and superior outcomes were potentially different from the characteristics of physicians with high MIPS scores and poor outcomes). Because of the potential for type I error due to multiple comparisons, findings should be interpreted as exploratory.
We performed 2 additional sensitivity analyses. First, we repeated all analyses with a restricted sample of primary care physicians who had at least 1 attributed patient for every outcome measure and most process measures to compare a consistent set of physicians across all quality measures. We did not require physicians to have the tobacco screening measure for this analysis because a relatively small proportion of physicians have valid data for this measure owing to infrequent use of necessary modifier codes. Second, we used the 2022 performance year thresholds (physicians with scores <75 may receive penalties, while those with scores >89 are eligible for an “exceptional performance” bonus) instead of the 2019 performance year thresholds to test the sensitivity of our results to the various program threshold values.
Analyses were conducted using Stata version 16 (StataCorp). P < .05 (2-sided) was considered statistically significant.
Results
The study population included 80 246 primary care physicians with 3.4 million attributed patients (eTables 2 and 3 in Supplement 1). Of the 97 572 primary care physicians who received a MIPS score in 2019, 82% (80 366) had attributed patients. Of the physicians with attributed patients, only 0.1% of physicians (120) and 0.15% of patients (2816) had missing characteristics and were excluded from analyses.
As shown in Table 1, 4773 physicians (5.9%) received a low MIPS score (mean score, 29.9), 6151 (7.7%) received a medium MIPS score (mean score, 58.7), and 69 322 (86.4%) received a high MIPS score (mean score, 92.8) (additional details on the distribution of physicians’ MIPS scores available in eTable 4 in Supplement 1). Compared with physicians with high MIPS scores, physicians with low MIPS scores had visits with significantly more unique enrollees (443 vs 301 unique enrollees in 2019; difference, 142 enrollees [95% CI, 136 to 150]) and were significantly more likely to work in smaller practices (mean practice size, 11 vs 311 physicians; difference, −300 physicians [95% CI, 286 to 314]). They were significantly less likely to practice in multispecialty groups (15.1% vs 71.9%; difference, −56.8 percentage points [95% CI, −57.9 to −55.7]) or to be affiliated with a health system (2.5% vs 57.0%; difference, −54.5 percentage points [95% CI, −55.1 to −53.9]).
Table 1. Physician, Practice, and Patient Characteristics by Physician MIPS Score.
| MIPS score categorya | Difference, mean (95% CI)b | |||||
|---|---|---|---|---|---|---|
| Low (≤30) | Medium (>30-75) | High (>75) | Low vs medium | Medium vs high | Low vs high | |
| Physician characteristics | ||||||
| No. of physicians | 4773 | 6151 | 69 322 | |||
| MIPS score | ||||||
| Mean (SD) | 29.9 (1.2) | 58.7 (13.6) | 92.8 (5.5) | −28.8 (−29.2 to −28.4) | −34.1 (−34.3 to −34.0) | −62.9 (−63.1 to −62.8) |
| Median (IQR) | 30.0 (30.0-30.0) | 63.1 (48.6-71.0) | 94.3 (89.9-96.7) | |||
| No. of enrolleesc | ||||||
| Mean (SD) | 443.4 (240.1) | 335.5 (262.1) | 300.7 (235.8) | 107.9 (98.4 to 117.5) | 34.8 (28.6 to 41.0) | 142.7 (135.8 to 149.6) |
| Median (IQR) | 383.0 (278.0-546.0) | 273.0 (142.0-461.0) | 243.0 (133.0-401.0) | |||
| Rurality, %d | 15.9 | 15.8 | 14.7 | 0.0 (−1.3 to 1.4) | 1.1 (0.1 to 2.0) | 1.1 (0.1 to 2.2) |
| Census region, % | ||||||
| Northeast | 17.7 | 15.5 | 22.1 | 2.1 (0.7 to 3.5) | −6.6 (−7.5 to −5.6) | −4.4 (−5.6 to −3.3) |
| Midwest | 12.3 | 13.7 | 24.5 | −1.4 (−2.6 to −0.1) | −10.8 (−11.7 to −9.9) | −12.2 (−13.2 to −11.2) |
| South | 43.2 | 44.9 | 35.2 | −1.7 (−3.6 to 0.2) | 9.7 (8.4 to 11.0) | 8.0 (6.5 to 9.4) |
| West | 26.9 | 25.9 | 18.2 | 0.9 (−0.7 to 2.6) | 7.8 (6.6 to 8.9) | 8.7 (7.4 to 10.0) |
| Practice characteristics | ||||||
| Group size (No. of physicians)e | ||||||
| Mean (SD) | 11.4 (108.9) | 84.5 (213.2) | 311.3 (498.4) | −73.1 (−79.7 to −66.5) | −226.8 (−239.4 to −214.3) | −299.9 (−314.1 to −285.8) |
| Median (IQR) | 1.0 (1.0-2.0) | 8.0 (2.0-74.0) | 99.0 (9.0-381.0) | |||
| Multispecialty, %f | 15.1 | 49.0 | 71.9 | −34.0 (−35.6 to −32.4) | −22.9 (−24.1 to −21.6) | −56.8 (−57.9 to −55.7) |
| System affiliation, %g | 2.5 | 19.5 | 57.0 | −16.9 (−18.0 to −15.9) | −37.5 (−38.6 to −36.5) | −54.5 (−55.1 to −53.9) |
| Patient panel characteristicsh | ||||||
| Age, mean (SD), y | 75.9 (5.2) | 74.6 (5.9) | 73.9 (5.9) | 1.4 (1.2 to 1.6) | 0.6 (0.5 to 0.8) | 2.0 (1.8 to 2.2) |
| Sex, mean (SD), % | ||||||
| Women | 56.8 (14.9) | 58.2 (19.7) | 58.7 (20.7) | −1.4 (−2.0 to −0.7) | −0.5 (−1.1 to 0.0) | −1.9 (−2.5 to −1.3) |
| Men | 43.2 (14.9) | 41.8 (19.7) | 41.3 (20.7) | 1.4 (0.7 to 2.0) | 0.5 (−0.0 to 1.1) | 1.9 (1.3 to 2.5) |
| Hierarchical Condition Category risk score, mean (SD)i | 1.75 (0.73) | 1.66 (0.76) | 1.48 (0.60) | 0.08 (0.06 to 0.11) | 0.18 (0.16 to 0.19) | 0.26 (0.24 to 0.28) |
| Dual beneficiaries, mean (SD), %j | 24.9 (26.5) | 23.9 (25.7) | 19.9 (22.9) | 1.0 (0.0 to 2.0) | 4.0 (3.4 to 4.6) | 5.1 (4.4 to 5.7) |
| Beneficiary race and ethnicity, mean (SD), %k | ||||||
| Black | 9.7 (17.2) | 10.2 (18.0) | 10.2 (19.2) | −0.4 (−1.1 to 0.3) | −0.1 (−0.6 to 0.4) | −0.5 (−1.0 to 0.1) |
| Hispanic | 7.2 (14.9) | 7.8 (16.0) | 5.4 (13.0) | −0.6 (−1.2 to −0.0) | 2.4 (2.0 to 2.7) | 1.8 (1.4 to 2.1) |
| White | 75.5 (27.1) | 74.9 (28.2) | 78.6 (25.9) | 0.7 (−0.4 to 1.7) | −3.7 (−4.4 to −3.0) | −3.0 (−3.8 to −2.3) |
Abbreviation: MIPS, Medicare Merit-based Incentive Payment System.
MIPS score categories align with MIPS program thresholds for payment adjustments (physicians with scores <30 received penalties and those with a score of 30 received no adjustment, while those with scores >30-75 received positive adjustments and those with scores >75 were eligible for an “exceptional performance” bonus).
Comparisons reflect unadjusted-mean comparison (t) test or 2-group test of proportions between indicated subgroups.
Indicates the number of unique enrollees a physician treated in 2019 as indicated in the Medicare Data on Provider Practice and Specialty file. This is a measure distinct from the number of enrollees attributed to a physician.
Indicates whether a physicians’ practice zip code was in a rural area, as defined by the Federal Office of Rural Health Policy.
Defined as number of unique physician National Provider Identifiers billing under the same primary tax identification number, including both primary care and non–primary care physicians.
Defined as practices in which less than 80% of National Provider Identifiers were primary care physicians.
Defined as a physician in a practice associated with a health system based on the 2018 Agency for Healthcare Research and Quality Compendium of US Health Systems Group Practice Linkage File.23
Patient characteristics for patients attributed to physicians. Patients were attributed to physicians based on the plurality of their evaluation and management visits in 2019, following logic developed by the Centers for Medicare & Medicaid Services for the Value-Based Modifier program.15
Scores were derived from the Centers for Medicare & Medicaid Services Hierarchical Condition Category risk-adjustment model. A score of 1 represents an individual with average clinical risk, with higher scores representing patients at higher risk.
Defined as enrollee having at least 6 months of Medicare and Medicaid eligibility in the year. Dual eligibility is a proxy for low income.
Derived from the Research Triangle Institute (RTI) race variable in Medicare claims. The RTI variable incorporates self-reported race and ethnicity from the Medicare enrollment database and an RTI-developed imputation algorithm to improve the classification of Hispanic enrollees.
Compared with physicians with high MIPS scores, physicians with low MIPS scores cared for patients who were significantly more likely to be dual-eligible (24.9% vs 19.9%; difference, 5.1 percentage points [95% CI, 4.4 to 5.7]), Hispanic (7.2% vs 5.4%; difference, 1.8 percentage points [95% CI, 1.4 to 2.1]), and medically complex (mean Hierarchical Condition Category risk score, 1.75 vs 1.48; difference, 0.26 [95% CI, 0.24 to 0.28]).
Association Between MIPS Scores and Physician Performance on Process and Outcome Measures
Physician performance on diabetic and breast cancer screening process measures varied widely, with interquartile ranges greater than 15 percentage points for physicians across all MIPS score categories (Figure 1). The distribution was compressed for influenza immunization and tobacco screening, with at least half of physicians in all MIPS score categories achieving rates of 100%. Compared with physicians with high MIPS scores, physicians with low MIPS scores had significantly lower rates of diabetic eye examinations (56.0% vs 63.2%; difference, −7.1 percentage points [95% CI, −8.0 to −6.2]; P < .001), diabetic HbA1c screening (84.6% vs 89.4%; difference −4.8 percentage points [95% CI, −5.4 to −4.2]; P < .001), and mammography screening (58.2% vs 70.4%; difference, −12.2 percentage points [95% CI, −13.1 to −11.4]; P < .001) but significantly better rates of influenza vaccinations (78.0% vs 76.8%; difference, 1.2 percentage points [95% CI, 0.0 to 2.5]; P = .045] and tobacco screening (95.0% vs 94.1%; difference, 0.9 percentage points [95% CI, 0.3 to 1.5]; P = .001) (eTable 5 in Supplement 1).
Figure 1. Association Between Process Measures and MIPS Scores.

Plots present the unadjusted primary care physician process measure performance by Medicare Merit-based Incentive Payment System (MIPS) categories. Upper and lower boundaries of boxes indicate 25th and 75th percentiles; whiskers, 10th and 90th percentiles; horizontal lines within the boxes, 50th percentile (median); and diamonds, mean. MIPS categories were defined as low (≤30), medium (>30-75), and high (>75), which aligns with MIPS program thresholds for payment adjustments (physicians with scores <30 received penalties and those with a score of 30 received no adjustment, while those with scores >30-75 received positive adjustments and those with scores >75 were eligible for an “exceptional performance” bonus). Ns listed below each MIPS category indicate the number of physicians included in the plot. The 90th, 75th, and 50th percentiles were 100% for influenza immunization and the 90th through 25th percentiles were 100% for tobacco screening, suggesting that these measures were “topped out.”
Physician performance on the adjusted clinical outcome measures varied widely for emergency department visits and hospitalizations, but performance differences were smaller for the ambulatory care–sensitive admissions, particularly for hypertension (Figure 2). Unadjusted clinical outcomes were qualitatively similar (eFigure 1 in Supplement 1). Compared with patients of physicians with high MIPS scores, patients of physicians with low MIPS scores had significantly higher rates of hospitalization (255.4 vs 225.2 per 1000; difference, 30.2 [95% CI, 24.8 to 35.7]; P < .001) but had significantly lower rates of emergency department visits (307.6 vs 316.4 per 1000; difference, −8.9 [95% CI, −13.7 to −4.1]; P < .001) (eTable 6 in Supplement 1). There were not significant differences in ambulatory care–sensitive admissions for hypertension, diabetes, chronic obstructive pulmonary disease, and congestive heart failure. Results were similar in analyses limiting the sample to physicians with all process and outcome measures (eTables 7 and 8 in Supplement 1) and applying the 2022 rather than 2019 MIPS scoring thresholds (eTables 9 and 10 in Supplement 1).
Figure 2. Association Between Adjusted Clinical Outcome Measures and MIPS Scores.

Plots present the adjusted physician clinical outcome measure performance by Medicare Merit-based Incentive Payment System (MIPS) categories. Upper and lower boundaries of boxes indicate 25th and 75th percentiles; whiskers, 10th and 90th percentiles; horizontal lines within the boxes, 50th percentile (median); and diamonds, mean. For each physician, outcome measure performance was defined as the number of patients with an ambulatory care–sensitive admission, emergency department visits not resulting in an inpatient admission, and number of all-cause inpatient admissions per 1000 attributed and eligible patients in the year. Adjusted clinical outcome represents the expected rate for a physician had the physician treated the average case-mix of patients. Adjustment based on patient-level logistic regression adjusting for enrollee age, sex, Hierarchical Condition Category risk score, and Hospital Referral Region. Empirical Bayes shrinkage estimator applied to account for the variation in a physician’s panel size (see eMethods in Supplement 1). MIPS categories defined as low (≤30), medium (>30-75), and high (>75), which aligns with MIPS program thresholds for payment adjustments (physicians with scores <30 received penalties and those with a score of 30 received no adjustment, while those with scores >30-75 received positive adjustments and those with scores >75 were eligible for an “exceptional performance” bonus). Details on outcome definitions are available in eTable 1 in Supplement 1. Ns listed below each MIPS category indicate the number of physicians included in the plot.
Discordance Between MIPS Scores and Physician Quality
Many primary care physicians experienced discordance between their MIPS scores and their risk-adjusted patient outcomes (Figure 3). Among the 4773 physicians with low MIPS scores in 2019, 19.0% had a composite clinical outcome score in the top quintile of all primary care physicians participating in the MIPS program (compared with 19.8% of physicians with high MIPS scores). Among the 69 322 physicians with high MIPS scores, 20.5% had a composite clinical outcome score in the bottom quintile (compared with 15.1% of physicians with low MIPS scores).
Figure 3. Distribution of Adjusted Clinical Outcome Quintiles by MIPS Score Categories.
Medicare Merit-based Incentive Payment System (MIPS) score categories align with MIPS program thresholds for payment adjustments (physicians with scores <30 received penalties and those with a score of 30 received no adjustment, while those with scores >30-75 received positive adjustments and those with scores >75 were eligible for an “exceptional performance” bonus). Composite outcome defined as the weighted average of physicians’ observed-to-expected outcomes across 6 measures (4 ambulatory care–sensitive measures, emergency department visits not resulting in an inpatient admission, and number of all-cause inpatient admissions) for which weights were the inverse standard deviation of the measure. Observed-to-expected outcomes constructed using patient-level logistic regression adjusted for patient age, sex, Hierarchical Condition Category quintile, and hospital referral region (details available in the eMethods in Supplement 1).
Compared with physicians with low MIPS scores who had concordance with their adjusted clinical outcomes, physicians with low MIPS scores who had discordance (top quintile of the clinical outcome composite measure) were significantly more likely to work in larger practices (mean practice size, 24 vs 8 physicians; difference, 16 physicians [95% CI, 8 to 24]) and significantly more likely to practice in multispecialty groups (26.1% vs 12.5%; difference, 13.6 percentage points [95% CI, 10.5 to 16.6]) (Table 2). They were also significantly more likely to care for dual-eligible patients (30.1% vs 23.7%; difference, 6.3 percentage points [95% CI, 4.4 to 8.2]), Black patients (11.2% vs 9.4%; difference, 1.8 percentage points [95% CI, 0.5 to 3.0]), Hispanic patients (8.5% vs 6.9%; difference, 1.6 percentage points [95% CI, 0.5 to 2.7]), and medically complex patients (Hierarchical Condition Category risk score, 2.14 vs 1.65; difference, 0.49 [95% CI, 0.44 to 0.54]). All other differences in characteristics were not significantly different or were small in magnitude.
Table 2. Patient, Practice, and Physician Characteristics by MIPS Score–Clinical Outcome Discordancea.
| Low-MIPS–scoring physicians | High-MIPS–scoring physicians | |||||
|---|---|---|---|---|---|---|
| Mean (SD) | Difference (95% CI)b | Mean (SD) | Difference (95% CI)b | |||
| Discordant (high clinical outcomes) | Concordant (low or medium clinical outcomes) | Discordant (low clinical outcomes) | Concordant (medium or high clinical outcomes) | |||
| Physician characteristics | ||||||
| No. of physicians | 905 | 3868 | 14 201 | 55 121 | ||
| No. of enrolleesc | 464.6 (265.1) | 438.4 (233.6) | 26.2 (8.8 to 43.5) | 234.5 (189.0) | 317.8 (243.5) | −83.2 (−87.5 to −78.9) |
| Rurality, %d | 16.6 | 15.7 | 0.9 (−1.8 to 3.6) | 10.5 | 15.8 | −5.3 (−5.9 to −4.8) |
| Practice characteristics | ||||||
| Group size (No. of physicians)e | 24.4 (171.2) | 8.4 (88.0) | 16.0 (8.1 to 23.9) | 356.0 (542.0) | 299.8 (485.9) | 56.1 (47.0 to 65.3) |
| Multispecialty, %f | 26.1 | 12.5 | 13.6 (10.5 to 16.6) | 72.1 | 71.9 | 0.3 (−0.6 to 1.1) |
| System affiliation, %g | 5.1 | 1.9 | 3.1 (1.6 to 4.6) | 57.0 | 57.0 | 0.0 (−0.9 to 0.9) |
| Patient panel characteristicsh | ||||||
| Age, mean (SD), y | 76.3 (6.7) | 75.8 (4.7) | 0.5 (0.1 to 0.9) | 73.7 (5.9) | 74.0 (5.9) | −0.3 (−0.4 to −0.2) |
| Sex, mean (SD), % | ||||||
| Women | 56.5 (19.6) | 56.9 (13.6) | −0.4 (−1.5 to 0.6) | 59.6 (23.2) | 58.5 (20.0) | 1.1 (0.7 to 1.5) |
| Men | 43.5 (19.6) | 43.1 (13.6) | 0.4 (−0.6 to 1.5) | 40.4 (23.2) | 41.5 (20.0) | −1.1 (−1.5 to −0.7) |
| Hierarchical Condition Category risk score, mean (SD)i | 2.14 (0.87) | 1.65 (0.66) | 0.49 (0.44 to 0.54) | 1.38 (0.55) | 1.51 (0.61) | −0.13 (−0.15 to −0.12) |
| Dual beneficiaries, mean (SD), %j | 30.1 (26.4) | 23.7 (26.4) | 6.3 (4.4 to 8.2) | 18.5 (24.4) | 20.2 (22.5) | −1.7 (−2.1 to −1.3) |
| Beneficiary race and ethnicity, mean (SD), %k | ||||||
| Black | 11.2 (18.5) | 9.4 (16.9) | 1.8 (0.5 to 3.0) | 9.3 (19.0) | 10.5 (19.3) | −1.1 (−1.5 to −0.8) |
| Hispanic | 8.5 (16.2) | 6.9 (14.5) | 1.6 (0.5 to 2.7) | 5.7 (14.0) | 5.4 (12.7) | 0.4 (0.1 to 0.6) |
| White | 74.0 (27.3) | 75.9 (27.0) | −1.9 (−3.8 to 0.1) | 77.2 (28.0) | 79.0 (25.3) | −1.8 (−2.3 to −1.3) |
Abbreviation: MIPS, Medicare Merit-based Incentive Payment System.
Discordance defined as low-MIPS–scoring physicians with a high composite clinical outcome (top quintile) and high-MIPS–scoring physicians with low composite clinical outcomes (bottom quintile). MIPS categories were defined as low (≤30), medium (30-70), and high (>70), which aligns with MIPS program thresholds for payment adjustments (physicians with scores <30 received penalties and those with a score of 30 received no adjustment, while those with scores >30-75 received positive adjustments and those with scores >75 were eligible for an “exceptional performance” bonus). Composite outcome defined as the weighted average of physicians’ observed-to-expected outcomes across 6 measures (4 ambulatory care–sensitive measures, emergency department visits not resulting in an inpatient admission, and number of all-cause inpatient admissions), where weights are the inverse standard deviation. Observed-to-expected outcomes constructed using patient-level logistic regression adjusted for patient age, sex, Hierarchical Condition Category quintile, and hospital referral region (see Supplement 1 for details).
Comparisons reflect unadjusted-mean comparison (t) test or 2-group test of proportions between indicated subgroups.
Indicates the number of unique enrollees a physician treated in 2019 as indicated in the Medicare Data on Provider Practice and Specialty file. This is a measure distinct from the number of enrollees attributed to a physician.
Indicates whether a physicians’ practice zip code was in a rural area, as defined by the Federal Office of Rural Health Policy.
Defined as number of unique physician National Provider Identifiers billing under the same primary tax identification number, including both primary care and non–primary care physicians.
Defined as practices in which less than 80% of National Provider Identifiers were primary care physicians.
Defined as a physician in a practice associated with a health system based on the 2018 Agency for Healthcare Research and Quality Compendium of US Health Systems Group Practice Linkage File.23
Patient characteristics for patients attributed to physicians. Patients were attributed to physicians based on the plurality of their evaluation and management visits in 2019, following logic developed by the Centers for Medicare & Medicaid Services for the Value-Based Modifier program.15
Derived from the Centers for Medicare & Medicaid Hierarchical Condition Category risk-adjustment model. A score of 1 represents an individual with average clinical risk, with higher scores representing patients at higher risk.
Defined as enrollee having at least 6 months of Medicare and Medicaid eligibility in the year. Dual eligibility is a proxy for low-income.
Derived from the Research Triangle Institute (RTI) race variable in Medicare claims. The RTI variable incorporates self-reported race and ethnicity from the Medicare enrollment database and an RTI-developed imputation algorithm to improve the classification of Hispanic enrollees.
Compared with physicians with high MIPS scores who had concordance in their adjusted clinical outcomes, physicians with high MIPS scores who had discordance (bottom quintile of the clinical outcome composite measure) cared for significantly fewer unique enrollees (235 vs 318 unique enrollees in 2019; difference, −83 enrollees [95% CI, −88 to −79]), practiced in significantly larger groups (mean practice size, 356 vs 300 physicians; difference, 56 physicians [95% CI, 47 to 65]), and were significantly less likely to practice in a rural area (10.5% vs 15.8%; difference, −5.3 percentage points [95% CI, −5.9 to −4.8]) (Table 2). They were also less likely to care for dual-eligible patients (18.5% vs 20.2%; difference, −1.7 percentage points [95% CI, −2.1 to −1.3]), Black patients (9.3% vs 10.5%; difference, −1.1 percentage points [95% CI, −1.5 to −0.8]), and medically complex patients (mean Hierarchical Condition Category risk score, 1.38 vs 1.51; difference, −0.13 [95% CI, −0.15 to −0.12]). All other differences in characteristics were not significantly different or were small in magnitude.
Results were similar in sensitivity analyses limiting the sample to physicians with all process and outcome measures (eFigure 2 and eTable 11 in Supplement 1), applying the 2022 rather than 2019 MIPS scoring thresholds (eFigure 3 and eTable 12 in Supplement 1), and using a composite measure weighting all outcomes equally (eFigure 4 and eTable 13 in Supplement 1).
Discussion
In this cross-sectional analysis of primary care physicians participating in the Medicare MIPS program in 2019, physicians’ MIPS scores were inconsistently associated with their performance on process and outcome measures. For some measures, physicians with the lowest MIPS scores—those subject to financial penalties under the program—had performance superior to that of physicians with the highest MIPS scores, who were eligible for “exceptional performance” bonuses.
The level of discordance between physician MIPS scores and performance on patient outcomes suggests that the MIPS program is approximately as effective as chance at identifying high vs low performance: there were an equal proportion of physicians with low MIPS scores in the top quintile of performance and physicians with high MIPS scores in the bottom quintile. Among physicians receiving low MIPS scores, those who achieved superior patient outcomes cared for more medically complex and socially vulnerable patients; conversely, among physicians receiving high MIPS scores, those who achieved poor patient outcomes cared for fewer medically complex and socially vulnerable patients. These findings suggest that physicians caring for vulnerable populations are more likely to be penalized by MIPS scoring, even when delivering relatively high-quality care.
To our knowledge, this is the first study to examine the relationship between primary care physicians’ MIPS scores and the quality of care they provide across a range of process and outcome measures. The findings support concerns that program performance may not accurately capture the quality of care that physicians provide but rather that quality of care is related to the characteristics of patients they care for and the ability of their organizations to report data. Primary care physicians with low MIPS scores tended to see more clinically and socially complex patients than physicians with high MIPS scores; they were also more likely to work in small and independent practices. Importantly, the study did not find evidence that physicians with low MIPS scores provided consistently worse care.
Given that the quality component of MIPS performance scores is composed largely of process measures, it might be expected that physicians with high scores perform substantially better on these measures; however, physicians with low MIPS scores had similar or better performance on 2 of 5 process measures. Although few health outcomes are included in the MIPS program, better outcomes are the ultimate goal of value-based payment programs and are important to patients and physicians.27,28,29 However, in this study there were no significant differences in performance between physicians with high vs low MIPS scores on 3 of 6 risk-adjusted outcome measures, and patients of physicians with low MIPS scores had lower rates of risk-adjusted emergency department visits. These findings are consistent with other work finding that higher physician MIPS scores in 2017 were not associated with better surgical outcomes.12
There are several possible reasons for the inconsistent relationship between MIPS scores and clinical performance. First, physicians can select the measures they report to the MIPS program from a broad set of quality measures (including measures outside of their specialty), making meaningful comparisons across physicians challenging. Beginning in 2023, clinicians will have the option to participate in the program through MIPS Value Pathways—specialty-focused subsets of measures and activities that can be used to meet program requirements—which may partially address this concern. Second, research has found that many program measures are invalid or of uncertain validity and therefore may not be linked to better outcomes.2 Third, quality is only 1 of 4 domains assessed by the MIPS program and comprises 45% of the overall score in 2019; the other domains (costs, improvement activities, and promoting interoperability) may or may not be associated with better outcomes. Fourth, high MIPS scores may reflect the ability of physicians to collect, analyze, and report data—not the delivery of better medical care. This assertion is supported by the finding that physicians with low MIPS scores were more likely to work in small and independent practices and often had clinical outcomes similar to those of physicians in large, system-affiliated practices with high MIPS scores.
Many observers have called for reforms to the MIPS program, especially given concerns that the program creates substantial administrative burden. The CMS estimates that it cost clinicians more than $1.3 billion to comply with the MIPS program in 2017 and another $700 million in 2018.8 A recent study found that in 2019, practices spent more than $12 000 per physician to participate in the program and that clinicians and administrators together spent more than 200 hours per physician on MIPS-related activities.3 The MIPS program, as currently structured, also has the potential to exacerbate health inequities by transferring resources from physicians caring for less affluent patients to those caring for more affluent patients.7 Furthermore, financial penalties for poor program performance are scheduled to increase in the coming years; this may impose an undue financial burden on safety-net organizations, given the finding that physicians with low MIPS scores, but strong clinical performance, were more likely to care for socially vulnerable patients.
Limitations
This study has several limitations. First, measures were constructed using Medicare claims data and therefore did not capture other clinical outcomes that may be important to patients and clinicians. Second, this study was cross-sectional and focused on a single year; the relationship between MIPS scores and measured quality may change over time. Third, although this study controlled for various patient and physician characteristics in measuring performance on clinical outcome measures, residual confounding may remain. Fourth, because there is no generally accepted method for assigning different weights to various outcome measures, the composite clinical outcome measures were derived from measures weighted by their inverse standard deviation to prevent more heterogenous measures from driving results; however, these outcomes may be of differing importance to patients and clinicians. Fifth, although these measures have been used by the CMS and in other studies evaluating value-based payment programs, they may not capture other important forms of quality care.30,31
Conclusions
Among US primary care physicians in 2019, MIPS scores were inconsistently associated with performance on process and outcome measures. These findings suggest that the MIPS program may be ineffective in measuring and incentivizing quality improvement among US physicians.
eMethods. Calculation of Adjusted Clinical Outcome Score
eTable 1. Process and Outcome Measure Definitions
eTable 2. Construction of the Study Sample
eTable 3. Sample Characteristics
eTable 4. Distribution of MIPS Scores
eTable 5. Association Between Unadjusted Process Measures and MIPS Scores
eTable 6. Association between Adjusted Clinical Outcome Measures and MIPS Scores
eTable 7. Association between Unadjusted Process Measures and MIPS Scores, Consistent Set of Physicians
eTable 8. Association between Adjusted Outcome Measures and MIPS Scores, Consistent Set of Physicians
eTable 9. Association between Unadjusted Process Measures and MIPS Scores, Using 2022 MIPS Performance Thresholds
eTable 10. Association between Adjusted Outcome Measures and MIPS Scores, Using 2022 MIPS Performance Thresholds
eTable 11. Physician, Practice, and Patient Characteristics by MIPS Score-Clinical Outcome Discordance, Consistent Set of Physicians
eTable 12. Physician, Practice, and Patient Characteristics by MIPS Score-Clinical Outcome Discordance, Using 2022 MIPS Performance Thresholds
eTable 13. Physician, Practice, and Patient Characteristics by MIPS Score-Clinical Outcome Discordance, Equally Weighted Composite Score
eFigure 1. Association Between Unadjusted Clinical Outcome Measures and MIPS Scores
eFigure 2. Proportion of Physicians with Poor Relative to Superior Adjusted Clinical Composite Performance in Each MIPS Score Category, Consistent Set of Physicians
eFigure 3. Proportion of Physicians with Poor Relative to Superior Adjusted Clinical Composite Performance in Rach MIPS Score Category, Using 2022 MIPS Performance Thresholds
eFigure 4. Proportion of Physicians with Poor Relative to Superior Adjusted Clinical Composite Performance in Each MIPS Score Category, Equally Weighted Composite Score
Data Sharing Statement
Footnotes
References
- 1.Quality Payment Program Participation in 2019: Results at a Glance. Centers for Medicare & Medicaid Services. Accessed September 20, 2022. https://qpp-cm-prod-content.s3.amazonaws.com/uploads/1190/QPP%202019%20Participation%20Results%20Infographic.pdf
- 2.MacLean CH, Kerr EA, Qaseem A. Time out—charting a path for improving performance measurement. N Engl J Med. 2018;378(19):1757-1761. doi: 10.1056/NEJMp1802595 [DOI] [PubMed] [Google Scholar]
- 3.Khullar D, Bond AM, O’Donnell EM, Qian Y, Gans DN, Casalino LP. Time and financial costs for physician practices to participate in the Medicare Merit-based Incentive Payment System: a qualitative study. JAMA Health Forum. 2021;2(5):e210527. doi: 10.1001/jamahealthforum.2021.0527 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Khullar D, Bond AM, Qian Y, O’Donnell E, Gans DN, Casalino LP. Physician practice leaders’ perceptions of Medicare’s Merit-based Incentive Payment System (MIPS). J Gen Intern Med. 2021;36(12):3752-3758. doi: 10.1007/s11606-021-06758-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Johnston KJ, Hockenberry JM, Wadhera RK, Joynt Maddox KE. Clinicians with high socially at-risk caseloads received reduced Merit-based Incentive Payment System scores. Health Aff (Millwood). 2020;39(9):1504-1512. doi: 10.1377/hlthaff.2020.00350 [DOI] [PubMed] [Google Scholar]
- 6.Johnston KJ, Wiemken TL, Hockenberry JM, Figueroa JF, Joynt Maddox KE. Association of clinician health system affiliation with outpatient performance ratings in the Medicare Merit-based Incentive Payment System. JAMA. 2020;324(10):984-992. doi: 10.1001/jama.2020.13136 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Khullar D, Schpero WL, Bond AM, Qian Y, Casalino LP. Association between patient social risk and physician performance scores in the first year of the Merit-based Incentive Payment System. JAMA. 2020;324(10):975-983. doi: 10.1001/jama.2020.13129 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Crosson FJ, Bloniarz K, Glass D, Mathews J. MedPAC’s urgent recommendation: eliminate MIPS, take a different direction. Health Affairs Forefront. Published March 16, 2018. Accessed July 19, 2020. http://www.healthaffairs.org/do/10.1377/forefront.20180309.302220/full/
- 9.MedPAC Report to Congress: Chapter 15: Moving Beyond the Merit-based Incentive Payment System. Medicare Payment Advisory Commission. Published March 2018. Accessed September 20, 2022. https://www.medpac.gov/wp-content/uploads/import_data/scrape_files/docs/default-source/reports/mar18_medpac_ch15_sec.pdf
- 10.Finnegan J. Why MedPAC wants to scrap MIPS. Fierce Healthcare. Published February 12, 2018. Accessed March 7, 2022. https://www.fiercehealthcare.com/practices/medpac-scrap-mips-mark-miller
- 11.Liao JM, Navathe AS. Medicare should transform MIPS, not scrap it. Health Affairs Forefront. Published March 2, 2021. Accessed March 7, 2022. https://www.healthaffairs.org/do/10.1377/forefront.20210226.949893/full/
- 12.Glance LG, Thirukumaran CP, Feng C, Lustik SJ, Dick AW. Association between the physician quality score in the Merit-based Incentive Payment System and hospital performance in Hospital Compare in the first year of the program. JAMA Netw Open. 2021;4(8):e2118449. doi: 10.1001/jamanetworkopen.2021.18449 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Physician Compare Datasets. Centers for Medicare & Medicaid Services. Accessed December 11, 2020. https://data.medicare.gov/data/physician-compare
- 14.Medicare provider utilization and payment data: physician and other practitioners. Centers for Medicare & Medicaid Services. Accessed February 17, 2020. https://data.cms.gov/provider-summary-by-type-of-service/medicare-physician-other-practitioners
- 15.Two-step Attribution for Claims-based Quality Outcome Measures and Per Capita Cost Measures Included in the Value Modifier [fact sheet]. Centers for Medicare & Medicaid Services. Published August 2017. Accessed November 9, 2022. https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/PhysicianFeedbackProgram/Downloads/2016-03-25-Attribution-Fact-Sheet.pdf
- 16.MIPS 101 for the 2019 Performance Year. Centers for Medicare & Medicaid Services. Published January 2019. Accessed November 9, 2022. https://www.healthit.gov/sites/default/files/playbook/pdf/2019-MIPS-Overview-Webinar_%20Slides.pdf
- 17.Merit-Based Incentive Payment System (MIPS) Scoring 101 Guide for the 2019 Performance Year. Centers for Medicare & Medicaid Services. Published July 13, 2020. Accessed November 9, 2022. https://qpp-cm-prod-content.s3.amazonaws.com/uploads/599/2019%20MIPS%20Scoring%20Guide.pdf
- 18.MIPS explore measures & activities—Quality Payment Program. Centers for Medicare & Medicaid Services. Accessed April 18, 2022. https://qpp.cms.gov/mips/explore-measures?tab=qualityMeasures&py=2019
- 19.HEDIS measures and technical resources. National Committee for Quality Assurance. Accessed April 18, 2022. https://www.ncqa.org/hedis/measures/
- 20.Quality Indicator User Guide: Prevention Quality Indicators (PQI) Composite Measures, V2022. Agency for Healthcare Research and Quality. Published July 2022. Accessed September 20, 2022. https://qualityindicators.ahrq.gov/Downloads/Modules/PQI/V2022/PQI_Composite_Measures.pdf
- 21.Prevention Quality Indicators Overview. Agency for Healthcare Research and Quality. Accessed April 18, 2022. https://qualityindicators.ahrq.gov/measures/pqi_resources
- 22.Federal Office of Rural Health Policy (FORHP) Data Files . US Health Resources & Services Administration. Published April 28, 2017. Accessed February 24, 2020. http://www.hrsa.gov/rural-health/about-us/what-is-rural/data-files
- 23.Compendium of US Health Systems, 2018. Agency for Healthcare Research and Quality. Published 2018. Accessed June 14, 2020. https://www.ahrq.gov/chsp/data-resources/compendium-2018.html
- 24.Jarrín OF, Nyandege AN, Grafova IB, Dong X, Lin H. Validity of race and ethnicity codes in Medicare administrative data compared with gold-standard self-reported race collected during routine home health care visits. Med Care. 2020;58(1):e1-e8. doi: 10.1097/MLR.0000000000001216 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Quality Indicator Empirical Methods, v2021. Agency for Healthcare Research and Quality. Published July 2021. Accessed September 1, 2022. https://qualityindicators.ahrq.gov/Downloads/Resources/Publications/2021/Empirical_Methods_2021.pdf
- 26.Morris CN. Parametric empirical Bayes inference: theory and applications. J Am Stat Assoc. 1983;78(381):47-55. doi: 10.1080/01621459.1983.10477920 [DOI] [Google Scholar]
- 27.Porter ME. What is value in health care? N Engl J Med. 2010;363(26):2477-2481. doi: 10.1056/NEJMp1011024 [DOI] [PubMed] [Google Scholar]
- 28.Khullar D, Wolfson D, Casalino LP. Professionalism, performance, and the future of physician incentives. JAMA. 2018;320(23):2419-2420. doi: 10.1001/jama.2018.17719 [DOI] [PubMed] [Google Scholar]
- 29.Rotenstein LS, Huckman RS, Wagle NW. Making patients and doctors happier—the potential of patient-reported outcomes. N Engl J Med. 2017;377(14):1309-1312. doi: 10.1056/NEJMp1707537 [DOI] [PubMed] [Google Scholar]
- 30.Roberts ET, Zaslavsky AM, McWilliams JM. The Value-Based Payment Modifier: program outcomes and implications for disparities. Ann Intern Med. 2018;168(4):255-265. doi: 10.7326/M17-1740 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.McWilliams JM, Hatfield LA, Landon BE, Hamed P, Chernew ME. Medicare spending after 3 years of the Medicare Shared Savings Program. N Engl J Med. 2018;379(12):1139-1149. doi: 10.1056/NEJMsa1803388 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
eMethods. Calculation of Adjusted Clinical Outcome Score
eTable 1. Process and Outcome Measure Definitions
eTable 2. Construction of the Study Sample
eTable 3. Sample Characteristics
eTable 4. Distribution of MIPS Scores
eTable 5. Association Between Unadjusted Process Measures and MIPS Scores
eTable 6. Association between Adjusted Clinical Outcome Measures and MIPS Scores
eTable 7. Association between Unadjusted Process Measures and MIPS Scores, Consistent Set of Physicians
eTable 8. Association between Adjusted Outcome Measures and MIPS Scores, Consistent Set of Physicians
eTable 9. Association between Unadjusted Process Measures and MIPS Scores, Using 2022 MIPS Performance Thresholds
eTable 10. Association between Adjusted Outcome Measures and MIPS Scores, Using 2022 MIPS Performance Thresholds
eTable 11. Physician, Practice, and Patient Characteristics by MIPS Score-Clinical Outcome Discordance, Consistent Set of Physicians
eTable 12. Physician, Practice, and Patient Characteristics by MIPS Score-Clinical Outcome Discordance, Using 2022 MIPS Performance Thresholds
eTable 13. Physician, Practice, and Patient Characteristics by MIPS Score-Clinical Outcome Discordance, Equally Weighted Composite Score
eFigure 1. Association Between Unadjusted Clinical Outcome Measures and MIPS Scores
eFigure 2. Proportion of Physicians with Poor Relative to Superior Adjusted Clinical Composite Performance in Each MIPS Score Category, Consistent Set of Physicians
eFigure 3. Proportion of Physicians with Poor Relative to Superior Adjusted Clinical Composite Performance in Rach MIPS Score Category, Using 2022 MIPS Performance Thresholds
eFigure 4. Proportion of Physicians with Poor Relative to Superior Adjusted Clinical Composite Performance in Each MIPS Score Category, Equally Weighted Composite Score
Data Sharing Statement

