Skip to main content
Rand Health Quarterly logoLink to Rand Health Quarterly
. 2012 Mar 1;2(1):3.

Physician Cost Profiling–Reliability and Risk of Misclassification

Detailed Methodology and Sensitivity Analyses

John L Adams, Ateev Mehrotra, J William Thomas, Elizabeth A McGlynn, John L Adams, Ateev Mehrotra, Elizabeth A McGlynn
PMCID: PMC4945285  PMID: 28083225

Short abstract

This article describes the methods and sensitivity analyses used in a New England Journal of Medicine article. A study of claims data from four Massachusetts health plans shows varying reliability in current methods of physician cost profiling. It also explains the relationship between reliability measurement and misclassification for physician quality and cost measures in health care and describes a method to calculate reliability and misclassification from data typically available to health plans.

Abstract

This article describes the methods and sensitivity analyses used by the authors in an article published in the New England Journal of Medicine. Purchasers are experimenting with a variety of approaches to control health care costs, including limiting network contracts to lower-cost physicians and offering patients differential copayments to encourage them to visit “high-performance” (i.e., higher-quality, lower-cost) physicians. These approaches require a method for analyzing physicians' costs and a classification system for determining which physicians have lower relative costs. There has been little analysis of the reliability of such methods. Reliability is determined by three factors: the number of observations, the variation between physicians in their use of resources, and random variation in the scores. A study of claims data from four Massachusetts health plans demonstrates that, according to the current methods of physician cost profiling, the majority of physicians did not have cost profiles that met common reliability thresholds and, importantly, reliability varied significantly by specialty. Low reliability results in a substantial chance that a given physician will be misclassified as lower-cost when he or she is not, or vice versa. Such findings raise concerns about the use of cost profiling tools and the utility of their results.

It also explains the relationship between reliability measurement and misclassification for physician quality and cost measures in health care. It provides details and a practical method to calculate reliability and misclassification from the data typically available to health plans. This article builds on other RAND work on reliability and misclassification and has two main goals. First, it can serve as a tutorial for measuring reliability and misclassification. Second, it will describe the likelihood of misclassification in a situation not addressed in our prior work in which physicians are categorized using statistical testing. For any newly proposed system, the methods presented here should enable an evaluator to calculate the reliabilities and, consequently, the misclassification probabilities. It is our hope that knowing these misclassification probabilities will increase transparency about profiling methods and stimulate an informed debate about the costs and benefits of alternative profiling systems.


This content accompanies the article “Physician Cost Profiling—Reliability and Risk of Misclassification,” published in the New England Journal of Medicine (Adams et al., 2010b). Purchasers are experimenting with a variety of approaches to control health care costs, including limiting network contracts to lower-cost physicians and offering patients differential copayments to encourage them to visit “high-performance” (i.e., higher-quality, lower-cost) physicians. These approaches require a method for analyzing physicians' costs and a classification system for determining which physicians have lower relative costs. To date, many aspects of the scientific soundness of these methods have not been evaluated.

One important measure of scientific soundness is reliability. Reliability is a key metric of the suitability of a measure for profiling because it describes how well one can confidently distinguish the performance of one physician from that of another. Conceptually, it is the ratio of signal to noise. The signal, in this case, is the proportion of the variability in measured performance that can be explained by real differences in performance.

The overall finding of the research is that the majority of physicians in our data sample did not have cost profiles that met common thresholds of reliability and that the reliability of cost profiles varied greatly by specialty. In an illustrative two-tiered insurance product, a large fraction of physicians were misclassified as low-cost when they were actually not, or vice versa. Our findings raise concerns about the use of cost profiling tools, because consumers, physicians, and purchasers are at risk of being misled by the results.

Because public and private purchasers and health plans are demanding more information about the quality and relative cost of U.S. physicians to increase physician accountability and to aid in value-based purchasing (McKethan et al., 2006; Milstein and Lee, 2007), we have developed a method to calculate reliability and misclassification from data typically available to health plans. Although performance measurement has been in place for some time in hospitals and managed care organizations (MCOs), the focus on physicians is a relatively new development. The inherent limitations of the data available at the physician level have brought to the fore technical issues that were less important at higher levels of aggregation in hospitals and MCOs (Associated Press, 2007; Kazel, 2005).

One of these technical issues is the statistical reliability of a physician's performance measure and how it may lead to misclassification of physicians as high or low performers. While the use of more reliable measures is obviously important to all stakeholders, the meanings of reliability and misclassification in this context are sometimes unclear, and the methods for measuring both reliability and misclassification in practice may seem daunting to those designing and implementing performance measurement systems. Addressing these needs is the focus of this work. This content builds on other RAND reports and publications on reliability and misclassification and has two major goals (Adams et al., 2010a; Adams, 2009; Adams et al., 2010b). First, it can serve as a tutorial for measuring reliability and misclassification. Second, it goes beyond our previous work to describe the potential for misclassification when physicians are categorized using statistical testing.

Fundamentally, reliability is a quantitative measure indicating the confidence one can have that a physician is different from his or her peers. One concern is that for most readers, interpreting reliability is not intuitive. Additionally, there is no agreement on a level of reliability that is considered acceptable in the context of provider profiling. We used reliability to estimate a more intuitive concept: the rate at which physicians are misclassified for a particular application of cost profiling.

The most commonly used applications of cost profiles (e.g., public reporting, pay for performance, tiering) typically require classifying physicians into categories. Reliability can be used to calculate the probability that a physician will be correctly or incorrectly classified in a particular application.* The reliability-misclassification relationship can be estimated for most common reporting systems that include cut points based on percentiles and statistical testing.

Reliability is a topic of increasing importance in profiling applications. We hope this work will reduce some of the confusion about what reliability is and how it relates to misclassification. Ultimately, whether reliability is “good enough” for any given application will be judged by the probability that the system misclassifies physicians. For any newly proposed system, these methods should enable an evaluator to calculate the reliabilities and, consequently, the misclassification probabilities. It is our hope that knowing these misclassification probabilities will increase transparency about profiling methods and stimulate an informed debate about the costs and benefits of alternative profiling systems.

Notes

*

If assignment to categories is based on a fixed external standard (e.g., cost profile less than 0.5), reliability can be used to estimate misclassification probabilities after the fixed external standard is transformed to a percentile of the scale.

Reference

  1. Adams JL, The Reliability of Provider Profiling: A Tutorial, Santa Monica, Calif.: RAND Corporation, TR-653-NCQA, 2009. As of May 6, 2012: http://www.rand.org/pubs/technical_reports/TR653.html [Google Scholar]
  2. Adams JL, Mehrotra A, Thomas JW, andMcGlynn EA, Physician Cost Profiling—Reliability and Risk of Misclassification: Detailed Methodology and Sensitivity Analyses, Santa Monica, Calif.: RAND Corporation, TR-799-DOL, 2010a. As of May 6, 2012: http://www.rand.org/pubs/technical_reports/TR799.html [PMC free article] [PubMed] [Google Scholar]
  3. Adams JL, Mehrotra A, Thomas JW, andMcGlynn EA, “Physician Cost Profiling—Reliability and Risk of Misclassification,” New England Journal of Medicine, Vol. 362, No. 11, March 18, 2010b, pp. 1014–1021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Associated Press, “Regence BlueShield, WA Doctors Group Settle Lawsuit,” Seattle Post-Intelligencer, August 8, 2007.
  5. Kazel R, “Tiered Physician Network Pits Organized Medicine vs. United,” American Medical News. March 7, 2005.
  6. McKethan A, Gitterman D, Feezor A, andEnthoven A, “New Directions for Public Health Care Purchasers? Responses to Looming Challenges,” Health Affairs, Vol. 25, No. 6, 2006, pp. 1518–1528. [DOI] [PubMed] [Google Scholar]
  7. Milstein A andLee TH, “Comparing Physicians on Efficiency,” New England Journal of Medicine, Vol. 357, No. 26, 2007, pp. 2649–2652. [DOI] [PubMed] [Google Scholar]

Articles from Rand Health Quarterly are provided here courtesy of The RAND Corporation

RESOURCES