Abstract
Objective
To compare risk scores computed by DxCG (Verisk) and Centers for Medicare and Medicaid Services (CMS) V21.
Research Design
Analysis of administrative data from the Department of Veterans Affairs (VA) for fiscal years 2010 and 2011.
Study Design
We regressed total annual VA costs on predicted risk scores. Model fit was judged by R‐squared, root mean squared error, mean absolute error, and Hosmer–Lemeshow goodness‐of‐fit tests. Recalibrated models were tested using split samples with pharmacy data.
Data Collection
We created six analytical files: a random sample (n = 2 million), high cost users (n = 261,487), users over age 75 (n = 644,524), mental health and substance use users (n = 830,832), multimorbid users (n = 817,951), and low‐risk users (n = 78,032).
Principal Findings
The DxCG Medicaid with pharmacy risk score yielded substantial gains in fit over the V21 model. Recalibrating the V21 model using VA pharmacy data‐generated risk scores with similar fit statistics to the DxCG risk scores.
Conclusions
Although the CMS V21 and DxCG prospective risk scores were similar, the DxCG model with pharmacy data offered improved fit over V21. However, health care systems, such as the VA, can recalibrate the V21 model with additional variables to develop a tailored risk score that compares favorably to the DxCG models.
Keywords: Risk adjustment, cost, performance measurement, health economics
Large clinical databases provide a foundation for policy makers and health system managers to track performance metrics across health care systems and over time. Such tracking is necessary to build learning health care systems (Bohmer 2011; Smith et al. 2012). However, to conclude that one hospital has higher mortality rates than another, or that one physician group is more efficient than another, analysts must adjust for inherent differences in the patients' underlying clinical needs. Numerous risk adjustment techniques have been developed over the past two decades (Schone and Brown 2013), some proprietary and some in the public domain (Pope et al. 2004). Most use International Classification of Diseases, ninth revision (ICD‐9) codes to classify patients into homogenous clinical categories, such as diabetes without complications. Some systems only produce these clinical categories, while others use these categories to produce a risk score, which reduces the multidimensional problem with many condition categories and facilitates risk adjustment in health services research. Choosing among these risk adjustment techniques is challenging, yet head‐to‐head comparisons are quite rare, particularly for proprietary software.
The Department of Veterans Affairs (VA) uses risk adjustment while producing performance metrics for its hospital systems. For example, VA's hospital efficiency metric requires detailed information on patient risk and input prices (Gao et al. 2011). Historically, VA used DxCG, supported by Verisk, which is a commonly used proprietary risk adjustment systems for cost data. We sought to compare DxCG to the CMS V21 system (hereafter referred to as V21), which was developed to adjust capitated payments for Medicare Advantage plans (Pope et al. 2011). For the comparison, we use data from the VA. We set forth two aims. First, we compared age and sex adjusted risk scores computed by DxCG and V21. Second, as many health systems have rich clinical data on their members, in Aim 2 we created a new risk score based on the V21 and additional information, including pharmacy data, and compared this new recalibrated risk score to the DxCG. This second aim investigates the value gained by tailoring and recalibrating the V21 risk score relative to DxCG.
Methods
Data and Variables
We extracted utilization and diagnostic data from national databases (National Patient Care Database and Patient Treatment Files) maintained by VA for fiscal years (FY) 2010 and 2011. Cost data were obtained from the Decision Support System (DSS) and the HERC Average Cost Datasets (Phibbs et al. 2003; Wagner, Chen, and Barnett 2003; Yu et al. 2003). For care purchased by the VA from non‐VA providers, we extracted paid claims from the Fee Basis datasets.
One challenge in choosing among risk adjustment techniques is that systems developed from commercial providers or for Medicare may not translate to other populations. Therefore, we created six analytical files reflecting veterans who used the VA in FY10 and FY11: (1) 2 million VA users, (2) veterans age 75 and over, (3) high‐cost veterans, defined as those veterans in the top 5 percent of annual VA costs, (4) veterans with a mental health condition and/or substance use disorder (MH‐SUD sample), (5) veterans with multimorbidity, and (6) low‐risk veterans. Samples 2, 4, and 5 were 50 percent random samples from all VA users. Patients in the high cost sample were identified in the HERC Average Cost data, which adjust for geographic wage variation. The multimorbidity sample was defined as any veteran having conditions affecting two or more body systems as measured in the AHRQ Chronic Condition Indicators (Agency for Healthcare Quality and Research 2013). The low‐risk group was focused on identifying patients who were relatively healthy. Therefore, we selected veterans whose health utilization, as tracked in diagnostic codes, affected one or fewer body systems and had a V‐code associated with having a physical in the VA, so as to exclude patients who relied primarily on non‐VA providers. All analytical files were created from the universe of all VA users, with the exception of the low‐risk sample, which was drawn from the 2 million random sample.
For each sample, we computed the DxCG and V21 risk scores using fiscal year 2010 clinical data. The DxCG Risk Solutions software generates 394 Hierarchical Condition Categories (HCCs) in the process of calculating three risk scores: (1) DxCG Medicare prospective (excluding Rx), (2) DxCG Medicaid prospective (including Rx), and (3) DxCG Medicare concurrent (excluding Rx). The V21 model creates 189 HCCs and uses a subset of these HCCs to calculate a risk score. The default risk score is based on weights for people living in the community; different weights are available for those living in an institution or new enrollees. We used the community risk score, unless a person spent more than 90 days in a nursing home, at which point we used the institutional weights.
A challenge in comparing risk models is the distinction between prospective and concurrent risk. Prospective risk models use current year diagnoses to estimate next year's costs, while concurrent risk models use current year diagnoses to estimate current costs. Prospective risk models are thought to place more weight on chronic conditions that influence expenditures over multiple time periods, while concurrent models place more weight on acute care needs. Because they use utilization and outcome data from the same time period, concurrent risk models usually yield higher fit statistics (e.g., R‐squared) than prospective models. DxCG software produces three risk scores (two prospective and one concurrent), while the V21 provides only a prospective risk score, although it can be modified to create a concurrent risk score. Therefore, we compared all risk models in their ability to measure concurrent and prospective costs.
To compare risk scores, we calculated each person's total costs, which included inpatient, outpatient, pharmacy, and purchased care (Fee Basis) in FY2010. We excluded care provided by and paid for by non‐VA insurers (e.g., Medicare). Total costs were based on DSS data, which is an activity‐based cost accounting system embedded at local VA medical centers. Although researchers have used these cost data and found them to be accurate and precise (Chapko et al. 2008; Wagner et al. 2011), there remains the potential for unusually high costs (due to mismatched units and unit costs) or negative costs (due to accounting reconciliations). Veterans whose DSS pharmacy costs exceeded $50,000 were excluded from these analyses. In addition, negative inpatient, outpatient, and pharmacy costs were replaced with zeros. We used HERC Average Cost data, which are encounter‐level cost estimates based on non‐VA relative value units, in a sensitivity analysis. The results were highly robust to the source of cost data (HERC vs. DSS), so we present results using the DSS data.
For aim 1, we established a base set of covariates including the risk score, age, age‐squared, and gender. For aim 2, we included age, age‐squared, gender, and a more extensive set of covariates, including race, being married, and 46 psychiatric condition categories, developed by Rosen and colleagues (Montez‐Rath et al. 2006; Sloan et al. 2006), to account for behavioral health needs for which existing risk systems struggle (Ettner et al. 2000). We included a variable identifying a lack of health insurance, veteran's priority‐level status (Department of Veterans Affairs 2013), and an indicator if the veteran was in a designated registry (e.g., Agent Orange). Finally, we included 25 drug class categories based on an alphanumeric list of 580 drug types maintained by the VA Pharmacy Benefits Management Service (see Table A1). Each drug class represented whether the patient had had a related prescription in the year (i.e., frequency, quantity, or dose was not recorded). We tried three alternative methods for including pharmacy information: CMS' Rx‐HCC model, Medicaid Rx, and veteran's VA pharmacy costs in the prior year. None of these methods worked as well as the 25 VA drug class, so we report the results with the VA drug class information here.
Analyses
The first objective was to compare DxCG and V21 risk scores using clinical and cost data from FY2010. We used regression models to understand how these concurrent risk scores fit the data while controlling for age and gender. We compared ordinary least squares (OLS), log‐OLS, square root OLS, and generalized linear model (GLM) with gamma distribution and log link, and GLM with gamma distribution and square root link. Log‐OLS and square root OLS refer to models where the natural log or square root is taken of the dependent variable and OLS is performed on the transformed variable. To compare model fit, we examined R‐squared, root mean squared error (RMSE), and mean absolute error (MAE). In addition, Hosmer–Lemeshow goodness‐of‐fit tests were run by sorting the predicted values from low to high, and then comparing the mean observed and expected values within each decile. Poorly specified models have extreme differences across the deciles. We examined alternative R‐squared measures for the GLM models, but in the end chose not to include them in the table as they can create confusion when compared to the traditional R‐squared. Model fit statistics were calculated when the data were retransformed with the appropriate smearing estimator (Duan et al. 1983; Jones 2010). In separate analyses, we compared the prospective risk models, in which we used FY2010 clinical data to predict FY2011 costs.
The second objective was to compare the DxCG and V21 risk models, after recalibrating the V21 to the VA with some additional control variables. For this aim, we reran the analytical models outlined in aim 1 with the extensive set of control variables, described above. Recalibrated risk scores were calculated by using the regression model to predict the person's costs, which we then divided by the average predicted costs. Recalibration was done by splitting the samples into two random groups, one for the estimation and the other for validation.
Risk scores can span the range of positive numbers. Unfortunately, different regression specifications can lead to prediction violations, such as negative estimates. The square root OLS model removes the possibility of negative predictions because the predictions are squared and corrected with a smearing estimator (Jones 2010). Squaring values can create rank order problems (i.e., when squared, −2 and 1 change rank order). We examined the raw square root OLS predictions and confirmed that we did not have any negative predicted values.
Results
Sample Characteristics
Table 1 shows the descriptive statistics for the six analytical samples. In the general sample, the average age was 62 years and 94 percent were male. In terms of age and gender, the multimorbid and high cost samples were similar to the general sample, whereas the low‐risk sample had a younger mean age and had a higher proportion of women.
Table 1.
General Sample | Older Sample | High‐Cost Sample | MH‐SUD Sample | Multimorbid Sample | Low‐Risk Sample | |
---|---|---|---|---|---|---|
N | 1,995,620 | 644,524 | 261,487 | 830,832 | 817,951 | 78,032 |
Mean age (SD) | 62.0 (15.9) | 81.4 (4.6) | 62.5 (13.4) | 56.9 (15.2) | 62.2 (13.8) | 48.2 (17.4) |
Male | 94% | 98% | 95% | 91% | 94% | 86% |
Total costsa | ||||||
Mean | 8,819 | 8,067 | 76,920 | 15,067 | 21,345 | 2,435 |
Median | 2,563 | 1,908 | 52,954 | 5,637 | 9,337 | 1,093 |
SD | 24,976 | 25,624 | 76,697 | 33,560 | 40,603 | 5,203 |
Maximum | 1,660,240 | 1,597,986 | 2,979,525 | 2,476,373 | 2,979,525 | 275,166 |
Percent of costs from pharmacy | 18.9 | 25.3 | 5.3 | 17.4 | 15.3 | 6.5 |
Total Costs include inpatient, outpatient, pharmacy, and Fee Basis care. Veterans whose DSS Rx costs exceed $50,000 were excluded from these analyses. Negative DSS VA in/out costs and DSS Rx costs were also replaced with zeros.
In the random sample, the average annual cost was $8,819 (SD: $24,976), with a maximum of $1.66 million (see Table 1). The distribution of costs varied across the samples. The MH‐SUD and multimorbid sample had 2–3 times higher average costs than the general sample. The high cost subsample, which represented the top 5 percent of high cost patients, had an average cost of $76,920. The low‐risk sample had an average cost of $2,435, although it was based on a smaller sample (n = 78,032 persons), in part because of the operational definition and limitations in defining low risk using administrative data without relying directly on utilization or costs.
Aim 1: Comparison of Risk Scores
For all of the samples, the average V21 score yielded a higher average score than the average DxCG prospective risk score excluding pharmacy (Table 2, columns 1 and 2). However, the V21 score was much lower than the average DxCG prospective risk score based on Medicaid data with pharmacy (Table 2). Note that all of these risk scores were being compared in concurrent models.
Table 2.
CMS V21 Concurrent | DxCG Medicare | DxCG Medicaid | ||
---|---|---|---|---|
Prospective without Rx | Concurrent without Rx | Prospective with Rx | ||
Mean (SD) | Mean (SD) | Mean (SD) | Mean (SD) | |
Random sample of users | 0.756 (0.730) | 0.661 (0.698) | 0.497 (0.879) | 1.756 (2.126) |
Older users | 1.065 (0.750) | 0.921 (0.656) | 0.504 (0.874) | 2.020 (2.267) |
High cost users | 2.234 (1.580) | 2.077 (1.628) | 2.684 (2.243) | 7.228 (4.323) |
MH‐SUD users | 0.893 (0.850) | 0.802 (0.802) | 0.770 (1.092) | 2.487 (2.708) |
Multimorbid users | 1.160 (1.024) | 1.044 (1.002) | 1.004 (1.343) | 3.146 (3.027) |
Low‐risk sample | 0.295 (0.234) | 0.236 (0.219) | 0.152 (0.244) | 0.708 (0.708) |
Risk scores were computed using VA clinical data from 2010.
The V21 score was most highly correlated with the DxCG Medicare prospective score without pharmacy (results not shown). The correlations ranged from .83 in the general sample to .75 in the high‐cost and low‐risk sample. The similarities between the V21 and DxCG Medicare prospective score was striking when computing the difference score; more than 99 percent of the cases in the general sample had a difference score between −.1 and .1. Correlations between the V21 and DxCG Medicaid prospective score with pharmacy were lowest for the low‐risk sample (r = .33). The correlations were higher for the general sample (.64), older sample (r = .65), high‐cost sample (.57), MH‐SUD sample (r = .65), and multimorbid sample (.63).
Four goodness‐of‐fit statistics showed that no single statistical model was universally superior, especially when the data were viewed across the six analytical samples. Table 3 shows the fit statistics for the square root OLS model across the six analytical samples. The fit statistics for the general sample and the complete Hosmer–Lemeshow deciles for the OLS and square root OLS models are available in additional tables online.
Table 3.
CMS V21 | DxCG Medicare | DxCG Medicaid | |||
---|---|---|---|---|---|
Concurrent | Concurrent with VA Drug Class Indicators | Prospective without Rx | Concurrent without Rx | Prospective with Rx | |
R 2 | |||||
General | 0.5793 | 0.6924 | 0.5819 | 0.6274 | 0.6351 |
Older | 0.5728 | 0.6772 | 0.5677 | 0.6233 | 0.6397 |
MH‐SUD | 0.5820 | 0.6810 | 0.5896 | 0.6268 | 0.6509 |
High cost | 0.3559 | 0.4281 | 0.3544 | 0.4244 | 0.4241 |
Multimorbid | 0.5350 | 0.6331 | 0.5326 | 0.5957 | 0.5943 |
Low risk | 0.2922 | 0.4573 | 0.3113 | 0.3508 | 0.3778 |
HL F‐stat | |||||
General | 9,206 | 7,499 | 20,251 | 20,687 | 11,726 |
Older | 3,803 | 2,182 | 6,683 | 7,074 | 3,418 |
MH‐SUD | 2,883 | 2,941 | 6,699 | 6,068 | 3,288 |
High cost | 197 | 298 | 214 | 167 | 159 |
Multimorbid | 1,671 | 1,972 | 4,282 | 5,297 | 1,992 |
Low risk | 53 | 91 | 112 | 127 | 113 |
RMSE | |||||
General | 16,560 | 15,531 | 17,053 | 16,590 | 15,540 |
Older | 15,972 | 14,894 | 16,415 | 16,108 | 14,652 |
MH‐SUD | 22,277 | 20,922 | 22,713 | 21,920 | 20,667 |
High cost | 57,306 | 54,756 | 57,422 | 54,254 | 54,473 |
Multimorbid | 27,709 | 26,105 | 28,234 | 26,761 | 26,069 |
Low risk | 4,393 | 4,023 | 4,418 | 4,269 | 4,164 |
MAE | |||||
General | 5,977 | 5,084 | 5,965 | 5,590 | 5,541 |
Older | 5,535 | 4,765 | 5,582 | 5,138 | 5,040 |
MH‐SUD | 9,224 | 8,026 | 9,142 | 8,668 | 8,400 |
High cost | 33,186 | 31,216 | 33,167 | 31,127 | 31,463 |
Multimorbid | 12,414 | 10,920 | 12,466 | 11,492 | 11,533 |
Low risk | 1,892 | 1,643 | 1,862 | 1,815 | 1,772 |
R 2 represents R‐squared; HL F‐stat is the Hosmer–Lemeshow goodness‐of‐fit F statistic; RMSE is root mean squared error; MAE is mean absolute error.
All analyses are based on risk scores computed with 2010 clinical data in relation to 2010 costs.
The large volume of results makes it easy to lose sight of the primary comparison of interest: the performance of the alternative risk models. The CMS V21 and DxCG prospective model yielded very similar fit statistics. In estimating concurrent costs, neither fared as well as the concurrent DxCG model nor the DxCG prospective model based on Medicaid with pharmacy data. As shown in Table 4, R‐squared estimates were between 8 and 20 percentage points higher for the DxCG prospective model with pharmacy than the CMS V21 model. All of the models struggled to fit the data from the low‐risk sample, but again the DxCG prospective model with pharmacy accounted for more variance.
Table 4.
CMS V21 Concurrent | DxCG Medicare | DxCG Medicaid | ||
---|---|---|---|---|
Prospective without Rx | Concurrent without Rx | Prospective with Rx | ||
R 2 | ||||
General | 0.4287 | 0.4308 | 0.5122 | 0.5682 |
Older | 0.4108 | 0.3876 | 0.4802 | 0.5907 |
MH‐SUD | 0.3985 | 0.4191 | 0.4876 | 0.5738 |
High cost | 0.1920 | 0.1999 | 0.2650 | 0.3779 |
Multimorbid | 0.3910 | 0.3906 | 0.4790 | 0.5377 |
Low risk | 0.1646 | 0.1966 | 0.2694 | 0.2701 |
HL F‐stat | ||||
General | 9,516 | 18,274 | 33,496 | 9,454 |
Older | 3,759 | 5,111 | 9,318 | 2,463 |
MH‐SUD | 3,812 | 5,017 | 8,848 | 3,906 |
High cost | 454 | 130 | 56 | 158 |
Multimorbid | 2,519 | 4,071 | 6,212 | 1,760 |
Low risk | 34 | 90 | 156 | 127 |
RMSE | ||||
General | 20,576 | 21,829 | 22,060 | 17,884 |
Older | 22,018 | 23,377 | 23,761 | 18,464 |
MH‐SUD | 27,942 | 29,215 | 28,865 | 23,895 |
High cost | 70,312 | 70,003 | 67,206 | 62,716 |
Multimorbid | 34,035 | 35,043 | 33,708 | 29,888 |
Low risk | 4,945 | 5,045 | 4,782 | 4,605 |
MAE | ||||
General | 7,415 | 7,423 | 6,783 | 6,398 |
Older | 7,320 | 7,552 | 6,812 | 6,077 |
MH‐SUD | 11,843 | 11,607 | 10,774 | 9,942 |
High cost | 41,640 | 41,266 | 39,120 | 36,720 |
Multimorbid | 15,225 | 15,236 | 13,868 | 13,234 |
Low risk | 2,087 | 2,035 | 1,937 | 1,941 |
R 2 represents R‐squared; HL F‐stat is the Hosmer–Lemeshow goodness‐of‐fit F statistic; RMSE is root mean squared error; MAE is mean absolute error.
All analyses are based on risk scores computed with 2010 clinical data in relation to 2010 costs.
The results described above use the risk scores computed with 2010 data to model 2010 costs. In these concurrent comparisons, R‐squared estimates ranged from .4287 to .5682. As expected, model fit, as measured by R‐squared, decreased when the 2010 risk scores were used to predict 2011 costs. When used in a prospective fashion, the R‐squared for the risk scores ranges from .2097 to .2487 (see supplemental tables).
Aim 2: Recalibrating the Model for VA
As was found in aim 1, different regression specifications yielded different fit statistics. The log‐transformed and GLM log‐link models provided good fits across the Hosmer–Lemeshow deciles, with the exception of the top decile, where the fit was very poor. This was evident in very large RMSE and MAE fit statistics. Convergence problems were also more common with the GLM models. In comparison, the square root OLS model yielded the highest R‐squared values, good fits across the Hosmer–Lemeshow deciles, and respectable RMSE and MAE statistics.
Recalibrating the risk models with additional control variables resulted in higher R‐squared values for all risk scores. In the general sample, the recalibrated V21 model yielded an R‐squared of .58. This fit was still lower than the DxCG Medicaid model with pharmacy (R‐squared of .64). Adding the 26 drug class variables to the V21 model provided R‐squared values that were similar to or higher than the DxCG Medicaid with pharmacy model. The addition of pharmacy information was most dramatic with the low‐risk sample. The Hosmer–Lemeshow deciles for the square root OLS model are available in an additional table online.
Recalibration was also influential in improving R‐squared when the 2010 risk scores were being used to predict 2011 cost data. As shown in Table 5, model fit improved for all models, but model fit for these prospective models was still substantially lower than the R‐squared for the concurrent models shown in Table 3.
Table 5.
CMS V21 | DxCG Medicare | DxCG Medicaid | ||
---|---|---|---|---|
Prospective without Rx | Concurrent without Rx | Prospective with Rx | ||
2011 costs | ||||
Basic model | ||||
General | 0.2117 | 0.2228 | 0.2097 | 0.2487 |
Older | 0.1743 | 0.1771 | 0.1732 | 0.2538 |
MH‐SUD | 0.1819 | 0.1970 | 0.1873 | 0.2408 |
High cost | 0.0401 | 0.0676 | 0.0403 | 0.0949 |
Multimorbid | 0.1527 | 0.1690 | 0.1520 | 0.1903 |
Low risk | 0.0600 | 0.0684 | 0.0673 | 0.0470 |
Recalibrated with additional variables | ||||
General | 0.3485 | 0.3558 | 0.3485 | 0.3391 |
Older | 0.3428 | 0.3480 | 0.3440 | 0.3405 |
MH‐SUD | 0.3190 | 0.3252 | 0.3182 | 0.3148 |
High cost | 0.1423 | 0.1631 | 0.1434 | 0.1399 |
Multimorbid | 0.2640 | 0.2757 | 0.2631 | 0.2544 |
Low risk | 0.1321 | 0.1326 | 0.1319 | 0.1272 |
R 2 is based on square root OLS regression models.
All analyses are based on risk scores computed with 2010 clinical data in relation to 2011 costs.
In the recalibrated models, the average risk score in the general sample is 1 (by construction). Average risk increased considerably for the high‐cost group (average 5.58), and averaged .038 for the low‐risk group. Interestingly, the older population had a slightly lower average risk score (.91). We expect this is due to an increased reliance on non‐VA providers when veterans become Medicare eligible and this is scheduled for future examination. The split‐sample analysis confirmed that model fit was very similar in the estimation dataset as compared to the validation dataset (results not shown). The models also confirmed that the results were robust when we used HERC costs instead of DSS costs.
Discussion
Many organizations face a difficult choice in selecting risk adjustment techniques for cost data. Proprietary techniques like DxCG have established reputations and a long history of use, but they have not usually been the subject of direct comparisons with free alternatives. Our analyses show that the choice depends, in part, on the anticipated use and whether the health care system can recalibrate the model. If users have a strong preference for a concurrent risk model (e.g., risk adjustment in a sample with most acute care needs) or cannot recalibrate the model, then the V21 software is unlikely to be as satisfying as some of the DxCG risk scores. The DxCG concurrent and prospective with RX risk scores provided stronger fit statistics than the V21 model when used as a concurrent risk score.
One advantage of the DxCG system over the V21 is its ability to compute a risk score using pharmacy data. In the analyses with our samples, the pharmacy information added 8 to 20 percentage points to the R‐squared estimates when compared to the base models without pharmacy, and the pharmacy data were particularly valuable in the low‐risk group. The advantage of the DxCG software diminished greatly when we recalibrated the V21 risk score to include pharmacy information; in a few cases, the recalibrated V21 score yielded fit statistics that surpassed the DxCG scores.
We examined alternative methods for including pharmacy information into the risk score: CMS Rx‐HCC model, Medicaid Rx (Gilmer et al. 2001), Rx risk (Fishman et al. 2003), and pharmacy costs in the prior year. Interestingly, we found that the extensive margin (any use of pharmacy in the drug classes) was more useful than the intensive margin (pharmacy spending in the prior year). We also found that while the Rx‐HCC and Medicaid Rx model yielded substantial improvements in fit over the V21 model alone, none of these methods worked as well as the VA drug class. Because the choice of risk adjustment techniques depends on the availability of pharmacy data, we separately report the data with the pharmacy variables so that readers can interpret the results accordingly. The VA drug class is maintained by the VA PBM, and the national drug file has both the National Drug Code (NDC) and the VA drug class (VA Pharmacy Benefits Management Service). Therefore, researchers outside of VA can also use this information to link NDC codes to the VA drug class. To expedite this process, we provide a dataset linking VA drug class and NDC for 2014 data (Wagner et al. 2015).
Health services researchers can easily download the V21 model and compute risk scores for their data. At a minimum, researchers would need age, gender, and ICD‐9 diagnostic information. Whether V21 risk scores are sufficient depends in part on the analyst's sample. Given the models were developed on a wide distribution of people, they tend to yield poor fits on either end of the distribution (low‐risk individuals or high‐cost users). Thus, researchers conducting work in the areas of health promotion and prevention may need to consider other risk adjustment models. It was in the low‐risk subpopulation where pharmacy data greatly improved the model fit, as many people may use little health care in part because they are adhering to medication regimens.
Robustness and Sensitivity
The DxCG and V21 risk scores were tested across six samples, using a range of regression models and a number of fit statistics. The results did not identify a preferred statistical approach, but they do provide some reassurances on the model specification. Like prior research (Montez‐Rath et al. 2006), the square root transformation generally performed quite well in its ability to fit the data, as evidence with the Hosmer–Lemeshow deciles and R‐squared. Readers should be cautious about using these results to conclude that one risk model was always preferred to another risk model; the results do not support that conclusion. The OLS and square root models can also create prediction problems (negative or changes in rank order). Although we did not encounter rank order problems with the square root models, analysts who use this approach would want to check for such possibilities.
Despite all of the analyses, there are some limitations with this research that are worthy of discussion. There were many alternative risk adjustment systems that we could have chosen for this comparison. For example, Charlson and Chronic Illness and Disability Payment System (CDPS) are in the public domain and CRGs and ACGs are common proprietary systems. We limited our comparison to DxCG and V21 because they are commonly used for utilization and cost outcomes, and because the history and potential for use in the VA. However, we believe the methods we present can guide future comparisons of other risk adjustment systems. Second, these risk models were all compared using cost data as the dependent variable. Risk adjustment techniques designed for one outcome (e.g., costs) may not perform well on other outcomes (e.g., mortality) (Quan et al. 2005; Wang et al. 2012a,b). Finally, these models focused on care provided by or paid by VA. Most older veterans are eligible for Medicare and Medicare use is common (Petersen et al. 2010; Liu et al. 2011). It is possible to include Medicare data in risk scores, although those data are subject to lags in availability. Ongoing work is examining how these risk scores change when Medicare data are included.
Gaming and Risk Selection
One concern with risk adjustment models is gaming to make the patient's risk appear to be as high as possible (Schone and Brown 2013). In the current study, we included pharmacy data, which proved to increase the fit of the regression models substantially. Pharmacy data were based on any use of a medication in one of 25 VA drug classes. While VA drug class outperformed other models for pharmacy use, caution is warranted because measuring pharmacy use in this manner could lead to gaming. In VA, the issue of gaming is less acute because the VA uses a separate tiered capitation model, known as Veterans Equitable Resource Allocation, to distribute funding without using pharmacy data.
Another concern with risk adjustment is risk selection (c.f., Brown et al. 2014). In VA, the issue of risk selection is less immediately apparent because the models developed in this paper were designed for performance monitoring and are distinct from VA's financial allocation model. One area of potential concern for risk selection is among veterans who are dually eligible for VA and other insurance, namely Medicare. Because the VERA is calculated only on VA data, there is an implicit incentive for VA medical centers to provide enough services that “qualify” patients for the highest possible reimbursement level, but there is no incentive to provide all of the veterans' care. An alternative model that estimates patient risk based on complete clinical information (VA and non‐VA claims) and then allocates VA funds based on a prorated algorithm would change the incentives associated with dual coverage. More research would be needed to identify a suitable algorithm for prorating the funds, but it could potentially reduce the notable distortions caused by risk selection and dual use (Trivedi et al. 2012).
Implementation
The VA has computed the V21 and the recalibrated models to estimate patient risk, and to date risk scores for 2006–2015 have been created on its data warehouses. The recalibrated risk scores have been named Nosos, which means “chronic condition” in Greek. The square root OLS model has consistently provided excellent fit and across the 9 years of data, and we have not encountered any rank order problems when retransforming the data. Further data documentation for the implementation of V21 and Nosos in VA is available online (Wagner et al. 2015).
Conclusion
We find high concordance between the DxCG prospective risk score based on Medicare data and the V21 model provided by CMS. The DxCG system provides alternative risk scores that are not provided by V21 (a prospective Medicaid score with pharmacy and a concurrent Medicare score). Others may place more value on these other risk scores, and we found that the DxCG Medicaid score with pharmacy frequently produced the best fit statistics in concurrent models. The V21 model can be recalibrated, and with the inclusion of pharmacy data, the recalibrated V21 model yielded risk scores that provided similar fit statistics to the DxCG software.
Supporting information
Acknowledgments
Joint Acknowledgment/Disclosure Statement: Funding for this study was provided by the VA Office for Operational Analytics and Reporting. We thank Mel Ingber (RTI), Paul Fishman (Group Health Cooperative), the associate editor, and two anonymous reviewers for suggestions. We appreciate the comments and insights from James Campbell, Yu‐Fang Li, Mei‐Ling Shen, Bruce Kinosian, Amy Rosen, and Maria Rath Montez.
Disclosures: This work was funded through the Operational Value Initiative from the VA Office of Analytics and Business Intelligence/Office of Informatics and Analytics and VA Operational Analytics and Reporting.
Disclaimers: The views, opinions, and content of this publication are those of the authors and do not necessarily reflect the views, opinions, or policies of the Department of Veterans Affairs. The authors had full control of the study design, methods used, outcome parameters and results, analysis of data, and production of the written report.
References
- Agency for Healthcare Quality and Research . 2013. “Chronic Condition Indicator (CCI) for ICD‐9‐CM” [accessed on April 20, 2013]. Available at http://www.hcup-us.ahrq.gov/toolssoftware/chronic/chronic.jsp
- Bohmer, R. M. 2011. “The Four Habits of High‐Value Health Care Organizations.” New England Journal of Medicine 365 (22): 2045–7. [DOI] [PubMed] [Google Scholar]
- Brown, J. , Duggan M., Kuziemko I., and Woolston W.. 2014. “How Does Risk Selection Respond to Risk Adjustment? New Evidence from the Medicare Advantage Program.” American Economic Review 104 (10): 3335–64. [DOI] [PubMed] [Google Scholar]
- Chapko, M. K. , Liu C. F., Perkins M., Li Y. F., Fortney J. C., and Maciejewski M. L.. 2008. “Equivalence of Two Healthcare Costing Methods: Bottom‐Up and Top‐Down.” Health Economics 18 (10): 1188–201. [DOI] [PubMed] [Google Scholar]
- Department of Veterans Affairs . 2013. “Health Benefits: Priority Group Tables” [accessed on April 16, 2013]. Available at http://www.va.gov/healthbenefits/resources/priority_groups.asp
- Duan, N. , Manning W. G., Morris C. N., and Newhouse J. P.. 1983. “A Comparison of Alternative Models for the Demand for Medical Care.” Journal of Business and Economic Statistics 1 (2): 115–26. [Google Scholar]
- Ettner, S. L. , Frank R. G., Mark T., and Smith M. W.. 2000. “Risk Adjustment of Capitation Payments to Behavioral Health Care Carve‐Outs: How Well Do Existing Methodologies Account for Psychiatric Disability?” Health Care Manag Sci 3 (2): 159–69. [DOI] [PubMed] [Google Scholar]
- Fishman, P. A. , Goodman M. J., Hornbrook M. C., Meenan R. T., Bachman D. J., and Rosetti M. C. O. K.. 2003. “Risk Adjustment Using Automated Ambulatory Pharmacy Data: The RxRisk Model.” Medical Care 41 (1): 84–99. [DOI] [PubMed] [Google Scholar]
- Gao, J. , Moran E., Almenoff P. L., Render M. L., Campbell J., and Jha A. K.. 2011. “Variations in Efficiency and the Relationship to Quality of Care in the Veterans Health System.” Health Affairs (Millwood) 30 (4): 655–63. [DOI] [PubMed] [Google Scholar]
- Gilmer, T. , Kronick R., Fishman P., and Ganiats T. G.. 2001. “The Medicaid Rx Model: Pharmacy‐Based Risk Adjustment for Public Programs.” Medical Care 39 (11): 1188–202. [DOI] [PubMed] [Google Scholar]
- Jones, A. 2010. “Models for Health Care” In The Oxford Handbook of Economic Forecasting, edited by Hendry D., and Clements M., pp. 625–54. Oxford, England: Oxford University Press. [Google Scholar]
- Liu, C. F. , Manning W. G., Burgess J. F. Jr, Hebert P. L., Bryson C. L., Fortney J., Perkins M., Sharp N. D., and Maciejewski M. L.. 2011. “Reliance on Veterans Affairs Outpatient Care by Medicare‐Eligible Veterans.” Medical Care 49 (10): 911–7. [DOI] [PubMed] [Google Scholar]
- Montez‐Rath, M. , Christiansen C. L., Ettner S. L., Loveland S., and Rosen A. K.. 2006. “Performance of Statistical Models to Predict Mental Health and Substance Abuse Cost.” BMC Medical Research Methodology 6: 53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Petersen, L. A. , Byrne M. M., Daw C. N., Hasche J., Reis B., and Pietz K.. 2010. “Relationship between Clinical Conditions and Use of Veterans Affairs Health Care among Medicare‐Enrolled Veterans.” Health Services Research 45 (3): 762–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Phibbs, C. S. , Bhandari A., Yu W., and Barnett P. G.. 2003. “Estimating the Costs of VA Ambulatory Care.” Medical Care Research and Review 60 (3 Suppl): 54S–73S. [DOI] [PubMed] [Google Scholar]
- Pope, G. C. , Kautter J., Ellis R. P., Ash A. S., Ayanian J. Z., Lezzoni L. I., Ingber M. J., Levy J. M., and Robst J.. 2004. “Risk Adjustment of Medicare Capitation Payments Using the CMS‐HCC Model.” Health Care Financ Rev 25 (4): 119–41. [PMC free article] [PubMed] [Google Scholar]
- Pope, G. C. , Kautter J., Ingber M. J., Freeman S., Sekar R., and Newhart C.. 2011. Evaluation of the CMS‐HCC Risk Adjustment Model. Perpared by RTI for Centers for Medicare & Medicaid Services, Medicare Plan Payment Group, Division of Risk Adjustment and Payment Policy. [Google Scholar]
- Quan, H. , Sundararajan V., Halfon P., Fong A., Burnand B., Luthi J. C., Saunders L. D., Beck C. A., Feasby T. E., and Ghali W. A.. 2005. “Coding Algorithms for Defining Comorbidities in ICD‐9‐CM and ICD‐10 Administrative Data.” Medical Care 43 (11): 1130–9. [DOI] [PubMed] [Google Scholar]
- Schone, E. , and Brown R.. 2013. “Risk Adjustment: What is the Current State of the Art, and How Can It Be Improved?” In Mathematica Policy Research: Research Synthesis Report No. 25. Minneapolis, MN: Robert Wood Johnson Foundation. [Google Scholar]
- Sloan, K. L. , Montez‐Rath M. E., Spiro A. 3rd, Christiansen C. L., Loveland S., Shokeen P., Herz L., Eisen S., Breckenridge J. N., and Rosen A. K.. 2006. “Development and Validation of a Psychiatric Case‐Mix System.” Medical Care 44 (6): 568–80. [DOI] [PubMed] [Google Scholar]
- Smith, M. , Saunders R., Stuckhardt L., and McGinnis J. M.. 2012. Best Care at Lower Cost: The Path to Continuously Learning Health Care in America. Washington, DC: The National Academies Press, Institute of Medicine. [PubMed] [Google Scholar]
- Trivedi, A. N. , Grebla R. C., Jiang L., Yoon J., Mor V., and Kizer K. W.. 2012. “Duplicate Federal Payments for Dual Enrollees in Medicare Advantage Plans and the Veterans Affairs Health Care System.” Journal of the American Medical Association 308 (1): 67–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- VA Pharmacy Benefits Management Service . n.d. [accessed on March 27, 2015]. Available at http://www.pbm.va.gov/nationalformulary.asp
- Wagner, T. H. , Chen S., and Barnett P. G.. 2003. “Using Average Cost Methods to Estimate Encounter‐Level Costs for Medical‐Surgical Stays in the VA.” Medical Care Research and Review 60 (3 Suppl): 15S–36S. [DOI] [PubMed] [Google Scholar]
- Wagner, T. H. , Sethi G., Holman W., Lee K., Bakaeen F. G., Upadhyay A., McFalls E., Tobler H. G., Kelly R. F., Crittenden M. D., Thai H., and Goldman S.. 2011. “Costs and Quality of Life Associated with Radial Artery and Saphenous Vein Cardiac Bypass Surgery: Results from a Veterans Affairs Multisite Trial.” American Journal of Surgery 202 (5): 532–5. [DOI] [PubMed] [Google Scholar]
- Wagner, T. H. , Cowgill E. H., Cashy J., and Shen M.‐L.. 2015. “Risk Adjustment: Guide to the V21 and Nosos Risk Score Programs” [accessed on March 27, 2015]. Available at http://www.herc.research.va.gov/include/page.asp?id=technical-report-risk-adjustment
- Wang, L. , Porter B., Maynard C., Bryson C., Sun H., Lowy E., McDonell M., Frisbee K., Nielson C., and Fihn S. D.. 2012a. “Predicting Risk of Hospitalization or Death among Patients with Heart Failure in the Veterans Health Administration.” American Journal of Cardiology 110 (9): 1342–9. [DOI] [PubMed] [Google Scholar]
- Wang, L. , Porter B., Maynard C., Evans G., Bryson C., Sun H., Gupta I., Lowy E., McDonell M., Frisbee K., Nielson C., Kirkland F., and Fihn S. D.. 2012b. “Predicting Risk of Hospitalization or Death among Patients Receiving Primary Care in the Veterans Health Administration.” Medical Care 51 (4): 368–73. [DOI] [PubMed] [Google Scholar]
- Yu, W. , Wagner T. H., Chen S., and Barnett P. G.. 2003. “Average Cost of VA Rehabilitation, Mental Health, and Long‐Term Hospital Stays.” Medical Care Research and Review 60 (3 Suppl): 40S–53S. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.