Abstract
Two groups (Segal et al. Med Care. 2017;55(7):716–722; Segal et al. Am J Epidemiol. 2017;186(6):745–747; and Kim et al. J Gerontol A Biol Sci Med Sci. 2018;73(7):980–987) recently proposed methods for modeling frailty in studies where a reference standard frailty measure is not directly observed, but Medicare claims data are available. The groups use competing frailty measures, but the premise is similar: In a validation data set, model the frailty measure versus claims variables; in the primary data set, impute frailty status from claims variables, and conduct inference with those imputed values in place of the unobserved frailty measure. Potential use cases include risk prediction, confounding control, and prevalence estimation. In this commentary, we describe validity issues underlying these approaches, focusing mainly on risk prediction. Our main concern is that these approaches do not permit valid estimation of associations between the reference standard frailty measure (i.e., “frailty”) and health outcomes. We argue that Segal’s approach is akin to multiple imputation but with the outcome variable omitted from the imputation model, while Kim’s is akin to regression calibration but with many variables improperly treated as surrogates. We discuss alternatives for risk prediction, including a secondary approach previously considered by Kim et al., and briefly comment on other use cases.
Keywords: multiple imputation, regression calibration, surrogacy, validation data
Editor's Note: Responses to this commentary appear on page 372 and page 373.
There is considerable interest in modeling frailty (e.g., as a primary exposure of interest, a potential confounder, or an effect modifier) in studies where the information needed to classify frailty according to a reference standard measure are not available. Segal et al. (1, 2) and Kim et al. (3) recently proposed a similar idea: Leverage the fact that Medicare claims data could be available, and many of those variables could inform a reference standard frailty measure that is missing. This motivates a 2-step estimation procedure: 1) In a validation data set, fit a model for the reference standard frailty measure given variables from claims data; 2) in the main data set, impute the reference standard frailty measure from claims data, and use those imputed values (the “claims-based frailty index”) in place of the reference standard to estimate quantities of epidemiologic interest.
While imputing a missing variable based on observed data is intuitive, these procedures do not generally permit valid estimation of the parameters that would be targeted if the reference standard frailty measure were actually observed. Here we argue that the Segal and Kim procedures are essentially variants on well-established estimation procedures (multiple imputation and regression calibration, respectively), with key differences that undermine their validity when used to estimate associations between the reference standard frailty measure and health outcomes. We focus primarily on this issue, although we also briefly address confounding control, effect modification, and prevalence estimation.
The main distinction between the two proposed approaches is that they use different frailty measures as a reference standard. Segal et al. anchor their claims-based frailty index to the binary frailty phenotype, defined as having ≥3 of 5 specified physical criteria (4), while Kim et al. anchor theirs to a continuous deficit-accumulation frailty index (5), defined as the proportion of 56 deficits that are present. Segal et al. used logistic regression to model frailty phenotype versus claims data (314 candidate variables; 21 selected) and classified patients as “frail” if the predicted probability of phenotypic frailty was ≥0.2, while Kim et al. used linear regression to model frailty index versus claims data (580 candidate variables; 93 selected) and defined their claims-based frailty index as the predicted values from that model. While the imputation models differ, both groups suggest simply modeling the imputed frailty variable to estimate associations for frailty.
Segal et al.’s procedure is nonstandard, but it can be viewed as an offshoot of multiple imputation, with 21 auxiliary variables (i.e., variables used in the imputation model but not in the subsequent regression model), a single deterministic imputation rather than multiple random draws, and the outcome variable (e.g., hospitalization, falls, death) omitted from the imputation model. The third deviation is the key problem. In multiple imputation, the outcome variable must be included in the imputation model to avoid distorting relationships among variables and thus inducing bias (6). We see no reason for Segal et al.’s simpler single imputation approach to be insulated from these effects, and thus we strongly suspect that regressing health outcomes on the claims-based frailty index will yield biased point estimates for the corresponding associations for the reference standard frailty measure.
We created a web application (https://jhubiostatistics.shinyapps.io/cfi_sim/) to examine bias associated with Segal et al.’s approach in the context of a simple scenario where a logistic regression relates
(e.g., whether patient has a fall in the following year) to a binary
(the reference standard frailty measure) and covariate
(e.g., standardized age),
informs
through a logistic regression, and
is marginally normal. The 7 parameters indexing the system (three in the
model, two in the
model, and the mean and variance of
) can be adjusted, along with the main study and validation study sample size and the probability cutpoint
used to classify
as 0 or 1. We consider Segal’s procedure as originally proposed as well as an alternate version where
is regressed on the predicted probability of
rather than
(essentially, regression calibration for a binary exposure).
Across a wide range of parameter values, fitting
versus imputed
does not validly estimate the crude or
-adjusted association relating
and
. Bias is often large in magnitude and can be toward the null (including flipped direction) or away from the null, relative to both the crude and adjusted association. One scenario where the predicted probability version works is when
informs
but not
independently of
, which is to be expected, as this corresponds to regression calibration where
is a surrogate for
(7).
A reviewer noted that we did not consider a version of Segal et al.’s procedure where
is included as a covariate in the final step. The problem with adjusting for
is that the imputed
values are determined completely by the
values. In the usual regression calibration setting where
is continuous, it is not possible to fit a model with both variables because the imputed
values are perfect linear combinations of the
values (equivalently, the design matrix is less than full rank). Here, with binary
and logistic regression relating
to
, the model can technically be fitted thanks to the nonlinearity of the logistic function. We experimented with a
-adjusted version of the predicted probability version of Segal et al.’s estimator and generally observed unbiased but very imprecise estimation. This is not surprising considering that an indirect and presumably very weak signal is being leveraged for identifiability. A version of the web application that includes this estimator is available at https://jhubiostatistics.shinyapps.io/cfi_sim2/. We would not recommend using this estimator, which is essentially a new version of regression calibration (binary exposure and no surrogates) with uncharacterized statistical properties. To be clear, this approach corresponds to targeting a covariate-adjusted association between frailty and a health outcome and using the exact same set of covariates in the calibration model.
Shifting focus to Kim et al.’s approach, their procedure is essentially regression calibration, but with 93 “surrogates” rather than just 1. The problem is that some (perhaps most) of these 93 variables are likely not true surrogates in the sense required for regression calibration. In a standard application of regression calibration, the model for the missing variable (i.e., the reference standard frailty measure) has the same covariates as the disease model of interest (perhaps age and sex) plus 1 additional variable—the surrogate—which is presumed to inform the exposure but not the outcome conditional on exposure and covariates. A natural choice for the surrogate is a mismeasured version of the exposure, because such a variable is generally quite informative of the true exposure but can safely be assumed uninformative of outcome given the true exposure (7). While regression calibration can accommodate multiple surrogates (8, 9), they all must be conditionally independent of the outcome. Given the nature of Kim et al.’s 93 items, it is highly improbable that all 93 would satisfy this requirement for any health outcome.
As a result of nonsurrogacy, one cannot expect Kim et al.’s procedure to recover regression coefficients relating the reference standard frailty measure to health outcomes. In a simpler setting (e.g., with 1 or 2 rather than 93 variables), it might be possible to work out an expression for the quantity that is ultimately being estimated, and perhaps derive a bias-correction formula and/or conditions in which the direction of bias can be predicted. This is beyond our scope and would likely be extremely difficult with nearly 100 variables involved, but we think it would be a necessary step if Kim et al.’s procedure is to be used to approximate inferences with respect to the reference standard in practice.
How might frailty researchers proceed? If anchoring to a reference standard frailty measure is considered very important, modified versions of the procedures proposed by Segal et al. and Kim et al. could be further pursued. For practical purposes, we believe that internal validation data—where the reference standard frailty measure, outcome, and covariates are all measured on the same subjects—would be a key component. Operationally, this would require measuring the reference standard frailty measure in a subsample of study participants. Such data would permit a proper application of multiple imputation, where the outcome is included in the imputation model. Including the claims-based classifier variables would not be necessary, although doing so would improve statistical efficiency (10). Absent internal validation data, we see little promise in pursuing regression calibration, because it would be difficult to find one or more variables that are 1) reasonably informative of the reference standard frailty measure (for precision), yet 2) conditionally independent of the outcome (for unbiasedness).
For researchers amenable to deficit accumulation indexes, we see merit in Kim et al.’s secondary approach of constructing a claims-based frailty index directly from a designated set of claims-based deficits (3). They favored the regression calibration claims-based frailty index because it resulted in a higher correlation with the reference standard frailty measure and a higher C statistic with mortality, but with the aforementioned surrogacy problem it is not clear what quantity the procedure ultimately estimates. Conversely, using a direct deficit-accumulation index strikes us as computationally simple and theoretically defensible, and it avoids creating a false expectation that inferences will be analogous to modeling the “anchored” reference standard frailty measure.
We conclude by discussing some use cases other than risk prediction. For confounding control, adjusting for an imputed or calibrated reference standard frailty measure could lead to a major residual confounding problem. Kim’s claims-based frailty index explained less than 40% of the variability in the reference standard frailty measure, so controlling for the claims-based frailty index would likely constitute an underadjustment for frailty status. This degree of misclassification could also undermine estimation of frailty-related interactions, inducing bias and leading to a considerable loss of power relative to observing the reference standard frailty measure (11). For estimating frailty prevalence, transporting a fitted logistic regression model for frailty status conditional on claims data (e.g., Segal’s imputation model, summarized in their Table 2 (1)) seems promising. Segal et al. reported both the mean predicted probability of frailty and the proportion of study participants with predicted probabilities greater than 0.2 as estimators for frailty prevalence (2). The statistical properties of these estimators should be examined in future work—specifically, whether they are unbiased; if both are unbiased, which is more efficient; and how confidence intervals can be constructed that properly reflect uncertainty in the predicted probabilities.
ACKNOWLEDGMENTS
Author affiliations: Department of Biostatistics and Center for Aging on Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland (Dane R. Van Domelen, Karen Bandeen-Roche).
This research was supported by the Epidemiology and Biostatistics of Aging Training Program, funded by the National Institute on Aging (grant T32AG000247).
Conflict of interest: none declared.
REFERENCES
- 1. Segal JB, Chang H-Y, Du Y, et al. Development of a claims-based frailty indicator anchored to a well-established frailty phenotype. Med Care. 2017;55(7):716–722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Segal JB, Huang J, Roth DL, et al. External validation of the claims-based frailty index in the National Health and Aging Trends Study cohort. Am J Epidemiol. 2017;186(6):745–747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Kim DH, Schneeweiss S, Glynn RJ, et al. Measuring frailty in Medicare data: development and validation of a claims-based frailty index. J Gerontol A Biol Sci Med Sci. 2018;73(7):980–987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Fried LP, Tangen CM, Walston J, et al. Frailty in older adults: evidence for a phenotype. J Gerontol A Biol Sci Med Sci. 2001;56(3):M146–M156. [DOI] [PubMed] [Google Scholar]
- 5. Mitnitski AB, Mogilner AJ, Rockwood K. Accumulation of deficits as a proxy measure of aging. Scientific World Journal. 2001;1:323–336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Moons KGM, Donders RART, Stijnen T, et al. Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol. 2006;59(10):1092–1101. [DOI] [PubMed] [Google Scholar]
- 7. Rosner B, Willett WC, Spiegelman D. Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Stat Med. 1989;8(9):1051–1069. [DOI] [PubMed] [Google Scholar]
- 8. Weller EA, Milton DK, Eisen EA, et al. Regression calibration for logistic regression with multiple surrogates for one exposure. J Stat Plan Inference. 2007;137(2):449–461. [Google Scholar]
- 9. Kipnis V, Midthune D, Freedman LS, et al. Regression calibration with more surrogates than mismeasured variables. Stat Med. 2012;31(23):2713–2732. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Raghunathan TE. What do we do with missing data? Some options for analysis of incomplete data. Annu Rev Public Health. 2004;25(1):99–117. [DOI] [PubMed] [Google Scholar]
- 11. Carroll RJ, Ruppert D, Stefanski LA, et al. Measurement Error in Nonlinear Models: A Modern Perspective. 2nd ed. Boca Raton, FL: CRC Press; 2006. [Google Scholar]
