Abstract
Kaplan-Meier survival curves are the most common method for unadjusted group comparison of outcomes in orthopedics research. However, they may be misleading due to an imbalance of confounders between patient groups. The Cox model is frequently used to adjust for confounders, but graphical display of adjusted survival curves is not commonly utilized. We describe the circumstances when adjusted survival curves are useful in orthopedic research, describe and use two different methods to obtain adjusted curves, and illustrate how they can improve understanding of the multivariable Cox model results. We further provide practical strategies for identifying the need for and performing adjusted survival curves.
Keywords: survival analysis, total joint arthroplasty
Introduction
Kaplan-Meier survival curves are the most common method for unadjusted display of many different outcomes in total joint arthroplasty (TJA). However, these curves may be misleading due to the effects of confounding factors. Confounding factors are baseline variables that are associated with both the exposure or the groups of interest and the TJA outcome of interest, such as revision or mortality. Common confounders in TJA include patient characteristics such as age, sex, and comorbidities. While Cox models are the standard tool for comparing groups when adjusting for confounders, publications often pair the hazard ratio results from the Cox model with unadjusted survival curves that can lead to confusion because they may not agree with the Cox model results. Without appropriate adjustments for confounding factors, outcome rates cannot be fairly compared across groups.
There are many statistical methods for confounder adjustment, including covariate matching, stratification, inverse probability weighting (IPW) and model-based adjustment1. Covariate matching involves selecting patients to be in each of the groups of interest such that they have similar confounders levels. This ensures the distribution of the matching factors is the same in each group. For example, matching on sex means equal number of men and women in each group. Stratification involves graphing separate survival curves for each stratum of the confounders, such as separate curves for men and women. However, employing covariate matching or stratification to adjust for confounding can be challenging when the sample size is small, when there are too many confounders, and when some confounders are continuous measures. IPW is a direct adjustment method that generates a “pseudo-sample” in which the imbalanced confounders become balanced between groups2. Each patient in the study population gets a weight whereby those who are over-represented are assigned a lower weight and those who are under-represented are assigned a higher weight. Model-based adjustment involves adjusting the data using fitted Cox models via including all of the confounders and the groups of interest in the model3. The Cox model is frequently used to adjust for confounders, but adjusted survival curves are not often utilized. If properly applied, these methods should each give similar results.
In an illustrative example, we create adjusted survival curves using the IPW and model-based adjustment methods (using stratified Cox models) to demonstrate how to account for confounders when graphically displaying absolute mortality risk estimates for revision total knee arthroplasty (TKA) patients with different surgical indications (i.e., infection, fracture, instability, loosening/wear). These methods can be used with other time-to-event outcomes, not just mortality. Other examples relevant to orthopedic research include displaying surgical complication rates between groups of patients with differing characteristics. We will then describe the main considerations, and some practical guidelines as to how researchers and reviewers can identify the need for and perform adjusted survival analyses.
Main considerations for adjusted survival curves
As shown in this example, patient age, sex and comorbidities (i.e., Charlson comorbidity index) are confounders as they differ across the 5 TKA revision indication groups and are associated with survival. Unadjusted survival curves do not consider these differences, and each survival curve is only respective of the exact covariate distribution observed within that subgroup. This leads to the disconnect between the unadjusted survival curves and the adjusted Cox model results across the 5 TKA revision indication groups. The adjusted survival curves utilize the distribution of confounders for the entire study population to provide better agreement with the adjusted Cox models results.
There are several considerations in adjusted survival analyses including the population of interest, confounders to adjust for, and the adjustment method.
Population of interest:
An important consideration when creating adjusted curves is the population of interest. In TJA research, the most common population of interest is the entire population of patients who have undergone TJA. This is commonly operationalized by using the entire study population, provided the study population can reasonably estimate the population of interest.
Confounders:
Confounders to adjust for should include baseline factors that are both imbalanced between the groups of interest and are associated with the outcome of interest. In our example, age, sex and comorbidities are clearly different across the TKA revision groups and are related to the outcome of survival.
Adjustment method:
There are multiple methods for confounder adjustment including covariate matching, stratification, IPW and model-based adjustment. We have demonstrated two of them in the following example. All adjustment methods should provide similar results if correctly applied. The IPW method is the most flexible method and requires the fewest assumptions, but it does not work well with small samples (n<100)4. In addition, the creation of IPW weights as a separate step prior to the adjustment allows for a direct assessment of balance of the characteristics in the pseudo-population, which is a valuable diagnostic assessment of the validity of the adjustment. The method of direct adjustment using Cox models can be problematic because a standard Cox model assumes the hazards are proportional between the groups. In the example, we used a Cox model stratified by TKA revision category to avoid this assumption. In both methods, the adjusted survival curves compared to an unadjusted Kaplan-Meier plot provide improved interpretation and fairer comparison between groups by eliminating the effects of confounding. In our example, characteristics such as patient age, sex, and Charlson comorbidity index can affect revision and mortality risk and thus, are considered confounders. Without proper adjustment for these variables, incorrect conclusions may be drawn. In our example, the unadjusted survival curves show a greater variation between TKA indication groups than is seen in the adjusted survival curves. Standardizing each TKA revision category to the distribution of age, sex, and comorbidity index for the entire population through IPW or covariate adjustment with a Cox regression model provide similar results. A limitation of direct-adjusted survival curve analysis is that not all statistical software can create the adjusted curves, and some software packages erroneously utilize the mean for each of the confounding covariates to obtain a single curve for each group. This is a problem because the expected value of survival given treatment/group and covariates does not correspond to the survival given treatment/group and the expected value of the covariates5. This “mean covariate curve” does not correspond to any single subject, nor does it correspond to any population.
What to look for as a researcher and as a reviewer
(1). What is the study design?
Adjusted survival curves are applicable only to observational cohort studies where a group of patients are followed-up over time for certain outcomes of interest. They do not apply to cross-sectional or case-control studies. They are typically not necessary for randomized control trials because randomization balances known and unknown confounders between groups. However, it would not be incorrect to use adjustment methods to correct for any residual confounding that may occur in randomized controlled trials. Issues to be considered in this setting can be found in the literature6,7.
(2). What is the outcome and how was it collected?
As with any time-to-event analysis, consider whether follow-up processes were the same for patients with and without the event, or if bias could be introduced by the follow-up process. For example, if a survey response was required to assess outcomes, those with the event may have been too sick or unable to respond.
(3). Does the study involve comparison of groups?
Adjusted survival curves are needed only when comparing the outcomes across two or more comparison groups. When there are no comparison groups, a single survival curve represents the study population without requiring adjustment.
(4). Are there differences in baseline characteristics between the groups?
If there are no differences in baseline characteristics between the groups, adjustment may not be necessary. Think about what characteristics may influence the rate of the outcome (e.g., age), and check to see if these were reported. If they are reported and differences are found, then adjustment methods should be considered.
(5). Is there a need to display differences using survival curves?
Hazard ratios from Cox models provide relative risk estimates, but often absolute risk estimates obtained from survival curves are more informative. For instance, a hazard ratio of 2.0 may seem large, but if it corresponds to a change in event rate from 1% to 2%, it may not be a clinically meaningful increase.
(6). Was the adjusted analysis properly conducted?
Most software packages only perform direct adjusted rates using the mean of each covariate, which is often incorrect. The correct methods for adjustment require custom programming and careful consideration of the adjusting population, so it is a good idea to involve a biostatistician as part of your research team to help with this work. In addition, if the adjusted curves are proportional and every curve has events (i.e., steps) at the exact same timepoints, it is likely the Cox model used for adjustment did not stratify on group, which assumes the groups have proportional hazards. In that case, stratifying the Cox model by group or using IPW may provide better adjustment.
In conclusion, unadjusted survival curves can be misleading when characteristics differ by groups. Unadjusted curves may be inconsistent with adjusted results from models. It is often helpful to provide adjusted survival curves, in conjunction with an adjusted analysis, to visualize a fair comparison of event rates between groups.
Figure 1.
Survival curves by surgical indication for total knee arthroplasty: unadjusted (Panel A), adjusted using inverse probability weighting (Panel B) and adjusted using Cox model with direct adjustment (Panel C).
Table 1.
Patient characteristics by surgical indication in a cohort of TKA surgeries
Characteristic | Loosening/Wear (N=1740) |
Infection (N=1370) |
Instability (N=990) |
Fracture (N=399) |
Other (N=408) |
---|---|---|---|---|---|
Age at TKA, years, mean (SD) | 69.5 (10.6) | 67.8 (11.2) | 65.9 (10.8) | 71.7 (11.5) | 63.9 (12.6) |
Sex, female, n (%) | 886 (51%) | 643 (47%) | 600 (61%) | 232 (58%) | 245 (60%) |
Charlson comorbidity index, mean (SD) | 0.7 (1.2) | 1.4 (1.7) | 0.7 (1.1) | 1.1 (1.6) | 0.8 (1.3) |
Abbreviations: TKA=total knee arthroplasty
Illustrative Example: Mortality following Revision TKA.
Background:
While long-term mortality following primary total knee arthroplasty (TKA) is lower than mortality in the general population, it is unknown whether this is similar in the setting of revision TKA. Long-term mortality trends following revision TKA were examined8. The original publication appropriately analyzed these data for each revision category separately, but did not compare mortality between the revision categories.
Study Population:
Data for this illustrative example comes from a retrospective cohort study of 4907 patients who underwent revision TKA surgery between January 1985 and December 20158. Patients were grouped by surgical indications into 5 groups (Loosening/Wear, Infection, Instability, Fracture, and Other) and followed at regular time intervals until death or October 2017.
Confounders:
Three covariates were considered as confounders in addition to surgical indication: age, sex, and comorbidity burden as measured by the Charlson index9. The Charlson index is a summary measure that assigns each of 17 comorbidities with an integer weight from one to six, with one being the least severe morbidity. The index is then the summation of the comorbidity scores.
Statistical methods:
Descriptive statistics (means, percentages, etc.) were used to summarize the patient characteristics by surgical indication. Unadjusted survival curves for each surgical indication were computed. Then a Cox model including age, sex, Charlson index and surgical indication was used to obtain comparisons between surgical indications adjusted for confounders (i.e., age, sex and Charlson index). Adjusted survival curves were produced using two direct-adjustment approaches: IPW and stratified Cox model-based adjustment.
For the IPW method, outcomes were weighted by the inverse of the probability of a patient with a given set of covariates being assigned to their respective surgical indication group. Weights were created so that the distributions of age, sex, and Charlson index within each of the indication groups were as similar as possible. Although there are multiple methods available for calculating the IPW, logistic regression models for each of the surgical indication groups were used for this analysis. After careful exploration of covariate relationships, the selected model included age (modeled with a spline to allow for non-linear effects), sex and Charlson index. Predicted probabilities were obtained from the logistic regression model for each surgical indication. Weights were constructed as the inverse (or reciprocal, i.e., 1/x) of the probability for each surgical group. The calculated weights were then scaled to sum to the number of patients in each group. By defining these weights in terms of the within-group distribution, the weights were stabilized by the overall prevalence of each group.
For the model-based data adjustment approach, predicted survival curves for each subject were calculated based on a fitted Cox model. The Cox model included age, sex, Charlson index and was stratified by surgical indication. To obtain population average predicted curves for each surgical indication adjusted to the age, sex and Charlson index distribution of the entire study population, predicted individual curves for all 4907 patients were calculated, assuming each of the five TKA indication groups one by one, and a proper average of the curves for each group was taken.
Results:
There were notable differences in patient characteristics across the 5 groups with different surgical indications (Table 1). Mean age varied from 63.9 to 71.7 years, sex varied from 47% to 61% female, and mean Charlson index varied from 0.7 to 1.4. The infection group had the highest mean Charlson index at the time of surgery, followed by the fracture group. The fracture group had the highest average age at the time of revision TKA surgery at 71.7 years, while patients in the “other” group had an average age of 63.9 years. The unadjusted mortality rates showed large differences between groups with 10-year survival rates ranging from 53% for the infection group to 74% for the other group (Figure 1, panel A). However, these differences were partially due to differences in patient characteristics. The Cox model adjusted for age, sex and Charlson index revealed no significant differences for instability (HR: 1.04; 95% CI 0.91–1.19) or other (HR: 0.88; 95% CI: 0.72–1.06) compared to the loosening/wear group with significantly poorer survival for the fracture group (HR: 1.18 95% CI: 1.00–1.39) and the worst survival for the infection group (HR: 1.48; 95% CI: 1.33–1.66) compared to the loosening/wear group. As these adjusted results differ from what is shown in the unadjusted survival curves, it is desirable to use adjusted curves to display the results. The IPW adjusted survival curves for the groups were closer together with 10-year survival rates ranging from 56% for the infection group to 68% for the other group (Figure 1, panel B). The IPW weighted Cox model revealed the adjusted survival was similar for all groups with the exception of the infection group, which had significantly poorer survival than the other groups. Of note, the poor survival in the fracture group, which was similar to the survival in the infection group in then unadjusted curves, was driven by the older age of the fracture group, so after adjustment the fracture group improved. Using direct adjustment methods with the stratified Cox model, results were similar but not identical to the IPW adjustment with 10-year survival rates ranging from 58% for the infection group to 71% for the other group (Figure 1, panel C).
Acknowledgments
Funding: This work was funded by a grant from the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) grant P30AR76312 and the American Joint Replacement Research-Collaborative (AJRR-C). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.MacKenzie TA, Brown JR, Likosky DS, Wu Y, Grunkemeier GL. Review of case-mix corrected survival curves. Ann Thorac Surg. 2012;93(5):1416–1425. [DOI] [PubMed] [Google Scholar]
- 2.Cole SR, Hernan MA. Adjusted survival curves with inverse probability weights. Comput Methods Programs Biomed. 2004;75(1):45–49. [DOI] [PubMed] [Google Scholar]
- 3.Hu ZH, Gale RP, Zhang MJ. Direct adjusted survival and cumulative incidence curves for observational studies. Bone Marrow Transplant. 2020;55(3):538–543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Raad H, Cornelius V, Chan S, Williamson E, Cro S. An evaluation of inverse probability weighting using the propensity score for baseline covariate adjustment in smaller population randomised controlled trials with a continuous outcome. BMC Med Res Methodol. 2020;20(1):70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ghali WA, Quan H, Brant R, et al. Comparison of 2 methods for calculating adjusted survival curves from proportional hazards models. JAMA. 2001;286(12):1494–1497. [DOI] [PubMed] [Google Scholar]
- 6.Pocock SJ, Assmann SE, Enos LE, Kasten LE. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems. Stat Med. 2002;21(19):2917–2930. [DOI] [PubMed] [Google Scholar]
- 7.Kahan BC, Jairath V, Dore CJ, Morris TP. The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies. Trials. 2014;15:139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Yao JJ, Hevesi M, O’Byrne MM, Berry DJ, Lewallen DG, Maradit Kremers H. Long-Term Mortality Trends After Revision Total Knee Arthroplasty. J Arthroplasty. 2019;34(3):542–548. [DOI] [PubMed] [Google Scholar]
- 9.Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40(5):373–383. [DOI] [PubMed] [Google Scholar]