Abstract
Study Design
Retrospective analysis of patients undergoing elective lumbar fusion operations, comparing rates of repeat spine surgery based on method of ascertainment.
Objective
We report the accuracy of a claims-based approach for reporting repeat surgery compared to medical records abstraction as the “gold standard.”
Summary of Background Information
Previous studies have reported the validity of a claims-based algorithm for grouping patients by surgical indication and classifying operative features, but their accuracy in measuring surgical quality indicators has not been widely examined.
Methods
We identified a subset of patients undergoing elective lumbar fusion operations at a single institution from 1996-2011, excluding those with spinal fracture, spinal cord injury, or cancer. From the medical record we abstracted the incidence of repeat spine operation or re-hospitalization at one-year. We cross-classified each event record with its corresponding value derived from claims. The sensitivity and specificity of the claims-based approach were calculated for reoperation within 30, 90, and 365 days, and all-cause hospital readmission within 30 days.
Results
Medical records linked to claims data were obtained for 520 patients undergoing elective lumbar fusion. Reoperation rates based on chart review were 1.0%, 1.3%, 3.6%, compared to 0.8%, 1.7%, and 3.8% based on the final claims methods at 30, 90, and 365 days, respectively. The claims-based algorithm had sensitivities of 80.0%, 100%, and 94.1% and specificities of 100%, 99.6%, 99.2% for repeat surgery within 30, 90, and 365 days, respectively. The sensitivity for all-cause readmission was 50%.
Conclusion
Health care quality improvement efforts often rely on administrative data to report surgical safety. We found that claims-based ascertainment of safety at a single institution was very accurate. However, accuracy depended on careful attention to the timing of outcomes, as well as the definitions and coding of repeat surgery, including how orthopaedic device removal codes are classified.
Keywords: lumbar spine, outcomes, reoperation, readmission, claims data, chart review, claims-based algorithm, sensitivity, spine surgery, retrospective study, lumbar fusion
Introduction
Claims data has been used to report trends in procedure rates, costs, and outcomes for spinal surgery in a more efficient manner than using manual chart review1,2,3,4. Several studies have shown a claims-based algorithm to reliably classify spine surgery indication and type of procedure5,6. Martin et al. (2014) reported the sensitivity of a hierarchical claims-based algorithm for classifying surgical indication of greater than 75%5. One advantage of a claims-based approach is that it can be used to examine a large number of admissions in a relatively generalizable and systemic method.
While claims-based algorithms appear to be useful for classifying surgical indications, their accuracy in measuring surgical quality indicators, such as reoperation and all-cause readmission, has not been widely examined. Reoperation rates following lumbar fusion surgery have been reported to be as high as 21.5% over an 11-year period7. Given the importance that surgeons and patients place on reoperation rates, it is important to have an efficient way to monitor and systematically report these outcomes in order to identify trends and isolate risk factors that can help preventative efforts8. While claims-based risk models have been used to measure surgical site infections in patients undergoing a total hip or knee replacement, development and validation of claims-based safety indicators (e.g. reoperations, re-admissions, etc.) for use in spinal surgery have lagged behind4.
We assessed the validity of a claims-based algorithm for identifying surgical quality indicators, focusing primarily on repeat spine surgery and all-cause readmission following lumbar fusion operations. We further sought to reveal factors that helped explain reasons why claims-based algorithms might lead to misclassification.
Methods and Materials
Patient selection and data source
We retrospectively reviewed medical records of patients undergoing lumbar spinal fusion operations at a single institution, and linked them to administrative data containing diagnosis and procedure codes from the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) and the Current Procedural Terminology (CPT). We selected patients undergoing lumbar fusion, with or without decompression, for disc degeneration, disc herniation, spinal stenosis, spondylolisthesis, or scoliosis (Table 1) from an Orthopaedic Surgery department of a single hospital from 1996-2011. We excluded those with spinal fracture, spinal cord injury, spinal infection, or cancer. For all patients meeting the inclusion criteria, we examined hospital claims data and patient charts for analysis.
Table 1. Distribution of Surgical Indication for Lumbar Fusion Procedure.
Indication | Based on Chart Review | Based on Claims Data | ||
---|---|---|---|---|
# of Cases | % of Cases | # of Cases | % of Cases | |
Degenerative Disk Disease | 54 | 10.4% | 65 | 12.5% |
Herniated Disc | 54 | 10.4% | 72 | 13.8% |
Spinal Stenosis | 78 | 15.0% | 40 | 7.7% |
Spondylolisthesis | 279 | 53.7% | 289 | 55.6% |
Scoliosis | 40 | 7.7% | 54 | 10.4% |
Classification and chart review
Our primary focus was on evaluating repeat spine surgery, but noted the validity of surgical indications as well. The quality indicators that were abstracted during the chart review focused on reoperations and readmissions for reasons such as device complication, wound infection, and life-threatening complication. Only reoperation that were not initially planned were counted as true reoperations in this study. For example, planned electric bone stimulator removal surgeries were not counted as reoperations.
We applied a previously published claims-based algorithm to characterize spinal operations by surgical indication, procedure, and operative features based on ICD-9-CM diagnosis and CPT procedure codes 5. Then, successive claims were linked for the same patient over time in order to calculate repeat surgery and readmission at 30, 90, and 365 days.
Statistical Analysis
We cross-classified the claims-derived indicator for repeat spine surgery to its corresponding measure obtained from the chart review, considered our “gold standard”. Similar 2-by-2 cross-classification tables were reported for all-cause readmission and each of the surgical indications. We reported the sensitivity and specificity of the claims-based approach to chart review for: (1) surgical indications, (2) reoperation within 30, 90, and 365 days, and (3) all-cause re-hospitalization within 30 days.
Sensitivity refers to the proportion of patients in the chart review who had a complication that the algorithm correctly classified as having the complication. Specificity refers to the proportion of patients who did not have a complication based on chart review that were correctly classified as not having a complication by the claims-algorithm.
Descriptive information from discordant cases was explored to identify patterns that prompted us to make changes aimed at improving the validity of the claims approach. The final ICD-9CM and CPT algorithm is presented in Table 2. Discordant cases were examined to identify possible factors leading to changes in sensitivity of the claims-based algorithm (Table 6).
Table 2. ICD-9-CM codes used to define the surgical quality measures.
Post discharge device complications | Measurement rules:
CPT codes 11012 20670 20680 22849 22850 22852 22855 22862 22865 22830 ICD-9 codes:
ICD9-9 Procedure Codes
|
Wound problems | Measurement rules:
CPT codes 10140 10180 11012 11042 11043 ICD-9 Diagnosis codes
ICD-9 Procedure codes
|
Life-threatening complications | Measurement rules:
ICD-9 Diagnosis codes:
ICD-9 Procedure codes:
|
Repeat lumbar spine surgery | Measurement rules:
Discectomy CPT codes 63010 63075 63076 63077 63078 63055 63056 63057 63064 63066 ICD-9 Procedure codes
Laminectomy CPT codes 63001 63003 63005 63011 63012 63030 63035 63042 63044 63017 63047 63048 63015 63016 63020 63040 63043 63045 63046 63050 63051 63172 63173 63182 63185 63191 63194 63195 63196 63197 63198 63199 63170 63180 63200 63250 63251 63252 63265 63266 63267 63268 63270 63271 63272 63273 63295 63170 63190 63275 63278 63280 63281 63282 63283 63285 63287 63290 63295 63012 63185 63200 63267 63272 ICD-9 Procedure codes
Fusion CPT codes 20930 20931 20936 20937 20938 22558 22585 22612 22614 22630 22830 22840 22841 22842 22843 22844 22845 22846 22847 22849 22851 22625 22650 20930 20931 20937 20938 22585 22614 22632 22532 22533 22534 22548 22554 22556 22590 22595 22600 22610 22800 22802 22804 22808 22810 22812 22848 22850 22852 22855 22633 ICD-9 Procedure codes
ADR CPT codes 22856 22857 0092T 0163T 22864 22865 0095T 0164T 22861 22862 0098T 0165T ICD-9 Procedure codes
Corpectomy CPT codes 63081 63082 63085 63086 63087 63088 63090 63091 63101 63102 63103 63300 63301 63302 63303 63304 63305 63306 63307 63308 Osteotomy/kyphectomy CPT codes 22206 22207 22208 22210 22212 22214 22216 22220 22222 22224 22226 22818 22819 Spacers CPT codes 0171T 0172T ICD-9 Procedure codes
Dynamic stabilizing devices ICD-9 Procedure codes
Other spinal procedure: ICD-9 Procedure codes
|
Results
After exclusions, claims were obtained for 520 patients undergoing elective lumbar fusion operations, including 54 (10%) with disc degeneration, 54 (10%) with disc herniation, 78 (15%) with spinal stenosis, and 2319 (61%) with degenerative spondylolisthesis or scoliosis. The mean age of the cohort at the time of the surgery was 55.3 years, with 221 (42.5%) males and 299 (57.5%) females.
Reoperation rates based on chart review were 1.0%, 1.3%, 3.6%, compared to 0.8%, 1.7%, and 3.8% based on claims methods at 30, 90, and 365 days, respectively. Claims-based algorithm had sensitivities of 80.0%, 100%, and 94.1% and specificities of 100%, 99.6%, 99.2% for classifying reoperation within 30, 90, and 365 days, respectively (Table 3). These values are based on the performance of the claims-based algorithm after incorporating modifications from the initial analysis through an iterative review of discordant cases. The initial analysis and modification of the algorithm are described below.
Table 3. Summary of the cross-classification of reoperations comparing “Gold Standard” chart review with the claims-based measure.
Reoperation within: | Claims-based algorithm | Sensitivity and Specificity | |||||
---|---|---|---|---|---|---|---|
Yes | No | Total | Values (%) | 95% CI (%) | |||
Chart Review (“Gold Standard”) | 30 days | Yes No Total |
4 0 4 |
1 515 516 |
5 515 520 |
Sensitivity: 80.0 Specificity: 100 |
28.4 – 99.5 99.3 – 100 |
90 days | Yes No Total |
7 2 9 |
0 511 511 |
7 513 520 |
Sensitivity: 100 Specificity: 99.6 |
59.0 – 100 98.6 – 100 |
|
365 days | Yes No Total |
16 4 20 |
1 499 500 |
17 503 520 |
Sensitivity: 94.1 Specificity: 99.2 |
71.3 – 99.9 98 – 99.8 |
Initial discordance between the chart and claims data for reoperation was due to whether wound complications and removal of bone growth stimulators were counted in the chart abstraction as being a reoperation. We examined the 1-year surveillance data to look for patterns of discordance, since it includes the other 2 subgroups (30 and 90 days). We initially found a total of 15 discordant cases that were recorded as reoperations within 1 year in the chart review, but not in claims data. Seven of these fifteen cases were due to the classification of an incision and drainage (I&D) for a wound infection as a reoperation during the chart review, but were not captured as a reoperation in the claims-based approach. A possible explanation for this discordance is that the chart review accounts for all I&Ds, including those done in an ambulatory setting, while the claims data only accounted for I&Ds that required a hospital admission. Additionally, there were a total of 11 discordant cases for which a reoperation was identified in the claims data, but not in the chart review. Five of these 11 discordant cases had an electrical bone stimulating device removal procedure within a year of the original operation, which was not included as part of the chart review definition of a reoperation. The chart review did not include electrical bone stimulating device removal as a reoperation since they were a planned procedure that typically takes place 6-12 months after the initial operation.
Overall the initial analysis showed that the definition of incision and drainage procedures for wound infections and electrical bone stimulating device removal procedures accounted for a significant portion of the discordant case. We improved the accuracy of the claims-based algorithm by making 5 modifications:
Refining the consistency and timing for reporting outcomes: not counting reoperations during the initial admission (i.e. return to OR) and only after the initial discharge date
Requiring postoperative “events” to involve at least one overnight stay in the hospital (important for wound problems).
Not requiring repeat spine operations to be explicitly coded with a spine diagnosis (this improves the sensitivity, but lowers the specificity - hence the remaining discordant cases had “device problem” codes that were related to a hip and knee procedures).
Not counting electrical bone stimulating device removal procedures as an “event” and removing its procedure code from the claims-based algorithm.
Not counting “arthrodesis status” as a repeat spine operation in the absence of other procedure codes.
These modifications helped to eliminate the ambiguity associated with the I&D procedures, addressed the issue of revision surgeries performed during the initial admission, and the issue of electrical bone stimulating device removal procedures being included as reoperations. After the modification was incorporated, the algorithm's sensitivity increased to 100% and 94.1% for reoperation within 90 and 365 days, respectively, while the specificity remained high. Not requiring the reoperations to be co-coded with a spine-specific diagnosis improved our ability to identify patients undergoing reoperation for orthopaedic device removals, but also introduced some mis-classification from patients undergoing joint replacement revisions. It is clear that these modifications, which made the algorithm less restrictive, resulted in better model.
We also examined the accuracy of the algorithm for determining all-cause readmissions. For this category the chart review was only recorded for hospitalizations within 30 days of the initial operation. The sensitivity for all-cause readmission within 30 days was 50% while the specificity was 99.8% (Table 4). The claims-based algorithm showed the readmit rate to be 0.96% at 30 days, while the chart review showed a 1.5% readmission rate at 30 days. There were 4 discordant cases that indicated a readmission within 30 days in the chart review, but not in the claims data. Four discordant cases resulted from the claims-based approach failing to detect readmission that occur within 30 days of the initial operation.
Table 4. Summary of the cross-classification of all-cause readmissions comparing “Gold Standard” chart review with the claims-based measure.
Readmission within: | Claims-based algorithm | Sensitivity and Specificity | |||||
---|---|---|---|---|---|---|---|
Yes | No | Total | Values (%) | 95% CI (%) | |||
Chart Review (“Gold Standard”) | 30 days | Yes No Total |
4 1 5 |
4 511 515 |
8 512 520 |
Sensitivity: 50.0 Specificity: 99.8 |
15.7 - 84.3 98.9 – 100 |
While our main focus was to examine the validity of claims-based methods for assessing surgical safety, we also validated surgical indications. The claims-based algorithm had a sensitivity of 72.2%, 81.5%, 39.7%, 92.1%, and 95% for degenerative disk disease, herniated disk, spinal stenosis, spondylolisthesis, and scoliosis, respectively (Table 5).
Table 5. Summary of the cross-classification of surgical indications comparing “Gold Standard” chart review with the claims-based indications.
Surgical Indication: | Claims-based algorithm | Sensitivity and Specificity | |||||
---|---|---|---|---|---|---|---|
Yes | No | Total | Values (%) | 95% CI (%) | |||
Chart Review (“Gold Standard”) | Degenerative Disk Disease | Yes No Total |
39 26 65 |
15 440 455 |
54 466 520 |
Sensitivity: 72.2 Specificity: 94.4 |
58.4 – 83.5 91.9 – 96.3 |
Herniated Disc | Yes No Total |
44 28 72 |
10 438 448 |
54 466 520 |
Sensitivity: 81.5 Specificity: 94.0 |
68.6 – 90.7 91.4 – 96.0 |
|
Spinal Stenosis | Yes No Total |
31 9 40 |
47 433 480 |
78 442 520 |
Sensitivity: 39.7 Specificity: 98.0 |
28.8 – 51.5 96.2 – 99.1 |
|
Degenerative spondylolisthesis | Yes No Total |
257 32 289 |
22 209 231 |
279 241 520 |
Sensitivity: 92.1 Specificity: 86.7 |
88.3 – 95.0 81.8 – 90.7 |
|
Scoliosis | Yes No Total |
38 16 54 |
2 464 466 |
40 480 520 |
Sensitivity: 95.0 Specificity: 96.7 |
83.1 – 99.4 94.6 – 98.1 |
Discussion
We examined the validity of claims-based approaches for successfully identifying repeat spine surgery and all-cause hospitalizations following lumbar fusion operations, incorporating improvements based on detailed examination of discordant cases. The major modification made to the claims-based algorithm after initial analysis refined the definitions of reoperation (timing, admission, etc.), especially with regard to wound infections requiring an incision and drainage as reoperations. The final algorithm shown in Table 2 demonstrated excellent validity for reporting repeat spine surgery.
We also analyzed the validity of claims data for classifying surgical indications in relationship to other reports. The findings in this study were consistent with sensitivities reported by Kazberouk et al. (2015), which also examined the accuracy of the algorithm for surgical indication. Both studies found that spinal stenosis was a surgical indication that is difficult to identify in claims, reflected by its lower sensitivity (32.7% in the Kazberouk et al. study and 39.7% in our study). In contrast, the SPORT validation study found a much higher sensitivity of 88.1% for spinal stenosis6. The major differences between the two previous studies were that the SPORT comparison was among older patients who were rigorously evaluated to establish stenosis, while the Kazberouk et al. study generally involved a younger population without rigorous establishment of the clinical indication. Based on this, it would seem that a claims-based identification of spinal stenosis is sensitive to patient age and extent of work-up.
Our study has several limitations. First, the study was confined to fusion operations performed at a single institution (DMHC), which restricts the generalizability of the findings. The study of relatively rare safety events further limited the ability to estimate the claims-based methods with a high degree of precision. Second, the all-cause re-hospitalizations were not well characterized during the chart review, making it difficult to determine the reasons for discordance. Third, all-cause re-hospitalizations were only abstracted from the chart if they occurred within 30 days of the operation, since MedPAC considers this to be an important quality measure. While a high sensitivity of claims-based approaches for all-cause re-hospitalizations within 30 days is the ultimate goal, exploring the discordant cases within 1 year provided a larger dataset and allowed for better characterization of the variables resulting in the discordance.
Our study provides important insights into the parameters that influence the validity of the claims-based algorithm and provides evidence that it is possible to use such an algorithm to accurately measure surgical safety indicators. Future efforts should further characterize the reasons for re-hospitalization within 30 days of surgery and search for ways to improve the sensitivity of the algorithm. Expanding the validation studies to multiple institutions could improve the accuracy and generalizability of claims algorithms. Finally, it will also be important to evaluate effect of transition to the ICD-10 on the performance of the coding algorithm.
Conclusion
In summary, we evaluated the validity of a claims-based algorithm for identifying re-operations following lumbar fusion surgery using a systematic chart review as the gold standard. We found that the performance of the algorithm depended on how wound problems and removal of electronic bone stimulators were defined. Our final algorithm demonstrated high sensitivity and specificity, supporting the usefulness of claims data for population-based surveillance of safety indicators. However, even with the improvement in sensitivity, our study should be combined with other efforts in order to have sufficient statistical power to be confident in the reliability. Future efforts can be tailored to investigate other factors that were not explored in this study that could be impacting the validity of the claims-based algorithm.
Acknowledgments
National Institute for Arthritis, Musculoskeletal and Skin Disease funds were received in support of this work.
Relevant financial activities outside the submitted work: grants.
Footnotes
The manuscript submitted does not contain information about medical device(s)/drug(s).
Contributor Information
Neel K. Patel, Geisel School of Medicine at Dartmouth.
Rachel A. Moses, Dartmouth-Hitchcock Medical Center.
Brook I. Martin, Dartmouth-Hitchcock Medical Center.
Jon D. Lurie, Dartmouth-Hitchcock Medical Center.
Sohail K. Mirza, Dartmouth-Hitchcock Medical Center.
References
- 1.Nota S, Braun Y, Ring D, Schwab JH. Incidence of Surgical Site Infection After Spine Surgery: What Is the Impact of the Definition of Infection? Clin Orthop Relat Res. 2015;473:1612–19. doi: 10.1007/s11999-014-3933-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Deyo RA, Gray DT, Kreuter W, Mirza S, Martin BI. United States trends in lumbar fusion surgery for degenerative conditions. Spine. 2005;30(12):1441–5. doi: 10.1097/01.brs.0000166503.37969.8a. [DOI] [PubMed] [Google Scholar]
- 3.Maradit Kremers H, Lewallen LW, Lahr BD, Mabry TM, Steckelberg JM, Berry DJ, Hanssen AD, Berbari EF, Osmon DR. Do Claims-based Comorbidities Adequately Capture Case Mix for Surgical Site Infections? Clin Orthop Relat Res. 2015;473:1777–86. doi: 10.1007/s11999-014-4083-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Martin BI, Deyo RA, Mirza SK, Turner JA, Comstock BA, Hollingworth W, Sullivan SD. Expenditures and health status among adults with back and neck problems. JAMA. 2008;299:656–64. doi: 10.1001/jama.299.6.656. [DOI] [PubMed] [Google Scholar]
- 5.Kazberouk A, Martin BI, Stevens JP, McGuire KJ. Validation of an administrative coding algorithm for classifying surgical indication and operative features of spine surgery. Spine. 2015;40:114–20. doi: 10.1097/BRS.0000000000000682. [DOI] [PubMed] [Google Scholar]
- 6.Martin BI, Lurie JD, Tosteson AN, Deyo RA, Tosteson TD, Weinstein JN, Mirza SK. Indications for spine surgery: validation of an administrative coding algorithm to classify degenerative diagnoses. Spine. 2014;39:769–79. doi: 10.1097/BRS.0000000000000275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Martin BI, Mirza SK, Comstock BA, Gray DT, Kreuter W, Deyo RA. Reoperation rates following lumbar spine surgery and the influence of spinal fusion procedures. Spine. 2007;32:382–7. doi: 10.1097/01.brs.0000254104.55716.46. [DOI] [PubMed] [Google Scholar]
- 8.Mirza SK, Konodi M, Martin BI, Spratt KF. Orthopaedic Knowledge Update: Spine 4. Rosemont, IL: American Academy of Orthopaedic Surgeons; Safety and functional outcome assessment in spine surgery; pp. 589–606. [Google Scholar]