Abstract
Introduction
On October 1, 2015, the Center for Medicare and Medicaid Services transitioned from the International Classification of Diseases, Ninth Revision (ICD-9) to the Tenth Revision (ICD-10) compendium of codes for diagnosis and billing in healthcare, but translation between the two is often inexact. Here we describe a validated crosswalk to translate ICD-9 codes into ICD-10 codes, with a focus on complications after carotid revascularization and endovascular aortic aneurysm repair.
Methods and Results
We devised an eight-step process to derive and validate ICD-10 codes from existing ICD-9 codes. We used publicly available sources, including the General Equivalence Mapping (GEM) database, to translate ICD-9 codes used in prior work to ICD-10 codes. We defined ICD-10 codes as “validated” if they were concordant with the initial ICD-9 codes after manual comparison by two physicians. Our primary validation measure was the percent of valid ICD-10 codes out of the total ICD-10 codes obtained during translation. We began with 126 ICD-9 diagnosis codes used for complication identification following carotid revascularization procedures, and 97 ICD-9 codes for complications following endovascular aortic aneurysm procedures. Translation generated 143 ICD-10 codes for carotid revascularization, a 14% increase from the initial 126 codes. Manual comparison demonstrated 98% concordance, with 99% agreement between the reviewers. Similarly, we identified 108 ICD-10 codes for endovascular aortic aneurysm repair, an 11% increase from the initial 97 ICD-9 codes. We again noted excellent concordance and agreement (98% and 100%, respectively). Manual review identified 4 ICD-10 codes incorrectly translated from ICD-9 codes for carotid revascularization, and 3 codes incorrectly translated for endovascular aortic aneurysm repair.
Conclusions
Algorithms to crosswalk lists of ICD-9 codes to ICD-10 can leverage electronic resources to minimize the burden of code translation. However, manual revision for code validation may be necessary, with collaboration across institutions for researchers to share their efforts.
Introduction
Insurance claims-based research is an integral component of cardiovascular knowledge generation.1–4 The most commonly used insurance claims coding system, the International Classification of Diseases, Ninth Revision (ICD-9), has been in use for over 3 decades, and during this time period the use of claims-based research has increased dramatically.5, 6 Investigators have used claims data to contribute to many important healthcare and policy advances in cardiovascular disease and many other medical fields.1–4, 7–17
The application of claims data to research is predicated on the use of billing codes as a proxy for the study of clinical events. The generation and validation of algorithms to define the codes which represent actual clinical events is an essential step in high-quality outcomes research.1–4, 11 Many stakeholders participate in this important step, including the Agency for Healthcare Research and Quality, the Center for Medicare and Medicaid Services, and independent investigators who publish their research findings as well as the coding algorithms which form the foundation of their work.1–4, 7, 9, 11, 14–16, 18, 19 By leveraging the efforts from these sources, robust repositories of ICD-9 codes that can be reliably used in clinical research have been created.
In October 2015, the Center for Medicare and Medicaid Services transitioned from ICD-9 to the Tenth Revision (ICD-10) list of billing codes.18 This transition expanded the number of available diagnosis codes from approximately 14,000 in ICD-9 to over 69,000 with ICD-10.18 This change created several challenges. First, validated lists of ICD-9 codes previously used to identify specific outcomes can no longer be used with ICD-10 data.18 Second, there is infrequently an exact match between an individual ICD-9 code and an individual ICD-10 code.20, 21 Computer-based algorithms have been designed to translate ICD-9 into ICD-10 codes, such as the General Equivalence Mapping database, but these resources have not been tested for the purpose of clinical research.20
Therefore, it was our objective to use publicly available sources to validate a crosswalk by which a list of known ICD-9 codes could be translated into a list of ICD-10 codes. We studied codes used extensively in our research program as part of a longitudinal effort in cardiovascular outcomes research, with the belief that any “lessons learned” would apply more broadly to ICD-9 to ICD-10 transition efforts.
Methods
Analytic overview of our eight-step validation process
This report describes the analytic methods used by our research group for ICD-9 to ICD-10 code translation. The data and materials used for this report are cited and publicly available in the hopes that it may serve as a resource, and that others may reproduce and replicate the procedure
We devised an eight-step process to derive and validate ICD-10 codes from an existing set of ICD-9 codes representing outcomes across several body systems (Figure 1). This process was developed in an iterative fashion with input from all co-authors and shared with collaborators as part of an ongoing National Institute on Aging Program Project (P01-AG019783).
The first step began with a group of ICD-9 codes and their corresponding labels that had been validated for outcome analysis. In the second step, we translated the ICD-9 codes to ICD-10 codes using the General Equivalence Mapping database, and then linked the ICD-10 codes with their associated descriptive labels using the Center for Medicare and Medicaid Services label database.20 In the third step, we stratified by match type (1:1, e.g. one ICD-9 code to one ICD-10 code; 1:multiple, e.g. one ICD-9 code to multiple ICD-10 codes; or multiple:1, e.g. multiple ICD-9 codes to one ICD-10 code); and in the fourth step by match precision as determined by the General Equivalence Mapping database (e.g. “exact” match, or “approximate” match). During the fifth and sixth steps, we manually obtained and compared two sets of codes with descriptive labels to determine the concordance between the two groups. In the seventh step, we removed the duplicate codes, and in the eighth step, performed a backwards validation exercise to determine if the ICD-10 codes derived from the algorithm identified any clinically relevant ICD-9 codes not present in the list used during step 1.
Definition of successfully matched codes
We defined ICD-10 codes as successfully “validated” if the labels were in concordance with the initial ICD-9 code used for translation upon manual review by two physicians. Our primary deliverable was the percent of valid ICD-10 codes as a proportion of the total ICD-10 codes obtained during the translation process. As our primary deliverable was based on a manual comparison of two lists of codes, we calculated no p-values.
Step 1: Initial list of ICD-9 codes with ICD-9 labels
We used a series of prior publications to identify an initial list of ICD-9 codes to use for translation.2–4, 7, 9, 14, 17 Specifically, this encompassed ICD-9 codes used in outcome analysis after carotid revascularization (carotid endarterectomy and carotid artery stenting), and endovascular aortic aneurysm repair (Supplementary Tables 1 and 2).9, 11, 14 The ICD-9 codes evaluated represented several important outcomes to cardiovascular research including, stroke, heart attack, dysrhythmia, heart failure, and procedural wound infections, among others.
We defined an ICD-9 “code” as the alphanumeric ICD-9 or ICD-10 designation as determined by the Center of Medicare and Medicaid Services. We defined a “label” as the text description associated with each respective code as designated by the Center of Medicare and Medicaid Services. All translation between ICD-9 and ICD-10 was performed using Microsoft Excel version 15 using the VLOOKUP and IFERROR commands (Microsoft, Redmond WA).
Step 2: Translation of ICD-9 to ICD-10 codes
Using our initial list of ICD-9 codes and labels, we performed forward (ICD-9 to ICD-10) General Equivalence Mapping using the publicly available database created by the Center for Medicare and Medicaid Services and the Centers for Disease Control and Prevention.18, 20, 22 The General Equivalence Mapping database provides mappings to translate between ICD-9 and ICD-10 billing codes. Complete information is available at the National Bureau of Economic Research website (www.nber.org).
Steps 3 and 4: Match type and match precision
We then segregated the ICD-9 codes based on the types of matches returned from the General Equivalence Mapping database (Step 3). Forward mapping resulted in four possible types of matches:
1:1, where a single ICD-9 code mapped to a single ICD-10 code (e.g. ICD-9 42741: ventricular fibrillation, ICD-10 I4901: ventricular fibrillation)
1:multiple, where a single ICD-9 code mapped to multiple ICD-10 codes (e.g. ICD-9 34201: flaccid hemiplegia affecting dominant side, to ICD-10 G8101: flaccid hemiplegia affecting right dominant side, and ICD-10 G8102: flaccid hemiplegia affecting left dominant side)
multiple:1, where multiple ICD-9 codes mapped to a single ICD-10 code (e.g. ICD-9 4280: congestive heart failure unspecified, and ICD-9 4289: unspecified heart failure, to ICD-10 L509: heart failure, unspecified)
1:none, where a single ICD-9 code had no corresponding ICD-10 code (e.g. ICD-9 9985: postoperative infection not elsewhere classified, no corresponding ICD-10 code identified).
We then segregated codes by match precision as designated by the General Equivalence Mapping database (Step 4). The database designates an “exact” or “approximate” match label based on the match of the respective ICD-9 and ICD-10 codes.
Step 5: Obtain associated labels for translated ICD-10 codes
As the General Equivalence Mapping database does not contain labels for the translated ICD-10 codes, we linked the returned ICD-10 codes with their respective labels. To do this we used the publicly available list of ICD-10 labels located on the Center for Medicare and Medicaid Services website (www.cms.gov).18 This created a matched set of ICD-10 codes with labels from our initial set of ICD-9 codes with labels.
Step 6: Manual comparison of ICD-9 to ICD-10 labels
We next performed a manual comparison between the ICD-9 and matched ICD-10 code labels. We classified codes into concordant (e.g. ICD-9 4270 paroxysmal supraventricular tachycardia to ICD-10 L471: supraventricular tachycardia) or discordant (e.g. ICD-9 43885: vertigo as late effect of cerebrovascular disease to ICD-10 L69998: other sequelae following unspecified cerebrovascular disease). Label concordance was adjudicated by two physician reviewers (JC, RK). We then calculated the percent of concordant versus discordant codes found for each type of match. The list of concordant ICD-10 codes represented the list of validated ICD-10 codes for use.
Steps 7 and 8: Removal of duplicate ICD-10 codes, and backward translation
Finally, we performed code translation in reverse (ICD-10 to ICD-9). We did this for two distinct reasons: first, to determine the percent of the original list of codes which would be returned, and second, to ensure that no clinically relevant codes were obtained that should have been included in our initial list of ICD-9 codes. We first removed any duplicates found within the list of ICD-10 codes (Step 7). Then, using the final list of ICD-10 codes, we used the General Equivalence Mapping database to backward translate the ICD-10 codes into ICD-9 codes (Step 8). We then calculated the percent of the initial list of ICD-9 codes that were obtained, and the number of clinically relevant codes not present in the initial list in step 1.
Results
Mapping codes for complications after carotid revascularizations
We began with 126 ICD-9 diagnosis codes used in our prior work in describing the complications following carotid revascularization (Figure 2).9, 11 Translation to ICD-10 codes using the General Equivalence Mapping database yielded 167 ICD-10 codes (Supplementary Table 3). We found 1:1 matches, 1:multiple matches, and multiple:1 matches between the codes. We then used the General Equivalence Mapping designation of match precision, categorized as “exact” or “approximate”, to further segregate the codes. 1:1 matches had both “exact” and “approximate” types of precision, while 1:multiple matches returned only “approximate” ICD-10 codes. Additional details are available in the Supplementary Appendix.
ICD-9 codes for complications after carotid revascularization which did not match to an ICD-10 code
There were 13 ICD-9 codes which did not match to any ICD-10 codes (Supplementary Table 4). These frequently represented codes which overlapped with other ICD-9 codes in our list and are further delineated in the Supplementary Appendix.
Manual comparison of ICD-9 and ICD-10 codes for carotid complications
Manual comparison of the ICD-9 and ICD-10 code labels demonstrated 97.6% concordance, with 98.8% agreement between the clinical reviewer evaluations (Supplementary Table 6). New codes found on backward translation are noted in Supplementary Table 7.
After removal of duplicate ICD-10 codes and discordant codes, translation yielded 143 validated ICD-10 codes associated with complications following carotid revascularization, a 14% increase from the initial 126 codes.
Mapping codes for complications following endovascular aortic aneurysm repair
In a fashion similar to process used to review the codes examining complications with carotid revascularization, we next studied the ICD-9 codes we used to identify complications after endovascular aortic aneurysm repair.14 We began with 97 ICD-9 diagnosis codes (Figure 3). Translation to ICD-10 codes using the General Equivalence Mapping database yielded 120 ICD-10 codes (Supplementary Table 5). As with the carotid revascularization codes, there were 1:1, 1:multiple, and multiple:1 match types. As previously found for carotid revascularization, 1:1 matches had both “exact” and “approximate” types of precision, while 1:multiple matches returned only “approximate” ICD-10 codes. Additional details are available in the Supplementary Appendix.
ICD-9 codes for complications after endovascular aortic aneurysm repair that did not match to an ICD-10 code
There were 18 ICD-9 codes which did not match to any ICD-10 codes. Again, these frequently represented codes which overlapped with other ICD-9 codes in our list and are further delineated in the Supplementary Appendix.
Manual comparison of ICD-9 and ICD-10 codes for endovascualar aortic aneurysm repair
Manual comparison of the ICD-9 and ICD-10 code labels again demonstrated excellent concordance and agreement (98% and 100% respectively; Supplementary Table 6). New codes found on backward translation are noted in Supplementary Table 7.
After removal of duplicate ICD-10 codes and discordant codes, translation yielded 108 validated ICD-10 codes associated with complications following carotid revascularization, an 11% increase from the initial 97 codes.
Discussion
Publicly available sources allowed us to derive a clinically relevant list of ICD-10 codes from a known list of ICD-9 codes used by our group and others to identify complications following carotid revascularization and endovascular aortic aneurysm repair.3, 4, 9, 11, 14, 17 While the translated ICD-10 codes were generally accurate, we found that a subset of codes were discordant upon manual review. Between 10–19% of ICD-9 codes had no corresponding ICD-10 code, and the list of resultant ICD-10 codes derived from the mapping exercise was between 11–14% larger than the original list of ICD-9 codes. Researchers seeking to use electronic sources for translation of ICD-9 codes to ICD-10 should also expect that errors will occur in 2–3% of cases and will require manual attention for identification and resolution.
The transition to ICD-10 in October of 2015 has made the ICD-9 code lists previously derived obsolete. For researchers to continue to use contemporary claims data for knowledge generation, new lists of ICD-10 codes that permit outcome identification must be created. There are several possible ways to accomplish this goal. First, researchers may create new sets of ICD-10 codes without assistance from prior validated ICD-9 lists. This method is time and resource intensive if not impractical given the nearly 5-fold increase in ICD-10 codes. Ultimately, such methodology would require manually reviewing 69,000 ICD-10 diagnosis codes to identify those relevant for use in clinical research. In addition to being labor intensive, this method provides no repeatable framework that other investigators may follow to create lists of ICD-10 codes for event detection.
Researchers may wish to utilize the 8-step framework we have provided to generate ICD-10 codes derived from existing lists of ICD-9 codes. Using duplicative manual comparison, we found that this method yielded clinically valid codes more than 97% of the time for the two ICD-9 lists we analyzed. Furthermore, while these two lists of codes were obtained from prior longitudinal work surrounding outcome identification after two cardiovascular procedures, these events comprised more than just procedure-specific events. The codes reviewed herein encompass an array of multi-system complications which may manifest following diverse medical and surgical interventions. For example, we performed code crosswalks for events including cardiac dysrhythmias, heart failure, stroke, and ventilator assisted pneumonia. Therefore, the scope of this work likely extends beyond carotid revascularization and endovascular aortic aneurysm repair and has implications for researchers seeking to create and analyze valid sets of ICD-10 codes for research in many clinical fields.
Interestingly, several ICD-9 codes did not match to any ICD-10 diagnosis codes. This most often occurred when ICD-10 codes were more specific than the starting ICD-9 code. In addition, our initial list of ICD-9 codes used for translation was designed to be sensitive in event detection. For this reason, some ICD-9 codes that were less likely to represent complications but that were close in numerical value to other more important diagnosis codes were included in our event detection algorithm. How these factors might affect event detection in claims data, and whether these events represent true clinical outcomes is an area of active investigation for our group.
ICD-10 is likely to be in widespread use for many years. The enhanced granularity of ICD-10 offers many new opportunities for research advances, but its utilization will require the creation of new sets of billing codes to accurately represent clinical events. The derivation and validation of these billing codes is not the responsibility of any one investigator, nor should investigators be working in isolation to accomplish this task. With these things in mind, The Dartmouth Institute has created a public website where advances in ICD-10 billing code derivation may be stored and easily accessed by any investigator from any institution (http://www.dartmouthatlas.org/pages/ICD10). We implore investigators also navigating the transition from ICD-9 to ICD-10 to collaborate with the many investigators across the country seeking similar goals.
Our method has several limitations. First, the General Equivalence Mapping database was designed for billing rather than clinical research, and as such has inherent limitations in ICD-10 code generation.21, 23 Second, while we took care to exhaustively review our coding algorithms, it remains possible that other relevant codes exist which were not identified with this method.23 Finally, while our sets of ICD-9 codes represented complications involving multiple organ systems, we cannot comment on the validity of ICD-10 code generation beyond the events that we studied. Nevertheless, this methodological framework provides a starting point for investigators seeking to derive sets of ICD-10 codes from their existing lists of ICD-9 codes.
Conclusions
We found that by using publicly available files, we were able to translate previously validated sets of ICD-9 codes used in clinical research for two different cardiovascular procedures into valid ICD-10 codes suitable for event identification in claims-based data. We found that ICD-10 coding lists were likely to be 11–14% larger than their ICD-9 counterparts, and that approximately 2–3% of codes generated by automated matching systems would not be deemed useful during a clinically rigorous evaluation. Cardiovascular researchers need to emphasize a systematic approach in developing and validating ICD-10 coding algorithms to ensure accurate assessment of outcomes.
Supplementary Material
Acknowledgements
Stephanie Tomlin, MS, MPA
Funding and Disclosures:
The authors have no conflicts of interest to report. Funding support was provided by the FDA (U01-FD005478), the National Institute on Aging (PO1- AG19783), and the National Institutes of Health Common Fund (U01-AG046830). The funders had no role in the design or execution of the study.
Footnotes
Disclaimer:
The views expressed do not necessarily represent the views of the Department of Veterans Affairs or the United States Government.
References
- 1.Stukel TA, Fisher ES, Wennberg DE, Alter DA, Gottlieb DJ and Vermeulen MJ. Analysis of observational studies in the presence of treatment selection bias: effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. Jama. 2007;297:278–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Brooke BS, Goodney PP, Kraiss LW, Gottlieb DJ, Samore MH and Finlayson SRG. Readmission destination and risk of mortality after major surgery: an observational cohort study. The Lancet. 2015;386:884–895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Garg T, Baker LC and Mell MW. Postoperative Surveillance and Long-term Outcomes After Endovascular Aneurysm Repair Among Medicare Beneficiaries. JAMA Surg. 2015;150:957–63. [DOI] [PubMed] [Google Scholar]
- 4.Schermerhorn ML, Buck DB, O’Malley AJ, Curran T, McCallum JC, Darling J and Landon BE. Long-Term Outcomes of Abdominal Aortic Aneurysm in the Medicare Population. N Engl J Med. 2015;373:328–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Gavrielov-Yusim N and Friger M. Use of administrative medical databases in population-based research. J Epidemiol Community Health. 2014;68:283–7. [DOI] [PubMed] [Google Scholar]
- 6.Centers for Disease Control and Prevention. International Classification of Diseases, Ninth Revision (ICD-9). Accessed May 1st, 2017 Available at https://www.cdc.gov/nchs/icd/icd9.htm. [Google Scholar]
- 7.HCUP Clinical Classifications Software (CCS) for ICD-9-CM. Healthcare Cost and Utilization Project (HCUP). 2015. Agency for Healthcare Research and Quality, Rockville, MD: 2017. [Google Scholar]
- 8.Warren JL, Klabunde CN, Schrag D, Bach PB and Riley GF. Overview of the SEER-Medicare data: content, research applications, and generalizability to the United States elderly population. Med Care. 2002;40:IV-3–18. [DOI] [PubMed] [Google Scholar]
- 9.Goodney PP, Lucas FL, Travis LL, Likosky DS, Malenka DJ and Fisher ES. Changes in the use of carotid revascularization among the medicare population. Archives of surgery. 2008;143:170–3. [DOI] [PubMed] [Google Scholar]
- 10.Jacobs JP, Edwards FH, Shahian DM, Haan CK, Puskas JD, Morales DL, Gammie JS, Sanchez JA, Brennan JM, O’Brien SM, Dokholyan RS, Hammill BG, Curtis LH, Peterson ED, Badhwar V, George KM, Mayer JE Jr., Chitwood WR Jr., Murray GF and Grover. Successful linking of the Society of Thoracic Surgeons adult cardiac surgery database to Centers for Medicare and Medicaid Services Medicare data. Ann Thorac Surg. 2010;90:1150–6; discussion 1156–7. [DOI] [PubMed] [Google Scholar]
- 11.Nallamothu BK, Gurm HS, Ting HH, Goodney PP, Rogers MA, Curtis JP, Dimick JB, Bates ER, Krumholz HM and Birkmeyer JD. Operator experience and carotid stenting outcomes in Medicare beneficiaries. Jama. 2011;306:1338–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Andrus BW and Welch HG. Medicare services provided by cardiologists in the United States: 1999–2008. Circ Cardiovasc Qual Outcomes. 2012;5:31–6. [DOI] [PubMed] [Google Scholar]
- 13.Brennan JM, Peterson ED, Messenger JC, Rumsfeld JS, Weintraub WS, Anstrom KJ, Eisenstein EL, Milford-Beland S, Grau-Sepulveda MV, Booth ME, Dokholyan RS, Douglas PS and Duke Clinical Research Institute DT. Linking the National Cardiovascular Data Registry CathPCI Registry with Medicare claims data: validation of a longitudinal cohort of elderly patients undergoing cardiac catheterization. Circ Cardiovasc Qual Outcomes. 2012;5:134–40. [DOI] [PubMed] [Google Scholar]
- 14.Hoel AW, Faerber AE, Moore KO, Ramkumar N, Brooke BS, Scali ST, Sedrakyan A and Goodney PP. A pilot study for long-term outcome assessment after aortic aneurysm repair using Vascular Quality Initiative data matched to Medicare claims. Journal of vascular surgery. 2017;66:751–759 e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tseng VL, Chlebowski RT, Yu F, Cauley JA, Li W, Thomas F, Virnig BA and Coleman AL. Association of Cataract Surgery With Mortality in Older Women: Findings from the Women’s Health Initiative. JAMA ophthalmology. 2018. January 1;136(1):3–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zuckerman RB, Joynt Maddox KE, Sheingold SH, Chen LM and Epstein AM. Effect of a Hospital-wide Measure on the Readmissions Reduction Program. N Engl J Med. 2017;377:1551–1558. [DOI] [PubMed] [Google Scholar]
- 17.Suckow BD, Goodney PP, Columbo JA, Kang R, Stone DH, Sedrakyan A, et al. National trends in open surgical, endovascular, and branched-fenestrated endovascular aortic aneurysm repair in Medicare patients. Journal of vascular surgery. 2018;67(6):1690–7 e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Center for Medicare and Medicaid Services. Accessed March 1st, 2017 Available at www.cms.gov.
- 19.Bensley RP, Yoshida S, Lo RC, Fokkema M, Hamdan AD, Wyers MC, Chaikof EL and Schermerhorn ML. Accuracy of administrative data versus clinical data to evaluate carotid endarterectomy and carotid stenting. Journal of vascular surgery. 2013;58:412–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.National Bureau of Economic Research. Accessed March 1st, 2017 Available at www.nber.org.
- 21.Fung KW, Richesson R, Smerek M, Pereira KC, Green BB, Patkar A, Clowse M, Bauck A and Bodenreider O. Preparing for the ICD-10-CM Transition: Automated Methods for Translating ICD Codes in Clinical Phenotype Definitions. EGEMS (Wash DC). 2016;4:1211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Center for Disease Control and Prevention. Accessed March 1st, 2017 Available at www.cdc.gov.
- 23.Jones LM and Nachimson S Use Caution When Entering the Crosswalk: A Warning About Relying on GEMs as Your ICD-10 Solution. 2014. Accessed August 31st 2017 Available from: http://www.cms.org/uploads/ICDLogicGEMSWhitePaper.pdf.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.