Skip to main content
Journal of the American Medical Informatics Association: JAMIA logoLink to Journal of the American Medical Informatics Association: JAMIA
. 2017 Feb 16;24(4):845–850. doi: 10.1093/jamia/ocw181

Development and validation of a structured query language implementation of the Elixhauser comorbidity index

Richard H Epstein 1,, Franklin Dexter 2
PMCID: PMC7651966  PMID: 28339644

Abstract

Objective: Comorbidity adjustment is often performed during outcomes and health care resource utilization research. Our goal was to develop an efficient algorithm in structured query language (SQL) to determine the Elixhauser comorbidity index.

Materials and Methods: We wrote an SQL algorithm to calculate the Elixhauser comorbidities from Diagnosis Related Group and International Classification of Diseases (ICD) codes. Validation was by comparison to expected comorbidities from combinations of these codes and to the 2013 Nationwide Readmissions Database (NRD).

Results: The SQL algorithm matched perfectly with expected comorbidities for all combinations of ICD-9 or ICD-10, and Diagnosis Related Groups. Of 13 585 859 evaluable NRD records, the algorithm matched 100% of the listed comorbidities. Processing time was ∼0.05 ms/record.

Discussion: The SQL Elixhauser code was efficient and computationally identical to the SAS algorithm used for the NRD.

Conclusions: This algorithm may be useful where preprocessing of large datasets in a relational database environment and comorbidity determination is desired before statistical analysis. A validated SQL procedure to calculate Elixhauser comorbidities and the van Walraven index from ICD-9 or ICD-10 discharge diagnosis codes has been published.

Keywords: comorbidity, computing methodologies, algorithms, software validation

BACKGROUND AND SIGNFICANCE

Determination of a patient’s comorbid conditions is often performed to adjust for disease severity during outcomes research or health care resource utilization analysis. Comorbidities are medical conditions unrelated to the primary reason for a patient’s hospitalization, ideally excluding hospital-acquired complications.1 The Elixhauser comorbidity index is a common classifier originally developed to predict in-hospital mortality, hospital charges, and length of stay.1 This binary array of comorbidities (Table 1) is based on International Classification of Diseases, Ninth Edition (ICD-9) codes and Medicare Severity Diagnosis Related Groups (DRGs) from hospital discharge abstracts and has been used extensively with large administrative datasets.2–6 The van Walraven index is an empirical weighted combination of the comorbidities, producing a single numeric value predicting 6-month mortality.7 Quan et al.8 modified the Elixhauser comorbidity index for use with ICD-10 codes. The Elixhauser comorbidity index outperforms the Charlson index, another index commonly used for comorbidity risk adjustment.9–12

Table 1.

Elixhauser Comorbidities and van Walraven Weights

Comorbidity Description Example van Walraven Weighta
AIDS Acquired immune deficiency syndrome AIDS wasting disease 0
ALCOHOL Alcohol abuse Alcohol withdrawal syndrome 0
ANEMDEF Deficiency anemia Iron-deficiency anemia −2
ARTH Rheumatoid arthritis/collagen vascular diseases Systemic lupus erythematosus 0
BLDLOSS Blood-loss anemia Menorrhagia −2
CHF Congestive heart failure Left ventricular failure 7
CHRNLUNG Chronic pulmonary disease Emphysema 3
COAG Coagulopathy Hemophilia A 3
DEPRESS Depression Bipolar disorder −3
DM Diabetes, uncomplicated Type 2 diabetes without complications 0
DMCX Diabetes, complicated Diabetes with peripheral neuropathy 0
DRUG Drug abuse Opioid addiction −7
HTN_C Hypertension with/without complications Hypertension 0
HYPOTHY Hypothyroidism Thyroid goiter 0
LIVER Liver disease Cirrhosis 11
LYMPH Lymphoma Hodgkin’s lymphoma 9
LYTES Fluid and electrolyte disorders Hyponatremia 5
METS Metastatic cancer Lung cancer with brain metastases 12
NEURO Other neurological disorders Seizure disorder 6
OBESE Obesity Morbid obesity −4
PARA Paralysis Quadriplegia 7
PERIVASC Peripheral vascular disorders Claudication 2
PSYCH Psychoses Schizophrenia 0
PUMCIRC Pulmonary circulation disorders Pulmonary hypertension 4
RENLFAIL Renal failure Renal failure on dialysis 5
TUMOR Solid tumor without metastasis Malignant colon polyp 4
ULCER Peptic ulcer disease excluding bleeding Duodenal ulcer 0
VALVE Valvular disease Aortic stenosis −1
WGHTLOSS Weight loss Protein calorie malnutrition 6
HTNb Hypertension without complications Benign essential hypertension 0
HTNCXb Hypertension with complications Hypertension with renal failure 0
ARRHYTHc Cardiac arrhythmias Atrial fibrillation 5

aThe van Walraven index is calculated by summing the listed weights for each Elixhauser comorbidity that is present (ref. 7).

bHCUP combines the HTN and HTNCX comorbidities in its Severity files, as was done originally by Elixhauser et al. (ref. 1).

cHCUP excludes this comorbidity in its Severity files, although it was originally included by Elixhauser et al. (ref. 1)

Elixhauser comorbidity measures, DRGs, and ICD-9 or ICD-10 codes are provided in databases describing hospitalization in the United States (eg, Nationwide Inpatient Sample, Nationwide Readmission Database). These are available to researchers from the Agency for Healthcare Research and Quality’s Healthcare Cost and Utilization Project (HCUP).13 HCUP provides ASCII text versions of the SAS software it uses to calculate Elixhauser comorbidities (SAS Institute, Cary, NC, USA).14 Other versions of Elixhauser software are available online,15,16 but their accuracy has not been validated and performance metrics are not publicly available.

The primary objective of our study was to develop, validate, and make available a high-performance algorithm, written in standard structured query language (SQL), to calculate the Elixhauser comorbidities and van Walraven index. This algorithm would likely be of use for analysis of comorbidities from electronic medical records at individual institutions. Such an algorithm would also facilitate use of large administrative datasets: (1) where the Elixhauser comorbidities have not been precomputed (eg, state Medicare files), (2) where modification of the methodology is desired (eg, changing mappings of ICD or DRG exclusions), (3) when DRGs are not relevant (eg, nonhospitalized patients), and (4) to perform sensitivity analyses (eg, based on inclusion or exclusion of various components).

MATERIALS AND METHODS

Data sources

The 2013 Nationwide Readmissions Database (NRD) was purchased and utilized subject to the HCUP data use agreement. The 4 NRD files (Core, Hospital, Severity Measure, and Diagnosis and Procedure Groups) were imported into SQL Server (Microsoft, Redmond, WA, USA). These comprise 14 325 172 discharges or in-hospital deaths from 21 states that provided complete data. The Elixhauser comorbidity system software (v3.7) was downloaded from the HCUP website.17,18 Cardiac arrhythmia ICD-9 and ICD-10 codes were obtained from the University of Manitoba,19 as HCUP does not calculate this comorbidity.

Algorithm development

SQL tables were created, mapping the ICD-9 and ICD-10 codes to the corresponding Elixhauser comorbidities and DRGs to comorbidities for exclusion. There were 355 combinations of DRG and Elixhauser comorbidities (some DRGs mapped to multiple comorbidities) and 1495 ICD-9 codes. The ICD-10 table contained 3211 ICD-10 codes and 353 DRG codes. A table was created to apply the van Walraven weights7 to generate the numeric Elixhauser index.

Steps to identify Elixhauser comorbidities were determined from the procedural code in the HCUP SAS algorithm. Equivalent code was then written in Transact-SQL (Microsoft). The SQL procedure is described schematically in Figure 1, with the full system available as supplemental web content (https://drive.google.com/file/d/0B7SSpT_sKeFib3h2X2VDYzhyRFU/view?usp=sharing and https://drive.google.com/file/d/0B7SSpT_sKeFiV1BqNnRhZmFjUTg/view?usp=sharing). The SQL procedure handles either ICD-9 or ICD-10 input and allows for options in processing ICD and DRG codes.

Figure 1.

Figure 1.

SQL Elixhauser Algorithm Flowchart. This flow diagram outlines the steps in the processing of entries by the Transact-SQL code to determine the Elixhauser comorbidities for a given set of Medicare Severity Diagnosis Related Groups (DRGs) and International Classification of Diseases (ICD) diagnosis codes.

For evaluation of the SQL algorithm, the ICD-9 codes were copied from the NRD Core file (organized by rows containing up to 25 diagnoses) into a normalized table with the KEY_NRD, diagnosis order (1–25), and ICD-9 code as columns.

Validation of algorithm performance

A 2-step approach to validate the SQL algorithm was performed. First, a test file was created containing all ICD-9 diagnoses, permutations of diagnoses resulting in the exclusion of an included comorbidity (eg, diabetes with complications excludes diabetes without complications), and all combinations of DRGs and ICD-9 codes. Expected comorbidities based on the SAS algorithm were then computed manually and compared to the output of the SQL algorithm. The SQL code and lookup tables were iteratively modified until there was 100% concordance between the expected and observed comorbidities. This process was repeated using ICD-10 codes.

Second, the SQL algorithm was applied to the 14 325 172 rows in the 2013 NRD database and the output compared to the NRD Severity file. However, 742 232 cases (5.2%) were excluded where HCUP data processing might preclude valid comparison (Table 2). Cases where 25 ICD-9 diagnoses were specified were removed, because Elixhauser comorbidities were computed using all diagnosis codes in the submitted state data; however, only the first 25 diagnoses were included in the Core file (HCUP user support, personal communication, September 2016). Thus, some comorbidities might not be calculated by the SQL algorithm. Cases involving transfer upon discharge from a hospital to another facility (eg, rehabilitation center, another hospital) were excluded, as the ICD-9 codes from both locations were included (up to 25), but the comorbidities were computed separately for each facility and then combined (HCUP user support, personal communication, September 2016). This could result in combinations of included comorbidities that otherwise would have been excluded (eg, complicated and uncomplicated hypertension). In addition, only a single discharge DRG was supplied (from the first or second facility, depending on the transfer type). The SQL code and lookup tables were iteratively modified until all mismatches were resolved.

Table 2.

Comparison of the SQL Elixhauser Algorithm to the Provided Comorbidities from the Nationwide Readmissions Database (NRD)

No. of records in the NRD database No. of records Percentage of all records
All 14 325 172 100
25 ICD-9 diagnosis codes 465 540 3.25
Same day of discharge transfer 399 814 2.79
Same day of discharge transfer and 25 ICD-9 codes 126 041 0.88
Total checked against SQL code 13 585 859 94.84

SQL, structured query language; ICD-9, International Classification of Diseases, Ninth Revision.

Because reference files from HCUP with ICD-10 codes are not yet available from HCUP, this second step was not performed for the ICD-10 algorithm version.

Evaluation of algorithm performance

Execution times were calculated from timestamps during 5 repetitions of the algorithm against NRD datasets containing between 1000 and 200 000 000 ICD-9 codes (1032 to 20 551 330 records). Performance was assessed as the means and standard deviations of the per-record and total processing times. The SQL was executed on a Dell server with an Intel Xenon 2.9 GHz CPU (2 processors, 16.0 GB of RAM, 64-bit Windows Server 2008 R2, and SQL Server 2008 R2).

RESULTS

The full enumeration of ICD-9 and ICD-10 codes and DRGs resulted in perfect matching with the expected comorbidities (Tables 3 and 4). When tested against the 13 585 859 evaluable cases in the NRD (Table 2), 49 discrepancies were found. For each case, the same DRG (039) was present that should have excluded the paralysis comorbidity in the NRD. In response to our inquiry, HCUP user support indicated that there was a coding error in the SAS program that was not corrected until after the NRD files had been created (personal communication, September 2016). Thus, there was 100% concordance between the 2 algorithms.

Table 3.

Comparison of the SQL Elixhauser Algorithm Comorbidities to Expected Comorbidities Using ICD-9 Codes

DRG ICD-9 codes
No. of combinations tested No. of matching algorithms Error rate (%)
Primary diagnosis #1 Secondary diagnosis #2 Secondary diagnosis #3
All ICD-9a 1495 1495 0
All codesa 1495 1495 0
DMb DMCXb 2146 2146 0
HTNc HTNCXc 728 728 0
TUMORd METSd 49 248 49 248 0
All e All codes 530 725 530 725 0

aAll 1495 ICD-9 codes that map to an Elixhauser comorbidity were tested. The Elixhauser methodology excludes the primary diagnosis from consideration as a comorbidity. The testing involved confirmation that each ICD-9 code listed as a secondary diagnosis mapped to the expected comorbidity, and that no ICD-9 code listed as the primary diagnosis mapped to a comorbidity.

bAll 29 diabetes mellitus without complications (DM) codes and all 37 diabetes mellitus with complications (DMCX) codes were included. Testing confirmed that when both comorbidities are present, the DM comorbidity is set to 0.

cAll 7 hypertension without complications (HTN) codes and all 52 hypertension with complications (HTN_COMP) codes were included. Testing confirmed that when both comorbidities are present, in either order, the HTN comorbidity is set to 0. In the HCUP implementation, the HTN and HTN_COMP comorbidities are combined into a single hypertension comorbidity (HTN_C).

dAll 324 malignancy (TUMOR) codes and all 38 metastatic disease (METS) codes were included. Testing confirmed that when both comorbidities are present, in either order, the TUMOR comorbidity is set to 0.

eAll 355 DRG codes mapping to an Elixhauser comorbidity were combined with each of the 1495 ICD-9 codes to test the Elixhauser comorbidity exclusion process. Morbidities related to the DRG are excluded as a comorbidity, as they are considered to be further specifications of the diagnoses included within the DRG. For example, if the cardiac DRG 1 (Heart transplant or implant of heart assist system with major complications or comorbidities) is assigned, hypertension, heart valve, and pulmonary circulation comorbidities are set to 0.

SQL, structured query language; DRG, Medicare Severity Diagnosis Related Group; ICD-9, International Classification of Diseases, Ninth Revision; HCUP, Healthcare Cost and Utilization Project.

See Table 1 for the key to comorbidity abbreviations.

Table 4.

Comparison of the SQL Elixhauser Algorithm Comorbidities to Expected Comorbidities Using ICD-10 Codes

DRG ICD-10-CM Codes
No. of permutations tested No. matching HCUP algorithm Error rate (%)
Primary diagnosis #1 Secondary diagnosis #2 Secondary diagnosis #3
All ICD-10a 3211 3211 0
All codesa 3211 3211 0
DMb DMCXb 18 972 18 972 0
HTNc HTNCXc 611 1222 0
TUMORd METSd 58 520 58 520 0
All e All codes 1 133 483 1 133 483 0

aAll 3211 ICD-10 codes that map to an Elixhauser comorbidity were tested. The Elixhauser methodology excludes the primary diagnosis from consideration as a comorbidity. The testing involved confirmation that each ICD-10 code listed as a secondary diagnosis mapped to the expected comorbidity, and that no ICD-10 code listed as the primary diagnosis mapped to a comorbidity.

bAll 51 diabetes mellitus without complications (DM) codes and all 186 diabetes mellitus with complications (DMCX) codes were included. Testing confirmed that when both comorbidities are present, in either order, the DM comorbidity is set to 0.

cAll 13 hypertension without complications (HTN) codes and all 47 hypertension with complications (HTNCX) codes were included. Testing confirmed that when both comorbidities are present, in either order, the HTN comorbidity is set to 0. In the HCUP implementation, the HTN and HTNCX comorbidities are combined into a single hypertension comorbidity (HTN_C).

dAll 532 malignancy (TUMOR) codes and all 55 metastatic disease (METS) codes were included. Testing confirmed that when both comorbidities are present, in either order, the TUMOR comorbidity is set to 0.

eAll 353 DRG codes mapping to an Elixhauser comorbidity were combined with each of the 3211 ICD-10 codes to test the Elixhauser comorbidity exclusion process. Morbidities related to the DRG are excluded as a comorbidity, as they are considered to be further specifications of the diagnoses included within the DRG. For example, if the cardiac DRG 1 (Heart transplant or implant of heart assist system with major complications or comorbidities) is assigned, hypertension, heart valve, and pulmonary circulation comorbidities are set to 0.

SQL, structured query language; DRG, Medicare Severity Diagnosis Related Group; ICD-10, International Classification of Diseases, Tenth Revision; HCUP, Healthcare Cost and Utilization Project.

See Table 1 for the key to comorbidity abbreviations.

The SQL Elixhauser algorithm executed quickly, with a mean (standard error) time to process 10 275 650 records (containing a total of 100 000 000 ICD-9 codes), averaged over 5 replications, of 8.26 (0.08) min (Figure 2). The average time to process each record ranged from 0.048 to 0.053 ms between 1 027 812 and 20 551 300 records (10 000 000–200 000 000 ICD-9 codes) (Figure 2). Processing times per record increased below 100 000 records, with an average time of 0.35 ms per record for the smallest dataset of 1032 records. This decreased efficiency resulted from fixed overhead associated with creating temporary tables, importing lookup tables, and other concurrent database processes, representing a substantive portion of the very small total processing time.

Figure 2.

Figure 2.

SQL Elixhauser Algorithm Performance. Datasets ranging from 10000 to 200 000 000 ICD-9 codes were created from the Nationwide Readmission Database, representing approximately 1032 to 20 551 300 patient records. The algorithm was executed 5 times with each dataset, and the results averaged. Plotted are the mean and standard error of the mean of the processing times in msec/record (blue circles) and the total time in seconds (red circles). There is overhead associated with creating temporary tables, loading in codes from the database, etc., that represents a larger percentage of the total processing time for smaller datasets than for the larger datasets, resulting in longer average processing times when calculated on a per record basis. Above 1 000 000 records, the time to process each record was nearly constant at ∼0.05 ms/record.

DISCUSSION

Our SQL implementation of the Elixhauser algorithm is highly efficient and computationally identical to the HCUP SAS comorbidity code. While such equivalence might be regarded as a predictable result of the conversion of 1 computational algorithm to another, this is not necessarily true: coding errors can occur and scalability cannot be assumed. Use of unproven source code for research or treatment purposes is thus problematic. Our development and validation of a standard SQL implementation for the calculation of Elixhauser comorbidities and the van Walraven index provides assurance that those working in relational database environments can use our algorithm to calculate these elements correctly.

There is no practical limit to the number of ICD codes per record that our SQL algorithm can process; however, the maximum number of diagnoses provided by any state to the HCUP 2013 State Inpatient Database was 61 (Indiana).20 Processing speed will vary by installation, depending, in part, on the CPU, amount of random access memory, hard drive type, and relational database system used. The code should be portable to other SQL dialects (eg, Oracle, MySQL), as American National Standards Institute-92 standard syntax was used.

Because the values of the ICD, DRG, and weighting factors are stored in tables rather than hard-coded, updates only require adding additional codes, not modifying the algorithm. While ICD-9 mappings are “frozen,” there will be periodic modifications in comorbidity calculations using ICD-10, as occurred with ICD-9.21

The SQL algorithm provides information that is not included in the HCUP files. The output lists complicated and uncomplicated hypertension and the arrhythmia comorbidity.22 Parameters are present that allow including the first diagnosis (excluded in the HCUP code) and applying or ignoring DRG exclusions.

A limitation of our study is that use of the algorithm requires SQL expertise. However, analysts pulling data from the relational databases of enterprise electronic health systems (eg, Cerner, Epic) typically do not use statistical software such as SAS or STATA, but rather SQL. Preprocessing steps, including calculation of comorbidity indices, might be more convenient to execute within the database prior to statistical analysis. Implementing our code from the supplemental web content provided is straightforward, as there are only 2 stored procedures. The first builds and populates the ICD-9 and ICD-10 lookup tables. The second calculates the Elixhauser comorbidities and the van Walraven index from an input table containing ICD-9 or ICD-10 codes for each case. A known limitation of programs such as SAS and STATA is the extremely large storage and memory requirements when big datasets are processed, sometimes requiring file segmentation and active memory management.23

CONCLUSIONS

We developed a highly efficient, validated SQL implementation of the Elixhauser comorbidity algorithm that is computationally equivalent to the SAS comorbidity code used by HCUP. Our method could be of particular use to researchers dealing with large datasets in a relational database environment where there is a desire to calculate Elixhauser comorbidities prior to exporting the data for statistical analysis.

ACKNOWLEDGMENTS

None.

Competing Interests

None of the authors reports any competing interests.

Funding

The project was funded from intradepartmental resources. This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sector. The study was funded solely from departmental resources.

Contributorship

RHE: substantial contributions to conception and design, acquisition of data, analysis and interpretation of data; drafting the article and revising it critically for important intellectual content; and final approval of the version to be published. This author is the guarantor. FD: substantial contributions to conception and design, acquisition of data, and analysis and interpretation of data; drafting the article and revising it critically for important intellectual content; and final approval of the version to be published.

References


Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES