Data on coding association rules from an inpatient administrative health data coded by International classification of disease - 10th revision (ICD-10) codes

Mingkai Peng; Vijaya Sundararajan; Tyler Williamson; Evan P Minty; Tony C Smith; Chelsea TA Doktorchik; Hude Quan

doi:10.1016/j.dib.2018.02.043

. 2018 Feb 16;18:710–712. doi: 10.1016/j.dib.2018.02.043

Data on coding association rules from an inpatient administrative health data coded by International classification of disease - 10th revision (ICD-10) codes

Mingkai Peng ^a,^⁎, Vijaya Sundararajan ^b, Tyler Williamson ^a, Evan P Minty ^c, Tony C Smith ^d, Chelsea TA Doktorchik ^a, Hude Quan ^a

PMCID: PMC5995749 PMID: 29896537

Abstract

Data presented in this article relates to the research article entitled “Exploration of association rule mining for coding consistency and completeness assessment in inpatient administrative health data” (Peng et al. [1]) in preparation).

We provided a set of ICD-10 coding association rules in the age group of 55 to 65. The rules were extracted from an inpatient administrative health data at five acute care hospitals in Alberta, Canada, using association rule mining. Thresholds of support and confidence for the association rules mining process were set at 0.19% and 50% respectively. The data set contains 426 rules, in which 86 rules are not nested. Data are provided in the supplementary material. The presented coding association rules provide a reference for future researches on the use of association rule mining for data quality assessment.

Specifications Table

Subject area	Medicine
More specific subject area	International Classification of Disease – 10^th revision (ICD-10) diagnosis codes in hospital setting
Type of data	Table
How data was acquired	Administrative health data coded by professional coders at acute care hospitals
Data format	Analyzed
Experimental factors	Association rule mining was conducted on an inpatient administrative health data at the age group of 55 to 65 to extract the coding association rules
Experimental features	Thresholds of support and confidence for association rule mining were set at 0.19% and 50% respectively.
Data source location	Alberta, Canada
Data accessibility	Data submitted with this article

Open in a new tab

Value of the data

•
Data could be used to assess data quality in ICD-10 coded health data.
•
These data provide the reference for the future studies on the development of data quality rules in observational health data using association rule mining.
•
These data will make it possible to improve the quality of studies using ICD-10 coded data for evidence generation, by understanding the associations hidden in the data.
•
Data on association rules can be used as a cost-effective way to improve the quality of data collection in hospital settings.

1. Data

ICD-10 classification system has been used by many countries for coding cause of death and for hospital morbidities as mandated by World Health Organization (WHO). We provided a set of ICD-10 coding association rules in the age group of 55 to 65 learned from an inpatient administrative health data [1]. In total, there were 426 rules with 86 rules not nested in the other rules. The rules captured meaningful clinical associations hidden in the database.

2. Experimental design, materials and methods

We used Alberta hospital discharge abstract data (DAD) for association rule mining. Following the guideline developed by the Canadian Institution of Health Information (CIHI), hospital coders abstract clinical documents (e.g. discharge summary) using ICD-10, Canada (ICD-10-CA) classification system into diagnosis codes. ICD-10-CA was developed by CIHI based on the ICD-10 classification terminology from WHO by adding one or more digits for some diagnosis codes. For each hospital admission, a coder can assign up to 25 ICD-10-CA codes. We extracted 26378 DAD records at the age group of 55 to 65 from 5 acute care hospitals in 2013. The ICD-10-CA diagnosis codes were mapped back to ICD-10 for international generalizability before analysis.

Association rule mining is the process of finding clinical and interesting associations or patterns hidden in data. An association rule is an expression of X → Y, where X and Y are disjoint and nonempty code sets. Code sets of X and Y are the left-hand side (LHS) and right-hand side (RHS) of the rule, respectively. The strength of an association rule can be measured in terms of support and confidence. The Apriori algorithm implemented in R package arules was used for association rule mining on ICD-10 codes [2]. The thresholds of support and confidence for association rule mining were set at 0.19% and 50% respectively. Bootstrapping was used in the rule mining process to ensure generalizability of the developed rules. Nested rules are identified, with two rules being considered nested if they have the same RHS and the LHS of one rule is a subset of the LHS of the other rule. For example, two rules of X → Y and {X, Z} → Y are nested. In total, there were 426 rules with 86 rules not nested. The support and confidence of rules were presented in the data. We also included the values of two commonly used measures: lift and conviction [3]. Description of each variable in the data are provided. The data are submitted as supplementary materials in excel format with the article.

Acknowledgements

This work was supported by the Canadian Institutes of Health Research grant no. 365973.

Footnotes

^{Transparency document}

Transparency data associated with this article can be found in the online version at 10.1016/j.dib.2018.02.043.

^{Appendix A}

Supplementary data associated with this article can be found in the online version at 10.1016/j.dib.2018.02.043.

Transparency document. Supplementary material

Supplementary material

mmc1.docx^{(12.5KB, docx)}

Appendix A. Supplementary material

Supplementary material

mmc2.xlsx^{(48.8KB, xlsx)}

References

1.M. Peng, V. Sundararajan, T. Williamson, E.P. Minty, T.C. Smith, C.T.A. Doktorchik, H. Quan Exploration of association rule mining for coding consistency and completeness assessment in inpatient administrative health data J. Biomed. Inform. 79, 2018, 41-47. [DOI] [PubMed]
2.Hahsler M., Grün B., Hornik K. arules - A computational environment for mining association rules and frequent item sets. J. Stat. Softw. 2005;1(15) [Google Scholar]
3.Tan P.-N., Steinbach M., Kumar V. Introduction to Data Mining. 1st ed. Pearson Addison Wesley; Boston: 2005. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

mmc1.docx^{(12.5KB, docx)}

Supplementary material

mmc2.xlsx^{(48.8KB, xlsx)}

[bib1] 1.M. Peng, V. Sundararajan, T. Williamson, E.P. Minty, T.C. Smith, C.T.A. Doktorchik, H. Quan Exploration of association rule mining for coding consistency and completeness assessment in inpatient administrative health data J. Biomed. Inform. 79, 2018, 41-47. [DOI] [PubMed]

[bib2] 2.Hahsler M., Grün B., Hornik K. arules - A computational environment for mining association rules and frequent item sets. J. Stat. Softw. 2005;1(15) [Google Scholar]

[bib3] 3.Tan P.-N., Steinbach M., Kumar V. Introduction to Data Mining. 1st ed. Pearson Addison Wesley; Boston: 2005. [Google Scholar]

PERMALINK

Data on coding association rules from an inpatient administrative health data coded by International classification of disease - 10th revision (ICD-10) codes

Mingkai Peng

Vijaya Sundararajan

Tyler Williamson

Evan P Minty

Tony C Smith

Chelsea TA Doktorchik

Hude Quan

Abstract

1. Data

2. Experimental design, materials and methods

Acknowledgements

Footnotes

Transparency document. Supplementary material

Appendix A. Supplementary material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Data on coding association rules from an inpatient administrative health data coded by International classification of disease - 10th revision (ICD-10) codes

Mingkai Peng

Vijaya Sundararajan

Tyler Williamson

Evan P Minty

Tony C Smith

Chelsea TA Doktorchik

Hude Quan

Abstract

1. Data

2. Experimental design, materials and methods

Acknowledgements

Footnotes

Transparency document. Supplementary material

Appendix A. Supplementary material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases