Abstract
Clinical decision support is essential for achieving the maximum value from electronic medical records. Content for decision support systems is usually developed manually, and developing good content is difficult. In this paper, we propose an alternate method to develop decision support content automatically through data mining of past ordering behaviors. We present two data mining methods from computer science: frequent itemset mining, which we use to learn order sets, and association rule mining, which we use to learn corollary orders. We successfully applied these techniques to a database of 156,756 orders from an ambulatory computerized physician order entry system. This analysis yielded a large pool of clinically relevant decision support content. Compared to manual development, these methods are more efficient, account more fully for local preferences and practice variations, and yield content more readily integrated into clinical systems.
Keywords: Data Mining, Electronic Medical Records, Computerized Physician Order Entry, Clinical Decision Support, Order Sets, Corollary Orders
Introduction
Electronic medical records are steadily gaining in popularity [1] and hold tremendous promise for improving the quality and efficiency of healthcare delivered in the United States [2–4]. However, clinical decision support is critical to maximize their value [5]. There are several types of decision support, including alerts, reminders, documentation templates, as well as tools for facilitating order creation, such as order sets and corollary orders [6].
This paper focuses on order sets and corollary orders, which have been shown to significantly improve clinical practice, more than doubling compliance with best practices in one randomized trial [7]. Order sets are collections of related items which are commonly ordered together. They are frequently created to help clinicians treat specific clinical conditions. For example, a physician might develop an order set for diabetic care that contains a prescription for insulin, an oral hypoglycemic agent, home blood glucose testing equipment and diabetes education. When seeing a patient with newly diagnosed diabetes, the physician would use the order set to order all of this care. Corollary orders are related to order sets: they are orders triggered as a consequence of another order. For example, frequent coagulation testing is required for patients receiving warfarin (Coumadin®) [8]. An electronic medical record could be programmed so that appropriate coagulation studies are automatically ordered whenever warfarin is prescribed.
Although order sets and corollary orders can be very effective, they are difficult and time consuming to develop and maintain [9]. The content is normally developed locally by hospitals and individual physicians. The developers may review the literature and guidelines, or may simply develop the interventions based on their own clinical experience. Many hospitals have dedicated pharmacy and therapeutics committees charged with developing and reviewing such content [10]. This is expensive, and not all hospitals and physicians can do it successfully.
This development expense could be reduced if hospitals could share such content, allowing them to split development costs. However, there is currently no standard way to share order sets and corollary orders between clinical systems. The HL7 standards development organization has begun development of a standard for sharing order sets, and a draft standard is available [11]. The effort is very promising, and the American College of Physicians has announced their support and intention of developing evidence-based order sets encoded in the standard once it is released [12]. However, the standard has not yet been approved, and is not implemented in any commercial system.
In this paper, we propose an alternate approach to developing clinical decision support content: data mining. We present two techniques which can be used to automatically generate order sets and corollary orders by reviewing past ordering behavior. These techniques have several critical advantages over the current methods, particularly in terms of cost and efficiency.
Methods
We used two complementary data mining techniques from the computer science literature in this study. The first is frequent itemset mining (sometimes called market basket analysis). This technique identifies sets of items which frequently occur together in a dataset. In this case, we used it to identify items which are frequently ordered as part of the same patient encounter. The strength of any itemset is defined by its “support”: the number of times the items it contains occur together in the dataset.
The second technique we employed is association rule mining, an extension of frequent itemset mining. Association rule mining finds rules that link items in the dataset probabilistically. Each rule is composed of one or more antecedent items, and one or more consequent items, often diagramed with an arrow. For example, the rule for warfarin ordering described above might be written as:
In this example, warfarin is the antecedent item, and the coagulation study is the consequent item. The strength of these rules is characterized statistically by two properties: support, or the number of times the rule appears in the dataset, and confidence, or the proportion of time that the consequent is seen in the dataset when the antecedent is present.
We applied these methods to data from the ambulatory medical record system in use at Kaiser Permanente Northwest. This system has been described elsewhere [13]. We extracted all orders entered on four randomly selected, non-consecutive weekdays in 2005. The data was de-identified, and the study protocol was approved by the Kaiser Permanente Institutional Review Board.
The dataset comprised a total of 156,756 orders across 70,778 patient encounters. 4,300 unique items were ordered over the four days.
We used the a priori algorithm [14, 15], implemented in free software from the Universiteit Antwerpen [16] to perform both frequent itemset mining and association rule mining on the dataset.
Results
We began the analysis by searching for frequent itemsets. There were 604 unique itemsets which occurred at least 50 times over the four days analyzed. The largest itemset contained 12 items. Of course, as the threshold for inclusion is decreased, the number of itemsets found increases. Reducing the threshold to 20 occurrences results in 2,588 unique itemsets.
In order to focus the analysis, we decided to concentrate on those items which are ordered most often. Figure 1 shows the distribution of items ordered, and reveals that they follow a power-law distribution – a variation on Pareto's classic 80/20 rule. The 100 most commonly ordered items comprise 58% of all orders, and the top 300 comprise 74% of all orders. As such, a decision support intervention, based on order sets and corollary orders that concentrates on these most frequently ordered items can be quite small, but still have significant impact.
Figure 1.
Cumulative frequency of items ordered
Some may argue that decision support which focuses on rare events may be more effective, since clinicians are perhaps less certain of how to proceed in these cases. However, McGlynn has shown that patients receive screening and treatment according to guidelines only 55% of the time [17], and these guidelines are principally comprised of the sorts of commonly ordered items found in the top 100 or top 300 items in our dataset. Focusing on these most common items can lead to substantial improvement.
Our review of frequent itemsets containing commonly ordered items revealed the itemsets to be quite diverse. Many represented common panels of lab tests, such as the Basic and Comprehensive Metabolic Panels, as well as a variety of itemsets which represent condition-specific orders or best practice guidelines. For example, the technique found 96 instances where the following items were co-ordered:
Pneumococcal vaccination
Hepatitis B vaccination
Haemophilus influenzae type b (Hib) vaccination
Diphtheria, tetanus and pertussis (DTaP) vaccination
Inactivated poliovirus vaccination
Vaccine administration (first)
Vaccine administration (additional)
These orders represent recommended vaccines for children at 2, 4 and 6 months from the CDC’s 2005 Recommended Childhood and Adolescent Immunization Schedule [18].
Amongst the largest itemsets, we also found one which contained frequently ordered screening tests for men:
Prostate Specific Antigen (PSA)
Urinalysis
Fasting glucose
Serum creatinine
Lipid panel
and for women:
Bilateral x-ray mammography with two views of each breast
Papanicolaou screen
Lipid panel
Fasting glucose
As described in the methods section, we followed the frequent itemset analysis with a review of association rules. We generated all rules with confidence of 90% and support of at least 20 occurrences. This yielded 1,590,869 rules – far too many to review. However, by limiting our analysis as above to rules containing commonly ordered items, and by excluding frequent laboratory test panels, a more manageable set of items emerged.
The most frequently occurring association rule applied to optometric refraction assessment. 90% of the time that refraction was ordered, an order for spectacles was also entered. Contact lenses were also frequently ordered following refraction.
The next most frequently occurring association rule came as a surprise. It centered on cryosurgery for benign lesions. This surgery is billed under 3 CPT codes:
“17000: Destruction (e.g., laser surgery, electrosurgery, cryosurgery, chemosurgery, surgical curettement), all benign or premalignant lesions (e.g., actinic keratoses) other than skin tags or cutaneous vascular proliferative lesions; first lesion
“17003: Second through fourteenth lesions, each
“17004: Destruction (e.g., laser surgery, electrosurgery, cryosurgery, chemosurgery, surgical curettement), all benign or premalignant lesions (e.g., actinic keratoses) other than skin tags or cutaneous vascular proliferative lesions; 15 or more lesions.” [19]
The billing rules for this procedure require that, if a single lesion is removed, code 17000 is billed. If 2–14 lesions are removed, the first lesion is billed with code 17000, and each additional lesion is billed with code 17003. If 15 or more lesions are removed, code 17004 is billed, but not code 17000 or 17003. We found that 94.2% of the time that code 17003 was billed (meaning that between 2 and 14 lesions were removed) code 17000 was also billed. However, the billing rules require that code 17000 always be billed when code 17003 is billed, so we should see this rule 100% of the time. In other words, this procedure was undercoded 5.8% of the time.
We also located a handful of other procedure related association rules. Several procedures, including endometrial biopsy, skin biopsy, snare polypectomy and vasectomy were strongly predictive of a subsequent order for histopathology. Likewise, procedures frequently performed under sedation, such as colonoscopy, were strongly associated with a variety of orders, including a reservation for a procedure room, drug orders for fentanyl and/or midazolam, and nursing orders for an intravenous line or oxygen.
The best supported association rule between medications was between paclitaxel (Taxol®) and famotidine (Pepcid®). Paclitaxel is a chemotherapeutic agent used for metastatic breast and ovarian cancer, and for Karposi’s sarcoma. Famotodine is a histamine H2 receptor antagonist, most commonly used to treat ulcers, but also used as part of a prophylactic regimen to prevent hypersensitivity reactions in patients receiving paclitaxel [20]. As expected, the other elements of this prophylactic regimen, dexamethasone and diphenhydramine, were also strongly associated with paclitaxel.
Discussion
We have shown that frequent itemset mining and association rule mining can be used to find clinically relevant rules from past ordering behavior. There are direct parallels between these techniques and common decision support paradigms. Itemsets found by frequent itemset mining are baskets of commonly co-occurring orders, and can be used to develop order sets for use in a clinical system. Likewise, patterns found by association rule mining can be implemented in a clinical system as corollary orders: when the antecedent item is ordered, the consequent item can be suggested. The strength of the association rule can be used to adjust the strength of the corollary order. Association rules which are 100% predictive could be implemented as panel orders, so that the consequent is ordered automatically with the antecedent. For weaker rules, the consequent could be suggested, or perhaps moved higher in the list of orderable items.
We have also shown that there are certain high-impact orderable items which are good targets for decision support. Specifically, we identified a very small set of orderable items (300) that accounted for a large percentage of all orders (74%).
This technique for developing order sets and corollary orders has four principle advantages over the more traditional approach of developing such content from scratch by looking at the literature and guidelines, or simply writing them based on the developer’s clinical experience:
The technique is extremely economical, as candidate order sets and corollary orders are automatically learned from past data.
One of the major difficulties of developing order sets and corollary orders from scratch is encoding them. This encoding must take into account the data available in the clinical system, and must be mapped to the terminology used by the system. Since the techniques described here use data which is already structured and encoded by the target clinical system, the patterns found are already structured and encoded to match the system.
The patterns found by these techniques already take into account and reflect local practice variations and preferences (though enabling location variation may not always be beneficial).
Because the techniques are based on frequencies, the patterns they find are likely to bear on frequent clinical tasks. There is evidence to suggest that many ordersets and rules developed from scratch are never used, because the clinical situation they bear on is infrequent [21].
The main disadvantage of the technique is related to its major advantage: it patterns from past behavior and, just because an ordering pattern is frequent, does not mean it is evidence-based, cost-effective, reasonable or even safe. It is important that all patterns found by these techniques be carefully reviewed for clinical relevance before being implemented as order sets or corollary orders in a clinical system.
It may be possible to use these techniques to build a closed loop decision support system that automatically develops and institutes order sets and corollary orders by observing the ordering patterns of users. Such a system could continually measure adherence to these rules, deleting or adjusting rules with poor adherence. However, an entirely unsupervised system may not be desirable because past ordering behaviors may not be optimal, as described above. Critical review of new rules is important to avoid institutionalizing common but sub-optimal practices.
Study Limitations and Future Work
There are several limitations to this study. First, although it analyzes the ordering patterns for a variety of providers and specialties, all the data comes from a single integrated health system. In future studies, we hope to replicate this analysis with other health systems to see if these techniques work as well in them, and also hope to repeat the study using inpatient data. Further, we would like to test some of the corollary orders and order sets experimentally to see if they can improve ordering efficiency or quality and completeness of care. Finally, we would like to extend this work to other types of decision support using other data mining techniques.
Conclusions
This paper has shown the viability of using data mining techniques to develop decision support content, particularly order sets and corollary orders. Using these techniques to “learn” decision support content a posteriori has several advantages over developing the content a priori based on guidelines. First, because the content was learned from data in a clinical system, the content is already encoded in form and terminology appropriate for that clinical system. Second, the content is tailored and contextualized to match local practice patterns. However, since the content is developed without reference to guidelines or evidence, it is very important to ensure that the content matches these guidelines before implementing it. Otherwise, there is a risk of institutionalizing common but sub-optimal practice patterns.
References
- 1.Gans D, Kralewski J, Hammons T, Dowd B. Medical groups' adoption of electronic health records and information systems. Practices are encountering greater-than-expected barriers to adopting an EHR system, but the adoption rate continues to rise. Health Aff (Millwood) 2005;24(5):1323–33. doi: 10.1377/hlthaff.24.5.1323. [DOI] [PubMed] [Google Scholar]
- 2.Bates DW, Cohen M, Leape LL, Overhage JM, Shabot MM, Sheridan T. Reducing the frequency of errors in medicine using information technology. J Am Med Inform Assoc. 2001;8(4):299–308. doi: 10.1136/jamia.2001.0080299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bates DW, Pappius E, Kuperman GJ, Sittig D, Burstin H, Fairchild D, et al. Using information systems to measure and improve quality. Int J Med Inform. 1999;53(2–3):115–24. doi: 10.1016/s1386-5056(98)00152-x. [DOI] [PubMed] [Google Scholar]
- 4.Hillestad R, Bigelow J, Bower A, Girosi F, Meili R, Scoville R, et al. Can electronic medical record systems transform health care? Potential health benefits, savings, and costs. The adoption of interoperable EMR systems could produce efficiency and safety savings of $142–$371 billion. Health Aff (Millwood) 2005;24(5):1103–17. doi: 10.1377/hlthaff.24.5.1103. [DOI] [PubMed] [Google Scholar]
- 5.Kaushal R, Shojania KG, Bates DW. Effects of computerized physician order entry and clinical decision support systems on medication safety: a systematic review. Arch Intern Med. 2003;163(12):1409–16. doi: 10.1001/archinte.163.12.1409. [DOI] [PubMed] [Google Scholar]
- 6.Osheroff JA, Pifer EA, Teich JM, Sittig DF, Jenders RA. Improving outcomes with clinical decision support: an implementers' guide. Chicago: HIMSS; 2005. [Google Scholar]
- 7.Overhage JM, Tierney WM, Zhou XH, McDonald CJ. A randomized trial of "corollary orders" to prevent errors of omission. J Am Med Inform Assoc. 1997;4(5):364–75. doi: 10.1136/jamia.1997.0040364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hirsh J, Dalen J, Guyatt G. The sixth (2000) ACCP guidelines for antithrombotic therapy for prevention and treatment of thrombosis. American College of Chest Physicians. Chest. 2001;119(1 Suppl):1S–2S. doi: 10.1378/chest.119.1_suppl.1s. [DOI] [PubMed] [Google Scholar]
- 9.Bates DW, Kuperman GJ, Wang S, Gandhi T, Kittler A, Volk L, et al. Ten commandments for effective clinical decision support: making the practice of evidence-based medicine a reality. J Am Med Inform Assoc. 2003;10(6):523–30. doi: 10.1197/jamia.M1370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Shah NR, Seger AC, Seger DL, Fiskio JM, Kuperman GJ, Blumenfeld B, et al. Improving acceptance of computerized prescribing alerts in ambulatory care. J Am Med Inform Assoc. 2006;13(1):5–11. doi: 10.1197/jamia.M1868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Health Level 7. Order Sets Specification Draft Standard. Ann Arbor, MI: 2005. [Google Scholar]
- 12.Jenders RA. HL7 Clinical Decision Support Technical Committee Meeting Minutes; 29 September 2004; Fall Working Group Meeting, Atlanta. 2004. [cited March 10, 2006]. Available from: http://cslxinfmtcs.csmc.edu/hl7/arden/2004-09-ATL/cds-tc-minutes-2004-09.html. [Google Scholar]
- 13.Chin HL, Krall M. Implementation of a comprehensive computer-based patient record system in Kaiser Permanente's Northwest Region. MD Comput. 1997;14(1):41–5. [PubMed] [Google Scholar]
- 14.Mining Association Rules between Sets of Items in Large Databases. In: Buneman P, Jajodia S, editors. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 1993; Washington, DC. 1993. pp. 26–28. [Google Scholar]
- 15.Hipp J, Guntzer U, Nakhaeizadeh G. Algorithms for Association Rule Mining - A General Survey and Comparison. SIGKDD Explorations. 2000;2(1):58–64. [Google Scholar]
- 16.Goethels B. Frequent Pattern Mining Implementations. 2005. [cited 2006 Mar 10]. Available from: http://www.adrem.ua.ac.be/~goethals/software/
- 17.McGlynn EA, Asch SM, Adams J, Keesey J, Hicks J, DeCristofaro A, et al. The quality of health care delivered to adults in the United States. N Engl J Med. 2003;348(26):2635–45. doi: 10.1056/NEJMsa022615. [DOI] [PubMed] [Google Scholar]
- 18.Recommended childhood and adolescent immunization schedule: United States, 2005. Pediatrics. 2005;115(1):182. doi: 10.1542/peds.2004-2409. [DOI] [PubMed] [Google Scholar]
- 19.Andrews MD. Cryosurgery for common skin conditions. Am Fam Physician. 2004;69(10):2365–72. [PubMed] [Google Scholar]
- 20.Markman M, Kennedy A, Webster K, Peterson G, Kulp B, Belinson J. An effective and more convenient drug regimen for prophylaxis against paclitaxel-associated hypersensitivity reactions. J Cancer Res Clin Oncol. 1999;125(7):427–9. doi: 10.1007/s004320050297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Payne TH, Hoey PJ, Nichol P, Lovis C. Preparation and use of preconstructed orders, order sets, and order menus in a computerized provider order entry system. J Am Med Inform Assoc. 2003;10(4):322–9. doi: 10.1197/jamia.M1090. [DOI] [PMC free article] [PubMed] [Google Scholar]

