Using statistical anomaly detection models to find clinical decision support malfunctions

Soumi Ray; Dustin S McEvoy; Skye Aaron; Thu-Trang Hickman; Adam Wright

doi:10.1093/jamia/ocy041

. 2018 May 11;25(7):862–871. doi: 10.1093/jamia/ocy041

Using statistical anomaly detection models to find clinical decision support malfunctions

Soumi Ray ^1,^2,^✉, Dustin S McEvoy ³, Skye Aaron ¹, Thu-Trang Hickman ¹, Adam Wright ^1,^2,³

PMCID: PMC6016695 PMID: 29762678

Abstract

Objective

Malfunctions in Clinical Decision Support (CDS) systems occur due to a multitude of reasons, and often go unnoticed, leading to potentially poor outcomes. Our goal was to identify malfunctions within CDS systems.

Methods

We evaluated 6 anomaly detection models: (1) Poisson Changepoint Model, (2) Autoregressive Integrated Moving Average (ARIMA) Model, (3) Hierarchical Divisive Changepoint (HDC) Model, (4) Bayesian Changepoint Model, (5) Seasonal Hybrid Extreme Studentized Deviate (SHESD) Model, and (6) E-Divisive with Median (EDM) Model and characterized their ability to find known anomalies. We analyzed 4 CDS alerts with known malfunctions from the Longitudinal Medical Record (LMR) and Epic^® (Epic Systems Corporation, Madison, WI, USA) at Brigham and Women’s Hospital, Boston, MA. The 4 rules recommend lead testing in children, aspirin therapy in patients with coronary artery disease, pneumococcal vaccination in immunocompromised adults and thyroid testing in patients taking amiodarone.

Results

Poisson changepoint, ARIMA, HDC, Bayesian changepoint and the SHESD model were able to detect anomalies in an alert for lead screening in children and in an alert for pneumococcal conjugate vaccine in immunocompromised adults. EDM was able to detect anomalies in an alert for monitoring thyroid function in patients on amiodarone.

Conclusions

Malfunctions/anomalies occur frequently in CDS alert systems. It is important to be able to detect such anomalies promptly. Anomaly detection models are useful tools to aid such detections.

Keywords: clinical decision support, anomaly detection, alerts, time series

Introduction

Clinical Decision Support

Over the years, healthcare delivery has become increasingly complex and personalized. Rapid, concurrent technological advances have enabled the collection, assimilation, and analysis of huge amounts of patient related clinical data. The culmination of these factors along with legislation and incentives from federal government agencies has led to the evolution and widespread usage of Electronic Health Records (EHRs).¹^,² Clinical Decision Support (CDS) systems have become an integral part of EHRs. CDS provides targeted suggestions and information to healthcare providers and patients with a goal of improving the quality, safety and efficiency of care.³ Common examples of CDS include alerts, reminders, and reference content to assist members of care teams. CDS systems have been shown to be beneficial,^4–8 nonetheless, malfunctions within the CDS continue to remain a challenge.⁹^,¹⁰ We define CDS malfunctions as events where a CDS system does not function as designed or expected. For example, if an alert does not fire the way it was designed, we call it a CDS malfunction. CDS malfunctions can cause users to mistrust CDS, and can also pose risks to patient safety and quality of care.⁶^,¹¹^,¹²

We recently published a case series depicting malfunctions within CDS systems.⁹ A representative example was a malfunction in an alert suggesting patients receiving amiodarone, an anti-arrhythmic agent, obtain a TSH (Thyroid Stimulating Hormone) test to monitor their thyroid function. In this example, the code for amiodarone was changed, but the related rule was not, causing the alert to stop firing. This went unnoticed for more than two years before it was detected and eventually fixed. Errors such as these can propagate for extended periods of time without proper monitoring. Given their potential for harm, better strategies for preventing and detecting CDS anomalies is essential. We recently published an analysis of 68 cases of CDS malfunctions and identified common causes and patterns.¹³ Common causes of CDS malfunctions include design and build errors, EHR software upgrades and changes to codes (such as in the amiodarone example). Although preventing CDS malfunctions is the ultimate goal, we have found that malfunctions still frequently “slip through the cracks,” making a monitoring strategy essential, even with preventive steps in place. This prior work has laid the foundation for the current project, which aims to enable timely detection of CDS anomalies using anomaly detection.

Anomaly Detection

Anomaly detection is a branch of computer science and statistics, and refers to the problem of finding patterns in data that do not conform to an expected behavior.¹⁴ These nonconforming patterns are referred to differently in various application domains, with “anomalies” and “outliers” being the two terms most commonly used in the context of anomaly detection. Anomaly detection is also used in many applications for detecting unusual behavior, including fraud detection in the finance and insurance sector, unusual pattern detection in medical diagnosis, and fault detection in safety critical systems.^14–16

Anomalies are not synonymous with malfunctions. A malfunction is when a CDS rule does not work as intended, while an anomaly occurs when the pattern of firing deviates from its past behavior. Not all malfunctions result in anomalies, and not all anomalies represent malfunctions. Our goal in this work is to find anomalies, which may or may not represent malfunctions, but instead, are any noticeable changes that occur in a time series. Some changes might be intentional and could be introduced, for example, if the logic of a rule is intentionally changed, or if clinical practice patterns evolve. Our aim in detecting anomalies is to identify and characterize malfunctions.

In this work, we focus on two classes of anomalies arising in time: point anomalies and changepoint or breakout anomalies as shown in Figure 1. A changepoint is any unexpected variation in a time series.¹⁷^,¹⁸ Changepoint anomalies can be of 2 types: mean-shift and mean-drift. In this paper, we address all 3 types of anomalies–point, mean-shift, and mean-drift.

Figure 2 is an example of a hypothetical time series that depicts these 3 types of anomalies. Point anomaly refers to a sudden increase or drop in a time series as shown in Figure 2 (1). Mean-shift is represented by switching of a time series from one steady state to another as shown in Figure 2 (2). Mean-drift is a gradual increase or decrease of the values as shown in Figure 2 (3). Any sudden changes and the extent of changes can be detected using changepoint detection.¹⁹^,²⁰

Anomaly detection models have applications in a variety of domains. Fluctuations or unforeseen changes in the exchange rate regimes can be detected using statistical models.²¹ How a country manages its exchange rate with respect to foreign currencies is referred to as the exchange rate regime of a country. Multivariate anomaly detection methods can detect changes that involve a subset of observables.²² Bayesian changepoint models provide an estimated probability of occurrence of a changepoint at each instant in a time series. There are many existing applications of Bayesian changepoint models for detecting anomalies such as environmental studies, biological data, econometrics, and natural gas time series data.^23–26 Hauskrecht et al developed a data-driven monitoring and alerting approach to detect anomalies corresponding to unusual patient management actions in ICU patients, which could help identify medical errors.²⁷

In a conference paper, we introduced an approach for using anomaly detection methods to identify potential CDS malfunctions.²⁸ In this paper, we significantly expand on this prior work by considering additional anomaly detection algorithms and an additional alert time series. In addition, we further explore optimization of hyperparameters for the detectors and, most significantly, perform a robust evaluation of the models.

Methods

CDS can malfunction in a variety of ways. One of the most straightforward cases is when an alert stops firing entirely or exhibits a very dramatic spike – these malfunctions can be detected using simple threshold-based methods. Other malfunction patterns are subtle and simple approaches fail to find anomalies. More complex statistical models, however, may be able to detect these subtler patterns. We applied 6 different models for point and changepoint anomaly detection within CDS alert firing data: Poisson changepoint detection, ARIMA (Autoregressive Integrated Moving Average), Hierarchical Divisive Changepoint Model, Bayesian Changepoint Model, Seasonal Hybrid Extreme Studentized Deviate model (S-H-ESD),²⁹ and E-Divisive with Median (EDM).³⁰

Poisson Changepoint Model

Poisson distribution provides the probability of a number of events within a specific interval of time or space. Poisson changepoint model looks into changes in the mean and variance of a Poisson distribution. A test statistic is constructed to decide whether a changepoint has occurred. The detection of a single changepoint is posed as a hypothesis test. A likelihood ratio based approach is used to test the hypothesis.^31–34

Autoregressive Integrated Moving Average (ARIMA) Model

Autoregressive moving average (ARMA) models make predictions on stationary time series, and ARIMA is a generalization of that concept. The properties of a stationary time series are independent of the time at which they are observed. ARIMA models take into account seasonality, trends, and non-stationary aspects of a time series when making future predictions.³⁵ If a data point does not abide by the expected pattern, the ARIMA model refers to that point as being anomalous. Successful anomaly detection in time series using ARIMA models have been previously described.³⁶^,³⁷

Hierarchical Divisive Changepoint (HDC) Model

A hierarchical divisive clustering algorithm is applied to find the number of changepoints and their positions in a time series.³⁸ HDC is a non-parametric approach; hence, there are no assumptions on the underlying distribution of the time series. HDC combines a binary segmentation algorithm with a divergence measure based on Euclidean distance that can determine whether two independent random vectors are identically distributed.³⁹ Multiple changepoints are estimated by iteratively applying the binary segmentation algorithm. This procedure partitions observations into clusters by locating a single changepoint within each of the clusters.⁴⁰ The statistical significance of an estimated change point is determined using a permutation test.⁴¹ The test is used as the stopping criteria for the proposed iterative procedure for the hierarchical estimation.

Bayesian Changepoint Model

The probability of occurrence of an event after taking into account all the evidence and background information is termed as posterior probability. In the Bayesian changepoint model posterior probability of each point being a changepoint is calculated. A Markov Chain Monte Carlo implementation of Barry and Hartigan analysis⁴² estimates the posterior distributions of the changepoints. A threshold is determined for the posterior probability depending on the application domain. Any data point that has posterior probability above the threshold is referred as an anomalous point.⁴²^,⁴³

Seasonal Hybrid Extreme Studentized Deviate (SHESD) Model

The SHESD⁶ model is a statistical technique to automatically detect anomalies in time series. This model employs time series decomposition STL (Seasonal and Trend decomposition using Loess)⁴³ to detect anomalies. SHESD model is an extension of a generalized Extreme Studentized Deviate test (ESD).^44–46 In the ESD model sample mean and standard deviation are used to detect anomalies in time series. The SHESD model uses a median which is a more robust statistics that would minimize the number of false positives detected by the model.⁴⁷^,⁴⁸

E-divisive with Median (EDM) Model

The EDM model can be used for detection of changepoint anomalies in time series.³⁰ This method automatically detects changepoints in time series using a mathematical method that tests whether two probability distributions are different from each other. This test utilizes robust statistics like the median to prevent undue influence of outliers, and uses permutation tests⁴⁹^,⁵⁰ to determine whether a point is a changepoint or not, i.e., whether the probability distributions of points before and after the test point come from different distributions. EDM is a non-parametric method.

In some cases, anomaly detection models are not able to detect anomalies perfectly. False alarm results occur when models detect some dates as anomalous when they are not. The false alarm rate is the rate at which false positives are detected using the models. Sometimes the models are able to detect anomalies but with some delay, known as detection delay.

Alerts/rules with known malfunctions from the reminders in the Longitudinal Medical Record (LMR) and Epic at Brigham and Women’s Hospital were used in this study. For each rule, our goal was to use anomaly detection models to detect the start date of an anomaly. We chose four time series which contained known malfunctions that manifested anomalies of varying degrees of subtlety and patterns. Because the resolution of these malfunctions often cause a large change in alert firing (as the alert returns to its intended behavior), the anomaly detection models tend to detect the end of malfunctions in addition to the start. Hence, for the 4 rules we shall present both the start and the end dates of anomaly detection. In some cases, there might not be an end date, for example, when there is an intended change in the rule logic.

One key question for time series analysis is the time unit to be analyzed. For example, a CDS malfunction detector might look at the number of firings of a rule per hour, per day, per week or per month. There are pros and cons of each method – shorter time windows may have more variation, making detection more difficult. Longer time windows provide a form of averaging, but can introduce delays, since an anomaly cannot be detected until at least one time window has passed. In this paper, we chose one day as the unit of analysis, primarily because our data warehouse updates on a daily basis, so we get one day’s worth of data at a time. This also allows us to flag anomalies and intervene on a near real-time basis. Following are descriptions of the rules we used to demonstrate the application of the models.

Rule 1: Lead Screenings for 2-year-olds

Regular lead testing is mandatory for children in Massachusetts under the Massachusetts Lead Law. This rule identifies children with ages between 23 and 29 months for whom no blood lead test result was available within the prior 6 months and suggests lead test screening. This alert abruptly stopped firing in June 2009 and resumed firing in October 2011. Two additional clauses were added to the lead screening test alert rule, by accident, which caused the alert to stop firing. These clauses were removed a couple of years later, which caused the alert to resume firing.⁷

Rule 2: Aspirin for Patients with CAD

This rule suggests prescribing aspirin to patients with coronary artery disease (CAD) who were not currently on aspirin. There was a sudden spike in the alert count between May and June 2012 in the alert firing time series. This was due to a malfunction in the drug classification service which led to the firing of this alert for all patients with CAD irrespective of their aspirin use. This anomaly was later identified, and the malfunction was rectified.

Rule 3: PCV Vaccine for Immune-compromised Adults

This rule identifies immunocompromised adults <65 years old who did not receive Pneumococcal Conjugate Vaccine (PCV) according to existing guidelines and suggests that PCV be ordered for these adults. There was a sudden mean-shift in the alert firing rate around July 2015. The rule was changed so that a PCV alert would be fired for patients who were already vaccinated. This led to a sudden increase in the number of alerts fired.

Rule 4: Monitoring Thyroid Function in Patients Receiving Amiodarone

This rule identifies patients on amiodarone who have not had their Thyroid Stimulating Hormone (TSH) level checked in the past one-year and suggests a TSH test. This alert abruptly stopped firing for patients who met the criteria. We found that an internal ID number for the drug amiodarone had been changed, but the rule was not updated. This caused the alert to stop firing for patients newly started on the drug. In February 2013 this anomaly was detected. The new code was included by updating the logic of the rule and the rule resumed firing as expected.

Results

In all the following figures the x-axis plots the date of alert firing and the y-axis plots the alert firing counts corresponding to each timestamp. The blue triangle represents the actual change/anomaly dates, and the red dotted line presents the change/anomaly dates detected by the anomaly detection models. A detailed description of the implementation is presented in the Supplementary Appendix.

Rule 1: Lead Screenings for 2-year-olds

Figure 3 shows the anomalies detected, applying the 6 anomaly detection models explained in the previous section, for alerts suggesting lead screening for 2-year old children. Poisson changepoint, HDC, Bayesian changepoint, and the EDM models could successfully detect the approximate start and end dates of this anomaly. There were 3 false alarms using EDM. The ARIMA model was unable to detect any anomaly. The SHESD model was unsuccessful in detecting the start and end date of the anomaly. There were 3 false alarms using SHESD.

Rule 2: Aspirin for Patients with CAD

Figure 4 shows the anomalies detected by the 6 anomaly detection methods when applied to Rule 2. Poisson changepoint and Bayesian changepoint models were able to detect the start date of the anomaly with a detection delay of 1 day. ARIMA, HDC and EDM models detected the start date of this anomaly with a detection delay of 2 days. SHESD model detected an anomaly with a detection delay of 10 days. Poisson changepoint model could detect the actual end date of the anomaly. HDC model, however, detected the end date with a detection delay of 12 days. ARIMA detected the end date 4 days earlier and Bayesian changepoint model detected the end date 1 day earlier than the actual end date. No anomalies were detected by SHESD and EDM models associated with the end date. The EDM model detected 3 false positives. We labeled these as false positives since no anomalies were detected during those time frames. ARIMA and Bayesian models detected anomalies on all the days between the start and the end date of the point anomaly.

Rule 3: PCV Vaccine for Immune-compromised Adults

Figure 5 shows the mean-shift anomaly detected for Rule 3. Poisson changepoint, ARIMA, HDC, Bayesian changepoint, and EDM models could successfully detect the anomaly date in the rule. The EDM model also detected two other anomaly dates which were false positives. The SHESD model was not successful in detecting the mean-shift in this case.

Rule 4: Monitoring Thyroid Function in Patients Receiving Amiodarone

Figure 6 shows the mean-drift anomaly detection when applied to Rule 4. The SHESD model was not successful in detecting the start or end dates. There was almost a one-year delay in detecting this anomaly using the Poisson changepoint and HDC models. ARIMA and Bayesian changepoint models could not detect the start date. The EDM model could detect the start date of the anomaly with a detection delay of just 5 days. HDC and EDM models were also successful in detecting the end date of the anomaly with a detection delay of 8 days. The Poisson changepoint model could detect the end date with a detection delay of 7 days. The EDM model also detected 19 more anomalous dates, which were false alarms, but it was the only model that could successfully detect the approximate start date of the mean-drift anomaly. Given the high number of false positives for EDM, it is not entirely clear that EDM was actually detecting the start and end of the malfunction. However, as a sensitivity analysis, we did explore varying the EDM window size and found that those anomalies persisted even as many of the false alerts were removed, suggesting that it may be detecting some structural change. For this rule, it was easier to detect the date the anomaly was fixed but not the start date.

Table 1 summarizes the results of application of the 6 models on the 4 rules from BWH.

Table 1.

Timeline of Anomaly Detection using Six Anomaly Detection Models in Four Rules from BWH

Rules/Types of Models	Rule 1 (Lead screening)	Rule 2 (Aspirin)	Rule 3 (PCV)	Rule 4 (TSH)
Known Anomaly Dates	Start (S): 06/13/09	S: 05/19/12	S: 07/22/15	S: 11/19/10
Known Anomaly Dates	End (E): 10/11/11	E: 06/08/12	No end date	E: 02/24/13
Average (std dev) number of rule firings, per weekday, during baseline period	245 (52.4)	1290 (140)	2900 (255)	40 (6)
Average (std dev) number of rule firings, per weekend day, during baseline period	47 (11.6)	162 (26)	1334 (90)	6 (2.5)
Poisson Changepoint	S: 1 day early	S: 1 day delay	S: Exact day	S: 1 year delay
Poisson Changepoint	E: Exact day	E: Exact day	S: Exact day	E: 7 day delay
ARIMA(Autoregressive Integrated Moving Average)	S: Did not detect	S: 2 day delay	S: 1 day delay	S: Did not detect
ARIMA(Autoregressive Integrated Moving Average)	E: Did not detect	E: 4 day early	S: 1 day delay	E: Did not detect
HDC (Hierarchical Divisive Changepoint)	S: Exact day	S: 2 day delay	S: Exact day	S: 1 year delay
HDC (Hierarchical Divisive Changepoint)	E: 1 day delay	E: 12 day delay	S: Exact day	E: 8 day delay
Bayesian Changepoint	S: 1 day early	S: 1 day delay	S: Exact day	S: Did not detect
Bayesian Changepoint	E: Exact day	E: 1 day early	S: Exact day	E: Did not detect
SHESD (Seasonal Hybrid Extreme Studentized Deviate)	S: Did not detect	S: 10 day delay	S: Did not detect	S: Did not detect
SHESD (Seasonal Hybrid Extreme Studentized Deviate)	E: Did not detect	E: Did not detect	S: Did not detect	E: Did not detect
EDM (E-Divisive with Median)	S: 2 day early	S: 1 day delay	S: 1 day delay	S: 5 day delay
EDM (E-Divisive with Median)	E: Did not detect	E: 7 day delay	S: 1 day delay	E: 8 day delay

Open in a new tab

One notable finding is that some of the models appeared to detect events before they happened. This occurs because the models use a sliding window on the alert firing time series for analysis, and data from the future can influence the detection of anomalous dates. The occurrence of this kind of leakage can lead to one or two day early detection in retrospective observations as seen in the results. This mode of anomaly detection is referred to as offline anomaly detection because it is applied retrospectively to a complete time series. By contrast, online anomaly detection is real-time monitoring that detects whether data from the current date is anomalous or not. Since future data is not available in online detection, this leakage is not possible, but there is a greater risk of false positives (if the model flags a modest temporary increase that returns to baseline) or detection delay (if the model does not alert for several days after an anomaly starts).

Discussion

Key Findings

Our work shows that anomalies in CDS alert firing time series can be successfully detected using 6 anomaly detection models with different degrees of accuracy and complementary strengths. Overall, the Poisson changepoint, HDC, and Bayesian changepoint models perform well in detecting both point anomalies and mean-shift anomalies. They were less successful in detecting mean-drift anomalies, taking nearly a year after the true anomaly start date to detect the anomaly. The ARIMA model was able to detect point anomalies and mean-shift anomalies with delay. It was not successful in detecting the mean-drift anomaly or in detecting when an alert was turned off. The SHESD model performs the best in detecting point anomalies but does not perform well in detecting mean-shift or mean drift anomalies. The EDM model performs very well detecting mean-shift and mean-drift anomalies but has many false positives. Mean-drift anomalies are subtle, and, hence, detecting this pattern is a challenge. Rule 4 (Monitoring thyroid function in patients receiving amiodarone) is an example of mean-drift anomaly.

Strengths and Weaknesses of the Six Anomaly Detection Models

Poisson changepoint models are tailored towards detecting times at which the mechanism generating a process changes. The Poisson changepoint model is a suitable candidate model for us since our response is the count of alerts fired per day. One disadvantage of this model is that it is not suitable for detecting mean-drift changepoint anomalies.

ARIMA is a simple method by design but is quite powerful for forecasting signals and finding anomalies in it. It is based on the approach that values from the past can be used to forecast the next point in the time series with the addition of white noise. Though ARIMA could handle some seasonality, some alert firing time series have strong seasonal trends (due to weekends, holidays and in certain cases alerts with seasonal activation such as for influenza vaccination); hence, ARIMA could not find anomalies in some alert rules.

HDC is a non-parametric approach, which does not make any distributional assumptions. Estimation is performed in a manner that simultaneously identifies both the number and locations of changepoints. This model can be sensitive to noise. A limitation of this model is late detection of the anomalous point in some cases.

The Bayesian changepoint model provides a probability distribution of each point in the time series to be a changepoint. Since it provides a probability distribution for the likelihood that each point is indeed a changepoint, it can give a coherent decision on the number of changepoints, their locations, and the degree of confidence in the estimates. The changepoint call is dependent on the probability threshold chosen, and a non-optimal choice can result in increased false positives or false negatives. This threshold can be optimized either using ROC methods or using cross-validation.

The SHESD model employs time series decomposition and robust statistics for detecting anomalies. This model takes seasonality and trend of the time series into account in its changepoint estimation. SHESD model is suitable for detecting point anomalies, but it is not effective in detecting changepoint anomalies.

The EDM model is non-parametric; hence, it can be used to detect anomalies in time series that do not follow the commonly assumed normal distribution. It also uses robust statistics for anomaly detection. This model can successfully detect both mean-shift and mean-drift anomalies as soon as they start. It was less effective in detecting point anomalies. One drawback of this model is that the false alarm rate is higher compared to the other models.

Observations

Table 2 lists our observation summary.

Table 2.

Observations of Models Suitable for Finding Different Types of Anomalies

Models	Assumptions	Potential Applications
Poisson Changepoint	Poisson distribution	Identifies point and mean-shift anomalies
ARIMA(Autoregressive Integrated Moving Average)	Normal distribution	Identifies point and mean-shift anomalies
HDC (Hierarchical Divisive Changepoint)	Non-parametric	Identifies mean-shift anomalies
Bayesian Changepoint	Normal distribution	Identifies point and mean-shift anomalies
SHESD (Seasonal Hybrid Extreme Studentized Deviate)	Normal distribution	Identifies point anomalies
EDM (E-Divisive with Median)	Non-parametric	Identifies mean-shift and mean-drift anomalies

Open in a new tab

These anomaly detection models, if used appropriately, can identify potential problems with CDS systems. These models can be used to find current malfunctions or malfunctions that happened in the past. Retrospective detection of malfunctions can assist in root cause analysis, the results of which could then be used to make the CDS systems more robust. Once an anomaly is detected by one of the models at a level of certainty exceeding a pre-defined threshold, an alert can be provided to CDS staff who then investigate the anomaly to determine its root cause and, in particular, whether the system is malfunctioning. If so, the malfunction can be remediated. Each institution might need to define the threshold according to their own specifications.

“Detection delay” is one of the key performance measures typically used in the anomaly detection literature. There can be delays for several reasons, including insensitivity of the detector or structural issues with the data. The amiodarone use case (rule 4) which we cover is particularly challenging because there is a structural one-year delay from when the amiodarone code was changed to when the first alert was missed (since the alert occurs only after the patient has been on the drug for a year). Further, since patients already on amiodarone retained the old code, there was a mixture of properly functioning and malfunctioning applications of the rule – the proportion of the mixture shifted over time, making the appearance of the malfunction rather subtle and gradual, which is why so many of the detectors had a very long detection delay. This shows that for these more-subtle mean-drift anomalies, more sophisticated statistical approaches are needed.

We suggest the application of a subset of these 6 models on all the rules in a system to be able to detect the different types of anomalies present in each rule. We would not recommend using only one model for anomaly detection since one model cannot detect the different types of anomalies that can be expected to be present in the system.

Limitations

A limitation of all these statistical models is the optimization of the hyperparameters for different rules. Poisson changepoint, ARIMA, HDC, and Bayesian changepoint models are quite generalizable in different settings. SHESD and EDM models on the other hand require hyperparameter optimization as discussed in the Supplementary Appendix. Optimization might become feasible once there is supervision of these models for some time. In the future, we plan to devise a feedback loop from the backend engineers for objectively optimizing the hyperparameters and thresholds.

Also, the Poisson changepoint, HDC, Bayesian changepoint and EDM models do not take into account the seasonality aspect of time series. This might potentially introduce some false positives into the detection.

In this study, we have presented an offline approach for anomaly detection. The main limitation of offline study is that anomaly detection might be better than an online version where future data is not available. However, an offline study affords the flexibility to analyze the complete time series and not just a small window of data as in the case of an online study. This is important because it allows us to understand and describe the types of anomalies that are occurring, with greater fidelity.

Future Work

Our work on detection of anomalies is exploratory and there are many opportunities for future work in this area. We plan to apply these anomaly detection models on data from different sites and EHR systems to validate the effectiveness of the models. Incorporating human feedback on the anomalies found could be used to improve the accuracy of the detection. The performance of these models could be explored while controlling for attributes like patient volume, holidays, etc. The current work focuses on finding anomalies retrospectively and analyzing and fixing their causes. With the benefit of our understanding of the type and nature of the anomalies that we have achieved with retrospective offline studies, we are now working toward implementing an online alerting system that notifies knowledge management staff when a new anomaly is detected in CDS firing data. We also propose further validation and study of these models on additional time series in the future so that recommendations can be made with a strong foundation of evidence. Exploring the application of anomaly detection models for other clinical applications such as physiologic monitoring and EHR interface problems are also something we would like to explore further.

Conclusion

Anomalies within the CDS system are a common problem and often go undetected for years, leading to poor performance outcomes. Anomaly detection models can be successfully used to find the occurrence or presence of different types of anomalies in CDS alert data. Once anomalies are detected they can be used for finding their causes and fixed. This can improve the overall performance of the CDS system.

Funding

This work was supported by National Library of Medicine of the National Institutes of Health grant number R01LM011966. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Competing interests

None.

Contributors

The contributions of the authors are: Ray and Wright: conception and design; acquisition, analysis, and interpretation of data; drafting of the manuscript; statistical analysis; supervision.

McEvoy, Aaron, Hickman: acquisition, analysis, and interpretation of data; critical revision of the manuscript for important intellectual content.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

Supplementary Material

Supplementary Data

Click here for additional data file.^{(22.6KB, docx)}

References

1. Blumenthal D, Tavenner M.. The “meaningful use” regulation for electronic health records. N Engl J Med 2010; 3636: 501–4. [DOI] [PubMed] [Google Scholar]
2. Coorevits P, Sundgren M, Klein GO et al. , . Electronic health records: new opportunities for clinical research. J Intern Med 2013; 2746: 547–60. [DOI] [PubMed] [Google Scholar]
3. Osheroff JA, Teich JM, Middleton B et al. , . A roadmap for national action on clinical decision support. J Am Med Inform Assoc 2007; 142: 141–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Zuccotti G, Maloney FL, Feblowitz J et al. , . Reducing risk with clinical decision support. Appl Clin Inform 2014; 0503: 746–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Kawamoto K, Houlihan CA, Balas EA, Lobach DF.. Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success. BMJ 2005; 3307494: 765–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Shackelton RJ, Marceau LD, Link CL, McKinlay JB.. The intended and unintended consequences of clinical guidelines. J Eval Clin Pract 2009; 156: 1035–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Bright TJ, Wong A, Dhurjati R et al. , . Effect of clinical decision-support systems: a systematic review. Ann Intern Med 2012; 1571: 29–43. [DOI] [PubMed] [Google Scholar]
8. Garg AX, Adhikari NK, McDonald H et al. , . Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review . JAMA 2005; 29310: 1223–38. [DOI] [PubMed] [Google Scholar]
9. Wright A, Hickman TT, McEvoy D et al. , . Analysis of clinical decision support system malfunctions: a case series and survey. J Am Med Inform Assoc 2016; 236: 1068–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Kassakian SZ, Yackel TR, Gorman PN et al. , . Clinical decisions support malfunctions in a commercial electronic health record. Appl Clin Inform 2017; 0803: 910–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Landman AB, Takhar SS, Wang SL et al. , . The hazard of software updates to clinical workstations: a natural experiment. J Am Med Inform Assoc 2013; 20 (e1): e187–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. McCoy AB, Waitman LR, Lewis JB et al. , . A framework for evaluating the appropriateness of clinical decision support alerts and responses. J Am Med Inform Assoc 2012; 193: 346–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Wright A, Ai A, Ash J et al. , . Clinical decision support alert malfunctions: analysis and empirically derived taxonomy. J Am Med Inform Assoc 2017; doi: 10.1093/jamia/ocx106. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Chandola V, Banerjee A, Kumar V.. Anomaly detection. ACM Comput Surv 2009; 413: 1–58. [Google Scholar]
15. Aggarwal CC. Outlier Analysis. New York, NY: Springer New York; 2013. [Google Scholar]
16. Hodge V, Austin J.. A survey of outlier detection methodologies. Artif Intell Rev 222: 85–126. [Google Scholar]
17. Sen A, Srivastava MS.. On tests for detecting change in mean. Ann Stat 1975; 31: 98–108. [Google Scholar]
18. Hinkley DV, Hinkley EA.. Inference about the change-point in a sequence of random variables. Biometrika 1970; 573: 477–88. [Google Scholar]
19. Gupta M, Gao J, Aggarwal CC, Han J.. Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 2014; 269: 2250–67. [Google Scholar]
20. Pettitt AN. A non-parametric approach to the change-point problem. Appl Stat 1979; 282: 126–35. [Google Scholar]
21. Zeileis A, Shah A, Patnaik I.. Testing, monitoring, and dating structural changes in exchange rate regimes. Comput Stat Data Anal 2010; 546: 1696–706. [Google Scholar]
22. Fan Z, Dror RO, Mildorf TJ, Piana S, Shaw DE.. Identifying localized changes in large systems: change-point detection for biomolecular simulations. Proc Natl Acad Sci USA 2015; 11224: 7454–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Adams RP, MacKay DJC.. Bayesian online changepoint detection. Technical report, University of Cambridge, Cambridge, UK, 2007. arXiv:0710.3742v1. [Google Scholar]
24. Western B, Kleykamp M.. A Bayesian change point model for historical time series analysis. Polit Anal 2004; 1204: 354–75. [Google Scholar]
25. Hill DJ, Minsker BS, Amir E.. Real-time Bayesian anomaly detection in streaming environmental data. Water Resour Res 2016; 464: W00D28. [Google Scholar]
26. Akouemo HN, Povinelli RJ.. Probabilistic anomaly detection in natural gas time series data. Int J Forecast 2016; 323: 948–56. [Google Scholar]
27. Hauskrecht M, Batal I, Hong C et al. , . Outlier-based detection of unusual patient-management actions: an ICU study. J Biomed Inform 2016; 64: 211–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Ray S, Wright A.. Detecting Anomalies in Alert Firing within Clinical Decision Support Systems Using Anomaly/Outlier Detection Techniques. Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics USA 2016; 185–90. [Google Scholar]
29. Vallis O, Hochenbaum J, Kejariwal A. A novel technique for long-term anomaly detection in the cloud. 17–18 June 2014 6th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud) Berkeley, CA, USA; 2014.
30. James NA, Kejariwal A, Matteson DS.. Leveraging cloud data to mitigate user experience from “breaking bad.” IEEE International Conference on Big Data(Big Data) USA 2016: 3499–3508. [Google Scholar]
31. Jen T, Gupta AK.. On testing homogeneity of variances for gaussian models. J Stat Comput Simul 1987; 272: 155–73. [Google Scholar]
32. Scott AJ, Knott M.. A cluster analysis method for grouping means in the analysis of variance. Biometrics 1974; 303: 507–12. [Google Scholar]
33. Killick R, Eckley IA.. changepoint: an R package for changepoint analysis. J Stat Soft 2014; 583: 1–19. [Google Scholar]
34. Chen J, Gupta AK.. Parametric Statistical Change Point Analysis. Birkhäuser Basel, Switzerland; 2000. [Google Scholar]
35. López-de-Lacalle J. tsoutliers R Package for Detection of Outliers in Time Series. May 27, 2017. http://www.jalobe.com/doc/tsoutliers.pdf
36. Pena EM, de Assis M, Proença ML. Anomaly detection using forecasting methods arima and hwds. In: 32nd International Conference of the Chilean Computer Science Society, IEEE Washington, DC, USA; 11–15 Nov, 2013: 63–6.
37. Chen C, Liu L-M.. Joint estimation of model parameters and outlier effects in time series. J Am Stat Assoc 1993; 88421: 284–97. [Google Scholar]
38. Matteson DS, James NA.. ecp: an r package for nonparametric multiple change point analysis of multivariate data. J. Stat. Softw 2015; 627: 1–25. [Google Scholar]
39. Szekely GJ, Rizzo ML.. Hierarchical clustering via joint between-within distances: extending ward’s minimum variance method. J Classif 2005; 222: 151–83. [Google Scholar]
40. Vostrikova LJ. Detecting disorder in multidimensional random process. Soviet Math Dokl 1981; 24: 55–9. [Google Scholar]
41. Gandy A. Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk. J Am Stati Assoc 2009; 104488: 1504–11. [Google Scholar]
42. Erdman C, Emerson JW.. bcp: an RPackage for performing a Bayesian analysis of change point problems. J Stat Soft 2007; 233: 1–13. [Google Scholar]
43. Barry D, Hartigan JA.. A Bayesian analysis for change point problems. J Am Stat Assoc 1993; 88421: 309–19. [Google Scholar]
44. Cleveland RB, Cleveland WS, McRae JE.. STL: a seasonal-trend decomposition procedure based on loess. J Off Stat 1990; 6: 3–73. [Google Scholar]
45. Rosner B. Percentage points for a generalized ESD many-outlier procedure. Technometrics 1983; 252: 165–72. [Google Scholar]
46. Rosner B. On the detection of many outliers. Technometrics 1975; 172: 221–7. [Google Scholar]
47. Donoho DL, Huber PJ.. The Notion of Breakdown Point. A Festschrift for Erich L. Lehmann. Florida, USA: CRC Press; 1983: 157–84. [Google Scholar]
48. Ruppert D. Robust statistics: the approach based on influence functions. Technometrics 1987; 292: 240–1. [Google Scholar]
49. Hampel FR. The influence curve and its role in robust estimation. J Am Stat Assoc 1974; 69346: 383–93. [Google Scholar]
50. Good P. Permutation, Parametric and Bootstrap Tests of Hypotheses. New York: Springer-Verlag; 2005. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Click here for additional data file.^{(22.6KB, docx)}

[ocy041-B1] 1. Blumenthal D, Tavenner M.. The “meaningful use” regulation for electronic health records. N Engl J Med 2010; 3636: 501–4. [DOI] [PubMed] [Google Scholar]

[ocy041-B2] 2. Coorevits P, Sundgren M, Klein GO et al. , . Electronic health records: new opportunities for clinical research. J Intern Med 2013; 2746: 547–60. [DOI] [PubMed] [Google Scholar]

[ocy041-B3] 3. Osheroff JA, Teich JM, Middleton B et al. , . A roadmap for national action on clinical decision support. J Am Med Inform Assoc 2007; 142: 141–5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocy041-B4] 4. Zuccotti G, Maloney FL, Feblowitz J et al. , . Reducing risk with clinical decision support. Appl Clin Inform 2014; 0503: 746–56. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocy041-B5] 5. Kawamoto K, Houlihan CA, Balas EA, Lobach DF.. Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success. BMJ 2005; 3307494: 765–73. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocy041-B6] 6. Shackelton RJ, Marceau LD, Link CL, McKinlay JB.. The intended and unintended consequences of clinical guidelines. J Eval Clin Pract 2009; 156: 1035–42. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocy041-B7] 7. Bright TJ, Wong A, Dhurjati R et al. , . Effect of clinical decision-support systems: a systematic review. Ann Intern Med 2012; 1571: 29–43. [DOI] [PubMed] [Google Scholar]

[ocy041-B8] 8. Garg AX, Adhikari NK, McDonald H et al. , . Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review . JAMA 2005; 29310: 1223–38. [DOI] [PubMed] [Google Scholar]

[ocy041-B9] 9. Wright A, Hickman TT, McEvoy D et al. , . Analysis of clinical decision support system malfunctions: a case series and survey. J Am Med Inform Assoc 2016; 236: 1068–76. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocy041-B10] 10. Kassakian SZ, Yackel TR, Gorman PN et al. , . Clinical decisions support malfunctions in a commercial electronic health record. Appl Clin Inform 2017; 0803: 910–23. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocy041-B11] 11. Landman AB, Takhar SS, Wang SL et al. , . The hazard of software updates to clinical workstations: a natural experiment. J Am Med Inform Assoc 2013; 20 (e1): e187–90. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocy041-B12] 12. McCoy AB, Waitman LR, Lewis JB et al. , . A framework for evaluating the appropriateness of clinical decision support alerts and responses. J Am Med Inform Assoc 2012; 193: 346–52. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocy041-B13] 13. Wright A, Ai A, Ash J et al. , . Clinical decision support alert malfunctions: analysis and empirically derived taxonomy. J Am Med Inform Assoc 2017; doi: 10.1093/jamia/ocx106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocy041-B14] 14. Chandola V, Banerjee A, Kumar V.. Anomaly detection. ACM Comput Surv 2009; 413: 1–58. [Google Scholar]

[ocy041-B15] 15. Aggarwal CC. Outlier Analysis. New York, NY: Springer New York; 2013. [Google Scholar]

[ocy041-B16] 16. Hodge V, Austin J.. A survey of outlier detection methodologies. Artif Intell Rev 222: 85–126. [Google Scholar]

[ocy041-B17] 17. Sen A, Srivastava MS.. On tests for detecting change in mean. Ann Stat 1975; 31: 98–108. [Google Scholar]

[ocy041-B18] 18. Hinkley DV, Hinkley EA.. Inference about the change-point in a sequence of random variables. Biometrika 1970; 573: 477–88. [Google Scholar]

[ocy041-B19] 19. Gupta M, Gao J, Aggarwal CC, Han J.. Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 2014; 269: 2250–67. [Google Scholar]

[ocy041-B20] 20. Pettitt AN. A non-parametric approach to the change-point problem. Appl Stat 1979; 282: 126–35. [Google Scholar]

[ocy041-B21] 21. Zeileis A, Shah A, Patnaik I.. Testing, monitoring, and dating structural changes in exchange rate regimes. Comput Stat Data Anal 2010; 546: 1696–706. [Google Scholar]

[ocy041-B22] 22. Fan Z, Dror RO, Mildorf TJ, Piana S, Shaw DE.. Identifying localized changes in large systems: change-point detection for biomolecular simulations. Proc Natl Acad Sci USA 2015; 11224: 7454–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocy041-B23] 23. Adams RP, MacKay DJC.. Bayesian online changepoint detection. Technical report, University of Cambridge, Cambridge, UK, 2007. arXiv:0710.3742v1. [Google Scholar]

[ocy041-B24] 24. Western B, Kleykamp M.. A Bayesian change point model for historical time series analysis. Polit Anal 2004; 1204: 354–75. [Google Scholar]

[ocy041-B25] 25. Hill DJ, Minsker BS, Amir E.. Real-time Bayesian anomaly detection in streaming environmental data. Water Resour Res 2016; 464: W00D28. [Google Scholar]

[ocy041-B26] 26. Akouemo HN, Povinelli RJ.. Probabilistic anomaly detection in natural gas time series data. Int J Forecast 2016; 323: 948–56. [Google Scholar]

[ocy041-B27] 27. Hauskrecht M, Batal I, Hong C et al. , . Outlier-based detection of unusual patient-management actions: an ICU study. J Biomed Inform 2016; 64: 211–21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocy041-B28] 28. Ray S, Wright A.. Detecting Anomalies in Alert Firing within Clinical Decision Support Systems Using Anomaly/Outlier Detection Techniques. Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics USA 2016; 185–90. [Google Scholar]

[ocy041-B29] 29. Vallis O, Hochenbaum J, Kejariwal A. A novel technique for long-term anomaly detection in the cloud. 17–18 June 2014 6th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud) Berkeley, CA, USA; 2014.

[ocy041-B30] 30. James NA, Kejariwal A, Matteson DS.. Leveraging cloud data to mitigate user experience from “breaking bad.” IEEE International Conference on Big Data(Big Data) USA 2016: 3499–3508. [Google Scholar]

[ocy041-B31] 31. Jen T, Gupta AK.. On testing homogeneity of variances for gaussian models. J Stat Comput Simul 1987; 272: 155–73. [Google Scholar]

[ocy041-B32] 32. Scott AJ, Knott M.. A cluster analysis method for grouping means in the analysis of variance. Biometrics 1974; 303: 507–12. [Google Scholar]

[ocy041-B33] 33. Killick R, Eckley IA.. changepoint: an R package for changepoint analysis. J Stat Soft 2014; 583: 1–19. [Google Scholar]

[ocy041-B34] 34. Chen J, Gupta AK.. Parametric Statistical Change Point Analysis. Birkhäuser Basel, Switzerland; 2000. [Google Scholar]

[ocy041-B35] 35. López-de-Lacalle J. tsoutliers R Package for Detection of Outliers in Time Series. May 27, 2017. http://www.jalobe.com/doc/tsoutliers.pdf

[ocy041-B36] 36. Pena EM, de Assis M, Proença ML. Anomaly detection using forecasting methods arima and hwds. In: 32nd International Conference of the Chilean Computer Science Society, IEEE Washington, DC, USA; 11–15 Nov, 2013: 63–6.

[ocy041-B37] 37. Chen C, Liu L-M.. Joint estimation of model parameters and outlier effects in time series. J Am Stat Assoc 1993; 88421: 284–97. [Google Scholar]

[ocy041-B38] 38. Matteson DS, James NA.. ecp: an r package for nonparametric multiple change point analysis of multivariate data. J. Stat. Softw 2015; 627: 1–25. [Google Scholar]

[ocy041-B39] 39. Szekely GJ, Rizzo ML.. Hierarchical clustering via joint between-within distances: extending ward’s minimum variance method. J Classif 2005; 222: 151–83. [Google Scholar]

[ocy041-B40] 40. Vostrikova LJ. Detecting disorder in multidimensional random process. Soviet Math Dokl 1981; 24: 55–9. [Google Scholar]

[ocy041-B41] 41. Gandy A. Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk. J Am Stati Assoc 2009; 104488: 1504–11. [Google Scholar]

[ocy041-B42] 42. Erdman C, Emerson JW.. bcp: an RPackage for performing a Bayesian analysis of change point problems. J Stat Soft 2007; 233: 1–13. [Google Scholar]

[ocy041-B43] 43. Barry D, Hartigan JA.. A Bayesian analysis for change point problems. J Am Stat Assoc 1993; 88421: 309–19. [Google Scholar]

[ocy041-B44] 44. Cleveland RB, Cleveland WS, McRae JE.. STL: a seasonal-trend decomposition procedure based on loess. J Off Stat 1990; 6: 3–73. [Google Scholar]

[ocy041-B45] 45. Rosner B. Percentage points for a generalized ESD many-outlier procedure. Technometrics 1983; 252: 165–72. [Google Scholar]

[ocy041-B46] 46. Rosner B. On the detection of many outliers. Technometrics 1975; 172: 221–7. [Google Scholar]

[ocy041-B47] 47. Donoho DL, Huber PJ.. The Notion of Breakdown Point. A Festschrift for Erich L. Lehmann. Florida, USA: CRC Press; 1983: 157–84. [Google Scholar]

[ocy041-B48] 48. Ruppert D. Robust statistics: the approach based on influence functions. Technometrics 1987; 292: 240–1. [Google Scholar]

[ocy041-B49] 49. Hampel FR. The influence curve and its role in robust estimation. J Am Stat Assoc 1974; 69346: 383–93. [Google Scholar]

[ocy041-B50] 50. Good P. Permutation, Parametric and Bootstrap Tests of Hypotheses. New York: Springer-Verlag; 2005. [Google Scholar]

PERMALINK

Using statistical anomaly detection models to find clinical decision support malfunctions

Soumi Ray

Dustin S McEvoy

Skye Aaron

Thu-Trang Hickman

Adam Wright

Abstract

Objective

Methods

Results

Conclusions

Introduction

Clinical Decision Support

Anomaly Detection

Figure 1.

Figure 2.

Methods

Poisson Changepoint Model

Autoregressive Integrated Moving Average (ARIMA) Model

Hierarchical Divisive Changepoint (HDC) Model

Bayesian Changepoint Model

Seasonal Hybrid Extreme Studentized Deviate (SHESD) Model

E-divisive with Median (EDM) Model

Rule 1: Lead Screenings for 2-year-olds

Rule 2: Aspirin for Patients with CAD

Rule 3: PCV Vaccine for Immune-compromised Adults

Rule 4: Monitoring Thyroid Function in Patients Receiving Amiodarone

Results

Rule 1: Lead Screenings for 2-year-olds

Figure 3.

Rule 2: Aspirin for Patients with CAD

Figure 4.

Rule 3: PCV Vaccine for Immune-compromised Adults

Figure 5.

Rule 4: Monitoring Thyroid Function in Patients Receiving Amiodarone

Figure 6.

Table 1.

Discussion

Key Findings

Strengths and Weaknesses of the Six Anomaly Detection Models

Observations

Table 2.

Limitations

Future Work

Conclusion

Funding

Competing interests

Contributors

SUPPLEMENTARY MATERIAL

Supplementary Material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases