Abstract
This article presents an, algorithm for performing early detection of disease outbreaks by searching a database of emergency department cases for anomalous patterns. Traditional techniques for anomaly detection are unsatisfactory for this problem because they identify individual data points that are rare due to particular combinations of features. Thus, these traditional algorithms discover isolated outliers of particularly strange events, such as someone accidentally shooting their ear, that are not indicative of a new outbreak. Instead, we would like to detect groups with specific characteristics that have a recent pattern of illness that is anomalous relative to historical patterns. We propose using an anomaly detection algorithm that would characterize each anomalous pattern with a rule. The significance of each rule would be carefully evaluated using the Fisher exact test and a randomization test. In this study, we compared our algorithm with a standard detection algorithm by measuring the number of false positives and the timeliness of detection. Simulated data, produced by a simulator that creates the effects of an epidemic on a city, were used for evaluation. The results indicate that our algorithm has significantly better detection times for common significance thresholds while having a slightly higher false positive rate.
Keywords: Anomaly detection, Data mining, Detection algorithm, Multiple hypothesis testing, Syndromic surveillance
Full Text
The Full Text of this article is available as a PDF (135.6 KB).
References
- 1.Wong W, Moore AW, Cooper G, Wagner M. Rule-based anomaly pattern detection for detecting disease outbreaks. In: Ford K, editor. Proceedings of the 18th National Conference on Artificial Intelligence (AAA1-02) Cambridge, MA: MIT Press; 2002. pp. 217–223. [Google Scholar]
- 2.Lane T, Brodley CE. Temporal sequence learning and data reduction for anomaly detection. ACM Trans Inform Syst Security. 1999;2:295–331. doi: 10.1145/322510.322526. [DOI] [Google Scholar]
- 3.Eskin E. Anomaly detection over noisy data using learned probability distributions. In: Langley P, editor. Proceedings of the 2000 International Conference on Machine Learning (ICML-2000) San Francisco: Morgan Kaufmann; 2000. pp. 255–262. [Google Scholar]
- 4.Maxion RA, Tan KMC. Anomaly Detection in Embedded Systems. Pittsburgh, PA: Carnegie Mellon University; 2001. [Google Scholar]
- 5.Wagner MM, Tsui FC, Espino JU, et al. The emerging science of very early detection of disease outbreaks. J Public Health Manage Pract. 2001;7(6):51–59. doi: 10.1097/00124784-200107060-00006. [DOI] [PubMed] [Google Scholar]
- 6.Bishop CM. Novelty detection and neural network validation. IEEE Proc Vision, Image Signal Proc. 1994;141:217–222. doi: 10.1049/ip-vis:19941330. [DOI] [Google Scholar]
- 7.Hamerly G, Elkan C. Bayesian approaches to failure prediction for disk drives. In: Brodley CE, Danyluk AP, editors. Proceedings of the 18th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann; 2001. pp. 202–209. [Google Scholar]
- 8.Bay SD, Pazzani MJ. Detecting change in categorical data: mining contrast sets. In: Chaudhuri S, Madigan D, editors. Knowledge Discovery and Data Mining. New York: Association for Computing Machinery; 1999. pp. 302–306. [Google Scholar]
- 9.Brin S, Motwani R, Ullman JD, Tsur S. Dynamic itemset counting and implication rules for market basket data. In: Peckham J, editor. SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13–15, 1997, Tucson, Arizona, USA. New York: ACM Press; 1997. pp. 255–264. [Google Scholar]
- 10.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc. Series B. 1995;57:289–300. [Google Scholar]
- 11.Good P. Permutation Tests—a Practical Guide to Resampling Methods for Testing Hypotheses. 2nd ed. New York, NY: Springer-Verlag; 2000. [Google Scholar]
- 12.Miller CJ, Genovese C, Nichol RC, et al. Controlling the False Discovery Rate in Astrophysical Data Analysis. Pittsburgh, PA: Carnegie Mellon University; 2001. [Google Scholar]
