Skip to main content
. 2009 Jul 14;21:49–72. doi: 10.1007/978-1-4419-1278-7_4

Table 4-3.

Outbreak detection algorithms.

Algorithm Short description Availability and applications Features and problems
Temporal analysis
Serfling method A static cyclic regression model with predefined parameters optimized through the training data Available from RODS (Tsui et al.,2001); used by CDC for flu detection; Costagliola et al. applied Serfling's method to the French influenza-like illness surveillance (Costagliola et al., 1981) The model fits data poorly during epidemic periods. To use this method, the epidemic period has to be predefined.
Autoregressive Integrated Moving Average (ARIMA) A linear function learns parameters from historical data. Seasonal effect can be adjusted. Available from RODS Suitable for stationary environments.
Recursive Least Square (RLS) A dynamic autoregressive linear model that predicts the current count of each syndrome within a region based on the historical data; it continuously adjusts model coefficients based on prediction errors Available from RODS Suitable for dynamic environments.
Exponentially Weighted Moving Average (EWMA) Predictions based on exponential smoothing of previous several weeks of data with recent days having the highest weight (Neubauer, 1997) Available from ESSENCE Allowing the adjustment of shift sensitivity by applying different weighting factors.
Cumulative Sums (CUSUM) A control chart-based method to monitor for the departure of the mean of the observations from the estimated mean (Das et al., 2003; Grigoryan et al., 2005). It allows for limited baseline data. Widely used in current surveillance systems including BioSense, EARS (Hutwagner et al., 2003) and ESSENCE, among others This method performs well for quick detection of subtle changes in the mean (Rogerson, 2005); it is criticized for its lack of adjustability for seasonal or day-of-week effects.
Hidden Markov Models (HMM) HMM-based methods use a hidden state to capture the presence or absence of an epidemic of a particular disease and learn probabilistic models of observations conditioned on the epidemic status. Discussed in (Rath et al., 2003) A flexible model that can adapt automatically to trends, seasonality covariates (e.g., gender and age), and different distributions (normal, Poisson, etc.).
Wavelet algorithms Local frequency-based data analysis methods; they can automatically adjust to weekly, monthly, and seasonal data fluctuations. Used in NRDM to indicate zip-code areas in which OTC medication sales are substantially increased (Espino and Wagner, 2001; Zhang et al., 2003) Account for both long-term (e.g., seasonal effects) and short-term trends (e.g., day-of-week effects) (Wagner et al., 2004b).
Spatial analysis
Generalized Linear Mixed Modeling (GLMM) Evaluating whether observed counts in relatively small areas are larger than expected on the basis of the history of naturally occurring diseases (Kleinman et al., 2004, 2005a) Used in Minnesota (Yih et al., 2005) Sensitive to a small number of spatially focused cases; poor in detecting elevated counts over contiguous areas when compared with scan statistic and spatial CUSUM approaches (Kleinman et al., 2004).
SMall Area Regression and Testing (SMART) An adaptation of GLMM that takes into account multiple comparisons and includes parameters for ZIP code, day of the week, holiday, and seasonal cyclic variation. Available from BioSense and National Bioterrorism Syndromic Surveillance Demonstration Program (Yih et al., 2005) Seasonal, weekly effects, and other parameters under consideration can be adjusted during the regression process.
Spatial scan statistics and variations The basic model relies on using simply-shaped areas to scan the entire region of interest based on well-defined likelihood ratios. Its variation takes into account factors such as people mobility Widely adopted by many syndromic surveillance systems; a variation proposed in (Duczmal and Buckeridge, 2005); visualization available from BioPortal (Zeng et al., 2004a). Well-tested for various outbreak scenarios with positive results; the geometric shape of the hotspots identified is limited.
Bayesian spatial scan statistics Combining Bayesian modeling techniques with the spatial scan statistics method; outputting the posterior probability that an outbreak has occurred, and the distribution of this probability over possible outbreak regions Available from RODS (Neill et al., 2005) Computationally efficient; can easily incorporate prior knowledge such as the size and shape of outbreak or the impact on the disease infection rate.
Spatial-temporal analysis
Space-time scan statistic An extension of the space scan statistic that searches all the subregions for likely clusters in space and time with multiple likelihood ratio testing (Kulldorff, 2001). Widely used in many community surveillance systems including the National Bioterrorism Syndromic Surveillance Demonstration Program (Yih et al., 2004) Regions identified may be too large in coverage.
What is Strange About Recent Event (WSARE) Searching for groups with specific characteristics (e.g., a recent pattern of place, age, and diagnosis associated with illness that is anomalous when compared with historic patterns) (Kaufman et al., 2005) Available from RODS; Implemented in ESSENCE In contrast to traditional approaches, this method allows for use of representative features for monitoring (Wong et al., 2003; Wong et al., 2002). To use it, however, the baseline distribution has to be known.
Population-wide ANomaly Detection and Assessment (PANDA) A causal Bayesian network approach to model a population and infer the spatial-temporal probability distribution of disease for the entire population or individual patients Available from RODS (Cooper et al., 2004; Moore et al.,2002) Extensive computational effort
Prospective Support Vector Clustering (PSVC) This method uses the Support Vector Clustering method with risk adjustment as a hotspot clustering engine and a CUSUM-type design to keep track of incremental changes in spatial distribution patterns over time Developed in BioPortal (Chang et al., 2005; Zeng et al., 2004a) This method can identify hotspots with irregular shapes in an online context