Table 4-3.
Outbreak detection algorithms.
| Algorithm | Short description | Availability and applications | Features and problems |
|---|---|---|---|
| Temporal analysis | |||
| Serfling method | A static cyclic regression model with predefined parameters optimized through the training data | Available from RODS (Tsui et al.,2001); used by CDC for flu detection; Costagliola et al. applied Serfling's method to the French influenza-like illness surveillance (Costagliola et al., 1981) | The model fits data poorly during epidemic periods. To use this method, the epidemic period has to be predefined. |
| Autoregressive Integrated Moving Average (ARIMA) | A linear function learns parameters from historical data. Seasonal effect can be adjusted. | Available from RODS | Suitable for stationary environments. |
| Recursive Least Square (RLS) | A dynamic autoregressive linear model that predicts the current count of each syndrome within a region based on the historical data; it continuously adjusts model coefficients based on prediction errors | Available from RODS | Suitable for dynamic environments. |
| Exponentially Weighted Moving Average (EWMA) | Predictions based on exponential smoothing of previous several weeks of data with recent days having the highest weight (Neubauer, 1997) | Available from ESSENCE | Allowing the adjustment of shift sensitivity by applying different weighting factors. |
| Cumulative Sums (CUSUM) | A control chart-based method to monitor for the departure of the mean of the observations from the estimated mean (Das et al., 2003; Grigoryan et al., 2005). It allows for limited baseline data. | Widely used in current surveillance systems including BioSense, EARS (Hutwagner et al., 2003) and ESSENCE, among others | This method performs well for quick detection of subtle changes in the mean (Rogerson, 2005); it is criticized for its lack of adjustability for seasonal or day-of-week effects. |
| Hidden Markov Models (HMM) | HMM-based methods use a hidden state to capture the presence or absence of an epidemic of a particular disease and learn probabilistic models of observations conditioned on the epidemic status. | Discussed in (Rath et al., 2003) | A flexible model that can adapt automatically to trends, seasonality covariates (e.g., gender and age), and different distributions (normal, Poisson, etc.). |
| Wavelet algorithms | Local frequency-based data analysis methods; they can automatically adjust to weekly, monthly, and seasonal data fluctuations. | Used in NRDM to indicate zip-code areas in which OTC medication sales are substantially increased (Espino and Wagner, 2001; Zhang et al., 2003) | Account for both long-term (e.g., seasonal effects) and short-term trends (e.g., day-of-week effects) (Wagner et al., 2004b). |
| Spatial analysis | |||
| Generalized Linear Mixed Modeling (GLMM) | Evaluating whether observed counts in relatively small areas are larger than expected on the basis of the history of naturally occurring diseases (Kleinman et al., 2004, 2005a) | Used in Minnesota (Yih et al., 2005) | Sensitive to a small number of spatially focused cases; poor in detecting elevated counts over contiguous areas when compared with scan statistic and spatial CUSUM approaches (Kleinman et al., 2004). |
| SMall Area Regression and Testing (SMART) | An adaptation of GLMM that takes into account multiple comparisons and includes parameters for ZIP code, day of the week, holiday, and seasonal cyclic variation. | Available from BioSense and National Bioterrorism Syndromic Surveillance Demonstration Program (Yih et al., 2005) | Seasonal, weekly effects, and other parameters under consideration can be adjusted during the regression process. |
| Spatial scan statistics and variations | The basic model relies on using simply-shaped areas to scan the entire region of interest based on well-defined likelihood ratios. Its variation takes into account factors such as people mobility | Widely adopted by many syndromic surveillance systems; a variation proposed in (Duczmal and Buckeridge, 2005); visualization available from BioPortal (Zeng et al., 2004a). | Well-tested for various outbreak scenarios with positive results; the geometric shape of the hotspots identified is limited. |
| Bayesian spatial scan statistics | Combining Bayesian modeling techniques with the spatial scan statistics method; outputting the posterior probability that an outbreak has occurred, and the distribution of this probability over possible outbreak regions | Available from RODS (Neill et al., 2005) | Computationally efficient; can easily incorporate prior knowledge such as the size and shape of outbreak or the impact on the disease infection rate. |
| Spatial-temporal analysis | |||
| Space-time scan statistic | An extension of the space scan statistic that searches all the subregions for likely clusters in space and time with multiple likelihood ratio testing (Kulldorff, 2001). | Widely used in many community surveillance systems including the National Bioterrorism Syndromic Surveillance Demonstration Program (Yih et al., 2004) | Regions identified may be too large in coverage. |
| What is Strange About Recent Event (WSARE) | Searching for groups with specific characteristics (e.g., a recent pattern of place, age, and diagnosis associated with illness that is anomalous when compared with historic patterns) (Kaufman et al., 2005) | Available from RODS; Implemented in ESSENCE | In contrast to traditional approaches, this method allows for use of representative features for monitoring (Wong et al., 2003; Wong et al., 2002). To use it, however, the baseline distribution has to be known. |
| Population-wide ANomaly Detection and Assessment (PANDA) | A causal Bayesian network approach to model a population and infer the spatial-temporal probability distribution of disease for the entire population or individual patients | Available from RODS (Cooper et al., 2004; Moore et al.,2002) | Extensive computational effort |
| Prospective Support Vector Clustering (PSVC) | This method uses the Support Vector Clustering method with risk adjustment as a hotspot clustering engine and a CUSUM-type design to keep track of incremental changes in spatial distribution patterns over time | Developed in BioPortal (Chang et al., 2005; Zeng et al., 2004a) | This method can identify hotspots with irregular shapes in an online context |