Skip to main content
. 2024 Jul 5;19(7):e0306532. doi: 10.1371/journal.pone.0306532

Table 1. Anomaly detection algorithms tested in the negative-based monitoring system.

Surveillance algorithms Brief description Parameters R package
Cumulative sum control charts (CUSUM) It is recognized as proper fit for situations where the process average is expected to shift or trend from the specified baseline. CUSUM can use binomial distribution to detect proportional events and Poisson for count events. Decision Interval (k): it determines the sensitivity, representing the amount of deviation from the target that will trigger an out-of-control signal. Target Value (h): the expected value around which the process should operate. Starting Value (h0): the starting value of the cumulative sum. It was set to be zero because the process was initially in control. Directionality (positive or negative): to detect positive (increase in the process mean) or negative shifts (decrease in the process mean). qcc [23]
Exponentially Weighted Moving Average (EWMA) It is an exponential smoothing trend and uses the cumulative differences between observed data in a time window and compares it to a threshold, namely, sigma. Smoothing Factor (λ): it determines the weight assigned to the most recent observation, being between 0 and 1. A higher value of λ gives more weight to recent observations, making the EWMA more responsive to changes, while a lower value gives more equal weight to all observations, providing a smoother but less responsive result. Sigma: it specifies the number of sigmas to determine upper and lower control limits. qcc [23]
Exponentially Weighted Anomaly Score (EARS) It is based on the difference between observed values and average values calculated within a moving time window and it standardizes observations using standard deviations from the shorter baselines. It compares the observation against standardized observations from 7 weeks in the past. The EARS includes three methods, namely, “C1” (compares the number of submissions of a week with the average of 7 previous weeks), “C2” (8 weeks), and “C3” (9 weeks). Baseline: number of timepoints from the observation with index baseline + 1 (C1) or + 3 (C2) or + 5 (C3). Method: selected between C1, C2, an C3. Alpha: approximate prediction interval. surveillance [24]
Farrington and Farrington Flexible The Farrington algorithm is specifically designed for surveillance data and is commonly used for outbreak detection. It compares observed counts with expected counts at a particular time point but also uses the same time point from previous years using a Poisson distribution. Thus, Farrington algorithms need at least one complete year of baseline to work properly. Noufaily et al. [25] designed a Farrington algorithm, namely Farrington Flexible, to control for false alarms by increasing the re-weight of past weeks. Baseline (b): number of years back in time to include when forming the base counts. Window’s half-size: number of weeks to include before and after the current week in each year. Weights Threshold: defines the threshold for reweighting past outbreaks using the Anscombe residuals (1 in the original method, 2.58 advised in the improved method). Past Weeks Not Included: number of past weeks to ignore in the calculation. Alpha: approximate prediction interval.  surveillance [2426]