Objective
To develop a computationally simple and fast algorithm for rapid detection of outbreaks producing easily interpretable results.
Introduction
Since the release of anthrax in October of 2001, there has been increased interest in developing efficient prospective disease surveillance schemes. Poisson CUSUM is a control chart-based method that has been widely used to detect aberrations in disease counts in a single region collected over fixed time intervals. Over the past few years, different methods have been proposed to extend Poisson CUSUM charts to capture the spatial association among several regions simultaneously.
In the proposed method, we extend an algorithm [2] in industrial process control using multiple Poisson CUSUM charts to the spatial setting. The spatial association among regions is captured using the method proposed by Raubertas [3], which has been successfully applied in several prospective surveillance schemes. Also, to improve the power of the traditional multiple Poisson CUSUM charts, Poisson CUSUM charts were used along with fault discovery rate (FDR) control techniques [2].
Methods
We start by defining the overall the false discovery rate which allows the user to define an acceptable false alarm rate. Then for each region m, we sum up the counts of all the immediate neighbors, creating m spatially correlated neighborhoods. A Poisson CUSUM statistic is then calculated for each neighborhood. To calculate the Poisson CUSUM statistic, the in-control Poisson parameter needs to be calculated using past data and a tolerable out-of-control parameter needs to be specified by the user. A computationally simple random walk method is used to find p-values for each regional Poisson CUSUM statistic [2] instead of the computationally intensive Markov chain-based method while producing comparable results.
The use of p-values to identify alarms makes the procedure easy to interpret for a wider audience in comparison to the traditional average run length (ARL) and cut-off thresholds that are only used in an industrial process control setting. The use of FDR to control multiple testing allows the utilization of popular multiple testing techniques such as Benjamini-Yekulieli [1] procedure.
Once the p-values are calculated, we use the general version of the Benjamini-Yukatieli procedure for multiple testing of the p-values to identify neighborhoods with unusually high disease counts; currently, we are considering alternative resampling based procedures to gain even more power.
Results
We developed a grid of neighboring regions and simulated Poisson counts with constant mean during the first half of the time period. During the second half, we introduced a step-increase in mean at varying degrees of intensity to simulate an outbreak in neighboring regions. Then we ran the simulation to calculated an independent Poisson CUSUM statistic for each region. Fixing the fault discovery rate at 0.05, we used the Benjamin- Hochberg [1] procedure to determine the speed at which the procedure identifies the simulated outbreak.
Next, for the same simulated data, we cumulated the Possion counts of the neighboring regions and used Benjamini-Yekutieli procedure to detect alarms from the spatially correlated neighborhood statistics. The results provide convincing evidence that using the neighborhoods - instead of independent regions - significantly reduces the time for detection while increasing the correct identification of time periods of the simulated outbreak.
The method proposed by Raubertas acknowledges spatial association but does not account for it, as it simply cumulates neighboring counts without using a measure of spatial correlation to calculate neighborhood statistics. To further improve, we are currently investigating spatial statistical methods to account for spatial correlation.
Finally, we plan to apply this procedure to lower respiratory infection data during 1996-1999 from multiple Boston clinics.
Conclusions
We have developed a simple, efficient algorithm for prospective disease surveillance that is relatively easy to interpret without a high level of statistical expertise. Our preliminary results from simulation studies provide evidence of the strengths of the method proposed.
References
- 1.Benjamini Y, Yekutieli D. 2001. The control of the false discovery rate in multiple testing under dependency. Ann Stat. 29(4), 1165-88. [Google Scholar]
- 2.Li Y, Tsung F. 2012. Multiple attribute control charts with false discovery rate control. Qual Reliab Eng Int. 28(8), 857-71. 10.1002/qre.1276 [DOI] [Google Scholar]
- 3.Raubertas RF. 1989. An analysis of disease surveillance data that uses the geographical locations of the reporting units. Stat Med. 8, 267-71. 10.1002/sim.4780080306 [DOI] [PubMed] [Google Scholar]
