Abstract
Spatial-temporal analysis of the COVID-19 cases is critical to find its transmitting behaviour and to detect the possible emerging clusters. Poisson's prospective space-time analysis has been successfully implemented for cluster detection of geospatial time series data. However, its accuracy, number of clusters, and processing time are still a major problem for detecting small-sized clusters. The aim of this research is to improve the accuracy of cluster detection of COVID-19 at the county level in the U.S.A. by detecting small-sized clusters and reducing the noisy data. The proposed system consists of the Poisson prospective space-time analysis along with Enhanced cluster detection and noise reduction algorithm (ECDeNR) to improve the number of clusters and decrease the processing time. The results of accuracy, processing time, number of clusters, and relative risk are obtained by using different COVID-19 datasets in SaTScan. The proposed system increases the average number of clusters by 7 and the average relative risk by 9.19. Also, it provides a cluster detection accuracy of 91.35% against the current accuracy of 83.32%. It also gives a processing time of 5.69 minutes against the current processing time of 7.36 minutes on average. The proposed system focuses on improving the accuracy, number of clusters, and relative risk and reducing the processing time of the cluster detection by using ECDeNR algorithm. This study solves the issues of detecting the small-sized clusters at the early stage and enhances the overall cluster detection accuracy while decreasing the processing time.
Keywords: COVID-19, Geospatial time series data, Space-time analysis, Spatial-temporal analysis, Poisson prospective distribution
Introduction
Coronavirus disease (COVID-19) is an infectious disease that was first identified in Wuhan city, Hubei province, China in December of 2019. COVID-19 is caused by a newly discovered coronavirus (SARS-CoV-2) and has severe acute respiratory syndrome [5, 12]. As of September 12, 2020, there were more than 28 million confirmed cases and above 920 thousand people lost their lives worldwide. In the United States, there were over 6,638,044 confirmed cases and 197,461 deaths [6]. Within a short time, the virus spread all over the world and many countries have implemented social distancing and even lockdown. Around 80% of the confirmed cases were mild and the death rate was around 3.19% [28]. Its general symptoms include fever, shortness of breath, cough whereas severe cases may include multi-organ failure, pneumonia, and death [22].
Due to its high transmission rate and challenges in developing a vaccine, it will likely take more time. So, it is very important to understand and visualize the behaviour of the virus to be safe or to reduce the fatality. A Space-Time statistic is an effective approach for analyzing the disease's behavior over the given time [18, 26]. It helps in studying the number of cases in a particular location in a given time. In the past, Space-time scan statistics have been implemented for analyzing chikungunya and dengue fever in Colombia and Panama [22] pointing to areas with increased criminal activity, detecting hot spots for the West Nile Virus infection in Italy [24]. It is useful for analyzing the recurrence intervals of the data using the software such as SaTScan [17]. Prospective distribution is a widely used approach to analyze the disease cases. This approach treats each case as an individual, following it over a given time, and collecting their data as characteristics. Then, it detects active or emerging clusters of the current day, while disregarding past clusters, which is very helpful for understanding the disease's behaviour. The current solution lacks the including of testing rate while calculating the number of expected cases [11].
The purpose of this paper is to enhance the cluster detection accuracy, processing time, the number of clusters, and the relative risk by using the ECDeNR algorithm This research aims to increase the cluster detection accuracy thus including the areas with a small number of cases having low testing rate [3]. The proposed Modified Likelihood ratio function helps to add new important features to the system to detect clusters. For this, the Relative positive case count is used to calculate the number of expected cases. It helps to detect the small-sized cluster which increases the accuracy of cluster detection. Furthermore, The Modified Relative Risk function is used with the Monte Carlo simulation to detect the secondary cluster. To ensure higher accuracy on the relative risk of each cluster, the proportion of positive test is used, which reduce the noise data of clusters.
The remaining paper is divided into five sections. In section two the literature review is given. Section three will depict all the major component and sub-components of the proposed system along with the related diagrams. Then Section four discuss the results of this study. Finally, we conclude the research in section five. The future works are also highlighted in the last section.
Literature review
The main reason for the literature review is to do a survey of the existing papers with their limitation to improve the current system. Furthermore, this section provides a review of the different papers and related prospective field analysis.
Guliyev [9] examined COVID-19 cases by taking two variables: recovered cases and the death rate, with their spatial spillover effects. To determine the relationship between these variables and their spatial effects, this work used the spatial panel data model. Guliyev [9] provided the most consistent efficient model to capture spatial effects according to LR-test, maximum pseudo-R2, and minimum BIC and AICc values. It identifies the actual impacts and spatial interactions of the factor components on COVID-19. However, this paper can’t model the death rate because of the presence of a high proportion of zeros in the dataset and also considered time is short. Future work will be carried out with a big dataset. Balamchi and Torabi [1] enhanced the accurate detection of repeated events on spatial data. This research offered a spatial model with repeated events known as the spatial compound poisson model to analyze the repeated events as well as the spatial variation of the data to incorporate spatial random effects. The work provided a better knowledge of the spatial trend of disease and its risk factors for future preventions and the performance of the model increases by 4.04%. However, this research was only used sex and year two covariates to account for the exact number of incidences which may reduce the performance of the system. In the future, the model should use more covariates and extend the process for binary data as well. Saeed et al. [29] improved location-wise alcohol-related driving crash rates by analyzing spatial effects and edge effects emerging from the common road which navigate region boundaries. Saeed et al. [29] used the spatial Durbin model to account for spatial dependencies, evaluating alternative spatial weight structures, and considering edge effects. This research provided a framework for macroscopic spatial analysis of different kinds of road crashes at different severity levels to identify the effective safety intervention programs to minimize the accidents. However, the driving crash rate is still not more accurate. This research can be extended to examine road crashes based on neighborhood dynamics to improve crash rate accuracy. Lansiaux [20] investigated that sunlight exposure is remarkably correlated with the mortality rate of COVID-19 by using the Pearson correlation test. Sunlight exposure may have a defensive impact on COVID-19 mortality. These findings help prevent and win the COVID-19 pandemic. This work doesn’t include two important factors measurement: time and direct vitamin D measurements which impacts the accuracy of the results. Hammad [10] investigated patients with acute and chronic conditions are at more risk because of many factors related to COVID-19 by analyzing blood pressure, heart rate, troponin, left ventricle ejection fraction (LVEF), and new Q-wave to assess severity. Hammad et al. [10] provided a more accurate result compared to the previous system by 11%. However, this paper used very small sets of datasets (only 143 patients) and it is not very efficient to generalize the behavior of the Covid-19. Further research can take a large dataset and apply the analyzing methods on that datasets.
Corizzo et al. [4] proposed a new algorithm for detecting the anomalies in the data collected from multiple sensors positioned in different locations. this research allowed up to 13.56% of RMSE reduction, compared to the baseline scenario which increased the accuracy of anomalies detection. This approach can sometimes lead to bad results if the geographic position of the sensors is different and far from each other. To consider the spatial autocorrelation occurrence in the system, Future research can study the adoption of statistical indicators in the learning process. Krivoruchko and Gribov [15] enhanced the accurate finding of the chordal distance between two geolocations. The work compared distance’s accuracy between different models. They provided a new computationally efficient and accurate Kernel evolution algorithm (EBK). It improved the accuracy by twice while calculating the distance between two geolocations. However, this research is entirely depending on the ArcGIS software for the simulation and mapping GIS models and the improper implementation of modeling within the system can cause the invalid result.
Leevy et al. [21] analyzed the effect on an existing predictive model by including a training dataset from several year-groupings. This research provided well-calculated data showing how the distribution of the original dataset changes over time by grouping the data in different years and applying various data processing algorithms. However, this research was only conducted for the dataset collected from 2013 to 2015 and it was limited to the physician who was active through these three years which also excluded some potential datasets. Future work will inspect the effect of using other different learners, metrics, and class ratios implemented in the big data from different domains than healthcare. Lakhani et al. [19] identified priority areas for palliative care in Melbourne city with a remarkably high number of adults with disabilities and blockade to accessing essential health services. It provided a framework to find the preferred region for palliative care services during the COVID19 pandemic. This model supportedsupports the health of the unsafe populations in the preferred region having limited access to health services during a pandemic. However, this paper is only considered a small area (Melbourne). Future research can be done with a larger data set of larger areas which will give a more generalized result. Mollalo et al. [23] investigated the country level variation of the COVID-19 disease cases across the USA by compiling 35 environmental, topographic, demographic, and socioeconomic variables. The work supported the substantial impact of healthcare professionals during the pandemic. However, the dataset used is on the county-level while the calculation is done on the sub-county level which did not produce the accurate results. Future research can be done with sub-county level data and should include other variables to improve the quality of the service and the overall improvement for combating the pandemic.
Cordes and Castro [3] identified the clusters of high positivity rates and low testing rates by using spatial scan statistics. The research provided the list of areas with limited access to testing but having a high case, which is very essential to realize the risk and allocate resources in the COVID-19 pandemic. The fine spatial resolution is the major strength of this research. It gave a better idea of which nearby region has a higher case burden. But it only explained the relationship between COVID-19 testing patterns and their dependent factors. More input parameters must be used and examined with a big dataset. Hohl et al.[11] conducted daily surveillance of COVID-19 to detect and characterize emerging clusters in the USA by applying the prospective space-time scan statistic. This work offered a web application that lets the user track the space-time distribution of significant clusters. It is an improvement on the previous work and enhanced the accuracy of the cluster detection at an increased temporal resolution. However, it generated the cluster in a circular shape which is not a good choice in an area that has significant spatial heterogeneity and it decreased the accuracy of cluster detection in real life. Future research will work for detecting clusters of irregular shape. Rongyao et al. [27] proposed a framework to conduct joint disease diagnosis and conversion time prediction. This work investigated distinguishing severe cases from mild cases and predicting the conversion time that mild cases to move to a severe case. This proposed method is evaluated against six comparison methods, on synthetic multi-modality data sets and a COVID-19 data set, based on binary classification and regression performance. Rongyao et al. [27] research is sensitive to the selection of the tuning parameters used in the objective function, and only focused on binary classification.
State of art
Hohl, et al. [11] proposed a model for COVID-19 clusters detection using Poisson prospective space-time analysis using software named SaTScan by building the cylindrical clusters. This paper calculates the expected cases and elevated risk by considering the total population and active cases. Expected cases were calculated using the population within a cluster(p), the total number of cases in the US(C), and the total population(P). They use 999 Monte Carlo simulations for significance testing, adds the great possibilities of finding secondary clusters and reduces the uncertainty [30]. Studies have found that in the first half of the study period, the number of clusters was in the range of 6–10, but for the second half, it was in the range of 23–24 [11]. This solution has some limitations. It cannot detect the small sized cluster with a low number of tests. Noise reduction of cluster is not implemented. Average accuracy of the cluster detection is a major limitation. The average cluster detection accuracy rate is 83.32% and its processing time is 7.37 minutes. Figure 1 shows the block diagram of the state of art, the blue borders show the good features of this state of art solution, and the red border refers to the limitation of it.
Fig. 1.
The block diagram of state of art system [11]. The blue borders show the good features of this state of art solution, and the red border refers to the limitation of it
It consists of four major stages, namely Data Collection and Preparation, Data modeling, Data Analysis, and Data Visualization. In the following section the stages of the state of art are further explained and also its limitation are given with the limitation justification.
Data preparation and collection
Data is collected from public COVID-19 case data of the USA, provided by Johns Hopkins University for the selected study period (January 22nd – June 5th, 2020). To make the integrity of data, cases from international cruise ships were removed [11]. Finally, daily new cases were calculated and grouped on the weekly basis for cluster calculation [11]. Grouping the daily case data on the weekly basis will impact the daily nature of the considered temporal window and calculation may lose useful information such as the daily number of cases, the daily number of deaths, etc. which impact the accuracy of the result.
Data modelling
Hohl et al. [11] used the Poisson prospective space-time analysis which detects the most likely clusters from several cylindrical candidates’ clusters. This research restricted the spatial scanning window and temporal scanning window by 10% and 50% respectively. Expected cases were calculated using the population within a cluster(p), the total number of cases in the US(C), and the total population(P). The use of the Poisson prospective space-time analysis is the main feature of this stage. Poisson prospective space-time analysis helped to analyze the cases by considering the geographical location and their behaviour over the study period. There is no relationship between expected cases and the testing rate as the number of the cases is directly dependent on the number of the test performed.
Data analysis
For the data analysis, this research paper considers a Null Hypothesis H0 and an Alternate Hypothesis HA. First, a Likelihood test is performed against the Null hypothesis and find out the most elevated clusters with likelihood ratio > 1. Researchers then run 999 Monte Carlo simulations by randomizing the spatial and temporal window to obtain a likelihood ratio for each run and candidate cluster that forms a distribution under H0[11]. The 999 Monte Carlo simulation is the main feature of this stage, which is performed to find the emerging clusters. By randomizing the locations and time window, it calculates the likelihood ratio for each run, and candidate clusters are detected with their Relative risk (RR). Also, selecting the clusters having a Likelihood ratio > 1 helps to select only the most elevated clusters, indicated as elevated risk. However, this model has a limitation that it does not consider the Modified Relative Risk (MRR) which includes the proportion of positive test during the calculation.
Data visualization
The calculated relative risks are presented in the tabular format whereas, for the cluster's visualization, researchers have built a web application named Covid19Scan. It consists of a map and a slider divided into weekly steps. It shows that the number of clusters changes from 0 to 23 during the study period [11]. The pseudocode and the flowchart of the state of art algorithm are shown in Table 1 and Fig. 2, respectively.
Table 1.
Poisson prospective space-time algorithm
Fig. 2.
The Flowchart of Poisson Prospective space-time algorithm
The state of art model presented cluster detection accuracy of a minimum of 5 cases within a minimum duration of 2 days. The Poisson prospective space-time scan statistic algorithm is implemented in the data modeling phase to determine the number of the expected cases as shown in Eq. 1 and the Likelihood ratio in Eq. 2 [11]. However, still accuracy can be increased by the techniques for cluster detection.
| 1 |
where
- p=
the population inside the cylinder;
- C=
the total number of cases; and
- P=
the total population from the U.S census website.
The number of expected cases () is an objective function that is calculated for each cluster in the data modeling phase to perform the likelihood test with the null hypothesis [11]. It reduced the accuracy of the cluster detection and is prone to error.
| 2 |
where
- L(Z)=
Likelihood function L(Z) for candidate cylinder Z;
- L0=
Likelihood function for H0;
- nz=
The number of cases inside the cylinder;
- μ(Z)=
The expected number of cases in-cylinder Z;
- μ(T)=
The total number of expected cases in the study area across all periods; and
- N=
The number of observed cases for the entire study area during the entire study period.
The relative risk is the risk within a county divided by the risk outside and computed by Hohl et al. [11] for each cluster in a county as in Eq. 3:
| 3 |
where
- RRcty=
The relative risk;
- e=
The total number of cases for a given county; and
- E=
The number of observed cases in U.S.
Without the testing rate, it is difficult to detect the actual number of expected cases and to calculate the relative risk within a cluster as compared to the outside world. Even though the expected number can be calculated without the testing rate, it does not represent the actual number. In addition, without the Modified Relative Risk, the performed Monte Carlo Simulation will miss some potential clusters and there is a high-risk of including the noise data.
Proposed System
After reviewing a range of methods for space-time analysis of the COVID-19 cases, we analyzed the pros and cons of each method. Accuracy, cluster shape, processing time, relative risks, and expected cases were the main issues to be considered. According to this consideration we selected the work of Hohl, et al. [11]; as the basis for our proposed solution. Poisson prospective space-time analysis analyzed the cases based on the spatial location and time window. Also, it produced clusters of cylindrical shapes which can include both space and time [25]. Thus, Poisson prospective space-time analysis helps to analyze the cases by considering the geographical location and their behavior over the study period [2]. A prospective space-time scan statistic is beneficial as it detects active or emerging clusters of the present while neglecting the clusters from the past [7]. Along with this, the proposed solution increases the cluster detection accuracy by including the area with a small number of cases having a low testing rate. The proposed system uses the differential information about the Relative positive test count (RC) and the proportion of positive test cases [3] for finding the cluster. This is a completely new feature adapted from the work of Cordes and Castro [3]. This approach calculates the testing rates(T), 0 < T ≤ 1, and uses it while calculating the number of expected cases to improve its accuracy. As a result, we selected the work of Cordes and Castro [3] as the second-best solution. The block diagram of the proposed system is given in Figure 3 below. The proposed system consists of same four major stages as in state of art system, as shown in Figure 3, called Data collection and preparation, Data modelling, Data Analysis, and Data Visualization. The following paragraphs gives more details of each stage.
Fig. 3.
The Block diagram of the proposed system for daily visualization of COVID-19 cases using Enhanced cluster detection and noise reduction algorithm. [The green borders refer to the new parts in our proposed system].
Data collection and preparation
This stage is same as done by Hohl, et al. [11] explained in previous section.
Data modelling
For modelling, the proposed system uses the Enhanced Cluster Detection and Noise reDuction algorithm (ECDeND) to find the most likely clusters from many cylindrical candidates’ clusters. The relative positive case count is calculated using the number of confirmed cases and the Testing rate. Then, The Expected number of cases were calculated using the population within a cluster(p), the total number of daily cases, total population(P), and the Relative positive case count (RC).
Data analysis
For the data analysis, this research has considered a Null Hypothesis H0 and an Alternate Hypothesis HA. First, a Likelihood test is performed against the Null hypothesis and find out the most elevated clusters with likelihood ratio > 1. Later, we calculated the Modified Relative Risk Function (MRRF) and then ran 999 Monte Carlo simulations by randomizing the spatial and temporal window to obtain a likelihood ratio for each run and candidate cluster that forms a distribution under H0 [11]. Also, to reduce the noise data, we have modified the equation of the Relative Risk using the Proportion of positive test (PPR) from [31]. We used the MRRF which reduces scattered cases as noise and improves the accuracy of cluster detection in Figure 3.
Data visualization
The calculated relative risks will be presented in the tabular format whereas, for the cluster's visualization, we used the SaTScan software for cluster detection which is widely used in both disease and syndromic surveillance [8]. The whole training strategy namely Enhanced cluster detection and noise reduction (ECDeNR) algorithm is presented in Table 2, and the flowchart of the proposed algorithm is given in Figure 4.
Table 2.
Enhanced cluster detection and noise reduction algorithm
Fig. 4.
The Flowchart of Enhanced cluster detection and noise reduction algorithm.
A. Proposed equation
The number of expected cases as calculated by Hohl, et al. [11] is given in Eq. 1. This calculation has a limitation because it does not include an independent variable such as testing rate. This limitation is solved using the Relative Positive Case Count (RC) proposed by Cordes and Castro [3]. So, the RC is calculated in Eq. 4 to overcome the limitation of Eq. 1:
| 4 |
where
- RC=
Relative Positive case count;
- T=
testing rate; and
- n=
number of positive tests.
The number of expected cases counts depends on the many independent variables such as testing rate, number of positive tests, and so on. Basic theory overcome this limitation by using the RC [3]. So, Eq. 1 modified by us and the result is given in Eq. 5.
| 5 |
where
- p=
the population inside the region X;
- P=
the total population;
- RC=
Relative Positive case count; and
- M=
Modified number of expected cases in region X.
It solves the limitation of Eq. 2. Therefore, Eq. 2 is modified by us to be Eq. 6.
| 6 |
where
- L(Z)=
Likelihood function L(Z) for candidate cylinder Z;
- L0=
likelihood function for H0;
- nz=
the number of cases inside the cylinder;
- Mμ(Z)=
Modified expected number of cases in-cylinder Z;
- Mμ(T)=
Modified total number of expected cases in the study area across all periods; and
- N=
the number of observed cases for the entire study area during the entire study period.
The Relative risk (RRcty) which is given in Eq. 3 as calculated by [11], has the limitation of the noise data and less accuracy which is solved by including the Proportion of positive test given by [3]. The Proportion of positive test (PPR) is calculated in Eq. 7 to overcome this limitation:
| 7 |
where
- N=
number of tests performed; and
- n=
number of positive tests.
Therefore, to compute the relative risk for each cluster county we proposed Eq. (8) instead of Eq. (3).
| 8 |
where
- PPR=
Proportion of positive test;
- MRRcty=
Modified relative risk for each cluster county.
Finally, the proposed enhanced cluster detection and noise reduction formula is given in Eq. 9:
| 9 |
where
- ECDeNR=
Enhanced cluster detection and noise reduction;
- =
Modified relative risk for each cluster county; and
- =
Modified likelihood ratio.
B. Area of improvement
We proposed two equations: Eqs. 6 and 8. First, the Modified Likelihood ratio function that is used for identifying clusters of elevated disease risk. With the help of the Eqs. 5 and 6, the Relative Positive Case count (RC) detects the small-sized clusters and increases the number of clusters detected. Along with this, it also detects the more emerging cluster at the beginning of the study period. Second, the function (MRRF) is the risk within a county divided by the risk outside. Although Poisson prospective space-time analysis calculates the relative risk for each cluster, it doesn’t filter the noise data. This results in inaccurate clustering. With the help of Eqs. 7 and 8, the Proportion of positive test (PPR) helps to reduce the noise data from the cluster and improves the accuracy. Additionally, we group the cases daily instead of weekly to preserve the useful information. The number of cluster detection is increasing in a significant way. This proposed system can detect and characterize the emerging cluster to find out the high-risk areas. It also helps to detect the small-sized cluster. With the Monte Carlo simulation, the Modified Likelihood ratio function is calculated for each cluster by using Modified Relative Risk. To remove the noise data, the Proportion of positive tests is used to transform the relative risk. The state-of-the-art system does not provide a noise elimination process. Furthermore, our proposed method increasing the accuracy during cluster detection of areas having a small number of cases. With the calculation of the Modified Relative Risk function, it loops through several clusters by randomizing times and locations. This guides in the successful accomplishment of the detection of the small-sized cluster in large numbers, increasing the cluster detection accuracy and removes the noises. As explained in the literature survey of this paper, the current available solutions that applied the Prospective Space-time Analysis in clustering have only used the confirmed cases. None of these solutions have considered the testing rate and Proportion of positive tests (PPR). Our proposed solution has reduced the noise using the proportion of positive tests along with relative risk. It also increases the cluster detection accuracy for small-sized clusters in the early stage by using the Relative positive case count (RC), instead of relying solely on confirmed cases.
Result and discussion
SaTScan v9.6 was used in the implementation of this research [11]. The machine used for the implementation is Windows 10 with Intel Core i3-1011OU CPU at 2.10.6 GHz clock speed and 8 GB RAM. For simulation purposes, we have selected a total of 20 states consisting of 10 lowest case count (Alaska, Montana, Hawaii, Wyoming, Vermont, etc.) and 10 highest case count (Massachusetts, California, Illinois, New Jersey, New York, etc.) between the duration of May 5th to June 5th, 2020. Two freely available datasets were used, the first one from the COVID-19 Data Repository [13] by the Centre for Systems Science and Engineering (CSSE) at Johns Hopkins University [6] and the second one is the state-level population data from the U.S. Census website [31]. We removed the cases from the international ships to maintain the integrity of the system. The data that are taken as samples to test has a different number of confirmed cases. Among the sample data, Alaska has the lowest number of confirmed cases 523 whereas New York has the highest number of confirmed cases 376,208. We used the Enhanced cluster detection and noise reduction algorithm on the SaTScan and set the time precision value to daily. Instead of the Confirmed case count, we supplied the Relative Positive case count for each location. Based on the number of confirmed cases, we ran 999 Monte Carlo simulation to detect the clusters. Table 3 shows an implementation sample of the proposed method and the state of art solution. The criteria used in Table 3 are: No. of Clusters, Average relative risk, Accuracy based on no. of clusters (%), and Processing Time (Minute).
Table 3.
Implementation sample of the proposed technique
| S.N. | Sample Location Details Confirmed cases(0-5k) | Population | Total Confirmed Cases | Duration (Days) | State of Art | Proposed System | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No. of Clusters | Average relative risk | Accuracy based on no. of clusters (%) | Processing Time (Minute) | No. of Clusters | Average relative risk | Accuracy based on no. of clusters (%) | Processing Time (Minute) | |||||
| 1 | Alaska | 731545 | 523 | May 5th -May-12th | 8 | 11.92 | 83.22 | 3.7 | 10 | 14.9 | 87.25 | 2.17 |
| May 13th- May 19th | 7 | 10.76 | 82.38 | 3.64 | 9 | 13.83 | 86.53 | 2.03 | ||||
| May 20th- May 26th | 9 | 13.17 | 82.81 | 3.8 | 13 | 19.02 | 89.43 | 2.32 | ||||
| May 27th – Jun 2nd | 6 | 9.9 | 81.43 | 3.03 | 8 | 13.2 | 85.8 | 1.88 | ||||
| Average | 7.5 | 11.4375 | 82.46 | 3.5425 | 10 | 15.2375 | 87.2525 | 2.1 | ||||
Samples were compared for state of art and the proposed solutions with the help of graphs and the data reports. The results from COVID-19 cases samples are reviewed in Table 4 and 5. All samples in the tables contain the result obtained during calculating the number of expected cases, detecting the emerging clusters, and performing the likelihood test with the null hypothesis. The results are divided according to data analysis phases to see the impact on the number of clusters on the accuracy of the cluster detection. The result from the sample is presented in the term of accuracy, processing time, the number of clusters, and relative risk. Accuracy is calculated in terms of the ratio of the true positive (number of the detected cluster) and all samples (the maximum number of possible clusters), where the number of detected clusters is calculated by using enhanced Poisson prospective space-time analysis to each location and the maximum number of possible clusters is the number of counties with the active cases for a given time. We have done the comprehensive test for 20 testsets; each test has 4 cases based on different time windows. The final result has been calculated by taking the average for all test cases which is shown in Table 3.
Table 4.
Comparison of accuracy, processing time, number of clusters, and relative risk for cluster detection (Sample 1: Confirmed Cases between 0 – 5000)
| S.N. | Sample Location Details Confirmed cases(0-5k) | Population | Total Confirmed Cases | Duration (Days) | State of Art | Proposed System | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No. of Clusters | Average relative risk | Accuracy based on no. of clusters (%) | Processing Time (Minute) | No. of Clusters | Average relative risk | Accuracy based on no. of clusters (%) | Processing Time (Minute) | |||||
| 1 | Alaska | 731545 | 523 | May 5th -May-12th | 8 | 11.92 | 83.22 | 3.7 | 10 | 14.9 | 87.25 | 2.17 |
| May 13th- May 19th | 7 | 10.76 | 82.38 | 3.64 | 9 | 13.83 | 86.53 | 2.03 | ||||
| May 20th- May 26th | 9 | 13.17 | 82.81 | 3.8 | 13 | 19.02 | 89.43 | 2.32 | ||||
| May 27th – Jun 2nd | 6 | 9.9 | 81.43 | 3.03 | 8 | 13.2 | 85.8 | 1.88 | ||||
| Average | 7.5 | 11.4375 | 82.46 | 3.5425 | 10 | 15.2375 | 87.2525 | 2.1 | ||||
| 2 | Montana | 1068778 | 541 | May 5th -May-12th | 9 | 13.29 | 84.02 | 3.9 | 11 | 16.24 | 87.98 | 2.32 |
| May 13th- May 19th | 8 | 11.9 | 82.25 | 3.24 | 10 | 14.88 | 87.25 | 2.17 | ||||
| May 20th- May 26th | 7 | 10.99 | 81.42 | 3.43 | 9 | 14.13 | 86.53 | 2.03 | ||||
| May 27th – Jun 2nd | 6 | 9.64 | 81.16 | 2.82 | 8 | 12.85 | 85.8 | 1.88 | ||||
| Average | 7.5 | 11.455 | 82.2125 | 3.3475 | 9.5 | 14.525 | 86.89 | 2.1 | ||||
| 3 | Hawaii | 1415872 | 664 | May 5th -May-12th | 7 | 10.94 | 80.4 | 2.95 | 10 | 15.63 | 87.25 | 2.03 |
| May 13th- May 19th | 8 | 11.83 | 82.5 | 4.06 | 10 | 14.79 | 87.25 | 2.17 | ||||
| May 20th- May 26th | 8 | 12.07 | 83.8 | 3.08 | 11 | 16.6 | 87.98 | 2.17 | ||||
| May 27th – Jun 2nd | 8 | 11.94 | 81.54 | 3.32 | 11 | 16.42 | 87.98 | 2.17 | ||||
| Average | 7.75 | 11.695 | 82.06 | 3.3525 | 10.5 | 15.86 | 87.615 | 2.135 | ||||
| 4 | Wyoming | 578759 | 933 | May 5th -May-12th | 12 | 16.37 | 85.42 | 4.15 | 16 | 21.83 | 91.6 | 2.76 |
| May 13th- May 19th | 10 | 14.14 | 83 | 3.85 | 13 | 18.38 | 89.43 | 2.47 | ||||
| May 20th- May 26th | 10 | 14.31 | 83.9 | 3.78 | 13 | 18.6 | 89.43 | 2.47 | ||||
| May 27th – Jun 2nd | 11 | 15.61 | 85.68 | 3.55 | 15 | 21.29 | 90.88 | 2.61 | ||||
| Average | 10.75 | 15.1075 | 84.5 | 3.8325 | 14.25 | 20.025 | 90.335 | 2.5775 | ||||
| 5 | Vermont | 623989 | 1027 | May 5th -May-12th | 11 | 15.23 | 85.65 | 3.54 | 13 | 18 | 89.43 | 2.61 |
| May 13th- May 19th | 12 | 16.39 | 83.55 | 3.69 | 16 | 21.85 | 91.6 | 2.76 | ||||
| May 20th- May 26th | 11 | 15.4 | 83.53 | 4.41 | 14 | 19.6 | 90.15 | 2.61 | ||||
| May 27th – Jun 2nd | 11 | 15.16 | 85.43 | 4.13 | 14 | 19.29 | 90.15 | 2.61 | ||||
| Average | 11.25 | 15.545 | 84.54 | 3.9425 | 14.25 | 19.685 | 90.3325 | 2.6475 | ||||
| 6 | West Virginia | 1792147 | 2119 | May 5th -May-12th | 9 | 13.34 | 83.61 | 3.66 | 12 | 17.79 | 88.7 | 2.32 |
| May 13th- May 19th | 11 | 15.36 | 84.75 | 4.38 | 15 | 20.95 | 90.88 | 2.61 | ||||
| May 20th- May 26th | 8 | 11.9 | 81.8 | 3.45 | 11 | 16.36 | 87.98 | 2.17 | ||||
| May 27th – Jun 2nd | 9 | 13.01 | 83.5 | 4.16 | 11 | 15.9 | 87.98 | 2.32 | ||||
| Average | 9.25 | 13.4025 | 83.415 | 3.9125 | 12.25 | 17.75 | 88.885 | 2.355 | ||||
| 7 | Maine | 1344212 | 2482 | May 5th -May-12th | 12 | 16.54 | 86.18 | 4.26 | 16 | 22.05 | 91.6 | 2.76 |
| May 13th- May 19th | 9 | 13 | 83.63 | 3.46 | 11 | 15.89 | 87.98 | 2.32 | ||||
| May 20th- May 26th | 10 | 14.15 | 82.39 | 3.97 | 13 | 18.4 | 89.43 | 2.47 | ||||
| May 27th – Jun 2nd | 9 | 12.92 | 81.98 | 3.84 | 11 | 15.79 | 87.98 | 2.32 | ||||
| Average | 10 | 14.1525 | 83.545 | 3.8825 | 12.75 | 18.0325 | 89.2475 | 2.4675 | ||||
| 8 | North Dakota | 762062 | 2745 | May 5th -May-12th | 12 | 16.61 | 85.92 | 3.79 | 16 | 22.15 | 91.6 | 2.76 |
| May 13th- May 19th | 11 | 15.47 | 83.85 | 3.82 | 15 | 21.1 | 90.88 | 2.61 | ||||
| May 20th- May 26th | 12 | 16.37 | 84.82 | 4.24 | 16 | 21.83 | 91.6 | 2.76 | ||||
| May 27th – Jun 2nd | 10 | 14.47 | 82.36 | 3.41 | 12 | 17.36 | 88.7 | 2.47 | ||||
| Average | 11.25 | 15.73 | 84.2375 | 3.815 | 14.75 | 20.61 | 90.695 | 2.65 | ||||
| 9 | Idaho | 1787065 | 3111 | May 5th -May-12th | 12 | 16.31 | 85.62 | 4.45 | 16 | 21.75 | 91.6 | 2.76 |
| May 13th- May 19th | 12 | 16.27 | 85.13 | 3.82 | 16 | 21.69 | 91.6 | 2.76 | ||||
| May 20th- May 26th | 11 | 15.36 | 83.89 | 4.22 | 14 | 19.55 | 90.15 | 2.61 | ||||
| May 27th – Jun 2nd | 12 | 16.46 | 86.32 | 3.87 | 15 | 20.58 | 90.88 | 2.76 | ||||
| Average | 11.75 | 16.1 | 85.24 | 4.09 | 15.25 | 20.8925 | 91.0575 | 2.7225 | ||||
| 10 | Oregon | 4217737 | 4570 | May 5th -May-12th | 15 | 19.8 | 88.21 | 4.06 | 19 | 25.08 | 93.78 | 3.2 |
| May 13th- May 19th | 12 | 16.54 | 85.84 | 4.58 | 15 | 20.68 | 90.88 | 2.76 | ||||
| May 20th- May 26th | 12 | 16.73 | 83.87 | 4.15 | 16 | 22.31 | 91.6 | 2.76 | ||||
| May 27th – Jun 2nd | 16 | 21.26 | 88.32 | 4.42 | 22 | 29.23 | 95.95 | 3.35 | ||||
| Average | 13.75 | 18.5825 | 86.56 | 4.3025 | 18 | 24.325 | 93.0525 | 3.0175 | ||||
Table 5.
Comparison of accuracy, processing time, no. of clusters and relative risk for cluster detection (Sample 1: confirmed cases between 50000 - 400000)
| S.N. | Sample Location Details Confirmed cases(50k-400k) | Population | Total Confirmed Cases | Week | State of Art | Proposed System | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No. of Clusters | Average relative risk | Accuracy based on no. of clusters (%) | Processing Time (Minute) | No. of Clusters | Average relative risk | Accuracy based on no. of clusters (%) | Processing Time (Minute) | |||||
| 1 | Maryland | 6892503 | 56770 | May 5th -May-12th | 75 | 83.55 | 80.5 | 10 | 86 | 95.8 | 90.22 | 7.87 |
| May 13th- May 19th | 67 | 74.37 | 78.74 | 8.93 | 80 | 88.8 | 88.6 | 7.31 | ||||
| May 20th- May 26th | 67 | 75.24 | 78.74 | 8.93 | 78 | 87.59 | 88.06 | 6.99 | ||||
| May 27th – Jun 2nd | 61 | 69.66 | 77.42 | 8.13 | 68 | 77.65 | 85.36 | 5.95 | ||||
| Average | 67.5 | 75.705 | 78.85 | 8.9975 | 78 | 87.46 | 88.06 | 7.03 | ||||
| 2 | Florida | 39512223 | 61488 | May 5th -May-12th | 62 | 71.05 | 77.64 | 8.27 | 72 | 82.51 | 86.44 | 7.1 |
| May 13th- May 19th | 77 | 87.7 | 80.94 | 10.27 | 85 | 96.81 | 89.95 | 8.13 | ||||
| May 20th- May 26th | 77 | 85.16 | 80.94 | 10.27 | 85 | 94.01 | 89.95 | 7.67 | ||||
| May 27th – Jun 2nd | 66 | 75.04 | 78.52 | 8.8 | 77 | 87.55 | 87.79 | 6.93 | ||||
| Average | 70.5 | 79.7375 | 79.51 | 9.4025 | 79.75 | 90.22 | 88.5325 | 7.4575 | ||||
| 3 | Michigan | 12671821 | 64821 | May 5th -May-12th | 67 | 74.71 | 78.74 | 8.93 | 78 | 86.98 | 88.06 | 7.05 |
| May 13th- May 19th | 82 | 91.02 | 82.04 | 10.93 | 91 | 101.01 | 91.57 | 8.34 | ||||
| May 20th- May 26th | 79 | 88.24 | 81.38 | 10.53 | 91 | 101.64 | 91.57 | 8.51 | ||||
| May 27th – Jun 2nd | 68 | 76.02 | 78.96 | 9.07 | 80 | 89.44 | 88.6 | 7.35 | ||||
| Average | 74 | 82.5 | 80.28 | 9.87 | 85 | 94.77 | 89.95 | 7.81 | ||||
| 4 | Texas | 8882190 | 72548 | May 5th -May-12th | 86 | 96.15 | 82.92 | 11.47 | 98 | 109.57 | 93.46 | 9.56 |
| May 13th- May 19th | 76 | 83.98 | 80.72 | 10.13 | 90 | 99.45 | 91.3 | 8.45 | ||||
| May 20th- May 26th | 83 | 94.7 | 82.26 | 11.07 | 96 | 109.53 | 92.92 | 9.46 | ||||
| May 27th – Jun 2nd | 76 | 86.34 | 80.72 | 10.13 | 87 | 98.84 | 90.49 | 8.05 | ||||
| Average | 80.25 | 90.29 | 81.66 | 10.7 | 92.75 | 104.35 | 92.04 | 8.88 | ||||
| 5 | Pennsylvania | 19453561 | 78815 | May 5th -May-12th | 83 | 93.13 | 82.26 | 11.07 | 91 | 102.11 | 91.57 | 9.05 |
| May 13th- May 19th | 82 | 91.92 | 82.04 | 10.93 | 97 | 108.73 | 93.19 | 8.96 | ||||
| May 20th- May 26th | 82 | 92.09 | 82.04 | 10.93 | 92 | 103.32 | 91.84 | 8.23 | ||||
| May 27th – Jun 2nd | 86 | 98.81 | 82.92 | 11.47 | 101 | 116.04 | 94.27 | 9.5 | ||||
| Average | 83.25 | 93.99 | 82.32 | 11.1 | 95.25 | 107.55 | 92.72 | 8.94 | ||||
| 6 | Massachusetts | 6892503 | 100419 | May 5th -May-12th | 95 | 105.64 | 84.9 | 12.67 | 106 | 117.87 | 95.62 | 9.69 |
| May 13th- May 19th | 99 | 111.18 | 85.78 | 13.2 | 111 | 124.66 | 96.97 | 10.23 | ||||
| May 20th- May 26th | 97 | 107.38 | 85.34 | 12.93 | 115 | 127.31 | 98.05 | 10.64 | ||||
| May 27th – Jun 2nd | 100 | 110.3 | 86 | 13.33 | 116 | 127.95 | 98.32 | 11.28 | ||||
| Average | 97.75 | 108.63 | 85.51 | 13.03 | 112 | 124.45 | 97.24 | 10.46 | ||||
| 7 | California | 39512223 | 125783 | May 5th -May-12th | 90 | 102.42 | 83.8 | 12 | 102 | 116.08 | 94.54 | 9.75 |
| May 13th- May 19th | 96 | 107.52 | 85.12 | 12.8 | 111 | 124.32 | 96.97 | 10.35 | ||||
| May 20th- May 26th | 92 | 102.03 | 84.24 | 12.27 | 109 | 120.88 | 96.43 | 10.33 | ||||
| May 27th – Jun 2nd | 87 | 98.48 | 83.14 | 11.6 | 104 | 117.72 | 95.08 | 10.35 | ||||
| Average | 91.25 | 102.61 | 84.08 | 12.17 | 106.5 | 119.75 | 95.76 | 10.2 | ||||
| 8 | Illinois | 12671821 | 125915 | May 5th -May-12th | 82 | 92.74 | 82.04 | 10.93 | 98 | 110.84 | 93.46 | 9.03 |
| May 13th- May 19th | 85 | 96.39 | 82.7 | 11.33 | 99 | 112.27 | 93.73 | 9.39 | ||||
| May 20th- May 26th | 96 | 108.58 | 85.12 | 12.8 | 111 | 125.55 | 96.97 | 10.21 | ||||
| May 27th – Jun 2nd | 82 | 92.33 | 82.04 | 10.93 | 95 | 106.97 | 92.65 | 8.99 | ||||
| Average | 86.25 | 97.51 | 82.98 | 11.5 | 100.75 | 113.91 | 94.2 | 9.41 | ||||
| 9 | New Jersey | 8882190 | 163336 | May 5th -May-12th | 90 | 99.27 | 83.8 | 12 | 106 | 116.92 | 95.62 | 10.11 |
| May 13th- May 19th | 88 | 98.56 | 83.36 | 11.73 | 102 | 114.24 | 94.54 | 9.41 | ||||
| May 20th- May 26th | 95 | 104.88 | 84.9 | 12.67 | 106 | 117.02 | 95.62 | 10.29 | ||||
| May 27th – Jun 2nd | 84 | 94.67 | 82.48 | 11.2 | 101 | 113.83 | 94.27 | 10.01 | ||||
| Average | 89.25 | 99.35 | 83.64 | 11.9 | 103.75 | 115.5 | 95.01 | 9.96 | ||||
| 10 | New York | 19453561 | 376208 | May 5th -May-12th | 85 | 96.99 | 82.7 | 11.33 | 100 | 114.11 | 94 | 9.59 |
| May 13th- May 19th | 98 | 108.19 | 85.56 | 13.07 | 112 | 123.65 | 97.24 | 11.03 | ||||
| May 20th- May 26th | 92 | 104.24 | 84.24 | 12.27 | 107 | 121.24 | 95.89 | 10.53 | ||||
| May 27th – Jun 2nd | 93 | 106.58 | 84.46 | 12.4 | 106 | 121.48 | 95.62 | 10.2 | ||||
| Average | 92 | 104 | 84.24 | 12.27 | 106.25 | 120.12 | 95.69 | 10.34 | ||||
These results were compared during different stages of space-time analysis of COVID-19 cases; Data modeling and Data analysis phases as mentioned above. The proposed solution has improved the accuracy of cluster detection by detecting the small-sized clusters and also has reduced the relative risk by using the proportion of positive test in the calculation and by filtering the noise data. All these preprocessing steps have been done before running the 999 Monte Carlo simulation (Data Analysis Phase), and that helped to detect the emerging clusters. Furthermore, it also reduces the processing time in a small difference.
Figure 5 shows the average accuracy based on the number of clusters for each of 20 different datasets using the current Poisson prospective space-time analysis technique versus the Enhanced cluster detection and noise reduction algorithm.
Fig. 5.
The Average accuracy based on the number of clusters for each of 20 different datasets using the current Poisson prospective space-time analysis technique (blue) versus the Enhanced cluster detection and noise reduction algorithm (red).
Figure 6 shows the average Processing time for cluster detection in a minute for each of 20 different datasets using the current Poisson prospective space-time analysis technique versus Enhanced cluster detection and noise reduction algorithm.
Fig. 6.
The average processing time for cluster detection in a minute for each of 20 different datasets using the current poisson prospective space-time analysis technique (blue) versus enhanced cluster detection and noise reduction algorithm (red)
Figure 7 shows the average number of clusters for each of 20 different datasets using the current Poisson prospective space-time analysis technique versus the proposed Enhanced cluster detection and noise reduction algorithm.
Fig. 7.
The average number of clusters for each of 20 different datasets using the current poisson prospective space-time analysis technique (blue) versus the proposed enhanced cluster detection and noise reduction algorithm (red)
Finally, Fig. 8 shows the average Relative Risk for each of 20 different datasets using the current Poisson prospective space-time analysis technique versus the proposed enhanced cluster detection and noise reduction algorithm.
Fig. 8.
The average relative risk for each of 20 different datasets using the current poisson prospective space-time analysis technique (blue) versus the proposed enhanced cluster detection and noise reduction algorithm (red)
The use of the Modified Likelihood ratio function in the data modelling phase and the Modified Relative Risk function during the data analysis phase improved the accuracy of cluster detection. By including the testing rate while calculating the expected number of cases allows the system to detect small-sized clusters, and using the proportion of positive rate removes noisy data and improves the Relative risk of a cluster. In conclusion, the combination of these technologies has greatly improved the cluster detection of COVID-19 cases with an enhanced accuracy of 91.35% and a better processing time of 5.69 minutes overall, which provides great assistance in making better strategies to control COVID-19.
The accuracy is calculated using Eq. 10 [11]:
| 10 |
where
- True Positives=
the number of clusters identified by algorithm; and
- True Negatives=
the number of clusters unidentified by algorithm.
A range of techniques has been implemented for the cluster detection of COVID-19 to achieve the desired accuracy, processing time, and a higher number of clusters. This research has successfully overcome the limitations of the state of art solution with an average accuracy of 91.35% against the current accuracy of 83.32%. This research also reduces the average processing time to 5.69 minutes against the current processing time of 7.36 minutes. Furthermore, the average number of clusters is 53 against 45. Lastly, this research improves the Relative risk to 61.9 against the Current Relative risk of 52.71. The proposed system even has comparatively better accuracy with a higher number of clusters while using lower processing time and improved relative risk in a various complex environment such as more active and emerging diseases hotspot. Table 6 gives a comparison between the state of art solution and the proposed one along with the contributions of the proposed solution. The results of the proposed system show the improvement in accuracy, processing time, number of clusters, and relative risk of the proposed solution compared to the state-or-art in different time windows. With the Modified Likelihood ratio function and the Modified Relative Risk function on the model, the proposed solution provides the average cluster detection accuracy to 91.35% which is 8.03% higher than the current best solution [11], and the average processing time decreases with 5.69 minutes which is 1.67 minutes less than current solutions [11]. Along with the removal of noise data in the data analysis phase, the average relative risk of the clusters increases by 9.19. The number of clusters, accuracy, processing time, and average relative risk is calculated by simulating the analysis in the SaTScan system. We quantify the improvement in cluster detection accuracy, processing time, the number of clusters, and average relative risk by running the state of art and proposed algorithms using the same configuration on the SaTScan.
Table 6.
Comparison table of State-of-art and Proposed solution
| Proposed Solution | State of Art Solution | |
|---|---|---|
| Name of the solution | Enhanced cluster detection and Noise reduction algorithm (ECDeNR) | Cluster detection & characterize emerging clusters based on Poisson prospective space-time Analysis. |
| Accuracy | 91.35% | 83.32% |
| Processing time | 5.69 Minute | 7.36 Minute |
| Average Number of clusters | 53 | 45 |
| Average Relative Risk | 61.9 | 52.71 |
| Proposed equation |
The Modified Likelihood ratio function is given by,
The Modified Relative Risk function for each within-cluster county is
|
The Likelihood ratio function for a given cluster Hohl et al. (2020),
The relative risk function for each within-cluster county is,
|
| Contribution 1 | The Modified Likelihood ratio function helps to add new important features to the system to detect clusters. For this, the Relative positive case count is used to calculate the number of expected cases. It helps to detect the small-sized cluster which increases the accuracy of cluster detection. | The state-of-the-art system does not detect a small cluster with a low number of tests. |
| Contribution 2 | The Modified Relative Risk function is used with the Monte Carlo simulation to detect the secondary cluster. To ensure higher accuracy on the relative risk of each cluster, the proportion of positive test is used, which reduce the noise data of clusters. | The state of art system does not provide a noise reduction process. |
Conclusion and future work
In conclusion, the Poisson prospective space-time analysis is an effective and widely used [16] technique to analyse the disease cases, because it detects the active and emerging clusters. However, it has comparatively low accuracy in detecting the clusters when there is are small number of cases [14]. Cluster detection is done based on the number of confirmed cases and population, so it is missing some other key information such as the number of tests performed. In this research, we improve the cluster detection accuracy, number of clusters, relative risk, and processing time of Poisson prospective space-time analysis. The Modified Likelihood ratio function has been developed which is a completely new feature adapted from second based solutions by Cordes and Castro [3]. This information finds a more accurate number of expected cases by using the Relative positive test count to improve the cluster detection accuracy and introduced a method named Enhanced cluster detection and noise reduction. Along with this, the proportion of positive test variables is another feature adapted from the second-best solution [3] and used in proposed solution. It reduces the noise data of cluster and further enhances the cluster detection accuracy. Therefore, the proposed solution has improved the average accuracy by 8.03%, the average number of clusters by 8, average relative risk by 9.19 and decreased the average processing time by 1.67 minutes. In the future, instead of using 2019 population data from the US Census website, the updated population data of 2020 would be used in the calculation. Furthermore, the use of the arbitrary shaped clusters can effectively represent the affected areas than the circular-shaped clusters. The noise reduction process using the Proportion of positive tests can be further refined. This would lead to potentially more precise clusters detection with less processing time. Furthermore, The experimental analysis of the proposed work can be further improved by analyzing the potential reasons.
Appendix 1 Abbreviations
| COVID-19 | Coronavirus disease of 2019 |
| U. S | United States |
| RAM | Random-access memory |
| CPU | Central processing unit |
| PPR | Proportion of Positive test |
| LR | Likelihood Ratio |
| RC | Relative positive case count |
| ECDeNR | Enhanced Cluster detection and Noise Reduction |
| MRRF | Modified Relative Risk function |
| MLRF | Modified Likelihood ratio function |
Appendix 2 Equations
| Equation | Number |
| The number of expected cases () | (11) |
| Likelihood ratio | (12) |
| the Relative Risk | (13) |
| Relative Positive case count | (14) |
| Modified () | (15) |
| Modified Likelihood ratio | (16) |
| The Proportion of positive test | (17) |
| Modified Relative Risk | (18) |
| the proposed enhanced cluster detection and noise reduction | (19) |
| The accuracy | (20) |
Declarations
Conflicts of interests
None.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Balamchi S, Torabi M. Spatial modeling of repeated events with an application to disease mapping. Spatial Stat. 2020;4:1–16. doi: 10.1016/j.spasta.2020.100425. [DOI] [Google Scholar]
- 2.Chen C-C, Teng Y-C, Lin B-C, Fan IC, Chan T-C. Online platform for applying space–time scan statistics for prospectively detecting emerging hot spots of dengue fever. Int J Health Geogr. 2016;15(1):43. doi: 10.1186/s12942-016-0072-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Cordes J, Castro MC. Spatial analysis of COVID-19 clusters and contextual factors in New York City. Spatial Spatio-Temporal Epidemiol. 2020;34:1–8. doi: 10.1016/j.sste.2020.100355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Corizzo R, Ceci M, Japkowicz N. Anomaly detection and repair for accurate predictions in geo-distributed Big Data. Big Data Res. 2019;16:18–35. doi: 10.1016/j.bdr.2019.04.001. [DOI] [Google Scholar]
- 5.Desjardins MR, Hohl A, Delmelle EM. Rapid surveillance of COVID-19 in the United States using a prospective space-time scan statistic: Detecting and evaluating emerging clusters. Appl Geogr. 2020;118:1–7. doi: 10.1016/j.apgeog.2020.102202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020;20(5):533–534. doi: 10.1016/S1473-3099(20)30120-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Greene SK, Peterson ER, Kapell D, Fine AD, Kulldorff M. Daily reportable disease spatiotemporal cluster detection, New York City, New York, USA, 2014–2015, (in eng) Emerg Infect Dis. 2016;22(10):1808–1812. doi: 10.3201/eid2210.160097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Güemes A, et al. A syndromic surveillance tool to detect anomalous clusters of COVID-19 symptoms in the United States. medRxiv. 2020;20:1–24. doi: 10.1101/2020.08.18.20177295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Guliyev H. Determining the spatial effects of COVID-19 using the spatial panel data model. Spat Stat. 2020;38:1–10. doi: 10.1016/j.spasta.2020.100443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hammad TA, et al. Impact of COVID-19 pandemic on ST-elevation myocardial infarction in a non-COVID-19 epicenter. Catheter Cardiovasc Interv. 2020;22:1–8. doi: 10.1002/ccd.28997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hohl A, Delmelle EM, Desjardins MR, Lan Y. Daily surveillance of COVID-19 using the prospective space-time scan statistic in the United States. Spatial Spatio-Temporal Epidemiol. 2020;34:1–8. doi: 10.1016/j.sste.2020.100354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Huang C, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;39510223:497–506. doi: 10.1016/S0140-6736(20)30183-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Johns Hopkins University. COVID-19: Novel Coronavirus (COVID-19) Cases https://github.com/CSSEGISandData/COVID-19
- 14.Jones RC, Liberatore M, Fernandez JR, Gerber SI. Use of a prospective space-time scan statistic to prioritize shigellosis case investigations in an urban jurisdiction, (in eng) Public Health Rep. 2006;121(2):133–139. doi: 10.1177/003335490612100206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Krivoruchko K, Gribov A. Distance metrics for data interpolation over large areas on Earth’s surface. Spatial Stat. 2020;35:1–27. doi: 10.1016/j.spasta.2019.100396. [DOI] [Google Scholar]
- 16.Kulldorff M. A spatial scan statistic. Commun Stat-Theory Methods. 1997;26(6):1481–1496. doi: 10.1080/03610929708831995. [DOI] [Google Scholar]
- 17.Kulldorff M, Kleinman K. Comments on 'a critical look at prospective surveillance using a scan statistic by T. Correa, M. Costa, and R. Assunção," (in eng) Stat Med. 2015;34(7):1094–1095. doi: 10.1002/sim.6430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kulldorff M, Athas WF, Feurer EJ, Miller BA, Key CR. Evaluating cluster alarms: a space-time scan statistic and brain cancer in Los Alamos, New Mexico. Am J Public Health. 1998;88(9):1377–1380. doi: 10.2105/AJPH.88.9.1377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lakhani A. Which Melbourne metropolitan areas are vulnerable to COVID-19 based on age, disability, and access to health services? Using spatial analysis to identify service gaps and inform delivery. J Pain Symptom Manag. 2020;60(1):41–44. doi: 10.1016/j.jpainsymman.2020.03.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lansiaux É, Pébaÿ PP, Picard J-L, Forget J. Covid-19 and vit-d: Disease mortality negatively correlates with sunlight exposure. Spatial Spatio-Temporal Epidemiol. 2020;35:1–5. doi: 10.1016/j.sste.2020.100362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Leevy JL, Khoshgoftaar TM, Bauder RA, Seliya N. Investigating the relationship between time and predictive model maintenance. J Big Data. 2020;7(1):1–19. doi: 10.1186/s40537-020-00312-x. [DOI] [Google Scholar]
- 22.Mahase E. Coronavirus: covid-19 has killed more people than SARS and MERS combined, despite lower case fatality rate. BMJ. 2020;368:641–642. doi: 10.1136/bmj.m641. [DOI] [PubMed] [Google Scholar]
- 23.Mollalo A, Vahedi B, Rivera KM. GIS-based spatial modeling of COVID-19 incidence rate in the continental United States. Sci Total Environ. 2020;728:1–8. doi: 10.1016/j.scitotenv.2020.138884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Mulatti P, et al. Retrospective space–time analysis methods to support West Nile virus surveillance activities. Epidemiol Infect. 2015;143(1):202–213. doi: 10.1017/S0950268814000442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Neill DB, Moore AW, Sabhnani M, and Daniel K (2005) Detection of emerging space-time clusters, In Presented at the proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. [Online]. Available: 10.1145/1081870.1081897
- 26.Robertson C, Nelson TA, MacNab YC, Lawson AB. Review of methods for space–time disease surveillance. Spatial Spatio-Temporal Epidemiol. 2010;1(2):105–116. doi: 10.1016/j.sste.2009.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Rongyao H, Gan Jiangzhang, Zhu Xiaofeng, Liu Tong, Shi Xiaoshuang. Multi-task multi-modality SVM for early COVID-19 diagnosis using chest CT data. Inform Process Manag. 2022;59(1):102782. doi: 10.1016/j.ipm.2021.102782. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Ruan Q, Yang K, Wang W, Jiang L, Song J. Clinical predictors of mortality due to COVID-19 based on an analysis of data of 150 patients from Wuhan, China. Intensive Care Med. 2020;46(5):846–848. doi: 10.1007/s00134-020-05991-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Saeed TU, Nateghi R, Hall T, Waldorf BS. Statistical analysis of area-wide alcohol-related driving crashes: a spatial econometric approach. Geogr Anal. 2020;52(3):394–417. doi: 10.1111/gean.12216. [DOI] [Google Scholar]
- 30.Tango T, Takahashi K. A flexibly shaped spatial scan statistic for detecting clusters. Int J Health Geogr. 2005;4(1):1–15. doi: 10.1186/1476-072X-4-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.U. S. C. Bureau (2010-2019) Index of /programs-surveys/popest/datasets/2010-2019/counties/totals. https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/










