Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2020 Aug 29;96:106692. doi: 10.1016/j.asoc.2020.106692

A data-driven understanding of COVID-19 dynamics using sequential genetic algorithm based probabilistic cellular automata

Sayantari Ghosh a, Saumik Bhattacharya b,
PMCID: PMC7455552  PMID: 32904415

Abstract

COVID-19 pandemic is severely impacting the lives of billions across the globe. Even after taking massive protective measures like nation-wide lockdowns, discontinuation of international flight services, rigorous testing etc., the infection spreading is still growing steadily, causing thousands of deaths and serious socio-economic crisis. Thus, the identification of the major factors of this infection spreading dynamics is becoming crucial to minimize impact and lifetime of COVID-19 and any future pandemic. In this work, a probabilistic cellular automata based method has been employed to model the infection dynamics for a significant number of different countries. This study proposes that for an accurate data-driven modelling of this infection spread, cellular automata provides an excellent platform, with a sequential genetic algorithm for efficiently estimating the parameters of the dynamics. To the best of our knowledge, this is the first attempt to understand and interpret COVID-19 data using optimized cellular automata, through genetic algorithm. It has been demonstrated that the proposed methodology can be flexible and robust at the same time, and can be used to model the daily active cases, total number of infected people and total death cases through systematic parameter estimation. Elaborate analyses for COVID-19 statistics of forty countries from different continents have been performed, with markedly divergent time evolution of the infection spreading because of demographic and socioeconomic factors. The substantial predictive power of this model has been established with conclusions on the key players in this pandemic dynamics.

Keywords: Epidemiological model, Probabilistic cellular automata, Genetic algorithm, Real data modelling

1. Introduction

With its outbreak in Wuhan, China, Coronavirus disease-2019 (COVID-19) has spread across the world within a few months. Due to its explosive growth and considerable rate of fatality, World Health Organization (WHO) declared COVID-19 as a pandemic and a global health emergency [1]. According to the available statistics in June, 2020, the total number of infections by SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2), the causative agent of this disease, is approaching 19 million around the world, causing around 700,000 deaths in 213 countries and territories, with no effective vaccination available in the market so far. Beyond respiratory discomforts including pneumonia, dry cough, cold and sneezing [2], [3], it has been reported to cause liver and gastrointestinal tract maladies, kidney dysfunction and heart inflammation, in cases of severe infection [4], [5], [6]. This highly infectious disease transmits from person-to-person through respiratory droplets produced by infected person. Fomite-mediated and nosocomially acquired infections are also being identified as important sources of viral diffusion [7], [8], [9]. A typical incubation time from exposure to symptoms has been reported for COVID-19, while infection transmission from asymptomatic individuals has been observed as well [10], [11], [12].

Immediately after the detection of human-to-human transmission, the government agencies of various countries started implementing several mitigation strategies to control the epidemic. The measures thus taken include social distancing, restrictions on domestic as well as international travel, cancelling social events, shutting down of public as well as commercial activities etc. which can effectively reduce the possibilities of physical human contact. Moreover, contact tracing, aggressive testing as well as hospital or home quarantine for infected individuals and suspected cases have also been executed to track and prevent further spread. However, these strategies are directly contributing to enormous economical loss. The optimum estimation of this novel disease dynamics is emerging out as a challenging problem in this context. The immense disruption caused by COVID-19, resulting into overwhelming disorder in the health, economy and lives of billions of people around the globe, has brought the necessity for accurate modelling of infectious diseases into the focus. The effect and effectiveness of this complex interplay between differing length-scales and time-scales with the applied control strategies can only be understood and predicted with the help of precisely designed quantitative models.

1.1. Models for understanding COVID-19 statistics

With a tremendous effort from researchers around the world, a spectrum of various mathematical and computational approaches is being used to understand and predict COVID-19 statistics, addressing its different perspectives. On a rudimentary sense, the studies being pursued can be segmented in two categories: (i) data science and machine learning approaches and (ii) differential equation based mathematical modelling techniques. The first group of studies trusted mostly on data mining from national/international repositories (e.g., WHO, country specific data centres etc.) or popular social media platforms to forecast the active cases and mortality data [13], [14], [15], [16], [17]. The major goal of these studies are to estimate and predict the time evolution of the disease using specific computational concepts, like Monte Carlo decision making, fuzzy rule induction, deep learning etc [18], [19], [20], [21], [22]. Some of these studies also explored impact of disease control interventions, like, travel restrictions [23], patient quarantining and isolation [24], medical facilities [25], social distancing and administrative responsibility [15] on epidemic spreading rate. Though these models are quite effective, being entirely dependent on data, the efficiency of these studies can be heavily inclined towards the data quality. As comprehensively reviewed by [26], several data-dependent models are prone to suffer from high risk of bias, which is very much probable for imprecise short time series data.

With the evidence of giving effective predictions for past pandemics [27], [28], [29], the traditional approaches of the mathematical theory of epidemiological dynamics also have driven several researchers to study COVID-19 dynamics. Theoretical modelling based approaches have been long associated to understand and predict the outbreak probabilities and seriousness of a disease, and provide key information to control the intensity [30], [31], [32], [33]. Most of the mathematical models that are being used to investigate the COVID-19 dynamics [34], [35], [36], [37] are based on variants of classical deterministic model of susceptible-infectious-recovered (SIR) that was introduced by Kermack and McKendrick [38]. Constituting a set of nonlinear ordinary differential equations (ODE), the SIR model compartmentalizes the population where susceptible subpopulation declines over time, constantly getting infected (by infectious subpopulation), and then recovered from (and gaining immunity to) the disease over time. Being powerful and computationally favourable tool to analyse epidemic, variants of this methodology are common in understanding real epidemic data [39], [40]. Though these models capture the disease transmission dynamics, being deterministic, they suffer from the assumption of homogeneous mixing, forgoing the spatial information.

For modelling real-world dynamics of a disease that spreads from close-contacts only, the tool needs to accommodate neighbourhood information. Moreover, the platform requires to take into account of stochasticity of real dynamics, spatial infection spread and inherent heterogeneity in population, which are some major limitations of the mentioned works. Thus, the identification of research gap points out in a direction of designing a methodology that addresses the above mentioned issues to understand and predict neighbourhood-dependent person-to-person probabilistic transmission of COVID-19, that should be powered with extensive computational tools for parametric optimization.

1.2. Motivation and contributions

In this study, we propose probabilistic cellular automata based dynamical model, optimized through sequential genetic algorithm for an accurate assessment of the extent of COVID-19 dynamics. The major motivation of using cellular automata (CA) is its ability in depicting extremely complex macroscopic outcomes, while being based on local interactions that trusts on the interaction of a multitude of single individuals [41], [42]. This methodology is capable of giving a direct correspondence to the physical system and also rectifies the major drawbacks of ODE models by (i) tracking individual contact processes, (ii) giving room for introducing probabilistic individual behaviour, and (iii) capturing neighbourhood as well as global spatial information. Because of these reasons, CA based approaches have been successfully used as a competent substitute method to simulate physical, biological, environmental and social contagion-like spreading [43], [44], [45], [46]. For studying past epidemics as well as interpreting COVID-19, some studies have proposed cellular automata as an alternative method [47], [48], [49], [50]. However, to capture and interpret the behaviour of real data through CA needs a large-scale parameter optimization that could be time consuming as well as sub-optimal. Thus, though being extremely flexible and powerful, CA has not been yet optimized to understand and interpret COVID-19 data for countries worldwide. To explore this, in this study, genetic algorithm (GA) has been employed, which is a well-known method for generating the optimal parameter subset through stochastic search procedures based on the principle of the survival of the fittest [51], [52], [53], [54], [55]. Cross-over and mutations, two key properties of genetic algorithm help to optimize the parameter set efficiently in limited steps. Cellular automata coupled with genetic algorithm has been used before to explore evolutionary aspects of game theoretical problems [56], but to the best of our knowledge analysing and developing understanding from real pandemic data like COVID-19 using optimized CA platform has not been attempted yet. The main contributions of this work are as follows:

  • To build a CA model which is probabilistic, so that it can take into account of demographic variations, neighbourhood diversity and uncertainties of real dynamics.

  • To create an easily implementable framework where optimization using GA will be done sequentially for all parameters associated with the transition rules of the CA model for real data interpretation.

  • To interpret and understand COVID-19 disease transmission dynamics with an optimized CA framework, which can be extended for prediction as well.

Through this, on one hand, one can track the individual contact process through time and space; on the other hand, a self-adapting process of evolutionary strategies has been created by designing the chromosome with parametric genes and establishing fitness function that maximizes over the generations. The main limitations of the state-of-the-art algorithms and the major contributions of the proposed method are listed in Table 1 for a clear understanding. The main rationality behind this approach is that it is extremely difficult to find the optimal parameter of the complex spatial epidemiological model using random search or analytical techniques. The proposed GA based framework helps to search the parameter space more efficiently for the optimal performance of the entire algorithm.

Table 1.

Comparison of the proposed method with the state-of-the-art COVID-19 models.

Basic
methodology
Differential equation models Data science approaches
References [33], [34], [35], [36], [37], [39], [40] [13], [14], [15], [16], [17], [18], [19], [20], [21], [22]

Limitations a) Homogeneous Mixing
b) Most models are considered
as deterministic
(a) No way to track person
to person transmission.
(b) No neighbourhood
consideration.

Contribution Proposed method,
(a) accommodates heterogeneity in population
(b) includes stochasticity and probabilistic dynamics
(c) estimates optimum epidemic dynamics parameters.
(d) considers neighbourhood and demography explicitly.
(e) performs robust prediction with limited data.

The rest of this article is organized as follows: Section 2 includes the proposed concepts of epidemiological model, probabilistic cellular automata and the sequential genetic algorithm used in this work. In Section 3, the results has been elaborately discussed where the optimized CA model has been employed for simultaneously understanding as well as analysing active infections, total infections and total death caused by COVID-19 for several countries, considering the demographic and spatial population density variations. Section 4 is comprised of concluding remarks.

2. Proposed methodology

An object process diagram of the proposed method has been depicted in Fig. 1(a). The methodology starts with the infection spreads following the SEIQR epidemiological model in a random human population over a 2D grid, initialized on a country-specific basis. The parameters of the epidemiological model is continuously optimized using proposed sequential genetic algorithm to match the real country-specific infection spread data. The proposed methodology is consisted of three distinct parts (A) epidemiological model that governs the infection spreading, (B) probabilistic cellular automata (PCA) to model the dynamics of the pandemic spread and (C) optimization of the parameters associated with PCA using genetic algorithm (GA) to fit real-world data.

Fig. 1.

Fig. 1

An overview of the dynamics: (a) Object process diagram of the proposed model; (b) The schematic diagram of the disease transmission dynamics in form of a modified SEIQR model. Transition probabilities pse, pei, piq, pir and pqr are pointed out. The associated state transition delays are indicated on the timeline of the disease dynamics. (c) Time evolution of the spatial lattice during spread of the infection in a population. The colours of the respective subpopulations, (i.e., susceptible, exposed, infected, quarantined and removed) are same as depicted in (a).

2.1. Epidemiological model

In the epidemiological model, the entire population is partitioned in five distinct parts. At the very beginning, every person was healthy but they are vulnerable to the infection. These people are denoted as susceptible (S) subpopulation. At time instance t=0, some people in the population got exposed to the infection from some known or unknown source. These exposed people do not have any particular symptom of the infection, but they can spread the infection to the susceptible people. These asymptomatic people are referred as exposed (E) subpopulation. At time instance t=0, there were also some people who had clear symptoms of the infection and they also had the potential to spread the infection among susceptible people. This symptomatic people are considered as infected (I) subpopulation. After an incubation period, some of the exposed people show the symptoms of the infection and they move to subpopulation I. Because of the health facilities and testing time, the infected people are detected with some average delay, and put to quarantine. The people who are quarantined cannot spread the infection to other people, though they themselves remain in the infectious stage. These people are denoted as quarantined (Q) subpopulation. Both the quarantined people and the infected (but not detected) people would come out of the infectious stage eventually, and after that they no longer contribute in the infection spreading dynamics. These people are denoted as removed (R) subpopulation in the model. This removed subpopulation contains two kinds of people one who have recovered from the infection completely and they neither infect nor get infected in future, and the other kind of people who have died due to the severity of the infection. Schematic diagram related to the transitions, probabilities and timelines corresponding to the dynamics of infection are shown in Fig. 1(b). In the analysis, normalized subpopulations have been considered, and the respective normalized subpopulation is denoted using the same lowercase character. For example, the normalized susceptible and infected subpopulations are denoted by s and i respectively. As shown in Fig. 1(c), this epidemiological time evolution has been implemented on a 2D lattice using PCA as discussed below.

2.2. Probabilistic cellular automata

Let L be a finite subset of Z2 at time instance t, denoted as LZ2 which defines a regular 2D lattice. Every point on this lattice xL can acquire finite number of states A. In this particular problem, the set A can be defined as A={0,s,e,i,q,r}, where the terms s, e, i, q and r denote the particular possible states of infection as discussed in Section 2.1, and 0 denotes no human occupant or an empty space. At time t=0, ni0 points are randomly selected on L and assign the state ai where iA. The total initial population is defined as N=iA0ni0. At any instance of time t, nit, iA0 denotes the total number of the people in respective state ai.

For neighbourhood criteria, modified-Moore neighbourhood or d-neighbourhood has been used. A finite subset ΩdZ2 is defined, containing the origin 0=(0,0), and the cardinality of Ωd is 4d(d+1). General probabilistic cellular automata (PCA) is a stochastic process that describes sequence of mappings Λta:La, aA, where any particular state Λta(x) of xL at a particular time instance t is dependent on the previous states of the d-neighbourhood of x, denoted as x+Ωd={x+ω:ωΩd} with certain probabilities. More precisely, in COVID-19 infection spread, ΛtE(x) will be decided by Λt1(x+ω), ωΩd. The other mappings Λta(x), aAE, depends on the sequence of states Λκa(x), 0κ<t.

2.2.1. Transitional probabilities

The transition probability paiajt denotes the probability of transition at time t from state ai to state aj, where ai,ajA. Without any loss of generality, paiajt is denoted as pijt and transition from state ai to aj as aij in the rest of the discussion for a simpler notation. In cases, where aiaj, pijt is referred as state transitional probability, and if ai=aj, piit is called as self transitional probability.

If a state transition aij, ij, happens in x at time t following the transition probability pijt and the transition state aij has a transitional delay τij, then

pijt=0if t<tui+τijpijif ttui+τij

where tui is the time instance when transition aui, ui happened. In this infection diffusion model, only the state transitional probabilities pset, peit, piqt, pqrt and pirt are considered to be nonzero at certain instance of time, and for all the other transitional probabilities, τij is set to infinity, where pij and τij are user defined parameters. However, for the transition ase, tui and τij are set to zero, and for xL, let us define pset=pij=1psst and the self-transition probability psst=(1pi)it1(1pe)et1 where it1 and et1 are the number of cells in states i and e respectively in the Ωd neighbourhood of x at time t1. The probabilities pe and pi are defined as ‘infection probabilities’ which can be considered as the probabilities that a susceptible person become exposed to the infection when that person meets an exposed or an infected person respectively.

An empty cell does not contribute in the infection spread, and thus, self transitional probability p00t=1, t. Among the total removed population rt at time instance t, a population fraction pβrt is considered that recover from the infection at time instance t and acquire long-term immunity towards the disease, and a population fraction (1pβ)rt is considered to be deceased. The removed population rt is not considered further in the infection dynamics and it is taken that prrt=1, t>t.

2.3. Parameter optimization using GA

Though PCA has potential to model the probabilistic transition of states on a spatial lattice, the main challenge to use it for modelling a real-world scenario is to find out the optimal parameters for the PCA. As the searching space for the proposed PCA model is very large, it is practically impossible to search for the optimal parameter setting manually to analyse the characteristics of the infection spread from a real data. Thus, genetic algorithm (GA) has been applied to find out the optimal parameter set given a real time-series data.

Let us assume a discrete time signal y[n], 0n(T1) associated with the real world infection spread. The PCA model is denoted by G(Θ), where Θ=[θ1,θ2θh] denotes the set of parameters used for the PCA model. If yˆ[n], 0n(T1) is the time evolution of the desired variable in the model G(Θ), then the objective is to find an optimal parameter set Θ such that yˆ[n]y[n], n. To apply GA, each θi, 1ih, is encoded as a string of binary digits bi [54], [55] assuming the θi has a bound |θi|<ζi, 1ih. This binary string is referred as gene, and the concatenated genes in the order of the appearance of respective θi in Θ is called the chromosome. For example, if B is the chromosome corresponding to parameter set Θ, G(B) is equivalent to G(Θ). A collection of Ng number of chromosomes of estimated parameters, often referred as gene pool, are evaluated at every time step (called as generation). In our work, the error of each chromosome has been evaluated using l1 norm distance. At ith generation, the error of the jth chromosome Bji is computed as

eji=yyˆji1=n=0T1|y[n]yˆji[n]|

where yˆji is the estimated output of G(Bji) in the vector form and yˆji[n] is the value of yˆji at time instance ‘n’. At each generation, GA finds out min(eji), j and tries to make eji0 as i. In the proposed framework, some of the parameters are related to probabilities having a range 0 to 1, and some of the parameters are associated with time (in days) which are discrete integers, and greater than or equal to zero in our case. Thus, the parameters are initialized randomly keeping their domain restrictions intact.

For mating, two chromosomes, often referred as parents, are selected from the gene pool considering their ‘fitness’. Among two selected parents, a crossover point or a splice point is selected at bi, 1ih in both chromosomes and a crossover [55] happens that produces two offsprings. In our approach, fitness fji of each chromosome has been defined as the inverse of their respective errors at a particular generation. At each generation, F number of best chromosomes are selected from the gene pool having the maximum fitness for mating. Following the idea of [52], ρF number of parents are kept to the next generation along with the new chromosomes to ensure that the error in the next generation is always less than or equal to the current generation. Selecting ρF number of chromosomes from the parents, NgρF number of children are produced from mating to keep the size of the gene pool constant. After the offsprings are generated, in the parameter space, s genes are randomly selected and small perturbations are added individually to mimic mutation.

As shown by several researchers [57], the homogeneity in the gene pool increases with the generations, and as the perturbations due to mutation are typically small, the reduction of error becomes a problem after a few generations. Thus, to restrict homogeneity in the gene pool, a small number of offsprings μ are selected from the total NgρF number of generated offsprings, and replaced them with randomly generated chromosomes to maintain diversity. This step is called as ‘diversification’ of gene pool.

In our problem, the parameters Θ of the PCA model G(Θ) are the state transitional probabilities pei, piq, pir, pqr, infection probabilities pe and pi, state transition delays τei, τiq, τqr, τir, neighbourhood d, and death probability pβ as mentioned in Section 2.2. As optimizing these many parameters simultaneously might be challenging and require huge amount of resources, we propose a variant of GA with sequential evolution mechanism where instead of optimizing the solutions simultaneously, the parameters are optimized sequentially. Let us define a set of generations as an era. For the first era containing a small number of generations, a traditional GA methodology is followed as discussed this far to have a set of initial parameters. From the next era onward, two parameters are fixed and optimized sequentially in that era. Mutation and crossover are restricted to those two respective genes, whereas parent selection is done based on the performances of the entire chromosomes. This newly proposed sequential optimization of parameters of PCA using GA is defined as PCA-GA. The proposed approach can optimize a large number of parameters using limited resources efficiently. All the notations used in PCA-GA are briefly summarized in Table 2.

Table 2.

Descriptions of the parameters used in the proposed work.

Notation Description
L Spatial lattice
A Set of possible states on lattice
A0 Set of epidemiological states
nit Total number of people at state ai at time t
Ωd d-neighbourhood of xL
Λta Mapping La at time t
pijt Probability at time t that xL moves from ai to aj
τij Transitional delay for x to move from ai to aj
et, it Number of exposed and infected people in the
d-neighbourhood of x at time t
pe, pi Probabilities that an exposed or an infected person spreads
the infection to a susceptible person when they meet
Θ A gene containing all the parameters of PCA method
B Binary encoded representation of Θ
G(Θ) The PCA model with parameter Θ
y Time series of an epidemiological state in a country
yˆ Time series estimate of epidemiological state from PCA
eji Estimation error of jth gene in ith generation
Ng Total number of chromosome in genepool
F Number of parents selected for mating from Ng
pβ Fraction of rt that recovers from the disease
ρ Fraction of parents F that lives in the next generation

Proposed PCA-GA has a complexity which can be approximated as O(NgTgO(f)) where Ng is the number of population, Tg is the total generation and O(f) is the complexity to measure the fitness in the GA. For a large enough Ng, Tg is considered as a comparatively smaller constant and thus, the complexity of the entire algorithm is mainly governed by Ng and O(f). The complexity of estimating the fitness can be approximated as O(f)=O(T+8NτT) for Moore neighbourhood criteria, where N is the total population on the 2D grid. The length of the original time series data T, and τ, the maximum of τij, are both constant, and thus O(f) can be represented as O(N).

Though GA has been selected as a strategy to optimize the parameters of the proposed PCA model, it is evident that because of the generalized construction of the proposed framework, other meta-heuristic methods could also be employed to search the parameters of the spatially driven SEIQR model which is the main focus of this work. However, presence of mutation and diversification in GA help to search for better solutions as the search space is extremely large.

3. Results

To validate the effectiveness of the proposed framework, using PCA-GA, the actual statistics of COVID-19 spreads till 20th June, 2020 in different countries is used. For finalizing the data-set from available data of 213 countries, several aspects have been considered. At first, 102 countries had been dropped due to less number of reported cases (less than 1000 reported cases till 20th June 2020). Out of the remaining countries, some countries, like Iran, Greece, Paraguay etc., are removed due to data inconsistency, and finally 40 countries are randomly selected ensuring the following points:

  • At least 2 countries from each continent got selected to maintain demographic diversity in our data.

  • Care has been taken to maintain significant variation in population density, which we believe as a major factor contributing in disease transmission.

  • It was ensured that countries from three distinct stages of COVID-19 infection are considered: (i) where the infection is significantly diminished, (ii) where the peak infection has been reached but substantial infection still persists, and (iii) where consistent growth in infection is occurring.

With these widely variant spectrum of time series data, we proceed for quantitative calibration and interpretation through the proposed methodology. All data samples are taken from the website worldometers.info.1

To point out the major contributing factors in dynamics of infection spread, for every country under consideration, three available time series, namely daily active cases, total number of infected cases and total number of deaths are accumulated. Out of these three series, the daily active cases time series is used for model formulation, and the rest are considered for model validation. It is important to mention that the population qt is the relevant observable here, as infected people as it and et remain latent and undetected in the population. The reported daily active case data is associated with lifetime of the infection, and are used in this study to check the effectiveness of the proposed framework as follows. By applying PCA-GA on the daily active case data of a particular country, the parameters Θ that gives the minimum l1 error is extracted. For validation of the optimized parameters and understanding the robustness of the algorithm, results generated by using G(Θ) for the total infected states and deceased states are then compared with the real-world data. Here it must be noted that the optimal parameters Θ remain unaltered and no further optimization is performed.

3.1. Experimental setup

For all the simulations, PCA is initialized with a fixed lattice size of 100 × 100 with ne=50 and ni=4. The population nq and nr are set to zero at t=0. The susceptible population ns has been initiated depending on the population density of a country as follows: among the countries considered in our study, for the country with lowest population density (Canada), ns=2500 has been selected, and for the country with highest population density (Singapore), ns=6000 has been fixed. For any other country, ns has been assigned within this range using logarithmic scaling based on the population of that country. As each of the parameters of PCA-GA has physical relevance, the sequential searching process has been initiated by following restrictions of ranges. It is important to note that in our problem, genes associated with probabilities are initiated in the range [0,1] and clipped during the optimization process accordingly. The state transition delays τei (incubation period) and τiq (testing delay) are considered to be within the range (0,30). The transition delay τir and τqr (corresponding recovery periods) are initialized in the range (20,100). All the simulations are executed in a system with Intel Core i7 8700K processor, 64 GB RAM and 8 GB NVIDIA GeForce RTX 2080 8 GB GPU using Python and numpy packages.

3.2. Estimation of parameters using active cases

The daily active cases can be defined as the ct=ct1+qtrt where ct is the number of active cases at time instance t having the initialization c0=0. In Fig. 2, the active cases of 20 different countries are shown along with the respective estimated active cases using PCA-GA model. For the countries shown in Fig. 2, the first peak of the infection is already crossed and a steady fall in the infection spread is observed. It can also be seen that some of the active cases of the countries like China, Israel, Switzerland, follow smooth bell-shaped curves, whereas for some countries, like Australia, Cyprus, Hungary etc., the times series data deviates from bell-shaped curves with substantial degree of noises. In all the cases, PCA-GA has successfully captures the trend of the time series data estimating the parameters of the epidemiological process. To measure the goodness of the model estimation, three different metrics has been used to measure the quality of the estimated values. The root mean square (RMSE) distance, correlation distance and chi-square distance [58], [59], [60], denoted as dl, dc and dχ respectively, are computed between the real data and the estimated values from the PCA-GA model to evaluate the effectiveness of the optimized model. For two vectors u and v, we define

dl=1Ti=1T(uivi)2,dc=1(uu¯).(vv¯)(uu¯)2(vv¯)2,dχ=i=0T(uivi)2vi

where T is the length of each vector, ui and vi are the ith elements of u and v respectively and (.) denotes dot product of two vectors. As shown in Fig. 3(a), the proposed model performs well in modelling the real data. When evaluated over all the countries considered in this work, the proposed model fits the data well, and for only 0% −12.5% cases the fittings were poor depending on the evaluation metric. It is important to mention that all the distance measures are evaluated on normalized data.

Fig. 2.

Fig. 2

Time series data for active cases (blue) of COVID-19 pandemic in different countries where the peaks of the infection spread of the first wave have been passed, and estimated active cases (red) from proposed PCA-GA method.

Fig. 3.

Fig. 3

Parameter estimations and goodness of model estimation: (a) RMSE, Correlation and χ2 distance, dl, dc and dχ for all 40 countries considered in this work in terms of goodness of agreement with model estimations shown in percentage. The colours green, orange and red signify level of agreement. Values between (0:0.05) for dl, (0:0.01) for dc and (0:1) for dχ are considered as good (green). Values between (0.05:0.08) for dl, (0.01:0.1) for dc and (1:3) for dχ are considered as moderate (orange). Values above moderate are considered as poor (red). For all three metrics 6575% countries have shown good agreement with model estimation; (b) and (c) represent boxplot for the best-fit parameters of state transition probabilities and state transitional delays respectively, for all the 20 countries shown in Fig. 2. The height of the boxplots represents the interquartile range (IQR). The dark line inside the box represents the median. The lower and upper whisker extend to the lowest and highest values within 1.5 IQR of the first and third quartile, respectively.

In Fig. 2, an interesting point to notice is that the peak of the active cases are located at markedly differing time instances, and the other properties, like variance, skewness etc., of the observed distributions are also varying drastically. The fundamental differences between the fitted curves are quantified with the help of boxplot of the parameters in Fig. 3(b)–(c) by analysing basic statistical properties. The reported boxplots are specifically for the countries selected in Fig. 2. It can be noted that pe, pi and pei exhibit a wide variability in Fig. 3(b). During our analysis, a strong positive correlation with population density for pe and pi has been also observed. This can be thus inferred that the variation in population density in the considered countries causes the wide range of these parameters. It can be also concluded that high density of population increases the probability of transmission of the disease. The considerable difference in the mean magnitudes of the infection associated probabilities (pe, pi and pei) and recovery-related probabilities (piq, pir and pqr) indicate the sharper rise and slower fall of active cases curves, which results into a skewed distribution in most of the cases (see Fig. 2). In Fig. 3(c), it is also shown that τei, which is identified as the incubation time in the model, exhibits a range of 3–14 days with a mean at 7.3, which perfectly aligns with the observed cases all around the world [61]. In this figure, a wide variability in the range of τir and τqr is observed, which points out the substantial difference in health infrastructure of these countries.

Here it must be mentioned that, while performing this statistical analysis with all 40 countries, some countries were detected showing consistent outliers (not included in Fig. 3(b)–(c)) in terms of four transitional parameters: pir. pqr, τir and τqr. While analysing the active case distributions of these outliers, it was found out that the time series data for all these countries have a saturating trend where the daily active cases do not show an average descent with time. Some of such cases are shown in Fig. 4. Even for these data which have drastically different qualitative trend compared to countries shown in Fig. 2, the proposed PCA-GA framework has successfully captured the trend of the real time series data accurately.

Fig. 4.

Fig. 4

Time series data for active cases (blue) of COVID-19 pandemic in different countries where the cases are saturating, and estimated active cases (red) from proposed PCA-GA method.

There are also certain countries, like India, Brazil, Chile, Mexico, etc., for which the infection spreading started later than the countries like China or Italy, and the active daily cases are still growing almost exponentially. As shown in Fig. 5, PCA-GA is able to estimate the time series data for these countries where the infection is spreading rapidly. Dynamics of COVID-19 spread in these countries are of particular interest as the prediction of the peak positions in these countries might help immensely to understand the maximum socioeconomic impact of the disease at a time in that geographical location.

Fig. 5.

Fig. 5

Time series data for active cases (blue) of COVID-19 pandemic in different countries where the cases are increasing exponentially, and estimated active cases (red) from proposed PCA-GA method.

3.3. Validation of the proposed model

While analysing a complex dynamics like the spread of a pandemic, it is not always sufficient to model the input real data only. It is required that the optimized model should be robust and can provide meaningful interpretations without further retraining or parameter tuning for real-world applications. To validate the robustness and the effectiveness of the proposed algorithm, the optimized model is now employed for three different tasks. At first, the robustness of the optimized model is checked by estimating the total number of infected cases, followed by total number of death cases without any further training, tuning or supervision. Finally, to further validate the efficiency of the model, its performance has been evaluated for the prediction task by training the model with partitioned data and evaluating on its future predictions without any further optimization.

3.3.1. Total number of infected

The total number of infected cases zt at time instance ‘t’ can defined as zt=i=0tqi. This cumulative sum indicates the total number of people who suffered from the disease at any point of time. For a country, where the first wave of the infection has passed, e.g., Croatia, Italy, etc., zt follows a sigmoid function approximately, whereas for the countries like India, Mexico etc., where the infection has not reached the peak, zt follows an exponential function. As PCA-GA is optimized using the time series information of daily active cases ct, zt is used to validate the parameters learnt by the sequential GA framework in the following way. Once a particular country is selected, Θ is estimated using PCA-GA with the actual ct. Next the zˆt for G(Θ) is calculated without any further fine-tuning of the parameters, and compared zˆt with actual zt. In Fig. 6, the total cases (blue) of six such countries are shown along with the best-fit results obtained from PCA-GA (red) which depict an excellent agreement with the data. It must be mentioned that for all three dynamical stages of infection spreading as discussed in Section 3.2, i.e., where the first wave of infection has passed, where the active cases are almost saturated currently or where the active cases are increasing rapidly, our estimated zˆt closely matches zt without any further parameter optimization. When evaluated over all 40 countries for the number of infected people, the proposed method gives average dl, average dc and average dχ as 0.037,0.006 and 0.53 respectively, which exhibits the robustness of the model.

Fig. 6.

Fig. 6

Total infected cases (blue) of COVID-19 pandemic in different countries, and estimated total cases (red) from proposed PCA-GA method.

3.3.2. Total death cases

To further validate the ‘goodness’ of the estimated parameters, the parameter set Θ optimized over the daily active cases of a particular country is taken and the identical parameter values are used to compare the estimated total deaths with the actual total deaths of that country. Death in the population is the prime concern in case of the COVID-19 pandemic, and as mentioned in Section 2.2.1, daily deceased population is a fraction of rt in our model. So, the total estimated death cases can be defined as dˆt=(1pβ)i=0tri where pβ and ri for 0it are given by Θ and G(Θ) respectively. Fig. 7 demonstrates the comparison of the actual total death cases dt with estimated total death cases dˆt for Θ, the identical set of parameters used for estimating active cases as well as total cases previously. The same countries shown in Fig. 6 have been selected to show the robustness of the estimated parameter Θ using the proposed technique. Excellent agreement with data has been found for this case as well; when evaluated over all 40 countries for the total number of death cases, the proposed method gives average dl, average dc and average dχ as 0.041,0.006 and 0.48 respectively.

Fig. 7.

Fig. 7

Total deaths (blue) of COVID-19 pandemic in different countries, and estimated total deaths (red) from proposed PCA-GA method.

3.3.3. Prediction related to infection spread

Prediction of future events is always challenging in data modelling [62]. For the final stage of validation of the methodology, the predictive power of the model has been tested. As the impacts of this pandemic becomes far reaching as the socioeconomic contexts vary, a considerably accurate prediction about the dynamics of the infection spread can be crucial and useful in many ways. As PCA-GA successfully estimates the optimal parameter Θ, the set of parameters can also be utilized to predict the future course of the infection in that country.

To validate the capacity of the prediction strategy, the daily active cases of a country ct is truncated to cP keeping the first ‘P’ values. PCA-GA is applied on cP to estimate the parameters ΘP. Then ΘP is used to predict the daily active cases cˆt. As shown in Fig. 8, for two countries Israel and Switzerland, the daily active case information up to 54 and 43 days respectively are considered for an attempt to predict the daily active cases up to 100 days. In the figure, the estimated curve (shown in red) is optimized using all the real data points available, whereas the predicted curve (shown in black) is optimized using the truncated real data. It can be observed that the predictive estimation closely follows the real active case data, even though only 50% data points are used for parameter estimation. For Israel and Switzerland, 100 days prediction of the algorithm produces (dl,dc,dχ) as (0.056,0.008,0.95) and (0.028,0.005,0.43) respectively. As prediction of the spread of the infection is one of the most challenging tasks, the predictive ability of proposed algorithm is compared with different baseline methods to better understands its performance. As only a very few data points were available in the truncated data, fast decision tree learning algorithm [63] and Random forest regression perform poorly and give (dl,dc,dχ) as (0.43,0.49,243.82) and (0.439,0.51,252.6) respectively for the truncated time series of Switzerland. SVM regression with RBF kernel performs satisfactorily on the same truncated data and produces (dl,dc,dχ) as (0.09,0.02,27.8). However, the proposed PCA-GA algorithm significantly outperforms the baseline algorithms and produces (dl,dc,dχ) as (0.028,0.005,0.43).

Fig. 8.

Fig. 8

Prediction of daily active cases from truncated data. For Israel and Switzerland, real data up to 54 and 43 days has been used to predict the daily active cases for 100 days. For prediction, the average of 50 independent PCA-GA simulations are considered.

3.4. Prediction for exponentially rising active cases

As the PCA-GA methodology has been elaborately validated in Section 3.3, now, in this section, it is employed for the purpose of prediction of consistently rising real epidemic data. Though the parameter estimation works well even when the minimum information about the peak position in ct is available, the prediction task becomes really challenging when ct is exponential in nature. For a particular country where ct is almost exponentially rising, proceeding with prediction, first the best set of parameters Θ is detected by PCA-GA with fitness f and error e. As the drop of the infection heavily depends on the transitional probabilities pir, pqr and state transitional delays τir and τqr, this parameters are tuned to find a region of predictions bounded by the possible best case and the worst case scenarios. While estimating the best case scenario, pir and pqr is chosen equal to the maximum and minimum pir and pqr observed in the continent from which the country belongs. The reason behind this strategy is that the parameters related to the infection spreading are different in each continent which is also observed by [64]. In the best case scenario, transitional delays τir and τqr are reduced to obtain best case transitional delays τir and τqr respectively such that the fitness remain within 90% of f, where τir and τqr are the corresponding optimized delays available in Θ. For the worst case scenario, we consider τir=τir+αir and τqr=τqr+αqr, where αir=τirτir and αqr=τqrτqr.

Fig. 9 depicts the prediction of the daily active cases using the method discussed so far. In Fig. 9, the black dotted line indicates the prediction using the optimal parameters Θ estimated using PCA-GA. The orange line indicates the best case scenario, where the maximum daily active cases would be minimized given the real data. The red line indicates the worst case scenario based on the specific conditions mentioned above. The best case and the worst case scenarios act as limiting cases of an area (shaded in pink colour) of probable future state. Any curve inside the pink region that contains the real data could be the evolution of the daily active cases in future given the real time series data, that is in exponentially rising state currently. This indicates that for India, which is now one of the biggest epicentres of COVID-19 in South-eastern Asia, the disease can start decline very soon if vigorous measures from government and complete support from the public could be achieved. It also shows that the maximum active cases on a day, that puts a direct burden on the health infrastructure of the country can be restricted below 750,000 if people participate to government indicated mitigation strategies, and recovery rate remains at its current value. In that case, the peak of the disease is expected to pass during mid-September to mid-October, and the disease can be over with its first wave by March 2021. But these predictions also imply that the range of future states, that are possible for exponentially rising daily active cases, not only depend on the evolution of the epidemic so far, but also gets highly affected by the consistency and implementation efficiency of mitigation strategies.

Fig. 9.

Fig. 9

Prediction of the course of the disease: Exponentially rising daily active cases for India (blue) till 20th July, 2020 are used for parameters estimation and the predictions.

4. Conclusion

COVID-19 outbreak has created a massive impact all across the globe. Even after nation-wide lockdowns, extensive testing strategies and medical supports, the spread of the virus has overwhelmed several countries. Thus, it is becoming more and more important to understand the nature of the infection spread and the key parameters that are controlling the spread. In this work, we proposed a probabilistic cellular automata model to understand and depict COVID-19 spread using appropriate choice of loss functions and evolutionary optimization framework. The parameters of this cellular automata model are optimized using sequential evolutionary genetic algorithm. It has been shown that this self-adapting methodology can be highly flexible and has the power to accurately estimate time trajectories of epidemics. This model works with physically interpretable parameters, which are accessible for analysis, data collection and further experiment, and can be readily identified with ground reality. This model has been successfully employed for optimizing all these parameters simultaneously for the daily active cases, total infected cases and total deaths with extreme robustness. The performance of the model has been exhibited for a large number of countries with huge diversity in population density, continents and available healthcare infrastructures. The predictive strength of the model has also been validated extensively, and demonstrated to estimate the course of the pandemic for the countries where infection peak has not been reached yet. It is important to mention that the motivation of the work was to develop a data driven, generalized, spatial framework that can be used to estimate relevant epidemiological parameters. This methodology is so powerful and flexible that physical interpretations of the results obtained from these analyses can have a wide range implications. Once the data is properly interpreted with the proposed methodology, interesting realistic features can be identified for specific countries. For example, in a pandemic situation, easily relatable factors like population clusters, variable population density, variable health facilities at different places of a country etc, can be studied to understand and predict emergence of new hotspots which can be used to design selective area containment strategies. While we propose and establish the applicability and strength of this framework in this work, we wish address these application perspectives in a study in our upcoming research studies.

With this proposed platform, the impact of individuality on contagion process can be explicitly studied, which might be directly related to the questions like lockdown behavioural differences, influence of rumours, vaccination opinion differences etc. As the effects of more complex dynamical factors like periodic lockdown or population clusters are not considered in this present model, the prediction capability of the proposed model is not satisfactory for time series data with abrupt discontinuities in the present form. The proposed framework could be enhanced with other lp norm distances and different optimization techniques like multi-objective genetic algorithm or strength pareto evolutionary algorithm. Other swarm-based optimization techniques can also be explored for further refinement of the model. The potential of the proposed approach can be utilized to better understand the disease spreading and controlling, beyond this pandemic the world is facing currently, by keeping track of the spatial information of the dynamics, incorporating realistic behavioural aspects, and optimizing in terms of demographic as well as socioeconomic features.

CRediT authorship contribution statement

Sayantari Ghosh: Conceptualization, Methodology, Software, Validation, Writing - review & editing. Saumik Bhattacharya: Conceptualization, Methodology, Software, Validation, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Footnotes

References

  • 1.2020. World Health Organization Coronavirus disease (COVID-2019) Situation Reports. Available at URL https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports. (Accessed June 2020) [Google Scholar]
  • 2.Jin X., Lian J.-S., Hu J.-H., Gao J., Zheng L., Zhang Y.-M., Hao S.-R., Jia H.-Y., Cai H., Zhang X.-L. Epidemiological, clinical and virological characteristics of 74 cases of coronavirus-infected disease 2019 (COVID-19) with gastrointestinal symptoms. Gut. 2020;69(6):1002–1009. doi: 10.1136/gutjnl-2020-320926. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Pan L., Mu M., Yang P., Sun Y., Wang R., Yan J., Li P., Hu B., Wang J., Hu C. Clinical characteristics of COVID-19 patients with digestive symptoms in Hubei, China: a descriptive, cross-sectional, multicenter study. Am. J. Gastroenterol. 2020;115 doi: 10.14309/ajg.0000000000000620. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Cheng Y., Luo R., Wang K., Zhang M., Wang Z., Dong L., Li J., Yao Y., Ge S., Xu G. Kidney disease is associated with in-hospital death of patients with COVID-19. Kidney Int. 2020 doi: 10.1016/j.kint.2020.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Han C., Duan C., Zhang S., Spiegel B., Shi H., Wang W., Zhang L., Lin R., Liu J., Ding Z. Digestive symptoms in COVID-19 patients with mild disease severity: clinical presentation, stool viral RNA testing, and outcomes. Am. J. Gastroenterol. 2020 doi: 10.14309/ajg.0000000000000664. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Zheng Y.-Y., Ma Y.-T., Zhang J.-Y., Xie X. COVID-19 and the cardiovascular system. Nature Rev. Cardiol. 2020;17(5):259–260. doi: 10.1038/s41569-020-0360-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wang C., Pan R., Wan X., Tan Y., Xu L., Ho C.S., Ho R.C. Immediate psychological responses and associated factors during the initial stage of the 2019 coronavirus disease (COVID-19) epidemic among the general population in China. Int. J. Environ. Res. Publ. Health. 2020;17(5):1729. doi: 10.3390/ijerph17051729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Wang Y., Wang Y., Chen Y., Qin Q. Unique epidemiological and clinical features of the emerging 2019 novel coronavirus pneumonia (COVID-19) implicate special control measures. J. Med. Virol. 2020;92(6):568–576. doi: 10.1002/jmv.25748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.van Doremalen N., Bushmaker T., Morris D.H., Holbrook M.G., Gamble A., Williamson B.N., Tamin A., Harcourt J.L., Thornburg N.J., Gerber S.I. Aerosol and surface stability of SARS-CoV-2 as compared with SARS-CoV-1. New Engl. J. Med. 2020;382(16):1564–1567. doi: 10.1056/NEJMc2004973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Bai Y., Yao L., Wei T., Tian F., Jin D.-Y., Chen L., Wang M. Presumed asymptomatic carrier transmission of COVID-19. JAMA. 2020;323(14):1406–1407. doi: 10.1001/jama.2020.2565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Nishiura H., Kobayashi T., Miyama T., Suzuki A., Jung S., Hayashi K., Kinoshita R., Yang Y., Yuan B., Akhmetzhanov A.R. Estimation of the asymptomatic ratio of novel coronavirus infections (COVID-19) medRxiv. 2020 doi: 10.1016/j.ijid.2020.03.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yu P., Zhu J., Zhang Z., Han Y. A familial cluster of infection associated with the 2019 novel coronavirus indicating possible person-to-person transmission during the incubation period. The J. Infect. Dis. 2020;221(11):1757–1761. doi: 10.1093/infdis/jiaa077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Giordano G., Blanchini F., Bruno R., Colaneri P., Di Filippo A., Di Matteo A., Colaneri M. Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy. Nature Med. 2020:1–6. doi: 10.1038/s41591-020-0883-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Yang Z., Zeng Z., Wang K., Wong S.-S., Liang W., Zanin M., Liu P., Cao X., Gao Z., Mai Z. Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions. J. Thoracic Dis. 2020;12(3):165. doi: 10.21037/jtd.2020.02.64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Volpert V., Banerjee M., Petrovskii S. On a quarantine model of coronavirus infection and data analysis. Math. Model. Nat. Phenom. 2020;15:24. [Google Scholar]
  • 16.Li C., Chen L.J., Chen X., Zhang M., Pang C.P., Chen H. Retrospective analysis of the possibility of predicting the COVID-19 outbreak from internet searches and social media data, China, 2020. Eurosurveillance. 2020;25(10) doi: 10.2807/1560-7917.ES.2020.25.10.2000199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Li L., Yang Z., Dang Z., Meng C., Huang J., Meng H., Wang D., Chen G., Zhang J., Peng H. Propagation analysis and prediction of the COVID-19. Infect. Dis. Model. 2020;5:282–292. doi: 10.1016/j.idm.2020.03.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Fong S.J., Li G., Dey N., Crespo R.G., Herrera-Viedma E. Composite Monte Carlo decision making under high uncertainty of novel coronavirus epidemic using hybridized deep learning and fuzzy rule induction. Appl. Soft Comput. 2020 doi: 10.1016/j.asoc.2020.106282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Chatterjee K., Chatterjee K., Kumar A., Shankar S. Healthcare impact of COVID-19 epidemic in India: A stochastic mathematical model. Med. J. Armed Forces India. 2020 doi: 10.1016/j.mjafi.2020.03.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Fong S.J., Li G., Dey N., Crespo R.G., Herrera-Viedma E. 2020. Finding an accurate early forecasting model from small dataset: A case of 2019-ncov novel coronavirus outbreak. arXiv preprint arXiv:2003.10776. [Google Scholar]
  • 21.Baltas G., Prieto Rodríguez F.A., Frantzi M., García Alonso C., Rodríguez Cortés P. Loyola Tech; 2020. Monte Carlo Deep Neural Network Model for Spread and Peak Prediction of COVID-19. [Google Scholar]
  • 22.Khatua D., De A., Kar S., Samanta E., Seikh A.A., Guha D. 2020. A fuzzy dynamic optimal model for COVID-19 epidemic in India based on granular differentiability. Available at SSRN 3621640. [Google Scholar]
  • 23.Liu P., Beeler P., Chakrabarty R.K. COVID-19 Progression timeline and effectiveness of response-to-spread interventions across the United States. medRxiv. 2020 [Google Scholar]
  • 24.Traini M.C., Caponi C., De Socio G.V. Modelling the epidemic 2019-nCoV event in Italy: a preliminary note. medRxiv. 2020 [Google Scholar]
  • 25.Lai S., Bogoch I.I., Ruktanonchai N.W., Watts A., Lu X., Yang W., Yu H., Khan K., Tatem A.J. Assessing spread risk of Wuhan novel coronavirus within and beyond China, January-April 2020: a travel network-based modelling study. medRxiv. 2020 [Google Scholar]
  • 26.Wynants L., Van Calster B., Bonten M.M., Collins G.S., Debray T.P., De Vos M., Haller M.C., Heinze G., Moons K.G., Riley R.D. Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. bmj. 2020;369 doi: 10.1136/bmj.m1328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bauch C.T., Lloyd-Smith J.O., Coffee M.P., Galvani A.P. Dynamically modeling SARS and other newly emerging respiratory illnesses: past, present, and future. Epidemiology. 2005:791–801. doi: 10.1097/01.ede.0000181633.80269.4c. [DOI] [PubMed] [Google Scholar]
  • 28.Shinde G.R., Kalamkar A.B., Mahalle P.N., Dey N., Chaki J., Hassanien A.E. Forecasting models for coronavirus disease (COVID-19): A survey of the state-of-the-art. SN Comput. Sci. 2020;1(4):1–15. doi: 10.1007/s42979-020-00209-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Althouse B.M., Lessler J., Sall A.A., Diallo M., Hanley K.A., Watts D.M., Weaver S.C., Cummings D.A. Synchrony of sylvatic dengue isolations: a multi-host, multi-vector sir model of dengue virus transmission in Senegal. PLoS Negl. Trop. Dis. 2012;6(11) doi: 10.1371/journal.pntd.0001928. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Anderson R.M., May R.M. Oxford university press; 1992. Infectious Diseases of Humans: Dynamics and Control. [Google Scholar]
  • 31.Hethcote H.W. Asymptotic behavior in a deterministic epidemic model. Bull. Math. Biol. 1973;35:607–614. doi: 10.1007/BF02458365. [DOI] [PubMed] [Google Scholar]
  • 32.Behncke H. Optimal control of deterministic epidemics. Opt. Control Appl. Methods. 2000;21(6):269–285. [Google Scholar]
  • 33.Bhattacharya S., Gaurav K., Ghosh S. Viral marketing on social networks: An epidemiological perspective. Physica A. 2019;525:478–490. [Google Scholar]
  • 34.Liu Y., Gayle A.A., Wilder-Smith A., Rocklöv J. The reproductive number of COVID-19 is higher compared to SARS coronavirus. J. Travel Med. 2020 doi: 10.1093/jtm/taaa021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Shim E., Tariq A., Choi W., Lee Y., Chowell G. Transmission potential and severity of COVID-19 in South Korea. Int. J. Infect. Dis. 2020 doi: 10.1016/j.ijid.2020.03.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Kucharski A.J., Russell T.W., Diamond C., Liu Y., Edmunds J., Funk S., Eggo R.M., Sun F., Jit M., Munday J.D. Early dynamics of transmission and control of COVID-19: a mathematical modelling study. Lancet Infect. Dis. 2020 doi: 10.1016/S1473-3099(20)30144-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Peng L., Yang W., Zhang D., Zhuge C., Hong L. 2020. Epidemic analysis of COVID-19 in China by dynamical modeling. arXiv preprint arXiv:2002.06563. [Google Scholar]
  • 38.Kermack W.O., McKendrick A.G. A contribution to the mathematical theory of epidemics. Proc. R. Soc. Lond. Ser. A. 1927;115(772):700–721. [Google Scholar]
  • 39.Rachah A., Torres D.F. 2017. Analysis, simulation and optimal control of a SEIR model for ebola virus with demographic effects. arXiv preprint arXiv:1705.01079. [Google Scholar]
  • 40.Berge T., Lubuma J.-S., Moremedi G., Morris N., Kondera-Shava R. A simple mathematical model for Ebola in Africa. J. Biol. Dyn. 2017;11(1):42–74. doi: 10.1080/17513758.2016.1229817. [DOI] [PubMed] [Google Scholar]
  • 41.Toffoli T., Margolus N. MIT press; 1987. Cellular Automata Machines: A New Environment for Modeling. [Google Scholar]
  • 42.Wolfram S. CRC Press; 2018. Cellular Automata and Complexity: Collected Papers. [Google Scholar]
  • 43.Boccara N., Cheong K., Oram M. A probabilistic automata network epidemic model with births and deaths exhibiting cyclic behaviour. J. Phys. A: Math. Gen. 1994;27(5):1585. [Google Scholar]
  • 44.Beauchemin C., Samuel J., Tuszynski J. A simple cellular automaton model for influenza a viral infections. J. Theoret. Biol. 2005;232(2):223–234. doi: 10.1016/j.jtbi.2004.08.001. [DOI] [PubMed] [Google Scholar]
  • 45.Fuks H., Lawniczak A.T. Individual-based lattice model for spatial spread of epidemics. Discrete Dyn. Nat. Soc. 2001;6 [Google Scholar]
  • 46.Willox R., Grammaticos B., Carstea A., Ramani A. Epidemic dynamics: discrete-time and cellular automaton models. Physica A. 2003;328(1–2):13–22. [Google Scholar]
  • 47.Eosina P., Djatna T., Khusun H. A cellular automata modeling for visualizing and predicting spreading patterns of dengue fever. Telkomnika. 2016;14(1):228. [Google Scholar]
  • 48.Pokkuluri K.S., Nedunuri S.U.D. A novel cellular automata classifier for COVID-19 prediction. J. Health Sci. 2020;10(1):34–38. [Google Scholar]
  • 49.Dascalu M., Malita M., Barbilian A., Franti E., Stefan G.M. Enhanced cellular automata with autonomous agents for Covid-19 pandemic modeling. Romanian J. Inf. Sci. Technol. 2020;23:S15–S27. [Google Scholar]
  • 50.Ghosh S., Bhattacharya S. 2020. Computational model on COVID-19 pandemic using probabilistic cellular automata. arXiv preprint arXiv:2006.11270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Wright A.H. Foundations of Genetic Algorithms, vol. 1. Elsevier; 1991. Genetic algorithms for real parameter optimization; pp. 205–218. [Google Scholar]
  • 52.Yao L., Sethares W.A. Nonlinear parameter estimation via the genetic algorithm. IEEE Trans. Signal Process. 1994;42(4):927–935. [Google Scholar]
  • 53.Katare S., Bhan A., Caruthers J.M., Delgass W.N., Venkatasubramanian V. A hybrid genetic algorithm for efficient parameter estimation of large kinetic models. Comput. Chem. Eng. 2004;28(12):2569–2581. [Google Scholar]
  • 54.Gulsen M., Smith A., Tate D. A genetic algorithm approach to curve fitting. Int. J. Prod. Res. 1995;33(7):1911–1923. [Google Scholar]
  • 55.Karr C.L., Weck B., Massart D.-L., Vankeerberghen P. Least median squares curve fitting using a genetic algorithm. Eng. Appl. Artif. Intell. 1995;8(2):177–189. [Google Scholar]
  • 56.Schimit P.H. Evolutionary aspects of spatial prisoner’s dilemma in a population modeled by continuous probabilistic cellular automata and genetic algorithm. Appl. Math. Comput. 2016;290:178–188. [Google Scholar]
  • 57.Holland J.H. MIT press; 1992. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. [Google Scholar]
  • 58.Liao T. Clustering of time series data—a survey. Pattern Recognit. 2005;38(11):1857–1874. [Google Scholar]
  • 59.Gao J., Sultan H., Hu J., Tung W.-W. Denoising nonlinear time series by adaptive filtering and wavelet shrinkage: a comparison. IEEE Signal Process. Lett. 2009;17(3):237–240. [Google Scholar]
  • 60.Salem O., Liu Y., Mehaoua A. 2014 IEEE International Conference on Communications (ICC) IEEE; 2014. Anomaly detection in medical wsns using enclosing ellipse and chi-square distance; pp. 3658–3663. [Google Scholar]
  • 61.2020. World Health Organization Coronavirus Disease (COVID-2019) Situation Reports. Available at URL: https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200402-sitrep-73-covid-19.pdf. (Accessed June 2020) [Google Scholar]
  • 62.Acharjya D., Anitha A. A comparative study of statistical and rough computing models in predictive data analysis. Int. J. Amb. Comput. Intell. (IJACI) 2017;8(2):32–51. [Google Scholar]
  • 63.Su J., Zhang H. AAAI, vol. 6. 2006. A fast decision tree learning algorithm; pp. 500–505. [Google Scholar]
  • 64.Miller A., Reandelar M.J., Fasciglione K., Roumenova V., Li Y., Otazu G.H. Correlation between universal BCG vaccination policy and reduced morbidity and mortality for COVID-19: an epidemiological study. MedRxiv. 2020 [Google Scholar]

Articles from Applied Soft Computing are provided here courtesy of Elsevier

RESOURCES