Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2023 Jan 20;3(2):85–96. doi: 10.1016/j.imed.2023.01.002

Big data technology in infectious diseases modeling, simulation, and prediction after the COVID-19 outbreak

Honghao Shi 1, Jingyuan Wang 1,, Jiawei Cheng 1, Xiaopeng Qi 2, Hanran Ji 2, Claudio J Struchiner 3,4, Daniel AM Villela 5, Eduard V Karamov 6,7, Ali S Turgiev 6,7
PMCID: PMC9851724  PMID: 36694623

Abstract

After the outbreak of COVID-19, the interaction of infectious disease systems and social systems has challenged traditional infectious disease modeling methods. Starting from the research purpose and data, researchers improved the structure and data of the compartment model or used agents and artificial intelligence based models to solve epidemiological problems. In terms of modeling methods, the researchers use compartment subdivision, dynamic parameters, agent-based model methods, and artificial intelligence related methods. In terms of factors studied, the researchers studied 6 categories: human mobility, nonpharmaceutical interventions (NPIs), ages, medical resources, human response, and vaccine. The researchers completed the study of factors through modeling methods to quantitatively analyze the impact of social systems and put forward their suggestions for the future transmission status of infectious diseases and prevention and control strategies. This review started with a research structure of research purpose, factor, data, model, and conclusion. Focusing on the post-COVID-19 infectious disease prediction simulation research, this study summarized various improvement methods and analyzes matching improvements for various specific research purposes.

Keywords: Infectious disease model, Data embedding, Social system, Dynamic, Modeling the social systems

1. Introduction

Researchers use infectious disease models to study how infectious diseases spread, how fast they spread, and their spatial-temporal characteristics. The most commonly used infectious disease model is the population model represented by the compartment model, such as SIR [1] and SEIR [2]. They were divided into several groups according to different states, such as susceptible, infected, and removed, and used differential equations to define the mechanism of individual flow between groups.

Since 1927, SEIR and other compartment models have been successfully applied to the application and analysis of measles [3], SARS[4], and influenza A (H1N1) [5].

The three main elements of compartment models are compartments, transmissions (between compartments), and parameters. After the outbreak of COVID-19, three elements of the compartment model faced limitations. These limitations stem from a common cause: the influence of social systems on infectious disease systems. Whether it is nonpharmacological interventions, vaccination strategies, the intensity of population activity, or the age distribution of those infected, these varied effects can each be attributed to a single factor in the social system. Researchers first determine the factor to be studied, then look for data that can characterize this factor, use modeling methods, and finally complete the research purpose through experiments.

In this review, Section 2 introduces the basic concepts, application status, and limitations of infectious disease models. Section 3 categorizes the new modeling methods, and Section 4 categorizes the research factors, their corresponding data, and modeling methods. Section 5 introduces preparedness for future outbreaks.

2. Traditional compartment model

2.1. What is compartment model

A compartment model is a mathematical model that utilizes a set of compartments, parameters, and transformations to model the development of an infectious disease. The mathematical representation and practical significance of these three elements are shown in Table 1 :

Table 1.

The mathematical representation and practical significance of compartments, parameters, and transformations

Types Mathematical representation Practical significance
Compartment A state such as susceptibility, infectious, or death A function F(t) relative to time, where t represents time, F(t) represents the compartment’s value at the corresponding time, and the compartment’s value represents the number of individuals in a corresponding state
Parameters A numerical feature, such as infection rate, mortality rate, or time to onset A floating-point number
Transformation The process of developing an epidemic, such as being infected and cured A differential equation, where the left side is the differential of a certain compartment with respect to time t, and the right side is an expression consisting of compartment values, parameters, and constants

Considering that all the compartments and parameters appear in the transformations in the form of a variable, a compartment model can be represented by a differential equation system.

Take the simplest SI model [6] as an example: SI contains two compartments: S (susceptible) and I (infectious), one parameter: β for transmission rate, and two transformations. The differential equations of the SI model are

dSdt=β·S·IN (2.1.1)
dIdt=β·S·IN (2.1.2)
S+IN (2.1.3)

2.2. SEIR compartment model

Before the outbreak of COVID-19 [2], the most widely used epidemic model was the SEIR compartment model. The differential equations of the SEIR model are

dSdt=β·S·IN (2.2.1)
dEdt=β·S·INα·E (2.2.2)
dIdt=α·Eγ·I (2.2.3)
dRdt=γ·I (2.2.4)
S+E+I+RN (2.2.5)

In this system of equations, the meaning of each compartment, parameter, transformation, and constant N is shown in Table 2 .

Table 2.

The meaning of SEIR compartment model

Types Mathematical representation Practical significance
Compartment S Susceptible
E Exposed
I Infectious
R Removed(death + recovered)
Parameter β Transmission rate
α The reciprocal of latency
γ Removal rate
Constant N Population

The SEIR model defines four states of the epidemic transmission process and the transformations between them. If it is possible to determine a set of initial compartments’ values and parameters’ values so that the compartments’ values simulated by the model are consistent with the real value time series, for example, I(t) is consistent with the real number of existing patients, it means that the development process of the current infectious disease is consistent with the model. As a result, the model can be used for epidemic prediction.

2.3. How to use SEIR model to predict epidemic

2.3.1. The definition of epidemic prediction task

First, a definition should be made for the epidemic prediction task using the SEIR model:

  • Through method f, Parameters are extracted from Origin Data.

  • Through method h, Initial Value of compartments(as well as N) is determined from Origin Data.

  • Parameters and Initial Value constitute SEIR Input.

  • Through the principles of the SEIR model, SEIR Input is calculated as SEIR Output.

  • Through method g, SEIR Output forms the results of the task.

When researchers use the SEIR model for epidemic prediction, they first start from the original data, use the method f to extract the parameters, and use the method h to obtain the initial compartments’ values and constant values. The initial compartments’ values and constant values only determine the input of the SEIR model. After the calculation of SEIR’s differential equations, the researchers obtained the model output, such as the time series of the number of infectious individuals, and finally used another method g to extract the final result of the epidemic prediction task.

In practice, the original data should at least contain a time series of the number of infected and cured people and the population. In such a case, h and g are relatively simple:

  • Method h: Taking the initial value of the number of infected people as the initial value of the I compartment, the sum of the initial value of cured and dead people as the initial value of the R compartment, and the population as N. Then, estimating E compartment’s initial value as coeEI times the initial value of I compartment [7] or the accumulated value of infected people in the next few days [8]. Finally, using S+E+I+RN to calculate the initial value of the S compartment;

  • Method g: Taking I compartment as the number of existing patients and the sum of I and R compartment as the cumulative number of patients.

In fact, h and g vary substantially as researchers use external data to study new scenarios such as country-specific and population mobility [9], [10]. Therefore, the method f for extracting parameters would be introduced first.

2.3.2. How to extract Parameters

In different studies, the parameter extracting method f can be generally divided into three categories: a priori, operator calculation, and fitting. It is worth noting that different parameters in the same study may also be extracted in different ways. For example, a study may use the a priori method to extract α and the fitting method to extract β.

A priori is to use information, facts, or data outside the extracting process of parameters and directly extracting the value of the parameters. For example, several studies use the reciprocal of the time of latency as the value of α [11], [12]. This time is derived from clinical data and has nothing to do with the SEIR model.

Operator calculation is to use external data to calculate the value of parameters through several formulas or a simple mathematical model. For example, Kissler and Christine [13] used the strain-level incidence proxies and the generation interval distributions to estimate the daily effective reproduction number(Ru) and then used Ru to determine parameters in SEIR model:

Ru=t=uu+imaxb(t)g(tu)a=0imaxb(ta)g(a) (2.3.1)

where b(t) is the strain-level incidence proxy on day t, g(a) is the value of the generation interval distribution at time a, and imax is the maximum generation interval, set as the first day at which over 99 of generation interval distribution had been captured [13].

Operator calculation is suitable for dealing with external factors that have little to do with the development scale and stage of infectious diseases, such as temperature, humidity, ultraviolet rays, and other climatic factors. Some studies from top journals have introduced the ERA5 dataset by invariably and used the formula βt=exp{a·dt+log(dmaxdmin)}+dmin to calculate β under the scenario of taking into account climate factors [14], [15], [16], where {dt} refers to temperature or UV rays from ERA5 dataset.

Fitting refers to calculating a set of optimal or relatively optimal parameters through an optimization method so that the simulated compartment values of this set of parameters are as close as possible to the real situation. The fitting method is the major method for extracting parameters, and it is also the only method whose results are strongly correlated with the actual development of the epidemic. For example, L Xue and others [17] used the MCMC method to extract the parameters in epidemic models, which modeled COVID-19 in Wuhan, Toronto, and Italy. Because the fitting method is based on real data, it is also called a data-driven method.

2.3.3. The weakness of traditional compartment model

After the outbreak of COVID-19, the SEIR model has shown two limitations as an infectious disease model: it cannot model the real social systems and dynamics.

Although the SEIR model models the whole process of contact-exposure-onset-removal of the development of infectious diseases, it is too ideal for the assumptions of compartments and individuals. This is reflected in the following:

  • Individuals in the same compartment are identical. For example, infected individuals transmit the disease to susceptible individuals at an average rate, and each individual has the same importance in the transmission chain.

  • Each individual is indifferent without subjective initiative. Individuals will not change their action strategies or formulate nonpharmacological interventions (NPIs, similarly hereinafter) according to the development of the epidemic.

  • The compartment is set according to the principle of the epidemic, not the actual observation data. For example, the infected person’s compartment is set, but only the confirmed data can be obtained in reality, and the error of approximate substitution is unignorable.

With increased human mobility and the introduction of NPIs, the complex, dynamic spread of COVID-19 has diverged significantly from SEIR’s single, static assumption. At the same time, the ability to obtain frontline data also limits the modeling capabilities of SEIR: Henrik Salje and others [18] modeled the different ages of individuals in the compartment and took into account the special propagation patterns of this scene in France. Vadim A. Karatayeva and others [19] modeled the dynamics of the transmission rate caused by dynamics in population mobility and conducted a simulation experiment on the effect of NPIs based on this model.

3. Methods to improve traditional compartment model in COVID-19 era

The improvements about “Modeling the Dynamics” were well-known before COVID-19 outbreak, but using these improvements with big data to solve infectious disease modeling problems proliferated after COVID-19 outbreak. We review these methods here and review specific research purposes with specific data in Section 4.

3.1. Modeling the dynamics

Dynamics means that the epidemic model varies with external factors such as spatial factors, temporal factors, and characteristics. According to the scale of variation, the methods of modeling dynamics can be divided into two categories: multistage models and parameter dynamics. Multistage models vary more but require more complete data and logic to support. Dynamic parameters only change the parameters of the model, which is more feasible under the premise of reasonable model design. In practical research, multistage models are often used to review specific epidemics, while parameter dynamics are suitable for extensive research such as data analysis, simulation, prediction, and regression. The difference between the multistage model and dynamic parameters can be seen in Table 3 .

Table 3.

The difference between multistage model and dynamic parameters

Aspects Multistage model Dynamic parameters
Scale of variation Whole model Only parameters
Required data Lots and complex Simple arrays usually, sometimes geospatial or graph data
Research focus When to divide stage, how to model each stage How to use array data to make parameters dynamic
Research interest Review specific epidemics Scientific purpose
Popularity Low, only a bit High, most of the research about modeling dynamics

3.1.1. Multistage models

The multistage model refers to the use of different models to model the epidemic according to certain rules. The dynamic nature of the multistage model is usually reflected in the time dimension, which is derived from the NPIs, an external factor with great influence.

Xingjie Hao and Chaolong Wang [20] published their academic research on the review of COVID-19 in Wuhan in Nature. Using the SAPPHIRE framework, they differentiated between symptomatic and asymptomatic infections and added presymptomatic compartments between exposed and infected compartments. Using this framework, they introduced the release timing of NPIs such as Wuhan’s city closure and established a five-stage SEPIR-class epidemic model. The transformations of the model for each stage are the same, but the compartments’ values are reset according to the end of the previous stage and the real data. The parameters are also completely refitted by the Markov chain Monte Carlo method. The cutoff points of the five stages are determined a priori by NPIs, such as 2020. 1.22 (Wuhan’s city closure) or 2022.2.2 (with the addition of clinical diagnosis criteria), and have nothing to do with confirmed case data or the SEPIR model itself.

A study published in PNAS by Daniel Duque and others [21] on social distancing and COVID-19 hospital surges also used a multistage model. Unlike Chaolong Wang, they presented a strategy for triggering short-term shelter-in-place orders when hospital admissions surpass a threshold. In other words, they use a certain data indicator to automatically obtain cutoff points, rather than manually specifying them based on external factors.

In the study of modeling dynamics using multistage models, manual formulation and automatic acquisition according to a certain index are two types of methods to obtain cutoff points. The former, such as the review of an epidemic in Wuhan by Wang [20] and the research on how Chinese NPIs control COVID-19 by Maier [22], deals with irregular, sudden external influences like NPIs. The latter, such as the research on hospital surges by Duque [21], deals with external factors with a certain regularity. It is usually a threshold like the hospital admissions threshold or the government response index threshold [23]. The difference between them is summarized in Table 4 .

Table 4.

The summary of multistage model

Aspects Manuel Automatic
Basis External human intervention External indicator with a certain threshold
Scenarios NPIs Extensive external factors

Finally, models in different stages are refitted as least, which is a variation on a larger scale compared to dynamic parameters.

3.1.2. Dynamic parameters

Dynamic parameters refer to the use of external data to change the parameters of the infectious disease model from a figure to an array or even a matrix by means of a priori, operators, or complex submodels. Dynamic parameters are more of a modeling basis than an improvement. Dynamic parameters are to the COVID-19 model what the E compartment is to the SEIR model, a common feature of models. The ultimate purpose of dynamic parameters is very broad, but the direct reason is that the parameters of the model are affected by external factors to study the relationship between those external factors and the epidemic. Compared with multistage models, research with dynamic parameters as the main method not only focuses on the review and analysis of past epidemics, but also on simulation experiments using models, and scientific conclusions are drawn from them. For example, Serina Chang and others [24] divided the places where people contact each other and cause infection into census block group (CBG) and points of interest (POI), where CBG is a geographic unit with a population of 600-3,000, and POIs are frequented by people, for example, places such as restaurants, grocery stores, and places of worship. The transmission rate βbase of all CBG home infections is the same value, while those of POIs are

βpj(t)=φdpj2Vpj(t)apj (3.1.1)

where φis a same propagation constant for all POIs, apj is the actual area of pj, Vpj(t)is the volume of visitors at time t and dpjis the average hourly percentage of visitors visiting this POIs at any time. In this case, the transmission rate β is dynamic in the spatial dimension.

As defined in the extraction parameter section, researchers can also obtain dynamic parameters through a priori and operator calculation methods. In Chang’s research [24], they use multiplication and division operators to introduce the relevant data of visitors to complete the parameter dynamic process. In addition, researchers will also use models in the fields of mathematics and computer science, such as Bayesian regression models, RNN-class models, and GNN-class models, to extract dynamic parameters, that is, complex submodel methods.

Compared with original infectious disease modeling, researchers introducing complex submodels are more inclined to explore how to design submodels to allow SEIR-like epidemic models, or more broadly, dynamic models(refers to models who model a dynamic system like an epidemic, “dynamic” here has a different meaning with those in “dynamic parameters”), to model the link between social systems and infectious disease systems. For example, Chang and others [25] discussed in detail the impact of NI(no interventions), CI(case isolation), HQ(home quarantine), SC(school closures) policies and their combinations on SD(social distance), and the spread of the epidemic. Among them, the results of SD are used to extract dynamic parameters in the SEIR model.

In the field of computer science, machine learning models are particularly suitable for the task of processing given data and discovering valuable information or conclusions. In research using complex submodels to obtain dynamic parameters, most researchers use machine learning models instead of complex but traditional mathematical models. In fact, with the introduction of machine learning methods, the original SEIR’s compartment settings and transformations have also undergone different changes.

For instance, Amray Schwabe and others [26] presented a paper at the KDD conference, using machine learning models to process massive mobile data, and use it to obtain dynamic parameters for infectious disease models. They designed the M2H(mobility to Hawkes process) model: used external data such as case data and mobile data to complete the fitting and application of the Hawkes process, and verified that its recurrence and epidemic prediction results were better than the SEIR model.

3.2. Modeling the real social systems

Compared with modeling dynamics, modeling the real social systems covers a wider range of studies and can use more methods. If modeling dynamics mainly changes the parameters in the three elements of the compartment model, then modeling the real scene requires designing the settings and transformations between the compartments. From the perspective of modeling methods, the means of modeling real scenes can be divided into 2 categories: compartment subdivision and metapopulation models.

3.2.1. Subdivided models

Subdivided models refer to further dividing one or more compartments in SEIR according to certain rules. From the perspective of mathematical modeling, compartment subdivision is to change a compartment C to C1, C2,..., Cn and supplement the matching transformations. From the perspective of infectious disease modeling, compartment subdivision can be divided into two categories: horizontal and vertical. The former solves the difference between different individuals in the compartment, and the latter solves the problem that SEIR’s compartment does not match the real data. “Horizontal” and “Vertical” describe their subdivision directions in the schematic.

The schematic of the horizontal subdivision is shown in Figure 1 .

Figure 1.

Fig. 1

Schematic of horizontal subdivision.

In the system of differential equations of the compartment model, the horizontal subdivision has the form of formula (3.2.1) to (3.2.11):

before: (3.2.1)
dXdt=vxy (3.2.2)
dYdt=vxyvyz (3.2.3)
dZdt=vyz (3.2.4)
X+Y+ZN (3.2.5)
after: (3.2.6)
dXdt=i=1nvxyi (3.2.7)
dYidt=vxyivyiz (3.2.8)
dZdt=i=1nvyiz (3.2.9)
X+i=1nYi+ZN (3.2.10)

In the horizontal subdivision, the subdivided compartments are “equal in status,” and there is no transmission between these compartments. Therefore, the horizontal subdivision method is suitable for dealing with external factors, which are not related to epidemiological status, but can affect the development of infectious diseases, such as age [27], [28], [29], occupation [30], and work intensity [31]. For example, Alessio Andronico and others [27] emphasized the impact of different age structures on hospitalizations by looking at histograms comparing the age distribution of hospitalized cases in French metropolis and French Guiana. And through the differences in the proportion of hospitalizations and deaths between the two, they further pointed out that considering the age structure of the population necessary, these differences were successfully predicted by the model used. Andronico subdivides the compartments horizontally into 8 age groups according to age, each covering 10 years.

The schematic of the vertical subdivision is shown in Figure 2 .

Figure 2.

Fig. 2

Schematic of vertical subdivision.

In the system of differential equations of the compartment model, the vertical subdivision has the form of formula (3.2.12) to (3.2.24):

before: (3.2.11)
dXdt=vxy (3.2.12)
dYdt=vxyvyz (3.2.13)
dZdt=vyz (3.2.14)
X+Y+ZN (3.2.15)
after: (3.2.16)
dXdt=vxy1 (3.2.17)
dY1dt=vxy1vy1y2 (3.2.18)
dYidt=vyi1yivyiyi+1,where1<i<n (3.2.19)
dYndt=vyn1ynvynz (3.2.20)
dZdt=vynz (3.2.21)
X+i=1nYi+ZN (3.2.22)

The compartments after vertical subdivision can be transformed in sequence. Therefore, vertical subdivision methods are suitable for modeling more detailed epidemiological states such as presymptomatic, confirmed, isolated, and hospitalization [21]. In fact, the process of extending the SI model to the SEIR model can also be regarded as a vertical subdivision, that is, the I compartment is subdivided into E-I-R subcompartments. For example, when discussing the consequences of the relaxation of school control measures in France, Laura Di Domenico and others [32] proposed an SEIR-based compartment model, which subdivided the follow-up status of severely infected patients vertically according to treatment methods and medical conditions, including hospitalization and ICU treatment. Next, they can use data from hospital admissions and ICU admissions to refine the model, rather than uniformly treating them as infected people and placing them in the I compartment.

The difference between the multistage model and dynamic parameters can be seen in Table 5 .

Table 5.

The summary of horizontal and vertical subdivision

Aspects Horizontal subdivision Vertical subdivision
Purpose Solves the difference between individuals Matches the real data
Relationship between subcompartments None Can be transformed in sequence
Subdividing standard Not related to epidemiological status Related to epidemiological status

It is worth noting that horizontal and vertical segmentation methods are not contradictory. For research purposes, researchers usually use a combination of the two. For example, Pinto Neto O and others [7] proposed a SUEIHCDR model that utilizes sophisticated compartment subdivision methods. They add 4 compartments on the basis of SEIR: U(unsusceptible), H(hospitalized), C(critical), and D(death). Starting from the SI model, they did the following:

  • Subdividing I into U and I horizontally, and the less (or even un-) susceptible people are modeled.

  • Subdividing I into E-I-R vertically, as the SEIR model does. The R here refers specifically to recovered rather than removed.

  • Subdividing R into R and H horizontally, modeling the population requiring hospitalization. The meaning of the transformation from I to R has also changed from a broad cure to a specific natural cure.

  • Subdividing H into H, C, and D vertically, modeling the progression of hospitalized patients with progressive deterioration and eventual death.

  • Complement the H-R, C-H transformation to model the recovery process of hospitalized or critically ill patients.

3.2.2. Metapopulation models

After the limitations of individual differences and data mismatches were addressed by subdivided models, the researchers went on to employ metapopulation models to more accurately model the process of “transmission” [33], [34]. In the SEIR model, “transmission” occurs uniformly between susceptible and infected compartments. But actually, in human society, this process goes along social networks. The meta-population model regards each node on the social network as a population, and the transmission of infectious diseases within the population is uniform, which is in line with the SEIR model. Between populations, along every edge of a social network, there is a flow of individuals. After obtaining data on flows and policies (e.g., not allowing the movement of infected patients), a metapopulation model is built to model infectious diseases on this social network.

A general metapopulation model differential equation system is as follows:

dSdt=multiply(S,β·I)N (3.2.23)
S=(S1,S2,...,Sn)T (3.2.24)
β=(βij)n*n (3.2.25)
I=(I1,I2,...,In)T (3.2.26)
where (3.2.27)
x=(x1,x2,...,xn)T (3.2.28)
y=(y1,y2,...,yn)T (3.2.29)
multiply(x,y)=(x1·y1,x2·y2,...,xn·yn)T (3.2.30)

Although β is used as the propagation matrix, the metapopulation model only focuses on some specific elements or vectors, not a whole (and sometimes random) matrix. Write βii as the propagation coefficient within the ith population, then βi=(βi1,βi2,...,βin) can be adapted into βi=βii·(ci1,ci2,...,1.0,...,cin), and:

cij=1.0,wherei=j (3.2.31)
cij=mobility(i,j),whereij (3.2.32)

The schematic of the metapopulation model is shown in Figure 3 .

Figure 3.

Fig. 3

Schematic of metapopulation model.

The metapopulation model provides an entrance for researchers to introduce spatiotemporal big data, and epidemic simulation based on the propagation matrix and propagation map can achieve more refined quantitative results than the traditional compartment model. For example, the paper published on PNAS by Ruiyun Li and other researchers [14] simulated the development of the epidemic after closing several high-traffic populations (social network nodes), indicating that the high-traffic populations (nodes) were first subjected to NPIs for 8 weeks and then intervened others for 8 weeks that reduced the epidemic scale by 88%. The degree of effectiveness exceeds that of 12 weeks of NPIs on all populations, and the cost is less than that of the latter.

The use of metapopulation models must be accompanied by dynamic parameters and is usually accompanied by compartment subdivision, that is, each population is a subdivided model instead of an SEIR model. In fact, in research on COVID-19, the model is usually a metapopulation model where each population is a subdivided model, and the model has dynamic parameters [29], [35], [36], [37]. How multiple populations can be designed, compartments subdivided, and dynamic parameters extracted depends primarily on the purpose of the study and the data the researcher has. A good model can make the best use of data, obtain as much knowledge as possible, then help researchers model abstract influencing factors, and finally complete the research purpose.

3.3. Beyond compartment models—agent-based models

With the development of computer simulation technology and artificial intelligence technology, researchers have been able to introduce the agent-based model into the field of infectious disease modeling. The modeling object of the traditional model is the compartment, that is, the group. In order to reflect the differences within the group, the researchers subdivided the group. In this way, it is possible to study the factors that lead to such internal differences and their impact on the spread and development of the epidemic. But in subdivided models, even in a metapopulation model, the modeling object is still the population, the group. The individuals inside each compartment are exactly the same and only reflect the characteristics of the group.

In fact, when the amount of data used for model fitting remains unchanged, increasing the number of compartments without limitation will only reduce the prediction and simulation performance of the model. The agent-based model takes the individual as the modeling object and completes the modeling of the spread of infectious diseases, various medical interventions or NPIs, and individual subjective behavior by defining the state and behavior of the individual. The agent-based model usually simulates dozens or even tens of thousands of individuals at the same time. Through the statistics of individual states and behaviors, the research purpose can be accomplished [38], [39], [40].

In the application scenario of the agent-based model, the state categories of individuals are usually many and fine, and it is difficult to model by conventional subdivided models. In the infectious disease model, the behaviors of individuals usually include intrinsic behaviors (morbidity, death, cure, etc.) and extrinsic behaviors (still, moving, migrating, etc.), and the spread of infectious diseases is carried out, checked, and determined for all individuals in each time unit and executed with a certain probability to individuals who meet the conditions. For example, Kim Sneppen and others published a paper on PNAS [41], using the agent model to complete the modeling of super-communicators and deploying super-communicator agents in specific communication scenarios such as schools and workplaces to explore how to formulate epidemic prevention policies with the existence of super-communicators. The research work published by Jesús A. Moreno Lpez1 and others in Science [42] introduced electronic device tracking data to explore the intervention process of the COVID-19 epidemic, especially how age and heterogeneity in modeling parameter settings interact.

An agent-based model of the spread of COVID-19 has been developed by a consortium of Russian research centers. Since the development of an epidemic is a kind of a chain reaction, the authors viewed the simulation process as an analog of the method used to solve the neutron transport equation for a heterogeneous medium in a multigroup approximation [43]. This model, demonstrating a good predictive power for metropolitan cities (Wuhan, New York, and Moscow), is currently being adapted to simulation of other viral respiratory infections and countrywide use.

On the basis of the above research, the researchers pushed the agent-based model from theory to application and formed a series of agent-based model open-source simulation platforms represented by covasim [44] and openABM-covid19 [45]. These open-source platforms focus on multiple scenarios. They not only complete model design and simulation operation, but also parameterize engineering factors such as scenario characteristics, data analysis methods, and data visualization methods, and form a complete open-source code, home page, published papers, and user manual documentation website.

3.4. AI techniques in improving COVID-19 modeling

Researchers also use AI techniques to solve the problem of static and too ideal scenes of traditional models. This combination of compartment model and AI is a hybrid physics-ML model [46]:

  • Residual modeling,

  • Output of physical model as input to ML model,

  • Replacing part of a physical model with ML,

  • Combining predictions from both physical model and ML model,

  • ML informing or augmenting physics model for inverse modeling.

In the scenario where AI technology and the compartment model are combined to deal with infectious diseases, by examining how models are output and the application of the laws of physics, the first three are AI in the auxiliary position, and the input and output of the model are still completed by the compartment model. The latter two are AI in the dominant position, and the input and output are completed under the guidance of physical rules (the principle of the compartment model). In addition, there is a model that is made purely of AI methods and has nothing to do with the compartment model. We call these three types, in turn, assisted, coexistence, and pure AI

  • Assisted: The compartment model completes all the processes such as modeling, parameter calculation, and simulation experiments. AI methods are only used to process part of the external data to achieve the goal of dynamically changing compartment values or parameters.

  • Coexistence: The designing, modeling, and simulation experiments are completed by the AI model. However, the principles of the compartment model are used, such as how the population in the cabin is divided, or the physical laws described by the dynamic equations.

  • Pure AI: Those studies that have nothing to do with compartment models, but belong to the field of infectious disease modeling

In this review, we briefly discuss and analyze the first two categories.

3.4.1. AI assist compartment model

Dynamic parameter is the primary method of AI-assisting compartment model. Compared with the numerical values or operators used in other dynamic parameter models, the method here uses a complete, complex, and independent AI model to obtain dynamic parameters and use them in the compartment model. Salah Ghamizi and others [47] published a paper in KDD’20, studying DN-SEIR, a data-driven approach to evaluating the effective reproduction number of the COVID-19 epidemic. They built an AI model (DNN) to predict the reproduction rate (Rt) and then used Rt to activate the SEIR model.

3.4.2. Coexistence of AI and compartment model

The knowledge of introducing kinetic equations into AI models falls under a larger scope of research: AI research with physical knowledge. In the field of infectious disease modeling, such models incorporate SEIR, or other variants of differential equations, into an AI model in the form of the loss function. Lijing Wang and others [48] used a causal-based graph neural network (CausalGNN) that learns spatiotemporal embedding in a latent space where graph input features and epidemiological context are combined via a mutual learning mechanism. In their model, the graph of disease dynamics is encoded through the time dimension. The encoding layers include feature encoding, spatiotemporal encoding, and finally causal encoding. At the top layer, encoding results are calculated in the SIRD compartment model, and then SIRD’s results are used as the input of the next time unit.

4. Data, factors, and modeling methods of research interests

At the end of the section “Meta-population Model,” it is mentioned that A good model can make the best use of data, obtain as much knowledge as possible, then help researchers model abstract influencing factors, and finally complete the research purpose. In fact, after the outbreak of COVID-19, the research on epidemic modeling has been inseparable from data, factors, and models. All research today is not limited to infectious disease itself but uses external data and models to model certain or certain types of influencing factors to carry out simulation experiments and draw conclusions. Data, factors, and models will be referred to as DFM hereinafter. Besides data, factors, and models, models are the subject of exploiting data, at the core of the modeling factor, as a tool for research purposes, and the most complex part. Therefore, the classification, principles, and application cases of models have been introduced in detail in Section 2. What follows is a brief introduction to data and factors.

4.1. Data

The data introduced in the infectious disease model can be divided into case data and noncase data according to content and source, and can be divided into point feature data, array data, and geospatial data according to the data format.

Case data mathematically includes exposure, infection, and removal data associated with the compartment, and in practice includes three categories: confirmed, cured, and dead. John Hopkins University’s (JHU) Center for Systems Science and Engineering (CSSE) is dedicated to building geographic information system (GIS) and collecting data, and after the outbreak of COVID-19, a database of cases from countries around the world and states in the United States has been established [49]. This database is accurate for every country, is updated every day, and is extremely widely used. Related papers have been cited more than 6,900 times.

Besides, to complete the vertical subdivided model and dynamic parameters, researchers introduced a type of “detailed case data.” In addition to the basic confirmed, cured, and death, it also includes critical illness, ICU, hospitalization, asymptomatic infection, close contact, nucleic acid testing, vaccination, etc. The case data is directly related to the compartment itself in content, mainly from the reports and epidemiological investigations of frontline staff. This type of data is mainly used as the true value to participate in the fitting process and is used as an index to compare with some results during the simulation experiments.

For example, Joshua S. Weitz [50] introduced a detailed US case dataset to complete his social system self-feedback model. That dataset includes 10 types of data: positive, negative, hospitalization, ICU, ventilator, cure, data quality rating (confidence level), death, data recording time, and floating space; and each type has 3 statistical dimensions: daily new, existing, and accumulated.

By definition, noncase data includes any data that is not case data but is used by researchers to model infectious diseases. Therefore, the discussion of noncase data focuses on its data format. Each format represents a class of information, has similar processing operators, and is applied in the same way in the model.

Point feature data is a type of feature data with a key-value format. Each object, such as country, POI, or even climate zone, has an ID and several dimension attributes. Such data are processed into feature vectors by traditional statistical methods and data mining methods, which are then used to obtain dynamic parameters based on characteristics or to build metapopulation models. For example, the mobility data between each CBGs and POIs can be described as a point feature dataset, where the objects are CBGs or POIs, and attributes are traffic flows, population, etc. [24]. Array data (or sequence data) is a type of data expanded in the time dimension. Each moment contains several attributes, which can be embedded into a feature vector. Sequence data is very common in case of data, for example, daily confirmed data is sequence data. In noncase data, sequence data is mainly used to extract dynamic parameters. Methods for processing sequence data include regression, fitting, and RNN-class deep learning models for extracting sequence information.

Geospatial data is based on GIS and contains descriptions of geographic information. In addition to geographic information, each object also includes several attributes, which can also be embedded into a feature vector. For instance, Carleton and others [15] use geospatial data (ERA-5 dataset) about UV rays to complete their research about the influence of UV rays on the epidemic.

The summary of case and noncase data can be seen in Table 6 and Table 7 .

Table 6.

The difference between case and noncase data

Aspects Case data Noncase data
Source Medical and epidemiological staff Everywhere
Content The number of individuals corresponding with compartment Everything
Usage As the true value Extracting parameters and building metapopulation models

Table 7.

The difference between 3 types of noncase data

Aspects Point feature data Array data Geospatial data
Format Key-value Array (sequence) Mostly shapefile or stata data
How to use Embedding and bringing into formula Regression, fitting, and RNN-class deep learning models Preprocessing and bring into formula

4.2. Factors

A factor is an abstract object that can be described in natural language and is a bridge between research interests and data. In order to achieve the research purpose, researchers need to analyze several factors, then find the corresponding data, and use the model to complete. For example, in the authoritative paper Reconstruction of the full transmission dynamics of COVID-19 in Wuhan [20] reviewing the Wuhan epidemic, researchers proposed three main factors that must be considered in the development of the COVID-19 epidemic in Wuhan, namely “pre-symptomatic infected individuals,” “NPIs,” and “human mobility.” Subsequently, the researchers used detailed case data and several noncase data to complete the model design with the compartment subdivision, multistage model, and dynamic parameters, and finally reviewed the Wuhan epidemic. On this basis, they used the model results to calculate a number of indicators to form valuable conclusions, such as COVID-19’s epidemiological characteristics comparing with SARS and MERS, and judging the effectiveness of NPIs in Wuhan.

4.3. The relationship between DFM and research interests

In the field of epidemic modeling, after being proposed, research would be divided into two parts: model building and simulation experiments. This research process can be summarized in the following steps:

  • Research interest

  • Factors

  • Origin Data

  • Models

  • Results of the task

The process of “Factors to Origin Data” and “Origin Data to Models” had been discussed in Section 2, and the research methods involved are provided in Section 3. Therefore, the relationship between factors, data, and model (DFM) and the research interest will be discussed next, from the aspect of research content.

4.4. Factors and their DFM

Each type of factor corresponds to some type of data and is associated with several models. The 6 main categories of research factors are summarized below on the basis of reviewing about 100 papers from Science, Nature, PNAS, The Lancet, SIGKDD, TKDE, and AAAI.

4.4.1. Human mobility

Mobility is a factor related to humans moving from here to there.

In the SEIR model, each individual is uniformly distributed in an ideal space and performs a completely random motion at a uniform speed. After the outbreak of COVID-19, the error brought about by such false assumptions cannot be ignored. Human mobility is not only a factor that directly affects the ability of the epidemic to spread but also acts as a medium for most NPIs, such as isolation, curfews, and school closures, to indirectly affect the spread of the epidemic. Therefore, it is necessary to model crowd mobility factors.

There are two types of data used to model mobility: traffic data with start and end points and a comprehensive human mobility index. The former is geospatial noncase data, used with the metapopulation model, and the latter is array-type noncase data, used with dynamic parameters.

Shengjie Lai’s team from Fudan University and others [8] regarded human mobility as a manifestation of NPIs and finally explored the effect of nonpharmaceutical interventions to contain COVID-19 in China by modeling human mobility factors. Using data from mobile phone signaling and travel, such as high-speed rail and plane ticket sales, they calculated the population flow between cities in China. Based on this flow, they completed the modeling of human mobility, which is more in line with the epidemic transmission model in China’s NPIs environment than the homogenized SEIR model. Similar data and methods were used by James D. Munday’s team from the London School of Hygiene and Tropical Medicine (LHSTM)) [51]. However, to achieve their purpose of studying COVID-19 transmission under school reopening strategies in England, they established a metapopulation model with a school–household network structure rather than a normal social network. They used their not-publicly available data from the UK Department for Education (DfE) to construct a network of schools linked through households: each edge on the network of schools is weighted by the number of unique contacts between schools that occur through shared households. For example, if in a given household, 2 children attend school i and 2 children attend school j, this corresponds to 4 unique contacts between school i and school j.

In the comprehensive human mobility index, Google Mobility is a dataset that will be released for free and open to use until the end of COVID-19 [52]. It divides the human activities into 6 categories according to the place of occurrence: grocery & pharmacy, parks, transit stations, retail & recreation, residential, and workplaces, and then records the difference between the crowd activity intensity and the reference value at a specific moment in turn. A work from Pierre Nouvellet and his team [53] used comprehensive human mobility indexes such as Google Mobility to complete the dynamic processing of effective reproduction number at the time of infection (Rt,i), and finally quantitatively calculated that the transmission can be significantly decreased with the initial reduction in mobility in 73%.

In summary, the human mobility factor is represented by traffic flow data and human mobility index which is modeled by metapopulation and dynamic parameters.

4.4.2. NPI

Nonpharmaceutical interventions (NPIs) are the factors by which humans, mainly governments or rulers, proactively propose measures to intervene in the epidemic.

Among all six types of factors, NPIs have the most extensive data sources and the most available modeling methods. Like the fundamental position of dynamic parameters in the COVID-19 model, research on COVID-19 must directly or indirectly consider NPIs. Research on indirect NPIs will use other factors, especially human mobility or human response, as a medium for modeling NPIs. Studies that directly consider NPIs use multistage models to model NPIs [20].

In Section 4.4, factors will be associated with a research interest, where simulation and regression experiments will be reviewed in detail. When dealing with NPIs factors, more researchers consider simulating the effects of NPIs through experimental settings in the simulation experiment part to complete the modeling and result from the analysis of NPIs. For example, after using the metapopulation model to model other factors such as human mobility, the simulation of the closed isolation policy is realized by manually interfering with the in and out the traffic of a certain population [24].

4.4.3. Ages

Age is a special, highly influential, mainly considered “individual difference within the compartment.” In the basic compartment model, individuals in the same compartment are identical, and this inappropriate assumption is addressed by many modeling approaches. Among them, age is more valued by researchers because of two characteristics: Compared with other factors such as income and social background, age data has less private information and can be easily counted and utilized; COVID-19 is highly sensitive to age, and there are great differences in the severity rate, mortality, and clinical manifestations of age groups.

The data processing the age factor is point feature data, which contains the proportion of each age composition of a specific population. Depending on the purpose of the study, the data can be the age ratio of the entire population or the age ratio of a specific diseased/susceptible population. In terms of modeling methods, there are mainly two methods for modeling age factors: unified processing through a weighted sum operator; horizontal compartment subdivision according to age.

For example, Davies and others [11] published a paper in nature that systematically analyzed the B.1.1.7 variant outbreak in England from multiple dimensions such as age, region, and medical characteristics. They used the differences in S gene target failures (SGTF) data in PCR tests in different age groups and built a model based on this to complete their research work. Zhang Juanjuan, Yu Hongjie, and others [54] sought to study how the COVID-19 outbreak in China is dynamic. They started with the contact pattern, built a detailed age-specific contact coefficient matrix based on age differences, and then built a model to complete the study.

4.4.4. Medical resources

Medical resources include beds, ICU beds, number of nurses, doctors, ventilators, vaccines, etc. Research on medical resources is based on epidemic prediction, to explore whether medical resources are sufficient and how to allocate them.

These studies are relatively independent and will add a submodel dedicated to forecasting medical resource needs on the basis of the compartment model used for epidemic modeling. The data used in forecasting medical resources is divided into two categories: clinical statistics of the point feature type and refined case data of the array type. Clinical statistics include severe disease rate, mortality rate, ICU utilization rate, average ICU treatment time, etc., which can be a single result or the results by age group and region. On the basis of traditional diagnosis, cure, and death, refined case data adds information such as admission time, onset time, and ICU admission time of these cases.

The Institute for Health Metrics and Evaluation (IHME) provides typical research in this area [55]. They first constructed a compartment model for predicting the epidemic and then constructed a data sequence of dead patients of different ages through age data. Using the admission-death time difference data of patients who died in different age groups, they calculated the admission time of patients who died in different age groups and aligned and summed them on the time axis to obtain the data series of admission times for all patients who died. Based on this, they deduced the total number of hospitalized patients in combination with the death rate of hospitalized patients, and then obtained the predicted results of medical resource demand according to the length of hospitalization of different categories (normal, severe) and the average consumption of various resources.

4.4.5. Human response

Human response is the fact that people take the initiative to take action out of psychological factors such as fear of death and fear of illness to avoid being infected as much as possible. Such a response will have an impact on the development of an infectious disease based on the principles established by the infectious disease model.

For example, Weitz and others [50] abstract the crowd’s fear of death and infection as an awareness factor and then calculate the value of this factor in real time according to the value of each compartment. They use this factor to correct the parameters of the compartment model. In this way, a self-feedback mechanism based on the dynamic parameter method is built, which models the human response factor.

4.4.6. Vaccine

The vaccine is a broad class of research interests. Vaccine research should first add vaccine-related compartments/parameters to the modeling process, and then obtain results from improved compartment models with vaccines for more detailed and specific epidemiological problems.

A study from Prof. Liu’s research team at LHSTM can better represent the data, methods, and models of vaccine research interests. This study examines how COVID-19 epidemic characteristics, population age characteristics, government policies, and population movement all influence optimal vaccine prioritization strategies [56]. In the data phase, in addition to the above general characteristics, the study also introduced data on the distribution rate of 4 vaccines based on COVAX vaccine distribution data and then established the CovidM improved compartment model to obtain the vaccine immunity compartment and vaccine subgroup. New bins such as clinical bins are eigenvalues. Finally, these values and statistical experiments were used to obtain the comparison results of the advantages and disadvantages of different vaccine priority distribution strategies on cLE, cQALY, and the other five indicators.

4.4.7. Other aspect in society like economic

The other factors include economic, pollen, UV, and other factors that are not widely studied. There is no universal standard for the data introduced into the study of these factors, or the modeling methods used.

For example, using the results predicted by basic infectious disease model simulations, combined with pollen concentration data, families and others [57] proved through statistical mathematical experiments that higher pollen concentrations are associated with higher rates of COVID-19 transmission. Using a multipopulation model, Bonaccorsi and others [58] established a refined infectious disease model based on Italian case data and population flow data and then used infectious disease indicators and economic indicators to conduct cross-axis statistics to quantitatively calculate the economic changes after the outbreak of COVID-19. Relationship to changes in crowd activity.

4.5. From DFM to research interest: simulation and regression experiments

In order to achieve research purposes, researchers use a lot of simulation and regression experiments to process the output results of infectious disease models.

The simulation experiment is based on the infectious disease model itself, by changing some real data, or using hypothetical data as input, and using different model results caused by different inputs to complete the research purpose.

Regression experiments are established outside the infectious disease model, and the output of the infectious disease model is used as the input to complete the research purpose through mathematical and statistical methods. Since this survey mainly studies the infectious disease model itself, the experimental part is only briefly described.

5. Preparedness for future outbreaks

The need to predict the timing and intensity of outbreaks of infectious diseases has been acknowledged for quite some time and has emerged as an even stronger lesson from the COVID-19 experience. Current initiatives to address this question depend on strategies for integrating theoretical process models with transmission patterns observed empirically.

Promising research in this area relies on artificial intelligence-based tools [59], [60]. The development of a prediction pipeline combines distinct methodologies: machine learning, causal diagrams, and their application to understanding the effect of large-scale drivers (e.g., climate, behavior) on the evolutionary and ecological trajectories leading to the next pandemic. The study of emergent diseases has become a major research topic in biomedicine. Its progress can be attributed to the decisive contributions made possible by the reconstruction of host–pathogen networks as an outcome of machine learning strategies. Interpretable (non-black box machine learning has proved particularly instrumental in establishing modellabfield virtuous feedback to challenge and improve predictive models. Cutting-edge success examples focus more on the structural and biochemical interactions between pathogens and cell receptors leading to humanpathogen compatibility. Assessment of the zoonotic potential in this association is the main goal in such studies. Machine learning approaches that can predict protein folding structures are expected to be added to the current toolbox already containing pipelines to harness the power of the whole individual and population genomes. Actionable predictions require robustness and interpretability of outputs generated by pipelines based on machine learning strategies. This might be achievable by embedding AI algorithms with the capability to find causes [61]. Artificial intelligence and causal inference are research areas that experienced formidable progress in the last decades but only now are finding a common ground.

Such a framework has been applied to assess the causal role of environmental changes as drivers of vector-borne diseases [62]. Santos et al. examined this relationship in detail using the spread of visceral leishmaniasis (VL) in São Paulo state (Brazil) as the case study. A two-step approach estimated the causal effects (overall, direct, and indirect) of deforestation on the occurrence of the VL vector, canine visceral leishmaniasis (CVL), and human visceral leishmaniasis (HVL).

Integration of machine learning and causal inference approaches raises the expectations of researchers in this field and might provide the appropriate methodological tools to face the challenges we encounter when studying the effect of large-scale drivers (e.g., climate) on human diseases. To achieve this, we need to conceive causal diagrams addressing evolutionary and ecological processes, an area yet almost untouched [63]. Such diagrams provide the necessary framework to describe the causal effects of events that take place at distinct interacting levels, such as ecological (deforestation), evolutionary (founder effect, niche construction), and genetic (genetic basis of vector–pathogen competence), thus helping us in forecasting the future host–pathogen landscape under climate change and intervention strategies.

6. Search strategies

6.1. Search words

COVID-19 modeling, SARS-CoV-2, compartment model, agent-based model, AI infectious disease modeling, vaccine modeling, infectious disease prediction, and simulation.

6.2. Search resource

  • Journals and Conferences: Nature, Science, PNAS, KDD, AAAI, TKDE, The Lancet, NEJM, JAMA.

  • Databases: Google Mobility, Oxford Government Response Index, IHME medical resource.

  • Websites: Github, Google, Readthedocs (docs for covasim/openABM, etc.).

6.3. Inclusion and exclusion criteria

  • Date: researches after 2020.01.01,

  • Exposure of interest: infectious disease modeling,

  • Geographic location of study: None,

  • Language: English,

  • Participants: at least 1 professor,

  • Peer review: None,

  • Reported outcomes: None,

  • Setting: None,

  • Study design: build a model to simulate/predict/analysis COVID-19,

  • Type of Publication: Nature/Science, PNAS, Top Journals/Conferences in Epidemiology or Computer Science.

7. Conclusion

After the outbreak of COVID-19, traditional compartment models, such as SEIR, which is commonly used to model infectious diseases, encountered extreme limitations. The main source of this limitation is the various influences that the social system exerts on the infectious disease system. To analyze these effects, researchers summarize them into abstract factors, then find the corresponding data representation factors, apply appropriate modeling methods to the data, and complete the study through experiments.

This review does not divide the collection of papers according to the research purpose but starts from the process of infectious disease modeling research, corresponds to the data, methods, models, and research interests, and conducts survey work on the cross-sectional area of the workflow. In terms of modeling methods, the researchers use compartment subdivision, dynamic parameters, agent-based model methods, and AI-related methods. The compartment subdivision and dynamic parameters in turn optimize the structure and value of the compartment model, and the agent model changes the compartment model’s assumptions about the “compartment.” AI-assisted or AI-led models offer a new avenue for modeling infectious diseases.

In terms of factors studied, the researchers studied 6 categories: human mobility, NPIs, ages, medical resources, human response, and vaccine. Using data or modeling methods, researchers try to achieve the modeling effect that is closest to the actual situation, so as to provide their own suggestions for quantitative analysis of the impact of social systems, and even for the future transmission status of infectious diseases and prevention and control strategies.

Conflicts of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Funding

We received project support and design guidance from National Key R&D Program of China (Grant No. 2021ZD0111201), The National Natural Science Foundation of China (Grant Nos. 82161148011, 72171013), Conselho Nacional de Desenvolvimento Científico e Tecnolgico (CNPq-Refs. 441057/2020-9, 309569/2019-2), CJS - CNPq, and Fundação deAmparo a Pesquisa do Estado do Rio de Janeiro (FAPERJ), and The Russian Foundation for basic Research, Project number 21-51-80000.

Author contributions

Honghao Shi completed the review research, chapter design, and article writing. Jingyuan Wang provided guidance and revisions for all sections. Jiawei Cheng assisted with the research and writing of Section 3 on compartment segmentation and dynamic models. Xiaopeng Qi and Hanran Ji assisted with the specific application and business of Section 4. Claudio J. Struchiner and Daniel A.M. Villela completed the work on future epidemic prediction, supplementing the application scenarios of this article. Eduard V. Karamov and Ali S. Turgiev completed the design, application, and research of the agent-based model.

References

  • 1.Anderson RM, May RM. Infectious diseases of humans: dynamics and control. Oxford University Press; Oxford: 1992. [Google Scholar]
  • 2.Kermack WO, McKendrick AG. A contribution to the mathematical theory of epidemics. Proceedings of the royal society of london Series A, Containing papers of a mathematical and physical character. 1927;115(772):700–721. [Google Scholar]
  • 3.Keeling MJ, Grenfell BT. Understanding the persistence of measles: reconciling theory, simulation and observation. Proceedings of the Royal Society of London Series B: Biological Sciences. 2002;269(1489):335–343. doi: 10.1098/rspb.2001.1898. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Lipsitch M, Cohen T, Cooper B, et al. Transmission dynamics and control of severe acute respiratory syndrome. Science. 2003;300(5627):1966–1970. doi: 10.1126/science.1086616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Balcan D, Hu H, Goncalves B, et al. Seasonal transmission potential and activity peaks of the new influenza a (H1N1): a monte carlo likelihood analysis based on human mobility. BMC medicine. 2009;7(1):1–12. doi: 10.1186/1741-7015-7-45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hurley M, Jacobs G, Gilbert M. The basic si model. New Directions for Teaching and Learning. 2006;2006(106):11–22. [Google Scholar]
  • 7.Pinto Neto O, Kennedy DM, Reis JC, et al. Mathematical model of covid-19 intervention scenarios for são paulo–brazil. Nat Commun. 2021;12(1):1–13. doi: 10.1038/s41467-020-20687-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Lai S, Ruktanonchai NW, Zhou L, et al. Effect of non-pharmaceutical interventions to contain covid-19 in china. Nature. 2020;585(7825):410–413. doi: 10.1038/s41586-020-2293-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Candido DS, Claro IM, De Jesus JG, et al. Evolution and epidemic spread of sars-cov-2 in brazil. Science. 2020;369(6508):1255–1260. doi: 10.1126/science.abd2161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Tian H, Liu Y, Li Y, et al. An investigation of transmission control measures during the first 50 days of the covid-19 epidemic in china. Science. 2020;368(6491):638–642. doi: 10.1126/science.abb6105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Davies NG, Abbott S, Barnard RC, et al. Estimated transmissibility and impact of sars-cov-2 lineage b. 1.1. 7 in england. Science 2021;372(6538):eabg3055. doi:10.1126/science.abg3055. [DOI] [PMC free article] [PubMed]
  • 12.Volz E, Mishra S, Chand M, et al. Assessing transmissibility of sars-cov-2 lineage b. 1.1. 7 in england. Nature 2021;593(7858):266–269. doi: 10.1038/s41586-021-03470-x. [DOI] [PubMed]
  • 13.Kissler SM, Tedijanto C, Goldstein E, et al. Projecting the transmission dynamics of sars-cov-2 through the postpandemic period. Science. 2020;368(6493):860–868. doi: 10.1126/science.abb5793. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Li R, Chen B, Zhang T, et al. Global covid-19 pandemic demands joint interventions for the suppression of future waves. Proc Natl Acad Sci U S A. 2020;117(42):26151–26157. doi: 10.1073/pnas.2012002117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Carleton T, Cornetet J, Huybers P, et al. Global evidence for ultraviolet radiation decreasing covid-19 growth rates. Proc Natl Acad Sci U S A. 2021;118(1) doi: 10.1073/pnas.2012370118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Baker RE, Yang W, Vecchi GA, et al. Susceptible supply limits the role of climate in the early sars-cov-2 pandemic. Science. 2020;369(6501):315–319. doi: 10.1073/pnas.2012370118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Xue L, Jing S, Miller JC, et al. A data-driven network model for the emerging covid-19 epidemics in Wuhan, Toronto and Italy. Math Biosci. 2020;326:108391. doi: 10.1016/j.mbs.2020.108391. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Salje H, Tran Kiem C, Lefrancq N, et al. Estimating the burden of sars-cov-2 in France. Science. 2020;369(6500):208–211. doi: 10.1126/science.abc3517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Karatayev VA, Anand M, Bauch CT. Local lockdowns outperform global lockdown on the far side of the covid-19 epidemic curve. Proc Natl Acad Sci U S A. 2020;117(39):24575–24580. doi: 10.1073/pnas.2014385117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Hao X, Cheng S, Wu D, et al. Reconstruction of the full transmission dynamics of COVID-19 in Wuhan. Nature. 2020;584(7821):420–424. doi: 10.1038/s41586-020-2554-8. [DOI] [PubMed] [Google Scholar]
  • 21.Duque D, Morton DP, Singh B, et al. Timing social distancing to avert unmanageable covid-19 hospital surges. Proc Natl Acad Sci U S A. 2020;117(33):19873–19878. doi: 10.1073/pnas.2009033117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Maier BF, Brockmann D. Effective containment explains subexponential growth in recent confirmed COVID-19 cases in China. Science. 2020;368(6492):742–746. doi: 10.1126/science.abb4557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hale T, Angrist N, Goldszmidt R, et al. A global panel database of pandemic policies (oxford covid-19 government response tracker) Nat Hum Behav. 2021;5(4):529–538. doi: 10.1038/s41562-021-01079-8. [DOI] [PubMed] [Google Scholar]
  • 24.Chang S, Pierson E, Koh PW, et al. Mobility network models of covid-19 explain inequities and inform reopening. Nature. 2021;589(7840):82–87. doi: 10.1038/s41586-020-2923-3. [DOI] [PubMed] [Google Scholar]
  • 25.Chang SL, Harding N, Zachreson C, et al. Modelling transmission and control of the COVID-19 pandemic in Australia. Nat Commun. 2020;11(1):1–13. doi: 10.1038/s41467-020-19393-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Schwabe A, Persson J, Feuerriegel S. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021. Predicting covid-19 spread from large-scale mobility data. [Google Scholar]
  • 27.Andronico A, Tran Kiem C, Paireau J, et al. Evaluating the impact of curfews and other measures on SARS-CoV-2 transmission in French Guiana. Nat Commun. 2021;12(1):1–8. doi: 10.1038/s41467-021-21944-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Sebhatu A, Wennberg K, Arora-Jonsson S, et al. Explaining the homogeneous diffusion of covid-19 nonpharmaceutical interventions across heterogeneous countries. Proc Natl Acad Sci U S A. 2020;117(35):21201–21208. doi: 10.1073/pnas.2010625117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Brett TS, Rohani P. Transmission dynamics reveal the impracticality of covid-19 herd immunity strategies. Proc Natl Acad Sci U S A. 2020;117(41):25897–25903. doi: 10.1073/pnas.2008087117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Marziano V, Guzzetta G, Rondinone BM, et al. Retrospective analysis of the italian exit strategy from covid-19 lockdown. Proc Natl Acad Sci U S A. 2021;118(4) doi: 10.1073/pnas.2019617118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Britton T, Ball F, Trapman P. A mathematical model reveals the influence of population heterogeneity on herd immunity to sars-cov-2. Science. 2020;369(6505):846–849. doi: 10.1126/science.abc6810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Di Domenico L, Pullano G, Sabbatini CE, et al. Modelling safe protocols for reopening schools during the covid-19 pandemic in France. Nat Commun. 2021;12(1):1–10. doi: 10.1038/s41467-021-21249-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Thurner S, Klimek P, Hanel R. A network-based explanation of why most covid-19 infection curves are linear. Proc Natl Acad Sci U S A. 2020;117(37):22684–22689. doi: 10.1073/pnas.2010398117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Schlosser F, Maier BF, Jack O, et al. Covid-19 lockdown induces disease-mitigating structural changes in mobility networks. Proc Natl Acad Sci U S A. 2020;117(52):32883–32890. doi: 10.1073/pnas.2012326117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Azimi P, Keshavarz Z, Laurent JGC, et al. Mechanistic transmission modeling of covid-19 on the diamond princess cruise ship demonstrates the importance of aerosol transmission. Proc Natl Acad Sci U S A. 2021;118(8) doi: 10.1073/pnas.2015482118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wong F, Collins JJ. Evidence that coronavirus superspreading is fat-tailed. Proc Natl Acad Sci U S A. 2020;117(47):29416–29418. doi: 10.1073/pnas.2018490117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Kortessis N, Simon MW, Barfield M, et al. The interplay of movement and spatiotemporal variation in transmission degrades pandemic control. Proc Natl Acad Sci U S A. 2020;117(48):30104–30106. doi: 10.1073/pnas.2018286117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Ali ST, Wang L, Lau EH, et al. Serial interval of SARS-CoV-2 was shortened over time by nonpharmaceutical interventions. Science. 2020;369(6507):1106–1109. doi: 10.1126/science.abc9004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Nishi A, Dewey G, Endo A, et al. Network interventions for managing the covid-19 pandemic and sustaining economy. Proc Natl Acad Sci U S A. 2020;117(48):30285–30294. doi: 10.1073/pnas.2014297117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Lau MS, Grenfell B, Thomas M, et al. Characterizing superspreading events and age-specific infectiousness of sars-cov-2 transmission in Georgia, USA. Proc Natl Acad Sci U S A. 2020;117(36):22430–22435. doi: 10.1073/pnas.2011802117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Sneppen K, Nielsen BF, Taylor RJ, et al. Overdispersion in covid-19 increases the effectiveness of limiting nonrepetitive contacts for transmission control. Proc Natl Acad Sci U S A. 2021;118(14) doi: 10.1073/pnas.2016623118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Moreno López JA, Arregui García B, Bentkowski P, et al. Anatomy of digital contact tracing: Role of age, transmission setting, adoption, and case detection. Sci Adv. 2021;7(15):eabd8750. doi: 10.1126/sciadv.abd8750. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Rykovanov GN, Lebedev SN, Zatsepin OV, et al. Agent-based simulation of the covid-19 epidemic in Russia. Her Russ Acad Sci. 2021;92(4):479–487. doi: 10.1134/S1019331622040219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Kerr CC, Stuart RM, Mistry D, et al. Covasim: an agent-based model of covid-19 dynamics and interventions. PLoS Comput Biol. 2021;17(7) doi: 10.1371/journal.pcbi.1009149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Hinch R, Probert WJ, Nurtay A, et al. Openabm-covid19–an agent-based model for non-pharmaceutical interventions against covid-19 including contact tracing. PLoS Comput Biol. 2021;17(7) doi: 10.1371/journal.pcbi.1009146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Willard J, Jia X, Xu S, et al. Integrating scientific knowledge with machine learning for engineering and environmental systems. ACM Computing Surveys (CSUR) 2021 doi: 10.48550/arXiv.2003.04919. [DOI] [Google Scholar]
  • 47.Ghamizi S, Rwemalika R, Cordy M, et al. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020. Data-driven simulation and optimization for covid-19 exit strategies. [Google Scholar]
  • 48.Wang L, Adiga A, Chen J, et al. Causalgnn: Causal-based graph neural networks for spatio-temporal epidemic forecasting. 2022.
  • 49.Dong E, Du H, Gardner L. An interactive web-based dashboard to track covid-19 in real time. Lancet Infect Dis. 2020;20(5):533–534. doi: 10.1016/S1473-3099(20)30120-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Weitz JS, Park SW, Eksin C, et al. Awareness-driven behavior changes can shift the shape of epidemics away from peaks and toward plateaus, shoulders, and oscillations. Proc Natl Acad Sci U S A. 2020;117(51):32764–32771. doi: 10.1073/pnas.2009911117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Munday JD, Sherratt K, Meakin S, et al. Implications of the school-household network structure on sars-cov-2 transmission under school reopening strategies in england. Nat Commun. 2021;12(1):1–11. doi: 10.1038/s41467-021-22213-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.LLC G. Google covid-19 community mobility reports. Available from https://www.google.com/covid19/mobility/ .
  • 53.Nouvellet P, Bhatia S, Cori A, et al. Reduction in mobility and covid-19 transmission. Nat Commun. 2021;12(1):1–9. doi: 10.1038/s41467-021-21358-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Zhang J, Litvinova M, Liang Y, et al. Changes in contact patterns shape the dynamics of the covid-19 outbreak in china. Science. 2020;368(6498):1481–1486. doi: 10.1126/science.abb8001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.COVID I, Murray CJ, et al. Forecasting the impact of the first wave of the covid-19 pandemic on hospital demand and deaths for the usa and european economic area countries. MedRxiv. 2020 [Google Scholar]
  • 56.Liu Y, Sandmann FG, Barnard RC, et al. Optimising health and economic impacts of covid-19 vaccine prioritisation strategies in the who european region: a mathematical modelling study. Lancet Reg Health Eur. 2022;12:100267. doi: 10.1016/j.lanepe.2021.100267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Damialis A, Gilles S, Sofiev M, et al. Higher airborne pollen concentrations correlated with increased sars-cov-2 infection rates, as evidenced from 31 countries across the globe. Proc Natl Acad Sci U S A. 2021;118(12) doi: 10.1073/pnas.2019034118. [DOI] [PMC free article] [PubMed] [Google Scholar]; e2019034118
  • 58.Bonaccorsi G, Pierri F, Cinelli M, et al. Economic and social consequences of human mobility restrictions under covid-19. Proc Natl Acad Sci U S A. 2020;117(27):15530–15535. doi: 10.1073/pnas.2007658117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Albery GF, Becker DJ, Brierley L, et al. The science of the host–virus network. Nat Microbiol. 2021;6(12):1483–1492. doi: 10.1038/s41564-021-00999-5. [DOI] [PubMed] [Google Scholar]
  • 60.Zhang C, Matsen IV FA. Generalizing tree probability estimation via bayesian networks. 2018. doi:10.48550/arXiv.1805.07834.
  • 61.Peters J, Janzing D, Schölkopf B. The MIT Press.; 2017. Elements of causal inference: foundations and learning algorithms. [Google Scholar]
  • 62.Santos CVBD, Sevá ADP, Werneck GL. Does deforestation drive visceral leishmaniasis transmission? a causal analysis. Proc Biol Sci. 2021;288(1957):20211537. doi: 10.1098/rspb.2021.1537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Otsuka J. Causal foundations of evolutionary genetics. Brit J Philos Sci. 2016;67(1):247–269. doi: 10.1093/bjps/axu039. [DOI] [Google Scholar]

Articles from Intelligent Medicine are provided here courtesy of Elsevier

RESOURCES