Skip to main content
Wiley - PMC COVID-19 Collection logoLink to Wiley - PMC COVID-19 Collection
. 2022 Dec 23;11(1):e511. doi: 10.1002/sta4.511

A modified SEIR model with a jump in the transmission parameter applied to COVID‐19 data on Wuhan

Tian Bai 1, Dianpeng Wang 1, Wenlin Dai 2,
PMCID: PMC9874617  PMID: 36713680

Abstract

In December 2019, Wuhan, the capital of Hubei Province, was struck by an outbreak of COVID‐19. Numerous studies have been conducted to fit COVID‐19 data and make statistical inferences. In applications, functions of the parameters in the model are usually used to assess severity of the outbreak. Because of the strategies applied during the struggle against the pandemic, the trend of the parameters changes abruptly. However, time‐varying parameters with a jump have received scant attention in the literature. In this study, a modified SEIR model is proposed to fit the actual situation of the COVID‐19 epidemic. In the proposed model, the dynamic propagation system is modified because of the high infectivity during incubation, and a time‐varying parametric strategy is suggested to account for the utility of the intervention. A corresponding model selection algorithm based on the information criterion is also suggested to detect the jump in the transmission parameter. A real data analysis based on the COVID‐19 epidemic in Wuhan and a simulation study demonstrate the plausibility and validity of the proposed method.

Keywords: COVID‐19, model selection, SEIR model, time‐varying parameter

1. INTRODUCTION

In December 2019, an outbreak of the novel coronavirus disease (COVID‐19) was reported in Wuhan, the capital of Hubei Province. This coronavirus is a new strain for humans and belongs to the RNA coronaviridae family. The infection of COVID‐19 can cause pneumonia, severe acute respiratory syndrome, and even death. The total number of infections increased at a fast rate in the early stage of the pandemic. Experts of the Health Commission and virologists analysed infected individuals and found that COVID‐19 was highly infectious with a median incubation period of 5.1 days (Lauer et al., 2020). Scientists found that rapid detection and isolation are significantly helpful for defeating the epidemic. According to the advice of Professor Nanshan Zhong, Wuhan took certain strategies to curb the spread of the pandemic. A major problem for statisticians is assessing the performance and timeliness of these interventions.

The dynamic modelling of pandemic data is an important tool for understanding the propagation of pandemics. Compartmental models provide a framework for infectious disease dynamics and one typical representative is the SEIR model. The SEIR model (Anderson & May, 1992) considers the incubation period by dividing the population into susceptible, exposed, infected, and removed. It then describes the dynamics among these individuals using a set of differential equations (DEs). Many scientists have used these models to simulate infectious pandemics such as SARS (see  Dye & Gay, 2003; Huang et al., 2004; Ng et al., 2003; Small et al., 2004; Wang & Zhao, 2003; Xu et al., 2005), Middle East respiratory syndrome (see  Eifan et al., 2017; Kwon & Jung, 2016), and HIV (see  Tang et al., 2019; Zakary, Rachik, et al., 2016; Zakary, Larrache, et al., 2016). A number of compartmental models have been proposed for predicting the trend of COVID‐19 by generalizing the SEIR model (e.g., see  Ghostine et al., 2021; Hao et al., 2020; He et al., 2020; Ndaïrou et al., 2020; Quick et al., 2021; Tian et al., 2021). Except for predictions, scenario analysis has also received extensive concerns. For example, Yang et al. (2020) utilized both the SEIR model and the recurrent neural network approach to predict the latent trend of COVID‐19 in Wuhan without interventions; Qian et al. (2020) proposed a Bayesian model for evaluating the effects of COVID‐19 lockdown policies in a global context and made counterfactual scenario analysis by utilizing a two‐layer Gaussian process prior. Many other works employed time‐varying parameters for policy utility analysis (e.g., see  Ajbar et al., 2021; Calafiore et al., 2020; Sun et al., 2020; Wang et al., 2020).

There exist two types of heterogeneity in the COVID‐19 data on Wuhan. The first type is due to the intensification of intervention strategies, especially the quarantine policy announced on January 23, 2020. As a result, the transmission and removal rates in the SEIR model should not remain constant. Chowell et al. (2004) investigated the SEIR model with a smooth transition in the transmission rate to account for intervention strategies. Lekone and Finkenstädt (2006) proposed a modified transmission rate with a more parsimonious parameterization model and suggested using Markov chain Monte Carlo methods for parameter inference. Qian et al. (2020) proposed a time‐varying transmission rate in the lower‐layer Gaussian process to account for the impact of policy changes over time; Sun et al. (2020) proposed a variant SEIR model, which substituted quarantined individuals for recovered individuals, and allowed a time‐varying effective reproduction number for forecasting the long‐term trend of COVID‐19 in China; Calafiore et al. (2020) utilized basis function expansion to fit the time‐varying parameters in the SIRD (D denotes deceased individuals) model. None of the above methods allowed the transmission rate to be discontinuous to account for the sharp change caused by intervention strategies. The second type of heterogeneity is caused by the implementation of a new diagnostic criterion, which led to the obvious jump on February 13, 2020. He et al. (2020) addressed this problem by dividing the pandemic period into two stages and fitting the model separately, which may be deficient since the information from the first stage is completely ignored when modelling the second stage.

Motivated by this problem, a modified SEIR model is proposed in this paper. The key novelty of the proposed method is that we suggest a potential jump for the time‐varying transmission parameter and provide the corresponding model selection algorithm based on the information criterion to detect the jump. Moreover, we note that medical efficiency is significantly improved by many factors, including the construction of Leishenshan and Fangcang Hospitals, medical aid from across the country, and the implementation of a catch‐all policy. All these measures make treatment more efficient. Thus, the variation in the removal rate should not be neglected. A time‐varying removal rate is also considered to characterize such a feature. To address the second type of heterogeneity caused by changes in diagnostic criterion, we consider a segmented structure in this modified SEIR model, which accounts for the correlation between the two periods. Further, the traditional SEIR model in the literature does not consider the fact that COVID‐19 has been shown to be highly infectious during the incubation period. We thus modified the differential equations of the SEIR model to overcome this drawback.

The remainder of this paper is organized as follows. In Section 2, we review the traditional SEIR model. A modified time‐varying parametric SEIR model with a jump in the transmission parameter is proposed. The model selection algorithm used to detect the jump is also provided in Section 2. In Section 3, the modified SEIR model is applied to COVID‐19 data on Wuhan released by the Health Commission. Further, real data analysis and simulation studies are used to demonstrate the performance of the new model. Some conclusions and prospects for future research are presented in Section 4.

2. METHODS

Assume that Y(t) is the number of confirmed COVID‐19 cases at time t; then, Y(t) can be formulated as follows:

Y(t)=μ(t)+ϵt, (1)

where μ(t) is the mean curve and ϵt is the error due to realistic conditions such as untimely diagnosis. Usually, the mean curve μ(t) is fitted using a propagation dynamics model, which is called the SEIR model.

2.1. Traditional SEIR model

The traditional SEIR model divides the population into four dynamic subpopulations: susceptible, exposed, infected, and removed, which can be described as follows:

  • S(t) denotes the susceptible population. An individual is susceptible before infection; that is, they are likely to be infected by infected individuals.

  • E(t) denotes the exposed population who are infected but do not show typical symptoms of infection. In the COVID‐19 case, the exposed population is highly infectious.

  • I(t) denotes the infected population who are confirmed to have been infected and have a certain probability of infecting other susceptible individuals.

  • R(t) denotes the removed population who have left the system and are no longer affected by the pandemic (recovery, death, or quarantine).

The corresponding first‐order ordinary non‐linear differential equations can be derived as follows:

dS(t)dt=βS(t)I(t)N,dE(t)dt=βS(t)I(t)Nγ1E(t),dI(t)dt=γ1E(t)γ2I(t),dR(t)dt=γ2I(t), (2)

where β>0 is the transmission rate from the susceptible population to the exposed population, γ1>0 is the transmission rate from the exposed population to the infected population, and γ2>0 is the removal rate, which denotes the transmission rate of recovery from the infected population. Note that 1/γ1 is the duration of the incubation period and 1/γ2 is the duration for which an infected patient was removed from the system due to recovery, death, or quarantine. Let N be the total population size; then, we have

N=S(t)+E(t)+I(t)+R(t). (3)

2.2. A modified SEIR model

From (2), it is clear that the exposed population is not infectious, which is inconsistent with the situation of the COVID‐19 pandemic. Assume the exposed population has the same infection rate as the infected population; then, the SEIR model can be modified as

dS(t)dt=βS(t)I(t)+E(t)N,dE(t)dt=βS(t)I(t)+E(t)Nγ1E(t),dI(t)dt=γ1E(t)γ2I(t),dR(t)dt=γ2I(t), (4)

where βS(t)I(t)+E(t)/N is the number of individuals exposed per unit of time. When the exposed population is not infectious, this term becomes βS(t)I(t)/N and leads to the traditional SEIR model. The framework of the modified model can be described as shown in Figure 1.

FIGURE 1.

FIGURE 1

Individual transition between the states of the SEIR model

In (4), the transmission parameters, β and γ2, are constant, which is also inconsistent with the situation of COVID‐19. Over time, scientists are becoming much more knowledgeable about the virus. Appropriate strategies are being formulated by Wuhan, and medical resources and clinical experience are also becoming richer. Therefore, it is unsuitable to use constant parameters to describe the transmission and removal rates of the pandemic. To model this situation, we propose a generalized time‐varying SEIR model based on (4), in which the transmission parameters β and γ2 are functions of time t.

For the transmission parameter β, assume that it is a constant in the early stage of the pandemic due to the lack of intervention strategies and then decays exponentially, which is formulated as

β(t)=β1,t<t,β2×exp{q(tt)},tt, (5)

where β1 is the initial transmission rate of COVID‐19 during the early stage of the pandemic, t is the time node at which intervention strategies take effect, β2=β1Δ is the transmission rate at time t, and q is the rate at which the transmission rate decays for t>t. When β2=β1 in (5), the model becomes similar to that proposed in Chowell et al. (2004) and Lekone and Finkenstädt (2006), which is formulated as

β(t)=β,t<t,β×exp{q(tt)},tt. (6)

Matabuena et al. (2021) considered a similar piece‐wise parameter structure when predicting the trend of COVID‐19 in Spain, but the cut‐off points between each stage are pre‐defined, which seems to be subjective and may neglect the lag effect of intervention strategies. In Section 3.1, we propose a data‐driven approach for estimating the time node t adaptively.

The construction of Leishenshan and Fangcang Hospitals, medical aid from across the country, and implementation of a catch‐all policy all make treatment more efficient, and therefore, the variation in the removal rate should not be neglected. To better capture the trend of the development of the pandemic and better account for the significant improvement in medical efficiency, we assume that the removal rate γ2(t) shows an increasing S‐trend as medical efficiency improves, which can be formulated as

γ2(t)=1Imin1Imax×11+exp((ta)×b)+1Imax, (7)

where Imax,Imin are the maximum and minimum values of the infection period of COVID‐19, which can be determined by experts' experience. a and b are the position and scale parameters to be estimated, respectively.

A time‐varying removal rate is commonly seen in the COVID‐19 data analysis literature; see, for example, Calafiore et al. (2020); Ajbar et al. (2021); Long et al. (2021); Jo et al. (2020). Burckhardt et al. (2022) mentioned that the removal rate may increase with improved treatments and expanded vaccine access. Besides, assuming an S‐trend for γ2(t) is quite reasonable regarding that the removal rate is a bounded increasing function of time. Moreover, the model fitting results to be shown in Section 3 also justify such a choice since the estimated removal rate matches the practical intervention quite well.

2.3. Parameter estimation and residual analysis

In the existing methods (see  Chowell et al., 2004; He et al., 2020; Lekone & Finkenstädt, 2006; Ng et al., 2003; Youssef et al., 2020), many statisticians have treated the population size N as a known value. However, N is not equal to the total urban population and needs to be estimated. Moreover, He et al. (2020) and Youssef et al. (2020) provided the initial model state E(0) as an initial value. Since the data observed do not include the number of people in the incubation period or people at risk of infection, the initial state E(0) is also estimated in our study. It is reasonable to use the sum of cumulative deaths and cumulative cures on the first day as the estimation of R(0) in the early stage. According to data released by the Health Commision (available at https://ncov.dxy.cn/ncovh5/view/pneumonia), I(0) and R(0) are already known. Then, S(0) can be calculated as

S(0)=NE(0)I(0)R(0). (8)

Let y1,y2,,yn be the number of confirmed cases at time t1,t2,,tn and θ be the unknown parameters in the SEIR model, including the initial condition E(0), total population size N, and state transfer parameters β1, β2, γ1, q, a, and b. Given time node t in models (5) and (6), the model parameters θ can be estimated by minimizing the sum of squared errors as follows:

θ^=arg minθΘi=1n(yiI(i))2, (9)

where I(i) is the number of infections in the SEIR model at time i. The time node t must be estimated in practical applications. In the next section, we provide an algorithm to determine the parameter t and detect the jump in the parameter β(t).

Let ϵt=ytI(t) be the residuals of the model at time t. It is easy to show that the residuals are not independent and have some autocorrelation (see Figure 8). We use an ARMA(p,q) model to analyse the residuals as follows:

ϵn=α1ϵn1+α2ϵn2++αpϵnp+εnλ1εn1λqεnq, (10)

where ϵ=(ϵ1,,ϵn)T is the vector of the SEIR model residuals and {εt} is a sequence of independent and identically distributed residuals, that is, εtN(0,σ2). We assume that the parameters α=(α1,α2,,αp) and λ=(λ1,λ2,,λq) satisfy the smoothness and reversibility conditions and that α(u) and λ(u) have no common root. When the model parameters are chosen reasonably, the autocorrelation of the model residuals can be efficiently eliminated.

FIGURE 8.

FIGURE 8

Left: residuals of the SEIR model. Middle: autocorrelation coefficients of the SEIR model residuals. Right: partial autocorrelation coefficients of the SEIR model residuals

2.4. Model selection algorithm

In Section 2.2, we proposed a modified SEIR model. In this new model, the transmission parameter β(t) is time‐varying and has a jump at the time node t. However, the jump Δ=β2β1 and time node t must be determined based on data. When Δ=0, model (5) becomes model (6). This problem involves model selection for DE models. Traditional subset selection methods such as the Akaike information criterion (AIC) (Akaike, 1998) and Bayesian information criterion (BIC) (Schwarz, 1978) can be used to select a DE model (e.g., see  Bortz & Nelson, 2006; Eguchi & Uehara, 2021; Miao et al., 2009; Miao et al., 2012). When the number of candidate models is too large and the computational cost of estimating parameters in each DE model becomes unacceptable, Zhang et al. (2015) considered the combination of a least squares approximation and the adaptive Lasso for model selection.

For our circumstance, only two models are under consideration so we use the AIC to select a better one. Thus, the question can be formulated as the following hypothesis test:

H0:Δ=0versusH1:Δ0. (11)

Under H0, model (6) is used, and under H1, model (5) is applied. Here, we propose a model selection algorithm based on the AIC. Given t, the likelihood under H0 and H1 can be derived as

LH0(θH0,ΩH0|t)=|A(ΩH0)|12(2π)n2exp12ϵTA(ΩH0)1ϵ,LH1(θH1,ΩH1|t)=|A(ΩH1)|12(2π)n2exp12ϵTA(ΩH1)1ϵ, (12)

where θHj represents the parameters of the SEIR model and ΩHj represents the parameters of the ARMA model under the Hj hypothesis, j=0,1. ϵ=(ϵ1,,ϵn)T is the vector of the SEIR model residuals and A(ΩHj)=E(ϵϵT) is the n×n‐order covariance matrix dependent on the ARMA model parameter ΩHj. Then, the corresponding AIC under H0 and H1 can be calculated as

AICH0(t)=2log(LH0(θ^0,Ω^0))+2kH0, (13)

and

AICH1(t)=2log(LH1(θ^1,Ω^1))+2kH1, (14)

where θ^j and Ω^j are the least squares estimations of the parameters under Hj,j=0,1 with the given t and kHj is the corresponding model complexity. Let t˜H0 be the time node that minimizes the AICH0(t), that is,

t˜H0=argmintAICH0(t), (15)

and the corresponding AIC value is AICH0=AICH0(t˜H0). t˜H1 and AICH1 can be defined similarly. Then, the model is determined as follows:

  • If AICH0<AICH1, model (6) is used and the estimation of t is t˜H0,

  • Otherwise, model (5) is used and the estimation of t is t˜H1.

3. RESULTS

In this section, the proposed method is applied to Wuhan pandemic data and the analysis results are used to demonstrate its performance. Simulation studies are also conducted to demonstrate the efficiency of pandemic prevention and control.

3.1. Real data analysis

We obtained the pandemic data on Wuhan from the Health Commission, with one‐day intervals, including the cumulative number of confirmed cases, death cases, and cured cases. The formula for calculating the number of confirmed cases yt on day t is

Confirmed cases yt = Cumulative confirmed cases ‐ Cumulative deaths & cures.

Since a low number of COVID‐19 cases were identified until January 16, 2020, we chose data between January 16, 2020, and April 1, 2020, in our analysis. The sequence of confirmed cases yt is illustrated in Figure 2. The number of confirmed cases on January 16, 2020, was 27 and this reached 38,007 on February 19, 2020, when it peaked. Furthermore, there was an obvious jump in the data on February 13, 2020, the date on which clinical diagnoses were used as an identification measure of COVID‐19 infection. The cumulative number of confirmed cases on this day was 32,994, including 20,630 cases diagnosed using the original technique and 12,364 newly diagnosed cases. He et al. (2020) addressed this problem by dividing the pandemic situation into two stages and fitting the SEIR model separately. However, the separated model did not consider the correlation between these two stages, as the cumulative death and cumulative cure data are continuous.

FIGURE 2.

FIGURE 2

The scatterplot of the number of confirmed cases in Wuhan. The red point represents the obvious jump on February 13, 2020

To address the heterogeneity caused by the jump, we use a segmented SEIR model in this study. First, we fit the data before February 13, 2020, using the proposed time‐varying model and then modify the model state based on the pandemic data released on February 13, 2020. We use this modified state as the initial state to fit the next stage of the SEIR model. The initial states of the segmented SEIR model can be formulated as follows:

Si(0)+Ei(0)+Ii(0)+Ri(0)=N,i=1,2;S2(0)=S1(28),E2(0)=E1(28);I2(0)=I1(28)+12364,R2(0)=R1(28), (16)

where i=1 represents the initial model states before the jump and i=2 represents the initial model states after the jump. S1(28), E1(28), I1(28), and R1(28) represent the susceptible, exposed, infected, and removed populations on February 13, 2020, and 12,364 is the number of newly clinically diagnosed cases caused by the use of clinical diagnoses.

Regarding the model estimation, we solve the ODE using the odeint function in the Python package scipy.integrate. A grid search is used to select t. Specifically, we substitute each candidate value of t into the model, estimate the remaining parameters according to (9), then compute the model MSE, and select the t corresponding to the smallest MSE. For a given t, we estimate other parameters θ^ using the Dual Annealing method (Xiang et al., 1997). We choose 10 random initial starting points for the optimization and select the best one with the smallest model MSE. Although global optimization methods are usually less computationally efficient compared to gradient methods, they are more flexible and can handle multi‐modal, non‐smooth, and even discrete objective functions. All optimizations are done by using dual_annealing function of the package scipy.optimize in Python, in which the local search method uses the Nelder–Mead method (Nelder & Mead, 1965).

We choose the number of confirmed cases on January 16, 2020, as the estimation of I1(0) and the sum of cumulative deaths and cumulative cures on January 16, 2020, as the estimation of R1(0). Based on the results of the clinical data analysis, we obtain an average treatment duration of 20 days, which contains the 14‐day observation period mandated by China's health authorities. Therefore, we determine Imax=20 and Imin=3 when modelling the data. Some appropriate initial conditions for the parameters are listed in Table 1, where we assume 0<γ1<1/3 based on the fact that the median incubation period is 5.1 days, 0<a<77 is the range of time t, and the total population size N always contains the exposed population E(t).

TABLE 1.

Initial conditions for the parameters

Parameter Initial conditions
N
[10,000, 10,000,000]
E(0)
[0, 10,000]
β1
[0, 1]
β2
[0, 1]
γ1
[0, 1/3]
a
[0, 77]
b
[0, 100,000]
q
[0, 1]

According to the model selection algorithm provided in Section 2.4, we obtain AICH0=1085.99 and AICH1=1059.82. Because AICH0>AICH1, model (4) with a jump in β(t) is selected. The estimation of the time node t is t˜H1=21 and the search details are shown in Figure 3. The corresponding estimations of parameters in the SEIR model are presented in Table 2. The value of parameter N is 3,393,863.90, which is approximately one‐third of the population of Wuhan. The estimated incubation period is 1/γ15.58 days, which is close to the actual incubation period of 5.1 days obtained based on clinical observations.

FIGURE 3.

FIGURE 3

Trend of the model MSE for each t. When t=21, which represents February 6, 2020, the MSE achieves its minimum value

TABLE 2.

Parameter estimations of the SEIR model

Parameter Estimation
N
3,393,863.90
E(0)
209.41
β1
0.31
β2
0.18
γ1
0.18
a
31.14
b
0.17
q
0.031
MSE 69,245.76
RMSE 263.15

The trend of transmission rates β(t) is displayed in the left panel of Figure 4. Wuhan announced its quarantine policy on January 23, 2020, and decided to construct the Leishenshan Hospital on January 25, 2020. On February 3, 2020, Wuhan started construction of the Fangcang Hospital and put them into use the next day. The estimated time node is February 6, 2020, which is about two incubation periods after the quarantine measures applied by Wuhan.

FIGURE 4.

FIGURE 4

Parameter estimations of the SEIR model. Left: trend of the transmission rate β(t). Middle: trend of the removal rate γ2(t). Right: trend of the effective regeneration number Reff(t), where the black dotted line represents Reff=1

The trend of removal rates γ2(t) is displayed in the middle panel of Figure 4. This shows that the change in γ2(t) in the initial stage of the pandemic was relatively slow, which indicates that the removal rate was low in the early stage and that the magnitude of the change was small due to the shortage of medical resources. However, there was a rapid upward trend around day a=31 (February 15, 2020), just a week after Leishenshan Hospital was put into use (February 8, 2020), which demonstrates that the investments in medical resources and clinical treatment were effective.

The effective regeneration number Reff(t) is used to determine the severity of an epidemic. The estimation of Reff(t) using the proposed SEIR model based on the Wuhan data is also shown in the right panel of Figure 4. In the early stage of the pandemic, the effective regeneration number was approximately 6, which indicates that COVID‐19 is highly infectious and lethal. With the intensification of intervention strategies and improvement in medical efficiency in Wuhan, the transmission of the virus was curbed and the medical level gradually improved. There was a rapid increase in γ2(t) and a rapid decrease in β(t), which resulted in a rapid decrease in Reff(t). At t0=28 (February 13, 2020), the effective regeneration number was less than 1, which means that the pandemic was under control. This is quite similar to the results of existing works such as Wang et al. (2020) and Rahman et al. (2020). The results of the model are consistent with clinical experience, experts' understanding, and the actual situation, which demonstrates the suitability of our proposed method. These results also show that the timely intervention strategies taken by Wuhan played a decisive role in curbing the pandemic.

The estimation of the mean function μ(t) using the new time‐varying parametric SEIR model and real data is shown in Figure 5. The solid red and blue lines represent the model predictions and the scatterplot is the observed data. This figure shows that the proposed model is an excellent fit for the Wuhan pandemic data and that any differences are subtle.

FIGURE 5.

FIGURE 5

Predictions of confirmed cases yt based on the time‐varying parametric SEIR model

In order to investigate the robustness of the proposed model, we conduct a small simulation. In each simulation, we multiply each of the raw observations with a random value generated from a uniform distribution on (0.9,1.1). We repeat the simulation 1000 times and derive the time node t and the corresponding model MSE. Results of selected t and model MSE are shown in Figure 6. Not surprisingly, the median of t locates at t=20, which is just one day ahead our estimation t˜H1=21. Moreover, the median of MSE is close to our results in Table 2. In conclusion, our model maintains a robust performance to the perturbation of observations.

FIGURE 6.

FIGURE 6

Robustness of the proposed model. Left: histogram for estimated time node t; Right: boxplot for model MSE, with the green dotted line representing the MSE reported in Table 2

3.2. Comparison

As aforementioned, there exist two types for the heterogeneity in the COVID‐19 data on Wuhan: intervention strategies, especially the quarantine policy announced on January 23, 2020, and the obvious jump on February 13, 2020, resulted from the implementation of a new diagnostic criterion. In order to investigate the performance of the proposed model, we compare it with four other models as below in terms of prediction accuracy.

  • (I)

    The original SEIR model with a noninfectious exposed population E and the parameters β and γ2 constant. We denote this model as SEIR;

  • (II)

    The modified SEIR model with an infectious exposed population E and a constant parameter β. We denote this model as MSEIRC;

  • (III)

    The modified SEIR model with a continuous time‐varying β(t). We denote this model as MSEIRN;

  • (IV)

    The two‐stage modified SEIR model, which divides the pandemic situation into two stages and fits the model separately. For each stage, we assume constant β and γ2. This model shares a similar idea with the one proposed in He et al. (2020). We denote this model as MSEIRS;

  • (V)

    The modified SEIR model with a jump in β(t), which is our proposed model. We denote this model as MSEIRJ.

We fit the above five models to the COVID‐19 data of Wuhan. For prediction accuracy, the following root mean squared error (RMSE) metric is used:

RMSE=1ni=1n(yiI^(i))2,

where y1,y2,,yn are the number of confirmed cases at time t1,t2,,tn, I^(i) is the prediction of infections at time i. The RMSEs of Models (I)–(V) are about 593.41, 406.22, 394.07, 324.14, and 263.15, respectively. Our model clearly outperforms other methods by providing the smallest RMSE. MSEIRC modifies the original model SEIR by assuming the exposed population E to be infectious and a time‐varying γ2, leading to a significant reduction in RMSE; MSEIRN improves MSEIRC by allowing a continuous time‐varying transmission rate β(t), but the improvement seems to be subtle; MSEIRS performs much better than MSEIRC and MSEIRN since it handles the heterogeneity by dividing the data into two stages and fitting the SEIR model separately, which actually allows the discontinuity in estimated parameter functions. However, MSEIRS needs a pre‐defined time node to divide the process rather than an adaptive estimation. It also neglects the correlation between the two stages; The proposed model, MSEIRJ, further outperforms MSEIRS. In particular, MSEIRJ assumes that one jump exists in the time‐varying transmission parameter β(t), leading to a sharp decrease that captures transmission heterogeneity. Moreover, MSEIRJ estimates the time lag of interventions adaptively. The estimated time node is February 6, 2020, twice the incubation period after the quarantine measures announced in Wuhan.

The residuals of Models (II)–(V) are shown in Figure 7, in which Model (I) is omitted due to its terrible performance. Obviously, our proposed model MSEIRJ provides a stable prediction throughout the whole pandemic. In the early stage, the four models perform analogously. Huge differences appear after the obvious jump on February 13, 2020. On February 13, the absolute residuals of MSEIRC and MSEIRN reach their peaks due to the lack of the ability to handle the heterogeneity. MSEIRS has a zero residual on this day since it uses the real data as the initial conditions for the second stage. For MSEIRJ, however, the prediction performance is acceptable owing to the sharp decrease of β(t) estimated adaptively on February 6. With the development of the pandemic, a constant parameter estimation obviously captures less information than the time‐varying parameter, as MSEIRS performs much worse than MSEIRJ during the latter part of the pandemic. Apparently, our proposed model handles the heterogeneity in data much better.

FIGURE 7.

FIGURE 7

Comparison of residuals for four different models. The red nodes denote the record times

3.3. Residual analysis

The autocorrelation coefficients and partial autocorrelation coefficients of the model residuals are displayed in Figure 8, which shows a strong autocorrelation. In Section 2.3, the ARMA time series model is suggested to fit the residuals. Here, we use the augmented Dickey–Fuller method to check the smoothness. The p‐value that corresponds to the augmented Dickey–Fuller test for the smoothness of the residuals is 2.1×105, which means that the SEIR model residuals satisfy the assumption of smoothness. Meanwhile, the Durbin–Watson test and Shapiro–Wilk test were used to test the correlation and normality of the ARMA model residuals, and the corresponding statistics are D=1.962 and W=0.991, respectively, which indicates that the ARMA model residuals meet the assumption of independent normality. With model selection, which also uses the AIC, the ARMA(1,1) model is chosen to fit the residuals under the alternative hypothesis H1. The autocorrelation coefficients and partial autocorrelation coefficients of the ARMA model residuals are displayed in Figure 9, which indicates that the autocorrelation of the model residuals is eliminated. Finally, our model MSE is 40,389.63 and RMSE is 200.97, which are relatively low compared with the large amount of pandemic data.

FIGURE 9.

FIGURE 9

ARMA model predictions. Left: predictions of the ARMA model. Middle: autocorrelation coefficients of the ARMA model residuals. Right: partial autocorrelation coefficients of the ARMA model residuals

3.4. Utility analysis of the intervention strategies

To show the efficiency of the intervention strategies in curbing the pandemic, we conduct a small simulation study. Assuming that the intervention strategies are delayed by δ days, where 0δ14, the pandemic situation is simulated using a time‐varying parametric SEIR model with the parameter values in Table 2. For comparative fairness, for each 1δ14, the number of new clinical diagnoses on the jump is the same as for δ=0, which is 12,364. In reality, this number could be higher as δ increases because the intervention strategies are delayed.

Figure 10 shows the relationship between the delayed days of the intervention strategies δ and the number of confirmed cases yt. Figure 11 shows the relationship between the delayed days of the intervention strategies δ and maximum pandemic size. As δ increases, the maximum pandemic size increases rapidly. Specifically, when δ=14, which means a two‐week delay in the intervention strategies, the final cumulative number of confirmed cases is 172,293, which is 4.6 times that when δ=0. Thus, for the Wuhan pandemic, timely intervention strategies played a crucial role in the prevention and control of the COVID‐19 virus.

FIGURE 10.

FIGURE 10

Relationship between the delayed days δ and number of confirmed cases yt in Wuhan

FIGURE 11.

FIGURE 11

Relationship between the delayed days δ and maximum size of the Wuhan pandemic

4. CONCLUSION

In this study, a novel time‐varying parametric SEIR model is proposed for COVID‐19. The contributions of the new model can be described as follows. The new model considers the fact that the incubation period for COVID‐19 is also infectious by modifying the differential equations. In the new model, time‐varying parametric models are adopted for the transmission parameters, which are used to describe the changes in the dynamic propagation of the pandemic due to medical resource investments and intervention strategies. The difference between our model and that proposed by Chowell et al. (2004) and Lekone and Finkenstädt (2006) is that we propose a discontinuous time‐varying model of the transmission rate β(t) with a jump. A model selection algorithm based on the AIC is also provided to detect the jump. In addition to the intervention strategies illustrated by the time‐varying transmission rate β(t), we assume that the removal rate γ2(t) is a function of time and propose an S‐trend model to account for the improvement in medical efficiency. Although the S‐trend model is our subjective choice, the parameter estimations of the removal rate in the application prove that the generalized form of the S‐trend is reasonable for COVID‐19.

The proposed method is applied to Wuhan data and the strong autocorrelation of the SEIR model residuals is modelled using the time series method. The results show that our method can excellently fit the pandemic trend of Wuhan with a small model MSE. The estimations of the model parameters are reasonable. Comparison results show that our proposed model has reasonable assumptions and could handle heterogeneity quite well. According to the results of the new method, intervention strategies, such as quarantining and constructing Fangcang Hospital, were efficient. In Wuhan, the intervention strategies taken by the local government were effective and timely in curbing the pandemic, which was largely under control by February 13, 2020. A small simulation is performed based on the new model with the estimated parameter values to show the impact of the timeliness of these measures on the development of the epidemic. The numerical simulations show that if the intervention strategies had been delayed by 14 days, the pandemic would have been 4.6 times larger. Therefore, such timely intervention strategies played a decisive role in curbing the spread of the pandemic. In general, our proposed model is superior and provides results consistent with existing research.

Uncertainty quantification (UQ) analysis about the proposed model is an important and challenging problem. There are mainly two ways to deal with the UQ analysis. One is to derive the asymptotic properties about the estimation θ^. Bootstrap approach can help to obtain the empirical asymptotic distributions of parameters. Another way to derive UQ is utilizing the Bayesian framework (e.g., see  Bu et al., 2022; D'Agostino McGowan et al., 2021; Taghizadeh et al., 2020). By eliminating the mean trend and giving some proper priors, we could compute the posterior distribution of model parameters based on the likelihood function. The specific parameter inference and UQ analysis will be investigated in future work.

ACKNOWLEDGEMENTS

This work was supported by the National Natural Science Foundation of China under Grant 11801034, 12171033 and 11901573 and by the Beijing Natural Science Foundation under Grant Z200001.

Bai, T. , Wang, D. , & Dai, W. (2022). A modified SEIR model with a jump in the transmission parameter applied to COVID‐19 data on Wuhan. Stat, 11(1), e511. 10.1002/sta4.511

Tian Bai and Dianpeng Wang are joint first authors.

Funding Information This research was supported by the National Natural Science Foundation of China under Grants 11801034, 12171033 and 11901573 and Beijing Natural Science Foundation under Grant Z200001.

DATA AVAILABILITY STATEMENT

The data used in this paper are openly available at https://ncov.dxy.cn/ncovh5/view/pneumonia.

REFERENCES

  1. Ajbar, A. , Alqahtani, R. T. , & Boumaza, M. (2021). Dynamics of an SIR‐based COVID‐19 model with linear incidence rate, nonlinear removal rate, and public awareness. Frontiers in Physics, 215, 1–13. [Google Scholar]
  2. Akaike, H. (1998). Information Theory and an Extension of the Maximum Likelihood Principle. Selected Papers of Hirotugu Akaike (pp. 199–213). Springer New York. [Google Scholar]
  3. Anderson, R. M. , & May, R. M. (1992). Infectious Diseases of Humans: Dynamics and Control. tOxford University Press. [Google Scholar]
  4. Bortz, D. M. , & Nelson, P. W. (2006). Model selection and mixed‐effects modeling of HIV infection dynamics. Bulletin of Mathematical Biology, 68(8), 2005–2025. [DOI] [PubMed] [Google Scholar]
  5. Bu, F. , Aiello, A. E. , Xu, J. , & Volfovsky, A. (2022). Likelihood‐based inference for partially observed epidemics on dynamic networks. Journal of the American Statistical Association, 117(537), 510–526. [Google Scholar]
  6. Burckhardt, R. M. , Dennehy, J. J. , Poon, L. L. M. , Saif, L. J. , & Enquist, L. W. (2022). Are COVID‐19 vaccine boosters needed? The science behind boosters. Journal of Virology, 96(3), e01973–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Calafiore, G. C. , Novara, C. , & Possieri, C. (2020). A time‐varying SIRD model for the COVID‐19 contagion in Italy. Annual Reviews in Control, 50, 361–372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chowell, G. , Hengartner, N. W. , Castillo‐Chavez, C. , Fenimore, P. W. , & Hyman, J. M. (2004). The basic reproductive number of Ebola and the effects of public health measures: The cases of Congo and Uganda. Journal of Theoretical Biology, 229(1), 119–126. [DOI] [PubMed] [Google Scholar]
  9. D'Agostino McGowan, L. , Grantz, K. H. , & Murray, E. (2021). Quantifying uncertainty in mechanistic models of infectious disease. American Journal of Epidemiology, 190(7), 1377–1385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Dye, C. , & Gay, N. (2003). Modeling the SARS Epidemic. Science, 300(5627), 1884–1885. [DOI] [PubMed] [Google Scholar]
  11. Eguchi, S. , & Uehara, Y. (2021). Schwartz‐type model selection for ergodic stochastic differential equation models. Scandinavian Journal of Statistics, 48(3), 950–968. [Google Scholar]
  12. Eifan, S. A. , Nour, I. , Hanif, A. , Zamzam, A. M. M. , & AlJohani, S. M. (2017). A pandemic risk assessment of Middle East respiratory syndrome coronavirus (MERS‐CoV) in Saudi Arabia. Saudi Journal of Biological Sciences, 24(7), 1631–1638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Ghostine, R. , Gharamti, M. , Hassrouny, S. , & Hoteit, I. (2021). An extended SEIR model with vaccination for forecasting the COVID‐19 pandemic in Saudi Arabia using an ensemble Kalman filter. Mathematics, 9(6), 636. [Google Scholar]
  14. Hao, X. , Cheng, S. , Wu, D. , Wu, T. , Lin, X. , & Wang, C. (2020). Reconstruction of the full transmission dynamics of COVID‐19 in Wuhan. Nature, 584(7821), 420–424. [DOI] [PubMed] [Google Scholar]
  15. He, S. , Peng, Y. , & Sun, K. (2020). SEIR modeling of the COVID‐19 and its dynamics. Nonlinear Dynamics, 101(3), 1667–1680. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Huang, D.‐S. , Guan, P. , & Zhou, B.‐S. (2004). Research on fitting of SIR model on prevalence of SARS in Beijing city. Chinese Journal of Disease Control and Prevention, 8(5), 398–401. [Google Scholar]
  17. Jo, H. , Son, H. , Hwang, H. J. , & Jung, S. Y. (2020). Analysis of COVID‐19 spread in South Korea using the SIR model with time‐dependent parameters and deep learning. medRxiv. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Kwon, C.‐M. , & Jung, J. U. (2016). Applying discrete SEIR model to characterizing MERS spread in Korea. International Journal of Modeling, Simulation, and Scientific Computing, 7(4), 1643003. [Google Scholar]
  19. Lauer, S. A. , Grantz, K. H. , Bi, Q. , Jones, F. K. , Zheng, Q. , Meredith, H. R. , Azman, A. S. , Reich, N. G. , & Lessler, J. (2020). The incubation period of coronavirus disease 2019 (COVID‐19) from publicly reported confirmed cases: Estimation and application. Annals of Internal Medicine, 172(9), 577–582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Lekone, P. E. , & Finkenstädt, B. F. (2006). Statistical inference in a stochastic epidemic SEIR model with control intervention: Ebola as a case study. Biometrics, 62(4), 1170–1177. [DOI] [PubMed] [Google Scholar]
  21. Long, J. , Khaliq, A. Q. M. , & Furati, K. M. (2021). Identification and prediction of time‐varying parameters of COVID‐19 model: A data‐driven deep learning approach. International Journal of Computer Mathematics, 98(8), 1617–1632. [Google Scholar]
  22. Matabuena, M. , Rodríguez‐Mier, P. , García‐Meixide, C. , & Leborán, V. (2021). COVID‐19: Estimation of the transmission dynamics in Spain using a stochastic simulator and black‐box optimization techniques. Computer Methods and Programs in Biomedicine, 211, 106399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Miao, H. , Dykes, C. , Demeter, L. M. , & Wu, H. (2009). Differential equation modeling of HIV viral fitness experiments: Model identification, model selection, and multimodel inference. Biometrics, 65(1), 292–300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Miao, H. , Jin, X. , Perelson, A. S. , & Wu, H. (2012). Evaluation of multitype mathematical models for CFSE‐labeling experiment data. Bulletin of Mathematical Biology, 74(2), 300–326. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Ndaïrou, F. , Area, I. , Nieto, J. J. , & Torres, D. F. M. (2020). Mathematical modeling of COVID‐19 transmission dynamics with a case study of Wuhan. Chaos, Solitons & Fractals, 135, 109846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Nelder, J. A. , & Mead, R. (1965). A simplex method for function minimization. The Computer Journal, 7(4), 308–313. [Google Scholar]
  27. Ng, T. W. , Turinici, G. , & Danchin, A. (2003). A double epidemic model for the SARS propagation. BMC Infectious Diseases, 3(1), 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Qian, Z. , Alaa, A. M. , & van der Schaar, M. (2020). When and How to Lift the Lockdown? Global COVID‐19 Scenario Analysis and Policy Assessment using Compartmental Gaussian Processes. In Larochelle, H. , Ranzato, M. , Hadsell, R. , Balcan, M. F. , & Lin, H. (Eds.), Advances in Neural Information Processing Systems, Vol. 33: Curran Associates, Inc., pp. 10729–10740. [Google Scholar]
  29. Quick, C. , Dey, R. , & Lin, X. (2021). Regression models for understanding COVID‐19 epidemic dynamics with incomplete data. Journal of the American Statistical Association, 116(536), 1561–1577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Rahman, B. , Sadraddin, E. , & Porreca, A. (2020). The basic reproduction number of SARS‐CoV‐2 in Wuhan is about to die out, how about the rest of the world? Reviews in Medical Virology, 30(4), e2111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. [Google Scholar]
  32. Small, M. , Shi, P. , & Tse, C. K. (2004). Plausible models for propagation of the SARS virus. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 87(9), 2379–2386. [Google Scholar]
  33. Sun, J. , Chen, X. , Zhang, Z. , Lai, S. , Zhao, B. , Liu, H. , Wang, S. , Huan, W. , Zhao, R. , Zheng, Y. , & Ng, M. T. A. (2020). Forecasting the long‐term trend of COVID‐19 epidemic using a dynamic model. Scientific Reports, 10(1), 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Taghizadeh, L. , Karimi, A. , & Heitzinger, C. (2020). Uncertainty quantification in epidemiological models for the COVID‐19 pandemic. Computers in Biology and Medicine, 125, 104011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Tang, L. , Sun, K. , Chen, F. F. , & Li, D. M. (2019). Progress on estimation and projection of HIV epidemics. Zhonghua liu Xing Bing xue za zhi= Zhonghua Liuxingbingxue Zazhi, 40(6), 731–738. [DOI] [PubMed] [Google Scholar]
  36. Tian, T. , Tan, J. , Jiang, Y. , Wang, X. , & Zhang, H. (2021). Evaluate the risk of resumption of business for the states of New York, New Jersey and Connecticut via a pre‐symptomatic and asymptomatic transmission model of COVID‐19. Journal of Data Science, 19(2), 178–196. [Google Scholar]
  37. Wang, D. , & Zhao, X. (2003). Empirical analysis and forecasting for SARS epidemic situation. Beijing da xue xue bao. Yi xue ban= Journal of Peking University. Health sciences, 35, 72–74. [PubMed] [Google Scholar]
  38. Wang, H. , Wang, Z. , Dong, Y. , Chang, R. , Xu, C. , Yu, X. , Zhang, S. , Tsamlag, L. , Shang, M. , Huang, J. , Wang, Y. , Xu, G. , Shen, T. , Zhang, X. , & Cai, Y. (2020). Phase‐adjusted estimation of the number of coronavirus disease 2019 cases in Wuhan, China. Cell Discovery, 6(1), 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Wang, K. , Zhao, S. , Li, H. , Song, Y. , Wang, L. , Wang, M. H. , Peng, Z. , Li, H. , & He, D. (2020). Real‐time estimation of the reproduction number of the novel coronavirus disease (COVID‐19) in China in 2020 based on incidence data. Annals of Translational Medicine, 8(11), 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Xiang, Y. , Sun, D. Y. , Fan, W. , & Gong, X. G. (1997). Generalized simulated annealing algorithm and its application to the Thomson model. Physics Letters A, 233(3), 216–220. [Google Scholar]
  41. Xu, G.‐X. , Feng, E.‐M. , Wang, Z.‐T. , Tan, X.‐X. , & Zhi‐Long, X. (2005). SEIR dynamic model of SARS epidemic and parameter identification. Journal of Natural Science of Heilongjiang University, 4, 43–46+51. [Google Scholar]
  42. Yang, Z. , Zeng, Z. , Wang, K. , Wong, S.‐S. , Liang, W. , Zanin, M. , Liu, P. , Cao, X. , Gao, Z. , Mai, Z. , Liang, J. , Liu, X. , Li, S. , Li, Y. , Ye, F. , Guan, W. , Yang, Y. , Li, F. , Luo, S. , …, & He, J. (2020). Modified SEIR and AI prediction of the epidemics trend of COVID‐19 in China under public health interventions. Journal of Thoracic Disease, 12(3), 165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Youssef, H. M. , Alghamdi, N. A. , Ezzat, M. A. , El‐Bary, A. A. , & Shawky, A. M. (2020). A modified SEIR model applied to the data of COVID‐19 spread in Saudi Arabia. AIP Advances, 10(12), 125210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Zakary, O. , Larrache, A. , Rachik, M. , & Elmouki, I. (2016). Effect of awareness programs and travel‐blocking operations in the control of HIV/AIDS outbreaks: A multi‐domains SIR model. Advances in Difference Equations, 2016(1), 1–17. [Google Scholar]
  45. Zakary, O. , Rachik, M. , & Elmouki, I. (2016). On the impact of awareness programs in HIV/AIDS prevention: An SIR model with optimal control. International Journal of Computer Applications, 133(9), 1–6. [Google Scholar]
  46. Zhang, X. , Cao, J. , & Carroll, R. J. (2015). On the selection of ordinary differential equation models with application to predator‐prey dynamical models. Biometrics, 71(1), 131–138. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data used in this paper are openly available at https://ncov.dxy.cn/ncovh5/view/pneumonia.


Articles from Stat (International Statistical Institute) are provided here courtesy of Wiley

RESOURCES