Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2022 Apr 23;12:6675. doi: 10.1038/s41598-022-10723-w

A new estimation method for COVID-19 time-varying reproduction number using active cases

Agus Hasan 1,, Hadi Susanto 2,3, Venansius Tjahjono 4, Rudy Kusdiantara 5, Endah Putri 4, Nuning Nuraini 5, Panji Hadisoemarto 6
PMCID: PMC9035172  PMID: 35461352

Abstract

We propose a new method to estimate the time-varying effective (or instantaneous) reproduction number of the novel coronavirus disease (COVID-19). The method is based on a discrete-time stochastic augmented compartmental model that describes the virus transmission. A two-stage estimation method, which combines the Extended Kalman Filter (EKF) to estimate the reported state variables (active and removed cases) and a low pass filter based on a rational transfer function to remove short term fluctuations of the reported cases, is used with case uncertainties that are assumed to follow a Gaussian distribution. Our method does not require information regarding serial intervals, which makes the estimation procedure simpler without reducing the quality of the estimate. We show that the proposed method is comparable to common approaches, e.g., age-structured and new cases based sequential Bayesian models. We also apply it to COVID-19 cases in the Scandinavian countries: Denmark, Sweden, and Norway, where the positive rates were below 5% recommended by WHO.

Subject terms: Viral infection, Computational models

Introduction

The coronavirus disease 2019 (COVID-19), a disease outbreak of atypical pneumonia that originated from Wuhan, China1, has caused globally at least 250 million confirmed cases, including an estimated 5 million deaths in approximately 221 countries and territories by November 2021. The World Health Organization (WHO) declared the COVID-19 crisis a pandemic on 11 March 2020.

In modelling the disease’s transmission as well as to inform and evaluate control policies, it is particularly important to estimate its reproduction number. Early estimates for COVID-19 basic reproduction number R0, that denotes the transmission potential of infectious disease when introduced to a completely susceptible population, ranged from 1.4 to 6.492. The effective (or instantaneous) reproduction number Rt, on the other hand, reflects the extent of transmission in the presence of population immunity or intervention. Thus, the estimation of Rt is important for evaluating public measure success. However, estimation of Rt is sensitive to the model structure and parameter assumptions3. As a case in point, due to incorporation of more individual case information and travel data, the estimate for R0 in Wuhan was revised upward from 2.2–2.7 to 5.74. On the other hand, data inavailability or poor quality often hinders the use of certain estimation methods, such as serial interval data that are usually needed to estimate Rt (e.g., Fraser5, Wallinga and Teunis6, Cauchemez et al.7, White and Pagano8).

In the course of calculating the exact value of Rt, especially when the data has not yet reached its peak, precise assumptions and data estimates are needed. Nishiura et al.9 discussed a likelihood-based approach to estimate Rt from early epidemic growth data, while Cazelles at al.10 used stochastic models for the disease dynamics coupled with particle Markov chain Monte Carlo algorithm. Using the compartmental Susceptible-Infectious-Recovered (SIR) model, Bettencourt and Ribeiro11 use the incidence data to estimate R0 and Rt. In this paper, based on the Susceptible-Infectious-Recovered-Dead (SIRD) model as a reference, we develop a novel approach to estimate Rt of COVID-19. It uses information on the number of infected or active (I), recovered (R), and death (D) cases, which are readily available for all affected countries, so that they can be accessed rather easily. This method does not require information regarding serial intervals, which makes the estimation procedure simpler without reducing the quality of the estimate. We assume mass population testing is sufficiently enough, such that the positive rate is below 5% recommended by WHO. This is to ensure data quality of the number of infection is acceptable since asymptomatic carrier transmission is often underestimated12.

The reproduction number is estimated from reported cases under uncertainties using a two-stage estimation method based on the Extended Kalman Filter (EKF) and a low-pass filter. The method not only considers the nominal number of reported cases, but also its daily pattern. To show our method’s practical ability, we apply it to COVID-19 cases in the Scandinavian countries: Denmark, Sweden, and Norway, and compare the results with two commonly used Bayesian methods due to Bettencourt and Ribeiro11 and Cori et al.5,13. We show that the results are indeed comparable. Remark that a similar approach, developed independently, can be found in14. The difference is in computational technique to estimate the reproduction number. In this paper, we estimate the reproduction number using EKF, while in14 it was estimated using Kalman smoother.

A discrete-time stochastic augmented compartmental model

Our estimation method is based on the compartmental SIRD model that can be written as the following first-order nonlinear differential equations:

S˙(t)=-βI(t)S(t)N, 1
I˙(t)=βI(t)S(t)N-(γ+κ)I(t), 2
R˙(t)=γI(t), 3
D˙(t)=κI(t), 4

where S, I, R, and D denote the number of susceptible cases, the number of active cases, the number of recovered cases, and the number of deceased cases, respectively. N is the total number of population, β is the average number of contacts per person per time, while γ and κ are the recovery and death rate. Remark that the value of β is time-varying due to intervention, i.e., β = β(t). To use the model, we require information on the average infectious time Ti and the Case Fatality Rate (CFR), so that

γ=1-CFRTi,κ=CFRTi. 5

For COVID-19, we take Ti=9 as the infectious period on average lasts for 9 days (7–11 days with 95% CI)15, while the CFR is assumed around 1%. The time-varying effective reproduction number is then given by:

Rt(t)=S(t)Nβ(t)γ+κβ(t)γ+κ. 6

The approximation is under the assumption that government intervention is taken at an early stage so that the susceptible is relatively the same over time as the total population. This is the case especially for emerging diseases. We modify the SIRD model by augmenting the following two equations into the system:

E˙(t)=(γ+κ)I(t)-E(t),Rt˙(t)=0. 7

The former equation takes into account the daily number of new reported cases E, while the latter one says that the effective reproduction number Rt is assumed to be a piece-wise constant function with jump every 1 day time interval.

Discretizing the model using the forward Euler method, we obtain the following discrete-time augmented SIRD model:

S(k+1)=1-(γ+κ)ΔtNRt(k)I(k)S(k), 8
I(k+1)=(1-(γ+κ)Δt)I(k)+(γ+κ)ΔtNRt(k)I(k)S(k), 9
R(k+1)=R(k)+γΔtI(k), 10
D(k+1)=D(k)+κΔtI(k), 11
E(k+1)=(γ+κ)ΔtI(k)+(1-Δt)E(k), 12
Rt(k+1)=Rt(k). 13

Our method computes a new estimate of Rt based on new reported cases. Since their frequency is low (could be once a day), the reported data can be interpolated using, e.g., a modified Akima cubic Hermite interpolation, such that it fits with the time step Δt. In our simulation, the time step Δt is chosen as 0.01, i.e., 100 time discretization within 1 day interval. The confidence interval of our estimated Rt is determined by computing the reproduction number for different values of the infectious period Ti within a certain interval.

To simplify the presentation, we define the augmented state vector

x(k+1)=S(k+1)I(k+1)R(k+1)D(k+1)E(k+1)Rt(k+1), 14

and as such, the discrete-time augmented SIRD model (8)–(13) can be written as follows

x(k+1)=f(x(k))+w(k), 15

where f is the nonlinear term written in the right hand side of (8)–(13) and w is introduced as an uncertainty to model the inaccuracies due to simplification in the modelling. The uncertainty is assumed to be a zero mean Gaussian white noise with known covariance QF. This is to simplify the calculation since the actual epidemic data usually follow Gamma distribution. In practice, QF can be considered as a tuning parameter for the EKF. Thus, the transmission model becomes a discrete-time stochastic augmented SIRD model.

Reported cases, such as the number of active cases and the cumulative numbers of recovered and death, can be incorporated into the model using the following output vector

y(k+1)=Cx(k)+v(k). 16

Here, v denotes uncertainties due to false testing results. We also assume the uncertainty to be a zero mean Gaussian white noise with known covariance RF. As well as QF, RF can also be considered as a tuning parameter. Following the available data that include I, R, D, and E, the data/measurement matrix C is taken to be

C=100000010000001000000100000010. 17

A two-stage filtering method

A two-stage filtering method is used to estimate the daily reproduction number Rt. The method consists of the EKF and a low-pass filter. In the first stage of estimation, the EKF is used to estimate the state variables and the value of Rt under uncertainties in the number of reported cases. Afterwards, the low pass filter is used to remove short term fluctuations of the reported cases that can be caused by delays in the reporting. For example, suddenly in Denmark there were 893 recovered patients reported on 1 April 2020, in contrast to the previous days from 16 February 2020 onwards when there was no recovery reported at all. Such an accumulated delay can cause a falsely decreasing value of Rt.

The EKF is an extension of Kalman filter for nonlinear systems. The Kalman filter itself is based on a recursive Bayesian estimation and is an optimal linear filter. The idea of EKF is to linearize the non-linearity around its estimate. Due to that linearization, the optimality and stability of the EKF cannot be guaranteed. However, if the non-linearity is not severe, the EKF can give a reasonably good estimate.

Let us denote x^(k) as an estimated vector state from the EKF. Applying first-order Taylor series expansion to f at x^(k), we obtain

f(x(k))=f(x^(k))+Jf(x^(k))(x(k)-x^(k)), 18

where Jf(x^(k)) is the Jacobian matrix of f, given by:

Jf(x^(k))=J11(x^(k))J12(x^(k))000J16(x^(k))J21(x^(k))J22(x^(k))000J26(x^(k))0γΔt10000κΔt01000(γ+κ)Δt001-Δt0000001, 19

where

J11(x^(k))=1-(γ+κ)ΔtNRt^(k)I^(k), 20
J12(x^(k))=-(γ+κ)ΔtNRt^(k)S^(k), 21
J16(x^(k))=-(γ+κ)ΔtNI^(k)S^(k), 22
J21(x^(k))=(γ+κ)ΔtNRt^(k)I^(k), 23
J22(x^(k))=1-(γ+κ)Δt+(γ+κ)ΔtNRt^(k)S^(k), 24
J26(x^(k))=(γ+κ)ΔtNI^(k)S^(k). 25

The EKF consists of two steps: predict and update. The discrete-time stochastic augmented SIRD model is used to predict the next state and covariance and update them after obtaining new data/measurement. The EKF can be considered as one of the simplest dynamic Bayesian networks. While the EKF calculates estimates of the true values of states recursively over time using incoming measurements and a mathematical process model, recursive Bayesian estimation calculates estimates of an unknown probability density function recursively over time using incoming measurements and a mathematical process model16. Let x^(n|m) denotes the estimate of x at time n given observations up to and including at time mn. The Kalman filter algorithm is given as follows17

Predict

x^(k+1|k)=f(x^(k|k)) 26
P(k+1|k)=Jf(x^(k|k))P(k|k)Jf(x^(k|k))+QF(k) 27

Update

y~(k+1)=y(k+1)-Cx^(k+1|k) 28
K(k+1)=P(k+1|k)CCP(k+1|k)C+RF(k)-1 29
x^(k+1|k+1)=x^(k+1|k)+K(k+1)y~(k+1) 30
P(k+1|k+1)=I-K(k+1)CP(k+1|k) 31

Here P(k|k) denotes a posteriori estimate covariance matrix. In the second stage, a low pass filter based on a rational transfer function is used to remove short term fluctuation at time step k, and is given by

y^(k)=1ynx^(k)+x^(k-1)++x^(k-yn+1), 32

where yn is a window length along the data. In our case, we choose yn=3Δt.

To evaluate the quality of the estimate, we calculate a Relative Root Mean Square Error (RRMSE) between the estimated and reported cases. The RRMSE is defined as

RRMSE=1Ndi=1NdXi-X^i2Xi2, 33

where Nd is the number of observed days and X{I,D,R,E}.

Case study: Scandinavian countries

In this section, we apply our method to study viral transmission of COVID-19 in Denmark, Sweden, and Norway. All datasets and MATLAB code are available on GitHub (https://github.com/agusisma/covid19). As of January 2021, the three Scandinavian countries have higher cumulative testing rate compared to other parts of the world, with Denmark held a record with 260 tests per 1000 population. During that time the daily test-positivity rate is below 1% for Denmark and Norway, while Sweden is at 2.9%18. These numbers are good indications about the testing capacity in the Scandinavian countries and may describe the dynamics of the transmission better with respect to asymptomatic cases. The countries also have a different approach in their public measures in responding to COVID-19, e.g., Sweden did not implement a strict lockdown, unlike its Nordic neighbouring countries.

We plot the observed incidence of COVID-19 in Denmark, Sweden, and Norway in Fig. 1. We also plot in the same figure estimated numbers computed using our method, where good agreement is obtained. For all estimation, the process and observation covariance matrices are considered as tuning parameters and are chosen as QF=diag(1010101050.2) and RF=diag(100101051), respectively. These parameters are obtained from trial and error and are chosen such that the RRMSE between the estimated and reported data are sufficiently small. In our case study, the RRMSE are shown in Table 1. Here, we can observe the method provides relatively small estimation errors for all countries. Norway has the largest error, which can be attributed to the lack of daily update of the active and recovered cases. In our simulation, we use the same tuning parameters. The error can be reduced by using different value of QF and RF.

Figure 1.

Figure 1

Comparison between reported and estimated cases for active (I), recovered (R), death (D), and daily new cases (E) from the three Scandinavian countries.

Table 1.

RRMSE of the two-stage filtering method for the three Scandinavian countries.

RRMSE
Country I R D E Total
Denmark 7.8943e-05 0.1396 1.1640e-04 6.0490e-05 0.1399
Sweden 4.9208e-05 0.0155 1.3375e-04 0.0102 0.0259
Norway 0.0011 0.0682 7.9477e-05 0.1631 0.2326

In applying our method, we also compare it with two commonly used methods to estimate transmission parameters, namely the sequential Bayesian method of Bettencourt and Ribeiro11 that provides an approximation of the basic reproduction number, and the instantaneous method by Fraser5 that is implemented with a Bayesian analysis13,19. The former method exploits the new reported incidence, while the latter one uses the distribution of the serial interval.

First, we compare our method with Bettencourt and Ribeiro11, that allows sequential estimation of the basic reproduction number at the initial stage when the growth is still exponential. While the two methods are based on the SIR model, Bettencourt and Ribeiro11 use new incidence data and the result is filtered using a 5-day moving average filter. In Fig. 2, we plot the comparison and summarise the basic reproduction numbers that are taken to be the maximum of the curves in Table 2. It is interesting to note how the methods give rather similar estimations. This indicates that our method gives comparable results to those of11.

Figure 2.

Figure 2

Comparison of the estimated reproduction number at the early stage of the pandemic between our proposed method and Bettencourt and Ribeiro11.

Table 2.

Estimation of the basic reproduction number R0 using our method and Bettencourt and Ribeiro11.

Current method Bettencourt and Ribeiro11
Denmark 9.6 [95% CI 7.7–11.4] 8.6 [95% CI 6.7–10.5]
Sweden 5.4 [95% CI 4.9–6.4] 6.5 [95% CI 3.3–9.6]
Norway 5.2 [95% CI 4.2–6.1] 4.6 [95% CI 1.3–7.9]

Finally, we plot the time-varying effective reproduction number in Fig. 3. Here, we compare our results with those using Cori et al.13. The method of13 utilises the disease serial interval, which we approximate using a shifted Gamma distribution13 with mean 4.7 and standard deviation 2.920. The prior belief for the value of Rt is taken to be Gamma function with mean and standard deviation 5. We do not average out the data of daily new cases, but instead take the likelihood estimation of a new case at 1 day to depend also on the estimation of the previous 3 days.

Figure 3.

Figure 3

Comparison of the estimated time-varying effective reproduction number between our method and Cori et al.13.

In Fig. 3, we obtain that the two methods give the plot of Rt with the same trend, indicating that our method is also comparable with13. There is a delay of about 4 days in the trend, especially with the time when the reproduction number curve crossed the horizontal axis. The delay is caused by the peaks of new daily cases and active ones that also differ by about the same days.

A different trend especially at later times between the methods appears in Fig. 3c for Norway. The curve from our method is quite smooth, while it is rather fluctuating in that using Cori et al.13. The discrepancy is caused by the active and recovered cases that apparently were not updated regularly, in contrast to the new positive cases needed by the method of13. The unreported recovery cases were all released at once on May 22, 2020, see Fig. 1c.

Conclusion and future work

Many mathematical models and estimation methods have been developed to estimate several types of reproduction numbers during epidemic outbreaks. Here, we provide a novel method exploiting reported active, recovered and death cases using the SIR model as an underlying approach. This new method offers several advantages compared to existing methods: (1) from modeling point of view, the resulting Rt value can follow the dynamics of the model suggested, so it is possible to develop it further if the model chosen has a higher complexity, (2) the estimation method can still be expanded in terms of statistical view, and (3) the method does not need information about serial intervals. In the case that the data provided in time series do not change much or instead have drastic changes, such as accumulating at a certain time, the resulting Rt value will show the same spikes and serrations. As a result, the latest information from data dynamics can be more elaborated.

By applying the method to COVID-19 cases in the Scandinavian countries and comparing the results to commonly used methods due to11,13, we showed that our model is comparable, which expectedly will allow for fast assessment of the reproduction number in new outbreaks. Using the method to forecast and critically assess incidence data in countries with high under-reporting, such as Indonesia, is addressed for future work.

Acknowledgements

This work was supported by National Research and Innovation Agency (BRIN), project number 133/FI/P-KCOVID-19 2B3/IX/2020, 835/IT1 B07/KS.00.00/2020. HS is also supported by Khalifa University through a Faculty Start-Up Grant (No.\ 8474000351-FSU-2021-011). RK gratefully acknowledgement financial support from Riset Peningkatan Kapasitas ITB 2020. Results from this research have been used by BRIN to monitor COIVD-19 transmission in Indonesia: https://www.brin.go.id/covid19/covid-meter/.

Author contributions

A.H. proposed the method and written the codes, H.S., V.T., R.K., E.P., P.H., and N.N. analysed the data, compared the method, and discussed the results. All authors reviewed the manuscript.

Data availability

Data and codes can be found here: https://github.com/agusisma/covid19.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.The 2019-nCoV Outbreak Joint Field Epidemiology Investigation Team An outbreak of NCIP (2019-nCoV) infection in China-Wuhan, Hubei province. China CDC Wkly. 2020;2:79–80. doi: 10.46234/ccdcw2020.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Liu Y, Gayle A, Wilder-Smith A, Rocklov J. The reproductive number of COVID-19 is higher compared to SARS Coronavirus. J. Travel Med. 2020;27:1–4. doi: 10.1093/jtm/taaa021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Delamater P, Street E, Leslie T, Yang Y, Jacobsen K. Complexity of the basic reproduction number (R0) Emerg. Infect. Dis. 2019;25:1–4. doi: 10.3201/eid2501.171901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Sanche S, et al. High contagiousness and rapid spread of severe acute respiratory syndrome Coronavirus 2. Emerg. Infect. Dis. 2020;26:1–8. doi: 10.3201/eid2607.200282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Fraser C. Estimating individual and household reproduction numbers in an emerging epidemic. PLoS One. 2007;2:1–12. doi: 10.1371/journal.pone.0000758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wallinga J, Teunis P. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. Am. J. Epidemiol. 2004;160:509–516. doi: 10.1093/aje/kwh255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Cauchemez S, et al. Real-time estimates in early detection of SARS. Emerg. Infect. Dis. 2006;12:1–4. doi: 10.3201/eid1201.050593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.White L, Pagano M. Transmissibility of the Influenza virus in the 1918 pandemic. PLoS One. 2008;3:1–6. doi: 10.1371/journal.pone.0001498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Nishiura H, Chowell G, Safan M, Castillo-Chavez C. Pros and cons of estimating the reproduction number from early epidemic growth rate of Influenza a (H1N1) 2009. Theoret. Biol. Med. Modell. 2010;7:1–13. doi: 10.1186/1742-4682-7-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Cazelles B, Champagne C, Dureau J. Accounting for non-stationarity in epidemiology by embedding time-varying parameters in stochastic models. PLoS Comput. Biol. 2018;15:e1007062. doi: 10.1371/journal.pcbi.1006211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Bettencourt L, Ribeiro R. Real time Bayesian estimation of the epidemic potential of emerging infectious diseases. PLoS One. 2008;3:1–9. doi: 10.1371/journal.pone.0002185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zhao H, Lu X, Deng Y, Tang Y, Lu J. COVID-19: Asymptomatic carrier transmission is an underestimated problem. Epidemiol. Infect. 2020;148:1–3. doi: 10.1017/S0950268820001235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Cori A, Ferguson N, Fraser C, Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am. J. Epidemiol. 2013;178:1505–1512. doi: 10.1093/aje/kwt133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Arroyo-Marioli F, Bullano F, Kucinskas S, Rondon-Moreno C. Tracking R of COVID-19: A new real-time estimation using the Kalman filter. PLoS One. 2021;16:1–16. doi: 10.1371/journal.pone.0244474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kai-Wang To K, et al. Temporal pr ofiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infection by SARS-CoV-2: An observational cohort study. Lancet. Infect. Dis. 2020;20:565–574. doi: 10.1016/S1473-3099(20)30196-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Masreliez C, Martin R. Robust bayesian estimation for the linear model and robustifying the Kalman filter. IEEE Trans. Autom. Control. 1977;22:361–371. doi: 10.1109/TAC.1977.1101538. [DOI] [Google Scholar]
  • 17.Simon D. Optimal state estimation. In: Kalman H, editor. Infinity, and Nonlinear Approaches. 1. ***: Wiley; 2006. [Google Scholar]
  • 18.Yarmol-Matusiak EA, Cipriano LE, Stranges S. A comparison of COVID-19 epidemiological indicators in Sweden, Norway, Denmark, and Finland. Scand. J. Public Health. 2021;49:69–78. doi: 10.1177/1403494820980264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Chowell G, Hyman J, Bettencourt L, Castillo-Chavez C. Mathematical and Statistical Estimation Approaches in Epidemiology. 1. ***: Springer; 2009. [Google Scholar]
  • 20.Nishiura H, Linton N, Akhmetzhanov A. Serial interval of novel coronavirus (COVID-19) infections. Int. J. Infect. Dis. 2020;93:284–286. doi: 10.1016/j.ijid.2020.02.060. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data and codes can be found here: https://github.com/agusisma/covid19.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES