Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2021 Jan 29;17(1):e1008679. doi: 10.1371/journal.pcbi.1008679

Estimating the time-varying reproduction number of COVID-19 with a state-space method

Shinsuke Koyama 1, Taiki Horie 2, Shigeru Shinomoto 2,3,*
Editor: Roger Dimitri Kouyos4
PMCID: PMC7875393  PMID: 33513137

Abstract

After slowing down the spread of the novel coronavirus COVID-19, many countries have started to relax their confinement measures in the face of critical damage to socioeconomic structures. At this stage, it is desirable to monitor the degree to which political measures or social affairs have exerted influence on the spread of disease. Though it is difficult to trace back individual transmission of infections whose incubation periods are long and highly variable, estimating the average spreading rate is possible if a proper mathematical model can be devised to analyze daily event-occurrences. To render an accurate assessment, we have devised a state-space method for fitting a discrete-time variant of the Hawkes process to a given dataset of daily confirmed cases. The proposed method detects changes occurring in each country and assesses the impact of social events in terms of the temporally varying reproduction number, which corresponds to the average number of cases directly caused by a single infected case. Moreover, the proposed method can be used to predict the possible consequences of alternative political measures. This information can serve as a reference for behavioral guidelines that should be adopted according to the varying risk of infection.

Author summary

Society and the media alternate between hope and despair in response to the temporary decrease or increase of daily new COVID-19 infections. The number of cases has been dependent on the political measures that were adopted in each country. Accordingly, there is a strong demand for quantifying the effects of individual measures. The reproduction number, defined as the average number of cases directly caused by a single infected case, is one of the indices of the current infectivity status. To capture the time-varying reproduction number correctly, it is necessary to incorporate the distribution of delays, which are widely dispersed from 2 to 14 days for the case of COVID-19. We have developed a state-space method for estimating the reproduction number solely from an available dataset of the number of daily cases. Our method automatically detects the change-points in the reproduction number. We apply our method to the real data and examine if the detected changes are consistent with the times at which political measures had been taken in each country. Furthermore, our method can be used to predict the number of new cases in the future to examine the possible consequences of alternative political measures.

Introduction

While the novel coronavirus COVID-19 has spread worldwide, different countries have employed various intervention strategies, many of which were later followed by liberalized approaches and relaxed behaviour from individuals. In this situation, it is desirable to monitor the extent to which individual political measures have influenced the spread of the disease in each country and predict the possible consequences of alternative measures.

A fundamental metric representing the degree of the spread of a disease is the reproduction number, R, which is defined as the average number of cases directly caused by a single case [1, 2]. However, it is difficult to trace the concrete processes by which infections have been transmitted among individual people, particularly considering the protection of private information. Thus, a statistical analytical method is needed to infer the underlying process from the available data consisting of the number of daily infected cases, which were obtained by imperfect observation and accompanied by errors.

Mathematical epidemiological studies using the ordinary differential equation (ODE) models, such as the susceptible–infectious–recovered (SIR) model, have contributed to our understanding of causal factor dynamics, the results of which can be used to suggest control measures needed in given situations [35]. While the original study of Kermack and McKendrick in 1927 [6] considered the distribution of delays in the transmission of a disease, the majority of later studies used ODEs in favor of an analytical treatment [7]. Though ODE models also assume the transmission delay, such that the SIR model represents the situation in which delays are distributed exponentially [8], they cannot adopt the specific distribution of delays for each disease. In the case of COVID-19, transmission delays are known to be widely dispersed from 2 to 14 days [912]. To capture the time-varying reproduction number under fluctuating circumstances, it is necessary to incorporate the delay distribution explicitly in the analysis, as previously performed in an analysis using the semi-mechanistic Bayesian hierarchical model [13].

Recently, Chiang, Liu, and Mohler [14] modeled COVID-19 transmission using the Hawkes process [15], in which the delay distribution can be explicitly adopted as a self-exciting kernel. They combined the Hawkes process with spatial and temporal covariates, such as demographic features and Google mobility indices, to explain the variability of the reproduction number, and to forecast future cases and deaths in the USA.

Herein, we establish a state-space method for estimating the time-varying reproduction number by fitting a discrete-time variant of the Hawkes process. While the semi-mechanistic Bayesian hierarchical model [13] requires manual assignment of the change-points, our method automatically detects the change-points solely from a given series of the number of daily cases. We first apply the method to synthetic data to confirm that the method properly detects the change-points embedded in the simulation. Here, the proposed method is compared with a conventional method in terms of performance estimation of the time-varying reproduction ratio. Then, we apply the proposed method to real data and examine whether the detected changes are consistent with the times at which political measures had been implemented in each country. The proposed method can also predict the number of new cases in the future to examine any possible consequences of alternative political measures.

Methods

We have developed a state-space method of estimating the temporally changing reproduction number from a given series of the number of daily new infections, by introducing a discrete-time variant of the Hawkes process as a basic model describing the transmission of disease. The state-space method describes the evolution of a system by a set of first-order difference equations of state variables. The state variables can be inferred from measured data using a recursive Bayesian estimation [16].

The rate process on a daily basis

The (original, continuous-time) Hawkes process describes a self-excitation process in terms of the instantaneous occurrence rate λ(t) as

λ(t)=μ+Rtk<tϕ(t-tk), (1)

where μ is the spontaneous occurrence rate, and the second term represents a self-excitation effect such that the occurrence of an event adds the probability of future events (Fig 1). R is the reproduction number representing the average number of events induced by a single event, tk is the occurrence time of a past (kth) event, and ϕ(t) is a kernel representing the distribution of the transmission delays, satisfying the normalization 0ϕ(t)dt=1. Events {t1, t2, …} are derived randomly in time from the rate λ(t).

Fig 1. Schematic description of the Hawkes process Eq (1).

Fig 1

The occurrence rate λ(t) is increased according to past events occurred at times t = tk (k = 1, 2, …) with the transmission delays ttk distributed with ϕ(ttk). R is the reproduction number that represents the average number of events induced by a single event.

For the case of COVID-19, however, exact timing of infection event is not available. To deal with the numbers of daily new cases that are practically available, we convert the original Hawkes process Eq (1) into a discrete-time variant representing the expected number of events on a daily basis:

λj=μ+i=1j-1νiRiϕj-i, (2)

where λj is the expected number of events on jth day. The first term μ′ on the right-hand-side refers to the expected number of spontaneous occurrences on a daily basis. The second term represents the self-excitation process, the manner in which νi events that have occurred on a day i exerted influence with the delay of ji days. Here, we assume that the reproduction number may change and represent the daily dependence as {Ri}i. ϕji represents a distribution of the transmission delays d = ji, satisfying the normalization d=1ϕd=1.

The number of events νi or ν is derived from a distribution specified with the mean rate λi or λ. It would be natural to assume the Poisson distribution p(ν|λ)=λνν!e-λ. However, real data are subject to erroneous observation and accordingly they tend to be over-dispersed, or the sample variance exceeds the sample mean. Here, we incorporate over-dispersed data using the negative binomial distribution in the following form [17]:

p(ν|λ,ρ)=Γ(ν+λρ)Γ(ν+1)Γ(λρ)(ρ1+ρ)ν(11+ρ)λρ, (3)

where ρ(> 0) represents the degree of over-dispersion, or the variance is Var(ν) = (1 + ρ)λ. The Poisson distribution is in the limit of ρ → 0.

The COVID-19 model parameters were chosen as follows: the spontaneous occurrence of infection is absent, μ′ = 0 because there is no spontaneous occurrence for COVID-19 except at the initial occurrence in China. The virus is transmitted between individuals during close contact, and each individual is determined to have an episode of infection. The duration between symptom onsets of successive cases is referred to as the serial interval [18], which is slightly different from the incubation period [19, 20]. It is reported that the distribution of the serial intervals is suitably approximated with the log-normal distribution function of the mean 4.7 days and SD 2.9 days for COVID-19 [12]. We have adopted this distribution as the transmission delay kernel ϕd. The distribution of transmission delays on a daily basis is given as the difference of a cumulative distribution function of the log-normal distribution, ϕd = Φd − Φd−1, where

Φd=12erfc(-logd-μ2σ2), (4)

where the parameters μ and σ are given in terms of the mean m = 4.7 and the SD s = 2.9 as μ=log(m2/s2+m2) and σ=log(1+s2/m2) (Fig 2).

Fig 2. The distribution of transmission delays.

Fig 2

A bar histogram represents the distribution of transmission delays on a daily bases ϕd, which was converted from the log-normal distribution with the mean 4.7 days and SD 2.9 days (a magenta line).

The system equation

To detect change-points in the reproduction number {Ri}i in Eq (2), we introduce a method of estimating stepwise dynamics [21]. We assume that system’s state xi obeys the evolution

xi=xi-1+ξi, (5)

with the Cauchy random number ξ:

p(ξ|γ)=γπ(ξ2+γ2). (6)

We assume that the reproduction number is given as Ri = f(xi) with the non-negative function. Here we adopted a ramp function f(x) = max(0, x).

State inference

We constructed a state-space model for estimating the temporally changing reproduction number Ri from a given dataset of daily confirmed cases {ν1, …, νT}. The basic procedure of constructing the state-space method is similar to the one we developed for estimating exogenous and endogenous factors in a chain of point events [22].

To put the model in the state-space form, we take the summation in Eq (2) over the last L days,

λj=μ+i=j-Lj-1νif(xi)ϕj-i, (7)

and introduce a concatenated state vector,

Xi(xi-1,,xi-L)T, (8)

so that the rate process (7) depends only on the current state Xi. Accordingly, the state Xi obeys the evolution

Xi=FXi-1+Gξi-1, (9)

where

F=(100100010),G=(100). (10)

We have chosen L = 30 in the following analysis because the transmission delay kernel ϕd is negligible at d = 30 (Fig 2).

The posterior distribution of system’s state Xi, given a set of daily new cases until ith day Yi ≔ {ν1, …, νi} is obtained using Bayes’ theorem as

p(Xi|Yi)=p(νi|Xi)p(Xi|Yi-1)p(νi|Yi-1). (11)

Here, p(Xi|Yi−1) may be obtained using a system model p(Xi|Xi−1) and the posterior distribution on day i − 1, p(Xi−1|Yi−1), as

p(Xi|Yi-1)=p(Xi|Xi-1)p(Xi-1|Yi-1)dXi-1. (12)

Starting from the initial distribution p(X1|Y0), we iterate Eqs (11) and (12) to compute p(Xi|Yi−1) and p(Xi|Yi) for i = 1, 2, …, T.

Then, we compute the distribution of system’s states {X1, …, XT}, given an entire set of occurrences YT ≔ {ν1, …, νT} with

p(Xi|YT)=p(Xi|Yi)p(Xi+1|YT)p(Xi+1|Xi)p(Xi+1|Yi)dXi+1 (13)

in reverse order as i = T − 1, T − 2, …, 1, using the distribution functions p(Xi|Yi) and p(Xi|Yi−1), which were obtained with Eqs (11) and (12).

We then take the median of the posterior distribution p(Xi+ 1|YT) for the estimate of the state X^i+1. The estimate of the reproduction number, R^i, is then given by the first element of f(X^i+1). With the estimated reproduction number, we obtain the estimated total rate as

λ^j=μ+i=1j-1νiR^iϕj-i. (14)

We devised an algorithm that performs the integrations in Eqs (11), (12) and (13) numerically using a sequential Monte Carlo method [23, 24].

To avoid bias in estimating the state, which is caused by outliers in the data, we may discard the preassigned outliers and treat them as “missing observations” [24], for which the posterior distribution of Xi, conditional on Yi, is set to p(Xi|Yi) = p(Xi|Yi−1) without applying the Bayesian update (11).

Results

Analysis of synthetic data

Firstly, we evaluated the functionality of the state-space method by applying it to synthetic data. For this purpose, we constructed simulations of the Hawkes process mimicking prototypical evolutions in several countries. In the simulations, we took μ′ = 0 and began with a few infections as initial seeds, mimicking those who introduced the disease into each country. With an initial reproduction number R > 1, the daily cases initially grew exponentially. To reproduce a variety of evolutions in different countries, we have evaluated several schedules of the reproduction number {Ri}i.

Fig 3 depicts three prototypical cases: (A) the rapid increase is followed by a slow decrease; (B) the rapid increase is followed by a rapid decrease, and then it started to increase again; (C) the increase is followed by a decrease, and then another large increase. For each type of time-varying reproduction number {Ri}i, the Hawkes process was simulated over an interval of length T = 120 days to generate daily cases {ν1, …, νT}. In the simulation, the parameter of the negative binomial distribution Eq (3) was set to ρ = 50.

Fig 3. Synthetic daily cases generated by simulating the Hawkes-type count process and the estimated reproduction number.

Fig 3

(A) Rapid increase followed by a slow decrease. (B) Increase followed by a rapid decrease, and then an increase. (C) Slow increase followed by a decrease, and then another large increase. In the upper panel plotting the number of daily cases (purple line), the rate estimated by the state-space method λ^i is also plotted (blue line). In the lower panel, the reproduction number R^i estimated with the state-space method is plotted in reference to the true reproduction number Ri (purple line). The blue solid line and the shaded area represent the median and 95% range of the posterior distribution, respectively. The reproduction numbers estimated by Wallinga and Teunis (WT: orange line) and by Cori et al. (EpiEstim: green line) are also plotted for reference.

Parameters of the state-space method

For each series of simulated data, we have applied the state-space method, or performed the sequential Monte Carlo algorithm to compute the posterior distributions of the reproduction number for each day, R^i. Here, the over-dispersion parameter ρ of the negative binomial distribution Eq (3) is determined from a given dataset as

ρ^=1Ti=1T(νi-λ¯i)2λ¯i-1, (15)

where λ¯i=j=-33νi+j/7 represents the mean daily cases averaged over a week.

To verify the convergence of the posterior estimate of R^i concerning the number of particles, we computed the standard error of R^i with 100 cases of the Monte Carlo estimation (Table 1). We observed that 106 particles may provide a reasonably accurate estimate of the reproduction number.

Table 1. Convergence of the posterior estimate of R^i.
particles 103 104 105 106
standard error 0.165 0.057 0.028 0.010

The standard error of R^i computed with 100 cases of the sequential Monte Carlo algorithm applied to the synthetic data in Fig 3A.

The state-space method possesses a hyperparameter γ that characterizes the Cauchy distribution of the system equation Eq (6). We tested different values for the hyperparameter γ as 10−2, 10−3, and 10−4, and observed that the estimated reproduction number R^i is sensitive to the value of γ while the estimated total rate λ^i is robust against γ (results not shown). We confirmed that the likelihood was highest for the case of γ = 10−3. Accordingly, we fixed the hyperparameter at γ = 10−3 throughout the following analysis.

With the hyperparameter γ = 10−3 and the parameter ρ=ρ^ determined for each dataset with Eq (15), we performed the sequential Monte Carlo algorithm with 106 particles to compute the posterior distributions of the reproduction number for each day, R^i. Fig 3 depicts the median (solid line) and 95% range (shaded areas) of the posterior distributions. We see that the amplitude of the reproduction number is estimated properly. In particular, the method has successfully detected change-points in {Ri}i that were embedded in the simulation.

Comparison with conventional estimation methods

We compared our method with the following two conventional estimation methods in their ability at estimating the time-varying reproduction number of the synthetic data. A method suggested by Wallinga and Teunis (WT method) estimates the “case reproduction number” [25, 26],

Ri=j=i+1Tνjϕj-ik=1j-1νkϕj-k. (16)

Another method suggested by Cori et al. (EpiEstim) estimates the “instantaneous reproduction number” [27, 28],

Ri=νij=1i-1νjϕi-j, (17)

in a Bayesian framework with a Poisson likelihood and a gamma-distributed prior for Ri. We plotted the estimation results obtained by the WT method and EpiEstim with a sliding window of 7 days in Fig 3. It is observed that both methods are easily influenced by the fluctuation of data and accordingly it is difficult to discern the change points in the original process. Furthermore, the WT method tends to underestimate the current reproduction number at the end of the recorded interval, because it requires data that could be obtained in the future; the reproduction number estimated by EpiEstim is shifted backward in time relative to the WT and our methods because it uses only data from time points before i.

Analysis of real data

Next, we applied the state-space method to real data of daily confirmed cases in several countries. The number of daily new cases in various countries is made available on websites hosted by public research centers such as Our World in Data (https://ourworldindata.org/coronavirus-source-data) and the Humanitarian Data Exchange (https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases). We used data from the former site in this analysis.

Variation by day of the week

In the number of reported infections, a large variation has been observed by day of the week; reported infections tend to be fewer on the weekend than on the weekdays. There might have been variations in the original infectious activity due to human behavior, but it is more likely that this variation was caused by the delay in confirming infections and compiling the results at the weekend. The variation by day of the week is commonly observed, but there are large differences between countries, presumably due to the cultural difference in weekly activities (Fig 4).

Fig 4. Variations by day of the week in the number of reported infections {βi} computed for several countries.

Fig 4

Before analyzing a sequence of daily cases of a given country, we process the data as follows; we first obtained the gross daily variation βi in a week by averaging over the entire infection record (from March 1 2020 to the present), so that the average over a week is normalized as

17SundaySaturdayβi=1. (18)

Then, we convert the original data of daily infections {n1, …, nT} to an adjusted dataset {ν1, …, νT} by

νi=ni/βi, (19)

to which we apply the proposed state-space method.

Diversity in the spread of the contagion

In Fig 5, we show original daily new infection cases {n1, …, nT} and the adjusted dataset {ν1, …, νT} of several countries. Below each panel of the daily cases, we demonstrate the reproduction number R^i estimated with the proposed state-space method:

Fig 5. Number of daily new cases and the reproduction number R^i estimated using the state-space method.

Fig 5

Italy, Japan, Saudi Arabia, and the USA.

  • Italy

    A rapid increase in new cases was followed by a slow decrease. The estimated reproduction number was R^>1 at the outset and dropped to R^<1. It is interesting to note that the drop in the reproduction number occurred after political measures, such as lockdown and border closure, were enforced.

  • Japan

    The number of cases is found to be relatively small compared to those in Europe. An increase in the number of cases was followed by a rapid decrease, and then by a second increase. Accordingly, the reproduction number exhibited a drop from R^>1 to R^<1, and then it increased to R^>1.

  • Saudi Arabia

    The number of new cases repeatedly moved up and down, and the estimated reproduction number R^ changed accordingly. It is observed that Ramadan has promoted increased reproduction number, as it may have facilitated human contact.

  • The United States

    A rapid increase in new cases is followed by a very slow decrease, and then another growth. The estimated reproduction number R^ was higher than unity at the beginning, dropped off to near unity due to the confinement measures taken, but then it exceeded unity again. The political measures taken were found to vary by state, making it difficult to interpret the data from this country as a whole.

We also compared the proposed method with a conventional WT method [26] and EpiEstim [28] by applying them to these real data (Fig 5). It is also observed that the WT method and EpiEstim are easily influenced by the fluctuation of data. Results of other four countries are also shown in Fig 6.

Fig 6. Number of daily new cases and the reproduction number R^i estimated using the state-space method.

Fig 6

France, Australia, Iran, and Brazil.

The reproduction number at the initial phase

There have been debates about why infection rate and mortality rate change by orders of magnitude across different countries. Though these numbers likely reflect the confinement measures taken in individual countries, there might also have been differences across nations in susceptibility to COVID-19, reflecting not only genetic resistance but also lifestyle and cultural differences, such as shaking hands or hugging.

Because most governments did not implement serious confinement measures at the initial phase, the initial exponential increase of infections might reflect the natural susceptibility of citizens of each country. We realized that the estimated reproduction number was stable in a certain period before each country took confinement measures such as a lockdown or social distancing. Fig 7A depicts the reproduction numbers estimated with the proposed state-space method for 10 days until the day before the confinement measures of each country. The initial variation in the numbers of daily new cases is depicted in Fig 7B, indicating that the estimated reproduction number is correlated to the slope in the log plot. Here we have selected the period shifted by 5 days, by taking account of the typical transmission delays. We can observe that countries in different regions tend to cluster, indicating that the susceptibility tended to be similar between nations in the same region.

Fig 7. The difference in the reproduction number at the initial phase.

Fig 7

(A) The reproduction numbers estimated with the proposed state-space method for 10 days until the day before the lockdown measures of each country. Days are counted from 12 days before the confinement measures. (B) Initial variation in the numbers of daily new cases; ni divided by n6. The period is shifted by 5 days, by taking account of the typical transmission delay.

The minimum reproduction number achieved in each country

The degree of a drop in the estimated reproduction number could reflect the impact of non-pharmaceutical interventions such as a lockdown. It might be possible to quantify the effectiveness of political interventions in each country in terms of the relative percentage reduction in the reproduction number [13]. Here we searched for the minimum reproduction number averaged over 10 days that was achieved in each country. Fig 8A depicts the reproduction number for 10 days whose average takes minimum in each country. The variation in the numbers of daily new cases is depicted in Fig 8B, indicating that the estimated reproduction number is correlated to the slope in the log plot.

Fig 8. The minimum reproduction number achieved in each country.

Fig 8

(A) The reproduction number for 10 days whose average takes minimum in each country. (B) Variation in the numbers of daily new cases; ni divided by n6. The period is shifted by 5 days, by taking account of the typical transmission delay.

Future prediction

Using the proposed method, it is also possible to predict the number of new cases in the future. This can be done by simulating the converted Hawkes process Eq (2) with the parameters estimated from the given data. One may adopt the reproduction number Ri in near future as constant at the value of an endpoint of estimation if the current conditions are assumed to be maintained. Alternatively, one may also examine various time schedules of Ri, by assuming possible choices of relaxation or confinement measures.

In Fig 9 we applied the forecasting method to the data of Japanese daily cases. Assuming that we are on June 30, 2020, we have estimated the reproduction number R^i using the daily cases until that day. To predict the number of daily cases from July 1 to August 1, we ran the Hawkes process 100 times to obtain the expected daily cases. Firstly, we have assumed that the reproduction number remains the value obtained for the last day R = 1.4. Occasionally the reproduction number has not changed drastically in July, and accordingly, the predicted number of new cases is similar to the real data obtained in July.

Fig 9. Predicting the number of new cases in the future.

Fig 9

The forecasting method was applied to the data of Japanese daily cases, assuming that we were on June 30, 2020. We ran the Hawkes process 100 times to obtain the expected daily cases, by assuming that the reproduction number remains constant R = 1.4, which was obtained using the previous data (orange line). We also examined the cases in which the reproduction number is decreased to R = 0.7 due to confinement measures (green line) or increased to R = 1.8 by liberalization (red line).

We have also tested the cases in which the reproduction number is decreased to R = 0.7 due to confinement measures, or increased to R = 1.8 by liberalization. In this way, we may examine what might occur if political interventions are taken.

Discussion

Society and the media currently alternate between hope and despair in response to the temporary decrease or increase of daily new COVID-19 infections, which came out after a long latency period. To make an objective assessment of the current status, we have developed a state-space method for estimating the control status, in particular quantifying the time-dependent reproduction number R.

Pros and cons

We have adopted the Hawkes process, or its discrete-time variant, in describing the variables underlying the transmission of disease. In contrast to ordinary differential equation models such as SIR or SEIR models, the Hawkes process is advantageous in that it explicitly specifies the distribution of transmission delays. However, the Hawkes process does not account for the finite size effect, in which infected and recovered people represent a finite fraction of the entire population. There have been some models that incorporate the finite population size effect into the Hawkes process, as has been done with the SIR or SEIR models [8, 29]. To analyze the current status of COVID-19, however, we do not take the finite size effect into account, as the fraction of the recovered or removed people is still less than a few % of the entire population.

We have converted the original Hawkes process Eq (1) into a discrete-time variant representing the expected number of events on a daily basis Eq (2) because the exact timing of infection event is not available for the case of COVID-19. It is noteworthy that Cheysson and Lang also developed a method for estimating parameters of the Hawkes process from counts data [30]. However, their method is based on a spectral likelihood, assuming stationarity in the underlying process. In contrast to this, we directly modeled the count time series and combined it with the state-space model to accommodate nonstationary data.

We introduced the Cauchy distribution Eq (6) into our analysis, assuming the stepwise changes in the reproduction number Ri. Accordingly, we were able to detect change-points from the posterior distribution taking on stepwise characteristics. As discussed by Kitagawa [21], the use of the heavy-tailed distribution enables us to express change-points, in contrast to using a Gaussian noise, which results in gentle changes. However, a drawback of the Cauchy distribution is that it causes slow convergence in the Monte Carlo simulation [31].

Interestingly, the drop in the reproduction number occurred after political measures, such as lockdown and border closure. It should be also noted that there may be an additional latency between the times at which political measures were conducted and the changes in the reproduction number, which may reflect the behavior change. This delay may also be country-specific. Therefore, it could be interesting to investigate the delay in the change-points in the reproduction number following social events.

When inferring the transmission of disease from daily confirmed cases, we have considered potentially erroneous observations made in the real data. We took into account counting errors by assuming a negative binomial distribution that represents the over-dispersion. We also took into account the variation by day of the week and adjusted the data by compensating for the periodic dependency. Note that there may still be an underestimation of infection numbers, as asymptomatic cases may have been overlooked. Though this is unavoidable unless the inspection is enforced, it is reported that the infections caused by asymptomatic people are relatively small (about 6%) for COVID-19 [19].

We have assumed that the transmission delay is a serial interval defined as the duration between symptom onsets of successive cases and adopted the log-normal distribution with the mean 4.7 days and SD 2.9 days, as suggested in reference [12]. As our mathematical formulation is general, it is possible to search for a more suitable transmission kernel ϕd without relying on such external knowledge, if the numbers of daily cases are accurately provided.

Here, we set the spontaneous occurrence rate to zero (μ′ = 0) in the analysis of real data. However, imported cases might be involved in the data. Also, we did not address censoring for incomplete observation of the epidemic process in particular at the initial stage. These may cause bias in the estimations of the reproduction number at the early stage of the epidemic.

The most crucial assumption in the majority of mathematical model studies, including this study, is the mean-field assumption, in which all individuals are assumed to interact uniformly. Though difficult to incorporate, it is desirable to consider the heterogeneity of the real-world community in analyzing the communicability of disease.

Despite these assumptions, the proposed state-space method may be of worth in assessing the status of the disease systematically, based on reported daily confirmed cases. This method might serve as a reference for governments adopting variable regulations that should be changed according to current infection circumstances.

Acknowledgments

We thank Shin Takagi, Masaki Ogura, Ryota Kobayashi, Takaaki Aoki, and Hideaki Shimazaki for their constructive comments on this manuscript, and Hidetaka Manabe for his technical assistance in developing a web-application program.

Data Availability

The application program and example datasets are available at our website https://s-shinomoto.com/COVID/ and the site hosted publicly on GitHub, accessible via https://github.com/shigerushinomoto.

Funding Statement

This work has been funded by New Energy and Industrial Technology Development Organization (NEDO) (to SS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Milligan GN, Barrett AD. An Essential Guide. Wiley Online Library; 2015. [Google Scholar]
  • 2. Fraser C, Donnelly CA, Cauchemez S, Hanage WP, Van Kerkhove MD, Hollingsworth TD, et al. Pandemic potential of a strain of influenza A (H1N1): early findings. science. 2009;324(5934):1557–1561. 10.1126/science.1176062 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Hethcote HW. The mathematics of infectious diseases. SIAM review. 2000;42(4):599–653. 10.1137/S0036144500371907 [DOI] [Google Scholar]
  • 4. Brauer F. Mathematical epidemiology: Past, present, and future. Infectious Disease Modelling. 2017;2(2):113–127. 10.1016/j.idm.2017.02.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Blackwood JC, Childs LM. An introduction to compartmental modeling for the budding infectious disease modeler. Letters in Biomathematics. 2018;5(1):195–221. 10.30707/LiB5.1Blackwood [DOI] [Google Scholar]
  • 6. Kermack WO, McKendrick AG. A contribution to the mathematical theory of epidemics. Proceedings of the royal society of london Series A, Containing papers of a mathematical and physical character. 1927;115(772):700–721. [Google Scholar]
  • 7. Wu JT, Leung K, Leung GM. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. The Lancet. 2020;395(10225):689–697. 10.1016/S0140-6736(20)30260-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Rizoiu MA, Mishra S, Kong Q, Carman M, Xie L. SIR-Hawkes: linking epidemic models and Hawkes processes to model diffusions in finite populations. In: Proceedings of the 2018 World Wide Web Conference; 2018. p. 419–428.
  • 9. Jung Sm, Akhmetzhanov AR, Hayashi K, Linton NM, Yang Y, Yuan B, et al. Real-time estimation of the risk of death from novel coronavirus (COVID-19) infection: inference using exported cases. Journal of clinical medicine. 2020;9(2):523 10.3390/jcm9020523 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Linton NM, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov AR, Jung Sm, et al. Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: a statistical analysis of publicly available case data. Journal of clinical medicine. 2020;9(2):538 10.3390/jcm9020538 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Li Q, Guan X, Wu P, Wang X, Zhou L, Tong Y, et al. Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia. N Engl J Med. 2020;382:1199–1207. 10.1056/NEJMoa2001316 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Nishiura H, Linton NM, Akhmetzhanov AR. Serial interval of novel coronavirus (2019-nCoV) infections. medRxiv 2020020320019497 [preprint]. 2020;.
  • 13. Flaxman S, Mishra S, Gandy A, Unwin HJT, Coupland H, Mellan TA, et al. Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe. Nature. 2020;584(7820):257–261. 10.1038/s41586-020-2405-7 [DOI] [PubMed] [Google Scholar]
  • 14.Chiang WH, Liu X, Mohler G. Hawkes process modeling of COVID-19 with mobility leading indicators and spatial covariates. medRxiv 2020060620124149 [preprint]. 2020;. [DOI] [PMC free article] [PubMed]
  • 15. Hawkes AG. Spectra of some self-exciting and mutually exciting point processes. Biometrika. 1971;58(1):83–90. 10.1093/biomet/58.1.83 [DOI] [Google Scholar]
  • 16. Durbin J, Koopman S. Time series analysis by state space methods. Oxford University Press; 2001. [Google Scholar]
  • 17. Koyama S, Fujiwara Y. Modeling event cascades using networks of additive count sequences. Journal of Statistical Mechanics: Theory and Experiment. 2019;2019(2):023402 10.1088/1742-5468/aafa7c [DOI] [Google Scholar]
  • 18. Fine PEM. The interval between successive cases of an infectious disease. Am J Epidemiol. 2003;158:1039–1047. 10.1093/aje/kwg251 [DOI] [PubMed] [Google Scholar]
  • 19. Ferretti L, Wymant C, Kendall M, Zhao L, Nurtay A, Abeler-Dörner L, et al. Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing. Science. 2020;368 (6491). 10.1126/science.abb6936 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. He X, Lau EHY, W P et al. Temporal dynamics in viral shedding and transmissibility of COVID-19. Nat Med. 2020;26:672–675. 10.1038/s41591-020-0869-5 [DOI] [PubMed] [Google Scholar]
  • 21. Kitagawa G. Non-Gaussian State-Space Modeling of Nonstationary Time Series. Journal of the American Statistical Association. 1987;82:1032–1041. 10.1080/01621459.1987.10478534 [DOI] [Google Scholar]
  • 22. Koyama S, Shinomoto S. Statistical physics of discovering exogenous and endogenous factors in a chain of events. Physical Review Research. 2020;. 10.1103/PhysRevResearch.2.043358 [DOI] [Google Scholar]
  • 23. Kantas N, Doucet A, Singh SS, Maciejowski J, Chopin N. On Particle Methods for Parameter Estimation in State-Space Models. Statistical Science. 2015;30:328–351. 10.1214/14-STS511 [DOI] [Google Scholar]
  • 24. Kitagawa G. Introduction to Time Series Modeling. Chapman and Hall/CRC; 2010. [Google Scholar]
  • 25. Fraser C. Estimating individual and household reproduction numbers in an emerging epidemic. PLoS ONE. 2007; 10.1371/journal.pone.0000758 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Wallinga J, Teunis P. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. American Journal of Epidemiology. 2004;160:509–516. 10.1093/aje/kwh255 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Nishiura H, Klinkenberg D, Roberts M, Heesterbeek JA. Early epidemiological assessment of the virulence of emerging infectious diseases: a case study of an influenza pandemic. PLoS One. 2009;4(8). 10.1371/journal.pone.0006852 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Cori A, Ferguson NM, Fraser C, Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. American Journal of Epidemiology. 2013;178:1505–1512. 10.1093/aje/kwt133 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Onaga T, Shinomoto S. Emergence of event cascades in inhomogeneous networks. Scientific reports. 2016;6:33321 10.1038/srep33321 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Cheysson F, Lang G. Strong mixing condition for Hawkes processes and application to Whittle estimation from count data. arXiv preprint arXiv:200304314. 2020;.
  • 31. Ghosh J, Li Y, Mitra R. On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression. Bayesian Anal. 2018;13(2):359–383. 10.1214/17-BA1051 [DOI] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008679.r001

Decision Letter 0

Thomas Leitner, Roger Dimitri Kouyos

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

5 Nov 2020

Dear Prof. Shinomoto,

Thank you very much for submitting your manuscript "Estimating the time-varying reproduction number of COVID-19 with a state-space method" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Roger Dimitri Kouyos

Associate Editor

PLOS Computational Biology

Thomas Leitner

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: This is a really interesting paper, and I am glad more work is being added applying Hawkes processes to Infectious diseases. The method is very nice, and i think novel to my knowledge. I do like the embedding of epidemiological approaches in state space. There are however some issues which means their results can be hard to justify. The use of case data when not linking cases to infections means the authors do not consider any incubation period or the possibility of asymptomatic transmission. This dilutes the validity of their reproduction number estimates. They really need to either connect R->infection->cases. I could not see the choice of phi being discussed. I assume its exponential, but if so this is wrong for infectious diseases. Hawkes processes are fundamentally linked to this phi, without it the model is incomplete.

I do comment the authors on their method, aside from a few missed details it is interesting and novel, but the could do much more to make their model more useful and accurate by considering the epidemiology. So the authors are best either focusing on a methods paper, and in which case a comparison to existing approachs (in addition to WT) would be well received, and of-course heavy caveating of the results. As an application paper, i would like a more careful consideration of the data.

I hope the authors will not begruge me, I realise these revisions would require quite a bit more work, but I believe they will strengthen the paper and ultimately help people using this in the future. However in general, I am very supportive of this work and belive it to be a good addition to the literature.

other comments

1) We have applied the state-space method <- this is introduced without me as the reader knowing what the authors are referring to

2) Limitations of using World in data should be added, factors like reporting accuracy and delay etc

3) It is interesting to note that the drop in the reproduction number occurred after political measures, such as lockdown and border closure, were enforced. <- A few points about why this might be the case would be good. Factors like slow adherence, but more plausibly that cases are lagged ahead of infections by an incubation period of 5 days or so. Obviously reporting is the biggest issue, testing was poor in most of europe in April.

4) A discussion of why the reproduction numbers differ at the start is good but i would like the authors to consider if its their modelling assumptions or data censoring rather than real effects.

5) The choice of kernel needs to be fully described in the main text. it underpins most of the dynamics

6) The sensitivity analysis of lambda should be presented in this paper

7) Posterior convergence and stats should be provided in the paper

8) Why is L 30

9) The choice of spontaneous occurenceis wrong, importation happened atleast for part of the period this paper considers

10) What is phi

minor comments which i leave to the authors to consider

1) The Kernel definition in (2) is nonstandard for a Hawkes process and can lead to the impression that its a discrete model

2) I do like the decomposition of the kernel into R and events, its novel

3) Might be worth mentioning the US for type C

4) The authors might consider looking at the HawkesN variant - they will find its like to SIR models interesting

Samir Bhatt

Reviewer #2: uploaded

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Samir Bhatt

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

Attachment

Submitted filename: report_koyama_horie_shinomoto.pdf

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008679.r003

Decision Letter 1

Thomas Leitner, Roger Dimitri Kouyos

6 Jan 2021

Dear Prof. Shinomoto,

We are pleased to inform you that your manuscript 'Estimating the time-varying reproduction number of COVID-19 with a state-space method' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Roger Dimitri Kouyos

Associate Editor

PLOS Computational Biology

Thomas Leitner

Deputy Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I am happy with the authors changes and their explanations. Thanks for their thoroughness.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Samir Bhatt

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008679.r004

Acceptance letter

Thomas Leitner, Roger Dimitri Kouyos

23 Jan 2021

PCOMPBIOL-D-20-01505R1

Estimating the time-varying reproduction number of COVID-19 with a state-space method

Dear Dr Shinomoto,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Alice Ellingham

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: report_koyama_horie_shinomoto.pdf

    Attachment

    Submitted filename: Response to Reviewer.pdf

    Data Availability Statement

    The application program and example datasets are available at our website https://s-shinomoto.com/COVID/ and the site hosted publicly on GitHub, accessible via https://github.com/shigerushinomoto.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES