Skip to main content
Wiley - PMC COVID-19 Collection logoLink to Wiley - PMC COVID-19 Collection
. 2020 Jul 28;77(3):929–941. doi: 10.1111/biom.13325

Estimation of incubation period and generation time based on observed length‐biased epidemic cohort with censoring for COVID‐19 outbreak in China

Yuhao Deng 1, Chong You 2, Yukun Liu 3, Jing Qin 4, Xiao‐Hua Zhou 2,5,
PMCID: PMC7362037  PMID: 32627172

Abstract

The incubation period and generation time are key characteristics in the analysis of infectious diseases. The commonly used contact‐tracing–based estimation of incubation distribution is highly influenced by the individuals' judgment on the possible date of exposure, and might lead to significant errors. On the other hand, interval censoring–based methods are able to utilize a much larger set of traveling data but may encounter biased sampling problems. The distribution of generation time is usually approximated by observed serial intervals. However, it may result in a biased estimation of generation time, especially when the disease is infectious during incubation. In this paper, the theory from renewal process is partially adopted by considering the incubation period as the interarrival time, and the duration between departure from Wuhan and onset of symptoms as the mixture of forward time and interarrival time with censored intervals. In addition, a consistent estimator for the distribution of generation time based on incubation period and serial interval is proposed for incubation‐infectious diseases. A real case application to the current outbreak of COVID‐19 is implemented. We find that the incubation period has a median of 8.50 days (95% confidence interval [CI] [7.22; 9.15]). The basic reproduction number in the early phase of COVID‐19 outbreak based on the proposed generation time estimation is estimated to be 2.96 (95% CI [2.15; 3.86]).

Keywords: deconvolution, interval censoring, mixture distribution, renewal process, serial interval

1. INTRODUCTION

In epidemiology, incubation period is the time between the infection of an individual by a pathogen and the manifestation of symptoms, while generation time is defined as the time between the infection of a primary case and its secondary cases (Fine, 2003; Svensson, 2007). Both are vital clinical characteristics that depict an epidemic and are essential for policy making. For example, a good understanding of incubation period offers an optimal length of quarantine, and a good understanding of generation time is essential in estimating the transmission potential of a disease measured by the basic reproduction number R 0 (Farewell et al., 2005; Wallinga and Lipsitch, 2007; Nishiura, 2010).

In most of literature, such as Li et al. (2020) and Guan et al. (2020), the distribution of incubation period is either described through a parametric model, for example, log‐normal and Weibull, or, its empirical distribution based on the observed incubation period from contact‐tracing data. However, contact‐tracing data are usually difficult to obtain, and can be highly influenced by the individual's judgment on the possible date of exposure rather than the actual date of exposure, which, in turn, might not be accurately monitored and determined leading to significant errors (Cowling et al., 2007).

An alternative approach to study incubation period is to take advantage of the mechanism of truncation or censoring. Lui et al. (1988), De Gruttola and Lagakos (1989), Struthers and Farewell (1989), and Kuo et al. (1991) estimated incubation distribution of contagious diseases using external truncation or censoring information. Kuk and Ma (2005) studied the incubation period of SARS by deconvolution, but the proposed method was only feasible for the disease that is noninfectious during incubation period, which is not the case for COVID‐19. It also assumed that the ability of infectiousness is uniform during the infectious period, which is a strong assumption. In the studies of Lessler et al. (2009) and Reich et al. (2009), double censoring was used to characterize the problem caused by daily reports rather than continuous observed symptoms onset time. Nishiura and Inaba (2011) used 72 confirmed imported cases that traveled to Japan from Hawaii during the early phase of the 2009 H1N1 pandemic to estimate the incubation by addressing censoring and infection age.

For COVID‐19, Backer et al. (2020) and Linton et al. (2020) used confirmed cases detected outside Wuhan to estimate the distribution of the incubation by interval‐censoring likelihood. In their studies, for each selected case, a censored interval for incubation period was obtained by travel histories and dates of symptoms onset, and the distribution of incubation was then estimated by fitting censored intervals into Weibull, Gamma, and log‐normal. However, such estimations may lead to biased estimations of incubation period due to the biased sampling issues. Qin et al. (2020) adopted the theory from renewal process and carefully selected the studying cohort to overcome the biased sampling problems but fitted a continuous parametric model with discrete observations, while the discreteness of data is in fact a sort of interval censoring caused by daily reports.

To the best of our knowledge, generation time is usually directly estimated by the time difference between symptoms onset of successive cases in a chain of transmission rather than the actual time of infection, that is, the serial interval. This is because it is challenging to obtain both the corresponding infection dates of the primary case and its secondary cases in a chain of transmission, while the dates of symptoms onsets are relatively easier to obtain. However, the distribution of serial interval may be biased for estimating generation time, especially when the disease is infectious during incubation, in which the variance could be overestimated (Britton and Scalia Tomba, 2019). As a result, the subsequent quantities estimated based on the generation time is biased. For example, the basic reproduction number, indicating the spreading ability of an infectious disease, would be underestimated. Note that COVID‐19 is incubation‐infectious, hence the estimation of generation time simply based on observed serial intervals is not consistent.

To overcome the issues aforementioned, in this paper we estimate the distribution of incubation period using the well‐studied renewal process where there exists a censoring event within the incubation period. Vardi (1982a, 1982b, 1989) discussed nonparametric maximum likelihood estimation based on length‐biased sampling and renewal process with incomplete renewal data, and further the multiplicative censoring problem. A brief review can be found in Qin (2017). Issues related to the length‐biased sampling and interval‐censoring sampling are both taken into consideration in the estimation of incubation distribution in this study. We have shown that under mild assumptions, parameters in the incubation distribution are identifiable and enjoy desirable asymptotic properties. Furthermore, a consistent estimator for the distribution of generation time is also proposed based on incubation period and observed serial interval for incubation‐infectious and incubation‐noninfectious diseases, respectively. Our approaches increase available sample size and utilize censored information in the early phase of an epidemic outbreak.

The rest of this paper is organized as follows. Section 2 describes the motivation data. In Section 3, we propose algorithms to estimate the distribution of incubation period and show that under mild assumptions the model parameters are identifiable and enjoy desirable asymptotic properties. In Section 4, we propose algorithms to estimate the distribution of generation time. Simulation studies are performed in Section 5, and the analyzed results to the current outbreak of COVID‐19 in China are shown in Section 6. Further discussion is given in Section 7.

2. MOTIVATING DATA

The COVID‐19 outbreak in Wuhan, China, has attracted worldwide attention (Huang et al., 2020; Tu et al., 2020; Wang et al., 2020). Publicly available data were collected from provincial and municipal health commissions in China and ministry of health in other countries and areas. The following details were collected on each confirmed case: case ID, region, age, gender, date of symptoms onset, date of diagnosis, history of travel, or previous residency in Wuhan, and, if available, related information regarding contact history with other confirmed cases. As of March 31, 2020, a total of 14,829 lab‐confirmed COVID‐19 cases were reported outside Hubei Province by the National Health Commission of China.

In the collected data, 645 chains of transmission were found in the collected data, and n=198 of them have their dates of symptoms onset available, which can be used to calculate serial intervals (You et al., 2020). These 198 observed serial intervals, {sj,j=1,,n}, range from −13 to 21 days, with a mean of 4.6 days and quartiles of 1, 4, and 7 days. The same subset of the data used in Qin et al. (2020) is considered in this study for the estimation of the incubation period. This subset includes the confirmed cases that left Wuhan between January 19 and January 23, 2020, and excludes cases that developed symptoms before leaving Wuhan. There is a total of m=1211 cases that meet such criteria in the collected data. These 1211 observed durations between departure from Wuhan and symptoms onset outside Hubei Province, {tj,j=1,,m}, range from 0 to 22 days with a mean of 5.4 days and quartiles of 2, 5, and 8 days. It is worth noting that Bi et al. (2020) reported that 191 travelers developed symptoms 4.9 days on average after arriving in Shenzhen (Guangdong Province, China).

It is arguable that people who left Wuhan might have higher chance to be infected on the day of departure since it is easier to be exposed to the human‐to‐human transmitted virus in a crowded environment. Hence in our dataset, there might be two types of individuals: (a) those who got infected during their stay in Wuhan and developed symptoms outside Hubei Province, and (b) those who got infected at the time of leaving Wuhan, for example, at the airport, railway station, or on the way from Wuhan to their destinations. Thus, the observed durations between departure from Wuhan and symptoms onset are from a mixture of two distributions: the time between departure from Wuhan and symptoms onset (forward time), and the complete incubation period. Note that the selected cohort is length‐biased since the ones with shorter incubation periods who got infected were less likely to be captured as they had higher chance to develop symptoms before leaving Wuhan. The length‐biased issue cannot be tested easily in the data but has naturally arisen from the data collection process, since only those who developed symptoms after departure from Wuhan can be collected.

3. ESTIMATION OF INCUBATION PERIOD

In this section, the distribution of incubation is estimated through theory of renewal process and interval censoring with a mixture distribution. Here we have to assume that the distribution of incubation period is same between the Wuhan residents who had a schedule to leave Wuhan and the general population. Furthermore, given an individual who got infected in Wuhan and developed symptoms outside Wuhan, it is reasonable to assume that the event of departing from Wuhan is independent of the event of infection and manifestation of symptoms. Hence, we can consider the incubation period as a continuous random variable, I, as the sum of forward and backward times, and the duration between departure from Wuhan and onset of symptoms as the forward time V in renewal process (see Figure 1 as an illustration). Suppose that I and V are continuous and let fI(·) be the probability density function (pdf) of incubation period, and h(·) be the pdf of forward time. According to Qin (2017) and Qin et al. (2020), we have

h(t)=S(t)E(I)=t+fI(y)dy0+yfI(y)dy,t>0, (1)

where S(·) is the survival function and E(I) is the expectation of I.

FIGURE 1.

FIGURE 1

Illustration of complete incubation period and forward time Note. Red circle: getting infected; blue column: departure from Wuhan; red cross: symptoms onset. The shaded area is the period during which our cohort sample departed from Wuhan. This figure shows five types of individuals. Only those who departed from Wuhan in the shaded area were collected in our cohort. (A) Symptoms onset in Wuhan, not in our cohort; (B and C) captured in our cohort with infection before departure; (D) captured in our cohort with infection at departure; (E) infection outside Wuhan, not in our cohort. This figure appears in color in the electronic version of this paper, and any mention of color refers to that version.

Note that I is not observable in our dataset but V is observable with observations of {tj},j=1,,m. From Equation (1), we can see that the forward time V should have a monotonically decreasing density. However, the observed density of {tj} does not seem to be monotone (see Figure 3). A possible explanation toward it would be that {tj} are not observations of V only but mixture of V and I. As aforementioned, due to the nature of a human‐to‐human infectious disease, it is easier to get infected at the airport/train station or on the flight/train/bus, namely, the infection occurs at the departure. In such case, the duration between departure from Wuhan and onset of symptoms is no longer the forward time, but the complete incubation period. Taking such possibility into account, let π be the (unknown) probability of getting infected at the departure time from Wuhan, and 1π be the probability of getting infected before departure. Therefore, the duration between departure from Wuhan and symptoms onset follows a mixture distribution with density

Q(t;θ,π)=πfI(t;θ)+(1π)h(t;θ),t>0, (2)

where θ is the model parameter in fI(·) and h(·).

FIGURE 3.

FIGURE 3

COVID‐19 data analysis result Note. Upper: twice of log‐likelihood ratio, 2[maxθ,π(θ,π)maxθ(θ,π)], versus π. The dashed line is at 2.71, the 90% quantile of chi‐squared distribution with 1 degree of freedom. In fact, the horizontal ordinate of the crossover point is the 95% upper bound of π by likelihood ratio, since 0.5+0.5χ2(2.71,1)=0.95 (mixed chi‐squared distribution), where χ2(·,1) is the cdf of chi‐squared distribution with 1 degree of freedom. Lower: incubation estimation; red line: forward time fit; blue line: incubation period fit; black line: mixed observed time fit (covered by the red line). This figure appears in color in the electronic version of this paper, and any mention of color refers to that version.

Accounting for the error caused by daily reports, we can simply let tj+=tj+0.5 and tj=tj0.5. The estimates of θ and π can be estimated by directly maximizing the likelihood function with interval censoring, that is,

L(θ,π;t1,,tm)=j=1mπ{FI(tj+;θ)FI(tj;θ)}+(1π){H(tj+;θ)H(tj;θ)}, (3)

where FI and H are the cumulative distribution functions (cdf) of I and V, respectively. We denote the maximum likelihood estimate of (θ,π) by (θ^,π^)=argsupθ,π(θ,π), where (θ,π)=logL(θ,π;t1,,tm). In Web Appendix B, we will provide an alternative interpretation for the likelihood function.

In general, it is difficult to derive asymptotic properties of the estimator for interval‐censoring cases (see Gentleman and Geyer, 1994; Lehmann and Romano, 2006). However, the asymptotic properties can be proved under our particular setting, in which we have identical interval lengths for all observations, namely tj+tj=1 for j=1,,m. Let (tj,tj+) for j=1,,m be independently and identically distributed observations from the mixture model (2). Define a pseudo‐pdf for the mixed model (2) as

Qp(tj;θ,π)=π{FI(tj+;θ)FI(tj;θ)}+(1π){H(tj+;θ)H(tj;θ)}. (4)

It is straightforward to show that +Qp(t;θ,π)dt=1 by the Fubini theorem. For notational simplicity, let FI(tj+;θ)FI(tj;θ)=fIp(tj;θ) and H(tj+;θ)H(tj;θ)=hp(tj;θ). The corresponding pseudo‐log‐likelihood (loglik) for the mixed model is

(θ,π)=j=1mlog{πfIp(tj;θ)+(1π)hp(tj;θ)}. (5)

Define two likelihood ratio functions:

R1(θ,π)=2{supθ,π(θ,π)(θ,π)}=2{(θ^,π^)(θ,π)},R2(π)=2{supθ,π(θ,π)supθ(θ,π)}=2{(θ^,π^)supθ(θ,π)}.

Let (θ0,π0) be the true parameter value. For notational simplicity, let g(t;φ) denote the density in (4) with φ=(θ,π), that is, g(t;φ)=Qp(t;θ,π). In addition, let qθ denote the dimension of θ , φ=/φ and φφ=2/(φφ). The upcoming expectations are taken with respect to the true density g(t;φ0), where φ0=(θ0,π0). To establish the asymptotic result, we make the following regularity condition.

Condition 1

Let Tg(t;φ0), and

  • (a)

    Eφlog{g(T;φ0)}<;

  • (b)

    U=E[φφlog{g(T;φ0)}] is finite and nonsingular;

  • (c)

    E[φφlog{g(T;φ)}] is continuous for φ in a neighborhood of φ 0.

The nonsingularity of U in Condition 1(b) excludes the cases where at least one of θ and π is not identifiable. Theorem 1 shows the asymptotic properties of the estimator (θ^,π^) if the true parameter value is an interior point in the parameter space, while Theorem 2 shows the case if π0 is at the boundary.

Theorem 1

Suppose that g(t;φ) and φ 0 satisfy Condition 1, and that (θ0,π0) is an interior point in the parameter space. As m, (a) m(θ^θ0,π^π0)dN(0,U), where d means convergence in distribution; (b) R1(θ0,π0)dχqθ+12; (c) R2(π0)dχ12.

We partition U as U=(Uij)1i,j2, where U 11 is a qθ×qθ matrix. Let x+=max(x,0), x=min(x,0), Y1N(0,Iqθ), and Y2N(0,1) such that Y 1 and Y 2 are independent of each other.

Theorem 2

Suppose that g(t;φ) and φ 0 satisfy Condition 1, and that θ 0 is an interior point in the parameter space of θ and π0=1. As m,

(a)mθ^θ0π^π0dU111/2Y1U111U12(U22U12U111U12)1/2(Y2)(U22U12U111U12)1/2(Y2);

(b) R1(θ0,π0)d12χqθ2+12χqθ+12; and (c) R2(π0)d12χ02+12χ12.

If π0=0, then in the right‐hand side of the formula in (a) (Y2) should be replaced with (Y2)+.

The proof of Theorems 1 and 2 is given in Web Appendix C. We can easily verify that the interval‐censored mixture distribution (4) for Gamma, Weibull (except when shape parameter of Gamma or Weibull is 1, ie, the exponential distribution), or log‐normal distribution satisfies Condition 1 and thus the above two theorems hold for our estimates.

4. ESTIMATION OF GENERATION TIME

In this section, we study the estimation of generation time based on serial interval and incubation time under proper assumptions. The estimation of generation time only subjects to symptomatic population. Suppose an infector got infected at calendar time T 0 and showed symptoms at T 1. This infector infected an infetee at calendar time T 2, and the infectee showed symptoms at T 3. Let G=T2T0 denote the generation time, S=T3T1 denote the serial interval, I1=T1T0 and I2=T3T2 be the incubation period of infector and infectee, respectively. It is straightforward to see that G=S+I1I2.

If a disease is noninfectious during the incubation period (eg, SARS; Lipsitch et al., 2003), then we can naturally assume Inline graphic and Inline graphic. Then it follows that fG=fS, where fG and fS are the pdfs of G and S, respectively, and the generation time can be estimated by serial interval without inducing bias. However, such case does not apply for COVID‐19 as there were reported asymptomatic infections (Rothe et al., 2020). Instead, we assume Inline graphic, Inline graphic, Inline graphic. The first part states that the incubation period of the primary case is independent of its generation time. This is true if the disease is infectious during incubation period, and in addition, the ability to pass the pathogens to susceptible host is independent of whether the symptoms are being developed. The rest is straightforward due to the standard assumption of independence between individuals. In addition, we assume that the distribution of incubation period, generation time, and serial interval is homogeneous among all individuals. Furthermore, to ensure that the observed serial intervals could reflect the serial interval of general population, we assume that the missingness (failure of establishing contact‐tracing) was independent of the length of serial interval. Hence, we obtain that

fGfIfI=fS, (6)

where the symbol * represents convolution, fG. fS, fI, and fI are the pdfs of G, S, I, and I, respectively. Thus, fG is identifiable through characteristic function (chf) (or Fourier transformation). The chf of G is ϕG(t)=ϕS(t)/ϕI(t)ϕI(t), where ϕS(t), ϕI(t) and ϕI(t) are the chf of S, I, and I, respectively. By the continuous inversion formula (Durrett, 2019), the pdf of the generation time is

fG(y)=12π+eityϕS(t)|ϕI(t)|2dt, (7)

where i=1, ϕI(t) can be approximated through the estimated distribution of I introduced in previous section, and ϕS(t) can be estimated by the observed serial intervals, {s1,,sn}, along with a proper kernel K(·), that is,

ϕ^S(t)=eity1nhnj=1nKysjhndy=1nj=1neitsjϕK(thn), (8)

where hn is the bandwidth. Note that G must be positive, so to account for the boundary bias, Karunamuni (2009) proposed to use boundary kernel Kc(t;y)=a0(y)K(t)+a1(y)K(t) with

a0(y)a1(y)=y/hnK(t)dty/hnK(t)dty/hntK(t)dty/hntK(t)dt110.

at the point y>0. Denote the Fourier transformation ϕKc(t)=eituKc(u)du. Hence, a consistent estimator for fG is defined as

f^G(y)=12nπj=1nMnMnReeit(sjy)ϕKc(thn)|ϕ^I(t)|2dt, (9)

where Mn, hn0 as n, and Re is the operator taking the real part of a complex value. This estimator is consistent at any interior point in the support of G, provided that the model for incubation period I is correctly specified (Liu and Taylor, 1989). It is equivalent to specifying a kernel density or a kernel chf, and possible choices are the Vallée Poussin (Fejér) kernels or Cesàro kernels (Devroye, 1989; Anastassiou, 2000). Note that the generation time must be positive. To correct the bias for devonvolution at the boundary G=0, a second‐order correction to remove the boundary effect was proposed by Karunamuni (2000, 2009). The density function f^G can also be obtained by imposing a parametric model on generation time and fit the density for serial interval, which relies heavily on model specification. More details about the conditions and properties of deconvolution is shown in Web Appendix D.

5. SIMULATION STUDY

In this numerical study, we assess the performances of our proposed method and the following methods in estimation of incubation period:

  • 1.

    The renewal process based mixture model in Qin et al. (2020), which is denoted as Qin's method hereafter, note that the original method in Qin et al. (2020) is not suitable to be applied in our simulation as the mixture proportion π was prefixed, hence we alter their method by estimating π simultaneously, as a result the Qin's method here is actually an improved version of the method in Qin et al. (2020) and the only difference between Qin's method and our method would be treating the observed tjs with censored intervals.

  • 2.

    The interval censoring–based method in Backer et al. (2020) and Linton et al. (2020), which is denoted as IC method hereafter.

In order to produce simulation settings similar to the collected dataset of COVID‐19, we consider three simulation settings for incubation period in the following numerical examples: the incubation period I follows (a) Gamma distribution Γ(5, 0.8), (b) Weibull distribution W(2, 8), and (c) log‐normal distribution LN(1.8,0.42). The density functions of these distributions are given in Web Appendix A. For each setting, we mimic the length‐biased sampling process by letting the time from infection to departure C follow uniform distribution on (0,30) and recording the time from departure to symptoms onset (forward time) V=IC if I>C, until the designated sample size is achieved. The simulated values of V are then rounded up to the nearest integers. We vary the sample size m over 600, 1200, and 1800, and π over 0 and 0.2. Each setting is repeated for 1000 times.

Table 1 summarizes the estimates of parameters in incubation distribution using the Qin's method, interval‐censoring method, and our proposed method. We can see that when π=0, our proposed method and Qin's method provide similar results. For π=0.2, our approach has smaller bias in Weibull setting. Due to the fact that the loglik is too flat near the maximum, the estimates may be a little biased in finite sample. With larger sample size, the bias is getting smaller. The IC method does not perform well in our simulation as it does not take the length‐biased sampling issue and the cross infection probability π into consideration.

TABLE 1.

Estimation of incubation distribution in simulation

(a) Gamma incubation fI(t;θ)=βαtα1eβt/Γ(α); α=5, β=0.8
Proposed method Qin's method IC method
π m α^ (SE) β^ (SE) π^ (SE) α^ (SE) β^ (SE) π^ (SE) α^ (SE) β^ (SE) π^ (SE)
0 600 4.43 0.76 0.11 4.47 0.76 0.10 9.79 1.07 0
    (1.22) (0.14) (0.14) (1.13) (0.13) (0.12) (1.24) (0.14) (0)
  1200 4.47 0.76 0.09 4.49 0.76 0.08 9.72 1.06 0
    (0.94) (0.10) (0.11) (0.89) (0.10) (0.09) (0.88) (0.10) (0)
  1800 4.55 0.77 0.07 4.54 0.76 0.07 9.70 1.06 0
    (0.77) (0.08) (0.09) (0.75) (0.08) (0.08) (0.82) (0.08) (0)
0.2 600 5.36 0.83 0.20 5.37 0.83 0.19 9.61 1.00 0
    (1.64) (0.17) (0.15) (1.62) (0.17) (0.14) (1.21) (0.13) (0)
  1200 5.33 0.83 0.19 5.33 0.82 0.18 9.51 0.99 0
    (1.28) (0.13) (0.11) (1.29) (0.13) (0.11) (0.83) (0.09) (0)
  1800 5.25 0.82 0.19 5.25 0.82 0.19 9.49 0.99 0
    (1.08) (0.10) (0.10) (1.10) (0.11) (0.10) (0.67) (0.07) (0)
(b) Weibull incubation fI(t;θ)=k(t/λ)k1exp{(t/λ)k}/λ; k=2, λ=8
Proposed method Qin's method IC method
π m k^ (SE) λ^ (SE) π^ (SE) k^ (SE) λ^ (SE) π^ (SE) k^ (SE) λ^ (SE) π^ (SE)
0 600 1.92 7.49 0.11 2.00 7.89 0.02 3.49 11.99 0
    (0.24) (0.86) (0.17) (0.19) (0.47) (0.05) (0.24) (0.30) (0)
  1200 1.93 7.57 0.09 2.00 7.95 0.01 3.57 12.00 0
    (0.19) (0.72) (0.14) (0.13) (0.34) (0.03) (0.17) (0.20) (0)
  1800 1.93 7.62 0.07 2.00 7.97 0.01 3.56 12.00 0
    (0.16) (0.64) (0.12) (0.11) (0.27) (0.02) (0.14) (0.17) (0)
0.2 600 2.07 8.20 0.19 2.18 8.70 0.10 3.48 12.42 0
    (0.29) (1.05) (0.19) (0.22) (0.71) (0.11) (0.21) (0.29) (0)
  1200 2.04 8.17 0.19 2.17 8.73 0.09 3.46 12.44 0
    (0.23) (0.88) (0.15) (0.16) (0.59) (0.09) (0.15) (0.20) (0)
  1800 2.03 8.14 0.19 2.17 8.73 0.09 3.45 12.43 0
    (0.20) (0.80) (0.14) (0.14) (0.53) (0.08) (0.12) (0.16) (0)
(c) Log‐normal incubation fI(t;θ)=exp{(logtμ)2/2σ2}/2πσ2t; μ=1.8, σ=0.4
Proposed method Qin's method IC method
π m μ^ (SE) σ^ (SE) π^ (SE) μ^ (SE) σ^ (SE) π^ (SE) μ^ (SE) σ^ (SE) π^ (SE)
0 600 1.73 0.42 0.08 1.74 0.42 0.07 2.18 0.33 0
    (0.10) (0.04) (0.10) (0.10) (0.04) (0.09) (0.02) (0.02) (0)
  1200 1.74 0.42 0.06 1.75 0.42 0.05 2.18 0.33 0
    (0.08) (0.03) (0.07) (0.07) (0.03) (0.07) (0.02) (0.02) (0)
  1800 1.75 0.42 0.05 1.76 0.41 0.04 2.18 0.33 0
    (0.07) (0.03) (0.06) (0.06) (0.03) (0.06) (0.01) (0.01) (0)
0.2 600 1.82 0.39 0.18 1.83 0.39 0.18 2.22 0.33 0
    (0.11) (0.04) (0.11) (0.10) (0.04) (0.11) (0.02) (0.02) (0)
  1200 1.82 0.39 0.19 1.82 0.39 0.18 2.22 0.33 0
    (0.08) (0.03) (0.08) (0.08) (0.03) (0.08) (0.02) (0.01) (0)
  1800 1.81 0.40 0.19 1.81 0.40 0.19 2.22 0.33 0
    (0.06) (0.03) (0.07) (0.06) (0.03) (0.07) (0.01) (0.01) (0)

Note. Estimates and standard error. The first panel is our proposed method: mixture distribution with censoring. The second panel is Qin's method. The third panel is the IC method.

For generation time estimation, we assume that both generation time and incubation period follow Gamma distributions. The mean and variance of these two periods are listed in Figure 2. We generate 200 serial intervals. Note that it is possible that some serial intervals are negative. We choose the kernel chf ϕK(t)=(1t2)+3, and according to Karunamuni (2009), ϕKc(t;y)={a0(y)ia1(y)t}(1t2)3I(|t|1), where

a0(y)a1(y)=y/h48[t(t215)cos(t)+3(52t2)sin(t)]πt7dty/h48[5t(2t221)cos(t)+(t445t2+105)sin(t)]πt8dty/h48[t(t215)cos(t)+3(52t2)sin(t)]πt6dty/h48[5t(2t221)cos(t)+(t445t2+105)sin(t)]πt7dt1×10.

The results are displayed in Figure 2. The cyan line is the fitted Gamma density using observed positive serial interval data. The red line is the estimated generation time density by deconvolution. We can see that the estimated density of generation time by deconvolution is more close to the true density than fitting the serial intervals, although the deconvolution estimate may be negative in some area.

FIGURE 2.

FIGURE 2

Histogram of serial interval data and density of generation time in simulation Note. The expectation and variance of generation time and incubation period are listed in each subfigure. Black line: true density; cyan line: Gamma fit of S by deleting negative observations; red line: estimated density. This figure appears in color in the electronic version of this paper, and any mention of color refers to that version.

6. ANALYSIS RESULTS ON THE COVID‐19 OUTBREAK

In this section, we analyze the real data of COVID‐19 outbreak, originated from Wuhan, China. As described in Section 2, the times between departure from Wuhan and symptoms onset were collected for the 1211 cases that got infected in Wuhan and developed symptoms outside Hubei Province; see Figure 3 for the histogram of the collected observations.

Table 2 summarizes the estimates of model parameters as defined in Section 3 and quantiles in the incubation distribution with their 95% confidence intervals (CIs) by nonparametric bootstrap. The last two columns list the loglik and goodness‐of‐fit (GoF) χ2 statistic of each parametric distribution of the incubation period, with higher loglik and lower GoF means a better fit of the model. The number in the bracket of GoF is the P‐value of GoF test, and all these three models have a good fit. More details about the GoF test is in Web Appendix E. Likelihood ratio test about π can be conducted based on the mixture distribution of half 0 and half chi‐squared distribution with 1 degree of freedom to infer the magnitude of π (Self and Liang, 1987; Susko, 2013). At significant level 0.05, the critical value is 2.71. Although the point estimate of π is zero, the loglik is flat in the region π[0,0.2], which results in a situation where a null hypothesis such as H0:π>0.1 or H0:π<0.1 cannot be reject at significant level 0.05, since 2[maxθ(θ,0)maxθ(θ,0.1)]<2.71 (illustrated in Figure 3). Our model estimated that about 1% of patients have incubation periods longer than 21 days. This might influence the length of quarantine period in regions with a severe epidemic.

TABLE 2.

Estimation of incubation distribution for COVID‐19 data

(a) Gamma incubation
α β π Mean 0.25 Q Median 0.75 Q 0.90 Q 0.99 Q GoF
95% CI 95% CI 95% CI 95% CI 95% CI 95% CI 95% CI 95% CI 95% CI loglik (P‐value)
4.97 0.55 0.00 9.10 6.13 8.50 11.43 14.57 21.17 −3260 13.46
[3.75; [0.45; [0.00; [7.86; [4.97; [7.22; [10.02; [13.22; [19.63; (0.41)
6.25] 0.66] 0.15] 9.66] 6.80] 9.15] 11.98] 15.10] 22.07]
(b) Weibull incubation
k λ π Mean 0.25 Q Median 0.75 Q 0.90 Q 0.99 Q GoF
95% CI 95% CI 95% CI 95% CI 95% CI 95% CI 95% CI 95% CI 95% CI (P‐value)
2.04 9.70 0.00 8.60 5.26 8.10 11.39 14.61 20.53 −3260 14.09
[1.72; [7.88; [0.00; [7.03; [3.84; [6.40; [9.52; [12.69; [18.58; (0.37)
2.26] 10.25] 0.27] 9.08] 5.86] 8.67] 11.91] 15.11] 21.38]
(c) Log‐normal incubation
μ σ π Mean 0.25 Q Median 0.75 Q 0.90 Q 0.99 Q GoF
95% CI 95% CI 95% CI 95% CI 95% CI 95% CI 95% CI 95% CI 95% CI loglik (P‐value)
2.17 0.39 0.00 9.44 6.70 8.74 11.39 14.46 21.82 −3263 15.44
[2.08; [0.35; [0.00; [8.81; [6.03; [8.02; [10.76; [13.78; [20.59; (0.28)
2.24] 0.43] 0.00] 9.99] 7.33] 9.36] 11.94] 15.02] 22.81]

Note. Model parameter estimates, incubation quantiles (with 95% CIs), log‐likelihood, goodness‐of‐fit statistic in the distribution estimation of incubation period.

Figure 3 plots the twice of loglik ratio, 2[maxθ,π(θ,π)maxθ(θ,π)], versus π. The dashed line is at 2.71, the 90% quantile of chi‐squared distribution with 1 degree of freedom. In fact, the horizontal ordinate of the crossover point is the 95% upper bound of π by likelihood ratio, since 0.5+0.5χ2(2.71,1)=0.95 (mixed chi‐squared distribution), where χ2(·,1) is the cdf of chi‐squared distribution with 1 degree of freedom.

From the last two columns in Table 2 we can see that Gamma distribution slightly outperforms among three distributions, having the smallest GoF statistic. The corresponding incubation period has an estimated mean of 9.10 days and median of 8.50 days, and possess a heavy tail. About 10% infected individuals would develop symptoms after 14.57 days and 1% after 21.17 days. Although the CI of π is relatively wide, variation of the results on the quantiles of incubation period is not significant as shown in Table 2. Figure 3 visualizes the estimate on the histogram of the time between leaving Wuhan and symptoms onset.

For the estimation of the distribution of generation time, we choose the kernel chf ϕK(t)=(1t2)+3 in (9) with bandwidth h=2. The estimated probability density of generation time based on the estimated Gamma incubation period is displayed in Figure 4. We can see that the distributions of generation time has much smaller variance than the serial interval.

FIGURE 4.

FIGURE 4

Estimated generation time density (red line) using 71 observed serial intervals in COVID‐19 outbreak Note. The black line is the density of serial interval data. This figure appears in color in the electronic version of this paper, and any mention of color refers to that version.

Based on the daily reported new cases from January 20 to January 30, 2020, the early phase of COVID‐19 outbreak, the exponential epidemic growth rate r (Malthusian coefficient) is estimated at 0.275 (SE 0.042) estimated by fitting a least square line to the daily number of reported new confirmed cases in a log‐scale outside Hubei Province (since only by regressing daily new confirmed cases rather than cumulative ones can the residuals be regarded independent). Note that the confirmed cases in Hubei Province were excluded here because there may be a significant underestimation of the number of infected individuals in Hubei Province and the first confirmed case outside Hubei Province was reported on January 20, 2020 (Imai et al., 2020; You et al., 2020). Hence the basic reproduction number can be calculated according to the Euler‐Lotka equation in a moment generating form

R^0=10+ertfG(t)dt. (10)

The point estimate of the basic reproduction number is 2.96 with 95% CI [2.15; 3.86]. Note that the estimate of R 0 is 2.18 using serial interval data instead of generation time, which severely underestimates the infectiousness ability of the disease.

7. DISCUSSION

In this paper, we proposed an estimation for incubation distribution, which only requires information on travel histories and dates of symptoms onset. Unlike the approach in Kuk and Ma (2005), our estimation of incubation period is feasible regardless that the disease is infectious or not during the incubation period. It enhances the estimation by increasing available sample size and utilizing censored information. We also took mixture distribution of forward time and complete incubation period and the interval censoring caused by daily reports into consideration, hence the result should be more robust than that in Qin et al. (2020).

According to the theory of renewal process, the density of forward time should be a decreasing function as it is proportional to the survival function of incubation period. If the density of the observed time between departure from Wuhan and symptoms onset is unimodal, it might be because of the fact that (a) the observations come from a mixture of forward time and full incubation period; (b) the discretized time. Hence, an estimation using mixture distribution together with the censored intervals is recommended if the observed density is not monotonically decreasing. Mixture distribution is robust in incubation analysis in that the potential problem due to the existence of short‐term tourists can be addressed by introducing π into the model. In addition, fewer observations of zeros than ones is still reasonable even if there is no full incubation period mixed in the cohort (when π=0), as the probability to be captured in our cohort is reduced by half if the “scheduled” departure from Wuhan and symptoms onset occur on the same day, which can be well reflected in the interval‐censoring situation since FI(0+;θ)FI(0;θ) is just equal to FI(0+;θ).

Compared with the estimated incubation period in Li et al. (2020), Backer et al. (2020), and Linton et al. (2020), our estimation yields a longer estimate of incubation period. This is possibly because we avoided the selection bias by considering a longer follow‐up period after departure from Wuhan and successfully recruiting the cases with long incubation periods. However, a limitation here is raised by the possible violation of assumption that the individuals included in the study were either infected in Wuhan or on the way to their destination from Wuhan. Violation of such assumption (eg, a family departed from Wuhan together and infection occurred inside the family at destination) leads to an overestimation of incubation period.

Furthermore, a consistent estimation of generation time distribution was proposed under two different scenarios through deconvolution. The efficiency of deconvolution is influenced by the choices of kernel function and the corresponding bandwidth. For a relatively small sample size, the estimate of density function can be negative due to the integration of complex function. The choice of kernel and bandwidth is ad hoc for finite sample size. One possible approach to select kernel and bandwidth is by conducting simulation using prior distribution of generation time.

In the previous studies of the basic reproduction number of COVID‐19, Zhao et al. (2020a, 2020b) estimated R 0 at 2.56 (95% CI [2.49; 2.63]) through the exponential growth using the distribution of serial interval, which might result in an underestimation due to the use of serial intervals rather than generation time (the serial intervals of SARS and MERS were used in their study rather than that of COVID‐19 due to the lack of information). Jung et al. (2020) estimated R 0 at 2.1 (95% CI [2.0; 2.2]) and 3.2 (95% CI [2.7; 3.7]) also through the exponential growth under two scenarios using exported cases. Some other estimation were based on dynamic models, such as Wu et al. (2020) at 2.68 (95% CI [2.47; 2.86]) and Read et al. (2020) at 3.11 (95% CI [2.39; 4.13]). Our estimate of R 0 is a little higher than those estimates obtained by exponential growth rate model that used serial intervals. Note that the recall bias embedded in epidemiology investigation is inevitable as long as contact‐tracing data were used for analysis, which might affect the estimation for generation time and basic reproduction number. It is worth mentioning that there is a large proportion of asymptomatic infected cases (Li et al., 2020). Whether the exponential growth rate can reflect the growth of all infected cases is untestable. If the asymptomatic cases have longer generation time, then the real distribution of generation time would be more variated and the R 0 would be overestimated.

Supporting information

Web Appendix A, B, C, D and E referenced in Section 3–6, is available with this paper at the Biometrics website on Wiley Online Library.

ACKNOWLEDGMENTS

We thank Dr. Dean Follmann from National Institute of Allergy and Infectious Diseases for comments that greatly improved the manuscript. We thank Qiushi Lin at Peking University for collecting the data. We also thank Yuan Zhang at Peking University for finding a condition for continuous inversion formula. This work was supported by National Natural Science Foundation of China grant (82041023, 11771144, 11971300, 11871287), Zhejiang University special scientific research fund for COVID‐19 prevention and control, the State Key Program of the National Natural Science Foundation of China (71931004), the development fund for Shanghai talents, the 111 project (B14019), and the Fundamental Research Funds for the Central Universities.

Deng Y, You C, Liu Y, Qin J, Zhou X‐H. Estimation of incubation period and generation time based on observed length‐biased epidemic cohort with censoring for COVID‐19 outbreak in China. Biometrics. 2021;77:929–941. 10.1111/biom.13325

DATA AVAILABILITY STATEMENT

The data and R codes that support the findings in this paper are openly available on github https://github.com/naiiife/wuhan (Deng et al., 2020).

REFERENCES

  1. Anastassiou, G. (2000) Handbook of Analytic‐Computational Methods Applied Mathematics. Boca Raton, FL: Chapman & Hall/CRC. [Google Scholar]
  2. Backer, J.A. , Klinkenberg, D. and Wallinga, J. (2020) Incubation period of 2019 novel coronavirus (2019‐nCoV) infections among travellers from Wuhan, China, 20–28 january 2020. Eurosurveillance, 25, 2000062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bi, Q. , Wu, Y. , Mei, S. , Ye, C. , Zou, X. , Zhang, Z. et al. (2020) Epidemiology and transmission of COVID‐19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study. The Lancet Infectious Diseases. Available at: 10.1016/S1473-3099(20)30287-5. [DOI] [PMC free article] [PubMed]
  4. Britton, T. and Scalia Tomba, G. (2019) Estimation in emerging epidemics: biases and remedies. Journal of the Royal Society Interface, 16, 20180670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cowling, B.J. , Muller, M.P. , Wong, I.O. , Ho, L.‐M. , Louie, M. , McGeer, A. et al. (2007) Alternative methods of estimating an incubation distribution: examples from severe acute respiratory syndrome. Epidemiology, 18, 253–259. [DOI] [PubMed] [Google Scholar]
  6. De Gruttola, V. and Lagakos, S.W. (1989) Analysis of doubly‐censored survival data, with application to aids. Biometrics, 45, 1–11. [PubMed] [Google Scholar]
  7. Deng, Y. , You, C. , Lin, Q. and Hu, T. (2020) Forward time for departure from Wuhan and some serial intervals of COVID‐19, Available at: https://github.com/naiiife/wuhan.
  8. Devroye, L. (1989) Consistent deconvolution in density estimation. The Canadian Journal of Statistics, 17, 235–239. [Google Scholar]
  9. Durrett, R. (2019) Probability: Theory and Examples, Vol. 49. Cambridge: Cambridge University Press. [Google Scholar]
  10. Farewell, V. , Herzberg, A. , James, K. , Ho, L. and Leung, G. (2005) SARS incubation and quarantine times: when is an exposed individual known to be disease free? Statistics in Medicine, 24, 3431–3445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Fine, P.E. (2003) The interval between successive cases of an infectious disease. American Journal of Epidemiology, 158, 1039–1047. [DOI] [PubMed] [Google Scholar]
  12. Gentleman, R. and Geyer, C.J. (1994) Maximum likelihood for interval censored data: consistency and computation. Biometrika, 81, 618–623. [Google Scholar]
  13. Guan, W.‐J. , Ni, Z.‐Y. , Hu, Y. , Liang, W.‐H. , Ou, C.‐Q. , He, J.‐X. , Liu, L. , et al. (2020) Clinical characteristics of 2019 novel coronavirus infection in China. New England Journal of Medicine. Available at: 10.1101/2020.02.06.20020974. [DOI] [Google Scholar]
  14. Huang, C. , Wang, Y. , Li, X. , Ren, L. , Zhao, J. , Hu, Y. et al. (2020) Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet, 395, 497–506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Imai, N. , Dorigatti, I. , Cori, A. , Riley, S. and Ferguson, N.M. (2020) Estimating the potential total number of novel coronavirus (2019‐nCoV) cases in Wuhan City, China. Available at: https://www.imperial.ac.uk/mrc‐global‐infectious‐disease‐analysis/news–wuhan‐coronavirus/.
  16. Jung, S.‐M. , Akhmetzhanov, A.R. , Hayashi, K. , Linton, N.M. , Yang, Y. , Yuan, B. et al. (2020) Real‐time estimation of the risk of death from novel coronavirus (COVID‐19) infection: inference using exported cases. Journal of Clinical Medicine, 9, 523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Karunamuni, R.J. (2000) Boundary bias correction for nonparametric deconvolution. Annals of the Institute of Statistical Mathematics, 52, 612–629. [Google Scholar]
  18. Karunamuni, R.J. (2009) Deconvolution boundary kernel method in nonparametric density estimation. Journal of Statistical Planning & Inference, 139, 2269–2283. [Google Scholar]
  19. Kuk, A.Y. and Ma, S. (2005) The estimation of SARS incubation distribution from serial interval data using a convolution likelihood. Statistics in Medicine, 24, 2525–2537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Kuo, J. , Taylor, J. and Detels, R. (1991) Estimating the aids incubation period from a prevalent cohort. American Journal of Epidemiology, 133, 1050–1057. [DOI] [PubMed] [Google Scholar]
  21. Lehmann, E.L. and Romano, J.P. (2006) Testing Statistical Hypotheses. Berlin: Springer Science & Business Media. [Google Scholar]
  22. Lessler, J. , Reich, N.G. , Cummings, D.A. and NYCD of Health and MHSII Team (2009) Outbreak of 2009 pandemic influenza A (H1N1) at a New York City school. New England Journal of Medicine, 361, 2628–2636. [DOI] [PubMed] [Google Scholar]
  23. Li, Q. , Guan, X. , Wu, P. , Wang, X. , Zhou, L. , Tong, Y. et al. (2020) Early transmission dynamics in Wuhan, China, of novel coronavirus‐infected pneumonia. New England Journal of Medicine, 382, 1199–1207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Li, R. , Pei, S. , Chen, B. , Song, Y. , Zhang, T. , Yang, W. , et al. (2020) Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS‐CoV‐2). Science, 368, 489–493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Linton, N.M. , Kobayashi, T. , Yang, Y. , Hayashi, K. , Akhmetzhanov, A.R. , Jung, S.‐M. et al. (2020) Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: a statistical analysis of publicly available case data. Journal of Clinical Medicine, 9, 538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Lipsitch, M. , Cohen, T. , Cooper, B. , Robins, J.M. , Ma, S. , James, L. et al. (2003) Transmission dynamics and control of severe acute respiratory syndrome. Science, 300, 1966–1970. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Liu, M.C. and Taylor, R.L. (1989) A consistent nonparametric density estimator for the deconvolution problem. Canadian Journal of Statistics, 17, 427–438. [Google Scholar]
  28. Lui, K.‐J. , Peterman, T.A. , Lawrence, D.N. and Allen, J.R. (1988) A model‐based approach to characterize the incubation period of paediatric transfusion‐associated acquired immunodeficiency syndrome. Statistics in Medicine, 7, 395–401. [DOI] [PubMed] [Google Scholar]
  29. Nishiura, H. (2010) Time variations in the generation time of an infectious disease: implications for sampling to appropriately quantify transmission potential. Mathematical Biosciences & Engineering, 7, 851–869. [DOI] [PubMed] [Google Scholar]
  30. Nishiura, H. and Inaba, H. (2011) Estimation of the incubation period of influenza a (H1N1‐2009) among imported cases: addressing censoring using outbreak data at the origin of importation. Journal of Theoretical Biology, 272, 123–130. [DOI] [PubMed] [Google Scholar]
  31. Qin, J. (2017) Biased Sampling, Over‐Identified Parameter Problems and Beyond. Berlin: Springer. [Google Scholar]
  32. Qin, J. , You, C. , Lin, Q. , Hu, T. , Yu, S. and Zhou, X.‐H. (2020) Estimation of incubation period distribution of COVID‐19 using disease onset forward time:a novel cross‐sectional and forward follow‐up study. Science Advances. Available at: 10.1101/2020.03.06.20032417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Read, J.M. , Bridgen, J.R. , Cummings, D.A. , Ho, A. and Jewell, C.P. (2020) Novel coronavirus 2019‐nCoV: early estimation of epidemiological parameters and epidemic predictions. Available at: https://www.medrxiv.org/content/10.1101/2020.01.23.20018549v1.full.pdf. [DOI] [PMC free article] [PubMed]
  34. Reich, N.G. , Lessler, J. , Cummings, D.A. and Brookmeyer, R. (2009) Estimating incubation period distributions with coarse data. Statistics in medicine, 28, 2769–2784. [DOI] [PubMed] [Google Scholar]
  35. Rothe, C. , Schunk, M. , Sothmann, P. , Bretzel, G. , Froeschl, G. , Wallrauch, C. et al. (2020) Transmission of 2019‐nCoV infection from an asymptomatic contact in Germany. New England Journal of Medicine, 382, 970–971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Self, S.G. and Liang, K.‐Y. (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association, 82, 605–610. [Google Scholar]
  37. Struthers, C.A. and Farewell, V.T. (1989) A mixture model for time to aids data with left truncation and an uncertain origin. Biometrika, 76, 814–817. [Google Scholar]
  38. Susko, E. (2013) Likelihood ratio tests with boundary constraints using data‐dependent degrees of freedom. Biometrika, 100, 1019–1023. [Google Scholar]
  39. Svensson, Å. (2007) A note on generation times in epidemic models. Mathematical Biosciences, 208, 300–311. [DOI] [PubMed] [Google Scholar]
  40. Tu, W. , Tang, H. , Chen, F. , Wei, Y. , Xu, T. , Liao, K. et al. (2020) Epidemic update and risk assessment of 2019 novel coronavirus — China, January 28, 2020. China CDC Weekly, 2, 83–86. [PMC free article] [PubMed] [Google Scholar]
  41. Vardi, Y. (1982a) Nonparametric estimation in renewal processes. The Annals of Statistics, 10, 772–785. [Google Scholar]
  42. Vardi, Y. (1982b) Nonparametric estimation in the presence of length bias. The Annals of Statistics, 10, 616–620. [Google Scholar]
  43. Vardi, Y. (1989) Multiplicative censoring, renewal processes, deconvolution and decreasing density: nonparametric estimation. Biometrika, 76, 751–761. [Google Scholar]
  44. Wallinga, J. and Lipsitch, M. (2007) How generation intervals shape the relationship between growth rates and reproductive numbers. Proceedings of the Royal Society B: Biological Sciences, 274, 599–604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Wang, C. , Horby, P.W. , Hayden, F.G. and Gao, G.F. (2020) A novel coronavirus outbreak of global health concern. The Lancet, 395, 470–473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Wu, J.T. , Leung, K. and Leung, G.M. (2020) Nowcasting and forecasting the potential domestic and international spread of the 2019‐nCoV outbreak originating in Wuhan, China: a modelling study. The Lancet, 395, 689–697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. You, C. , Deng, Y. , Hu, W. , Sun, J. , Lin, Q. , Zhou, F. et al. (2020) Estimation of the time‐varying reproduction number of COVID‐19 outbreak in China. International Journal of Hygiene and Environmental Health, 228, 113555. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. You, C. , Lin, Q. and Zhou, X.‐H. (2020) An estimation of the total number of cases of ncip (2019‐nCoV)—Wuhan, Hubei Province, 2019‐2020. China CDC Weekly, 2, 87–91. [PMC free article] [PubMed] [Google Scholar]
  49. Zhao, S. , Lin, Q. , Ran, J. , Musa, S.S. , Yang, G. , Wang, W. et al. (2020a) Preliminary estimation of the basic reproduction number of novel coronavirus (2019‐nCoV) in China, from 2019 to 2020: a data‐driven analysis in the early phase of the outbreak. International Journal of Infectious Diseases, 92, 214–217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Zhao, S. , Musa, S.S. , Lin, Q. , Ran, J. , Yang, G. , Wang, W. et al. (2020b) Estimating the unreported number of novel coronavirus (2019‐nCoV) cases in China in the first half of January 2020: a data‐driven modelling analysis of the early outbreak. Journal of Clinical Medicine, 9, 388. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web Appendix A, B, C, D and E referenced in Section 3–6, is available with this paper at the Biometrics website on Wiley Online Library.

Data Availability Statement

The data and R codes that support the findings in this paper are openly available on github https://github.com/naiiife/wuhan (Deng et al., 2020).


Articles from Biometrics are provided here courtesy of Wiley

RESOURCES