Skip to main content
Biology logoLink to Biology
. 2020 Mar 8;9(3):50. doi: 10.3390/biology9030050

Understanding Unreported Cases in the COVID-19 Epidemic Outbreak in Wuhan, China, and the Importance of Major Public Health Interventions

Zhihua Liu 1,, Pierre Magal 2,3,*,, Ousmane Seydi 4,, Glenn Webb 5,
PMCID: PMC7150940  PMID: 32182724

Abstract

We develop a mathematical model to provide epidemic predictions for the COVID-19 epidemic in Wuhan, China. We use reported case data up to 31 January 2020 from the Chinese Center for Disease Control and Prevention and the Wuhan Municipal Health Commission to parameterize the model. From the parameterized model, we identify the number of unreported cases. We then use the model to project the epidemic forward with varying levels of public health interventions. The model predictions emphasize the importance of major public health interventions in controlling COVID-19 epidemics.

Keywords: corona virus, reported and unreported cases, isolation, quarantine, public closings, epidemic mathematical model

1. Introduction

An epidemic outbreak of a new human coronavirus, termed the novel coronavirus COVID-19, has occurred in Wuhan, China. The first cases occurred in early December, 2019, and, by 29 January 2020, more than 7000 cases had been reported in China [1]. Early reports advise that COVID-19 transmission may occur from an infectious individual, who is not yet symptomatic [2]. Evidently, such asymptomatic infectious cases are not reported to medical authorities. For epidemic influenza outbreaks, reported cases are typically only a fraction of the total number of the symptomatic infectious individuals. For the current epidemic in Wuhan, it is likely that intensive efforts by Chinese public health authorities have reduced the number of unreported cases.

Our objective is to develop a mathematical model, which recovers from data of reported cases, the number of unreported cases for the COVID-19 epidemic in Wuhan. For this epidemic, a modeling approach has been developed in [3], which did not consider unreported cases. Our work continues the investigation in [4,5] of the fundamental problem of parameter identification in mathematical epidemic models. We address the following fundamental issues concerning this epidemic: How will the epidemic evolve in Wuhan with respect to the number of reported cases and unreported cases? How will the number of unreported cases influence the severity of the epidemic? How will public health measures, such as isolation, quarantine, and public closings, mitigate the final size of the epidemic?

2. Results

2.1. The Model and Data

Our model consists of the following system of ordinary differential equations:

S(t)=τS(t)[I(t)+U(t)],I(t)=τS(t)[I(t)+U(t)]νI(t),R(t)=ν1I(t)ηR(t),U(t)=ν2I(t)ηU(t). (1)

Here, tt0 is time in days, t0 is the beginning date of the epidemic, S(t) is the number of individuals susceptible to infection at time t, I(t) is the number of asymptomatic infectious individuals at time t, R(t) is the number of reported symptomatic infectious individuals (i.e., symptomatic infectious with sever symptoms) at time t, and U(t) is the number of unreported symptomatic infectious individuals (i.e., symptomatic infectious with mild symptoms) at time t. This system is supplemented by initial data

S(t0)=S0>0,I(t0)=I0>0,R(t0)=0andU(t0)=U00. (2)

Figure 1 is the flow chart of the model (1). The parameters of the model are listed in Table 1.

Figure 1.

Figure 1

Diagram flux.

Table 1.

Parameters of the model.

Symbol Interpretation Method
t0 Time at which the epidemic started fitted
S0 Number of susceptible at time t0 fixed
I0 Number of asymptomatic infectious at time t0 fitted
U0 Number of unreported symptomatic infectious at time t0 fitted
τ Transmission rate fitted
1/ν Average time during which asymptomatic infectious are asymptomatic fixed
f Fraction of asymptomatic infectious that become reported symptomatic infectious fixed
ν1=fν Rate at which asymptomatic infectious become reported symptomatic fitted
ν2=(1f)ν Rate at which asymptomatic infectious become unreported symptomatic fitted
1/η Average time symptomatic infectious have symptoms fixed

We use three sets of reported data to model the epidemic in Wuhan: First, data from the Chinese CDC for mainland China (Table 2), second, data from the Wuhan Municipal Health Commission for Hubei Province (Table 3), and third, data from the Wuhan Municipal Health Commission for Wuhan Municipality (Table 4). These data vary but represent the epidemic transmission in Wuhan, from which almost all the cases originated in the larger regions.

Table 2.

Reported case data 20–29 January 2020, reported for mainland China by the Chinese CDC [1].

Date           January 20 21 22 23 24 25 26 27 28 29
Confirmed cases (cumulated) for mainland China 291 440 571 830 1287 1975 2744 4515 5974 7711
Mortality cases (cumulated) for mainland China 9 17 25 41 56 80 106 132 170

Table 3.

Reported case data 23–31 January 2020, reported for Hubei Province by the Wuhan Municipal Health Commission [6].

Date           January 23 24 25 26 27 28 29 30 31
Confirmed cases (cumulated) for Hubei 549 729 1052 1423 2714 3554 4586 5806 7153
Mortality cases (cumulated) for Hubei 24 39 52 76 100 125 162 204 249

Table 4.

Reported case data 23–31 January 2020, reported for Wuhan Municipality by the Wuhan Municipal Health Commission [6].

Date           January 23 24 25 26 27 28 29 30 31
Confirmed cases (cumulated) for Wuhan 495 572 618 698 1590 1905 2261 2639 3215
Mortality cases (cumulated) for Wuhan 23 38 45 63 85 104 129 159 192

2.2. Comparison of Model (1) with the Data

For influenza disease outbreaks, the parameters τ, ν, ν1, ν2, η, as well as the initial conditions S(t0), I(t0), and U(t0), are usually unknown. Our goal is to identify them from specific time data of reported symptomatic infectious cases. To identify the unreported asymptomatic infectious cases, we assume that the cumulative reported symptomatic infectious cases at time t consist of a constant fraction along time of the total number of symptomatic infectious cases up to time t. In other words, we assume that the removal rate ν takes the following form: ν=ν1+ν2, where ν1 is the removal rate of reported symptomatic infectious individuals, and ν2 is the removal rate of unreported symptomatic infectious individuals due to all other causes, such as mild symptom, or other reasons.

The cumulative number of reported symptomatic infectious cases at time t, denoted by CR(t), is

CR(t)=ν1t0tI(s)ds. (3)

Our method is the following: We assume that CR(t) has the following special form:

CR(t)=χ1expχ2tχ3. (4)

We evaluate χ1,χ2,χ3 using the reported case data in Table 2, Table 3 and Table 4. We obtain the model starting time of the epidemic t0 from (4):

CR(t0)=0χ1expχ2t0χ3=0t0=1χ2lnχ3lnχ1.

We fix S0=11.081×106, which corresponds to the total population of Wuhan. We assume that the variation in S(t) is small during the period considered, and we fix ν,η,f. By using the method in Section 4, we can estimate the parameters ν1,ν2,τ and the initial conditions U0 and I0 from the cumulative reported cases CR(t) given (4). We then construct numerical simulations and compare them with data.

The evaluation of χ1, χ2, χ3, and t0, using the cumulative reported symptomatic infectious cases in Table 2, Table 3 and Table 4, is shown in Table 5 and in Figure 2 below.

Table 5.

Estimation of the parameters χ1, χ2, χ3, and t0 by using the cumulated reported cases in Table 2, Table 3 and Table 4.

Name of the Parameter χ1 χ2 χ3 t0
From Table 2 for China 0.16 0.38 1.1 5.12
From Table 3 for Hubei 0.23 0.34 0.1 2.45
From Table 4 for Wuhan 0.36 0.28 0.1 4.52

Figure 2.

Figure 2

In the left side figures, the dots correspond to tlnCR(t)+χ3, and in the right side figures, the dots correspond to tCR(t), where CR(t) is taken from the cumulated confirmed cases in Table 2 (top), in Table 3 (middle), and in Table 4 (bottom). The straight line in the left side figures corresponds to tlnχ1+χ2t. We first estimate the value of χ3 and then use a least square method to evaluate χ1 and χ2. We observe that the data for China in Table 2 and Hubei in Table 3 provides a good fit for CR(t) in (4), while the data for Wuhan in Table 4 does not provide a good fit for CR(t) in (4).

Remark 1.

According to Table 2, Table 3 and Table 4, the time t=0 will correspond to 31 December. Thus, in Table 5, the value t0=5.12 means that the starting time of the epidemic is 5 January, the value t0=2.45 means that the starting time of the epidemic is 28 December, and t0=4.52 means that the starting time of the epidemic is 26 December.

Remark 2.

As long as the number of reported cases follows (1), we can predict the future values of CR(t). For χ1=0.16, χ2=0.38, and χ3=1.1, we obtain

30January31January1February2February3February4February5February6February851012,39018,05026,29038,29055,77081,240118,320

The actual number of reported cases for China are 8163 confirmed for 30 January, 11,791 confirmed for 30 January, and 14,380 confirmed for 1 February. Thus, the exponential formula (4) overestimates the number reported after day 30.

From now on, we fix the fraction f of symptomatic infectious cases that are reported. We assume that between 80% and 100% of infectious cases are reported. Thus, f varies between 0.8 and 1. We assume 1/ν, and the average time during which the patients are asymptomatic infectious varies between one day and seven days. We assume that 1/η, the average time during which a patient is symptomatic infectious, varies between one day and seven days. Thus, we fix f,ν,η. Since f and ν are known, we can compute

ν1=fνandν2=(1f)ν. (5)

Moreover, by following the approach described in the supplementary information, we should have

I0=χ1χ2expχ2t0fν=χ3χ2fν, (6)
τ=χ2+νS0η+χ2ν2+η+χ2, (7)

and

U0=ν2η+χ2I0=(1f)νη+χ2I0. (8)

By using the approach described in the supplementary material, the basic reproductive number for model (1) is given by

R0=τS0ν1+ν2η.

By using ν2=(1f)ν and (7), we obtain

R0=χ2+ννη+χ2(1f)ν+η+χ21+(1f)νη. (9)

2.3. Numerical Simulations

We can find multiple values of η, ν and f which provide a good fit for the data. For application of our model, η, ν and f must vary in a reasonable range. For the corona virus COVID-19 epidemic in Wuhan at its current stage, the values of η, ν and f are not known. From preliminary information, we use the values

f=0.8,η=1/7,ν=1/7.

By using formula (9) for the basic reproduction number, we obtain from the data in Table 2 that R0=4.13. Using model (1) and the values in Table 5, we plot the graph of tCR(t), tU(t) and the data for the confirmed cumulated cases in Figure 3, according to Table 2 for China, Table 3 for Hubei, and Table 4 for Wuhan. We observe from these figures that the data for China and Hubei fit the model (1), but the data for Wuhan do not fit the model (1) because the model (4) is not a good model for the data for Wuhan in Table 4. The data for Wuhan do not fit an exponential function.

Figure 3.

Figure 3

In these figures, we use f=0.8,η=1/7,ν=1/7, and S0=11.081×106. The remaining parameters are derived by using (6)–(8). In (a), we plot the number of tCR(t) (black solid line) and tU(t) (blue dotted) and the data (red dotted) corresponding to the confirmed cumulated cases for mainland China in Table 2. We use χ1=0.16, χ2=0.38, χ3=1.1, t0=5.12 and S0=11.081×106 which give τ=4.44×1008, I0=3.62, U0=0.2 and R0=4.13. In (b), we plot the number of tCR(t) (black solid line) and tU(t) (blue dotted) and the data (red dotted) corresponding to the confirmed cumulated case for Hubei province in Table 3. We use χ1=0.23, χ2=0.34, χ3=0.1 and t0=2.45 and S0=11.081×106 which give τ=4.11×1008I0=0.3, U0=0.02 and R0=3.82. In (c), we plot the number of tCR(t) (black solid line) and tU(t) (blue dotted) and the data (red dotted) corresponding to the confirmed cumulated cases for Wuhan in Table 4. We use χ1=0.36, χ2=0.28, χ3=0.1, t0=4.52, and S0=11.08×106, which give τ=3.6×1008, I0=0.25, U0=0.02, and R0=3.35.

In what follows, we plot the graphs of tCR(t), tU(t), and tR(t) for Wuhan by using model (1). We define the turning point tp as the time at which the red curve (i.e., the curve of the non-cumulated reported infectious cases) reaches its maximum value. For example, in the figure below, the turning point tp is day 54, which corresponds to 23 February for Wuhan.

In the following, we take into account the fact that very strong isolation measures have been imposed for all China since 23 January. Specifically, since 23 January, families in China are required to stay at home. In order to take into account such a public intervention, we assume that the transmission of COVID-19 from infectious to susceptible individuals stopped after 25 January. Therefore, we consider the following model: for tt0,

S(t)=τ(t)S(t)[I(t)+U(t)],I(t)=τ(t)S(t)[I(t)+U(t)]νI(t)R(t)=ν1I(t)ηR(t)U(t)=ν2I(t)ηU(t) (10)

where

τ(t)=4.44×1008,fort[t0,25],0,fort>25. (11)

The figure below takes into account the public health measures, such as isolation, quarantine, and public closings, which correspond to models (10) and (11). By comparison of Figure 4a with Figure 5, we note that these measures greatly mitigate the final size of the epidemic, and shift the turning point about 24 days before the turning point without these measures.

Figure 4.

Figure 4

Figure 4

In this figure, we plot the graphs of tCR(t) (black solid line), tU(t) (blue solid line) and tR(t) (red solid line). We use again f=0.8,η=1/7,ν=1/7, and S0=11.081×106. In (a), we use χ1=0.16, χ2=0.38, χ3=1.1, t0=5.12 for the parameter values for China which give τ=4.44×1008 for t[t0,25] and τ=0 for t>25, I0=3.62, U0=0.2. In (b), we use χ1=0.23, χ2=0.34, χ3=0.1 and t0=2.45, for the parameter values obtained from the data for Hubei province, which give τ=4.11×1008 for t[t0,25] and τ=0 for t>25, I0=0.3, U0=0.02. In (c), we use χ1=0.36, χ2=0.28, χ3=0.1, and t0=4.52 for the parameter values obtained from the data for Wuhan, which give τ=3.6×1008 for t[t0,25] and τ=0 for t>25, I0=0.25, U0=0.02. The cumulated number of reported cases goes up to 7000 in (b), 4000 in (b) and 1400 in (c), and the turning point is around 30 January in (ac).

Figure 5.

Figure 5

In this figure, we plot the graphs of tCR(t) (black solid line), tU(t) (blue solid line) and tR(t) (red solid line). We use f=0.8,η=1/7,ν=1/7, and S0=11.081×106. The remaining parameters are derived by using (6)–(8). We obtain τ=4.44×1008, I0=3.62 and U0=0.2. The cumulated number of reported cases goes up to 8.5 million people and the turning point is day 54. Thus, the turning point is 23 February (i.e., 54–31).

3. Discussion

An epidemic outbreak of a new human coronavirus COVID-19 has occurred in Wuhan, China. For this outbreak, the unreported cases and the disease transmission rate are not known. In order to recover these values from reported medical data, we present the mathematical model (1) for outbreak diseases. By knowledge of the cumulative reported symptomatic infectious cases, and assuming (1) the fraction f of asymptomatic infectious that become reported symptomatic infectious cases, (2) the average time 1/ν asymptomatic infectious are asymptomatic, and (3) the average time 1/η symptomatic infectious remain infectious, we estimate the epidemiological parameters in model (1). We then make numerical simulations of the model (1) to prodict forward in time the severity of the epidemic. We observe that public health measures, such as isolation, quarantine, and public closings, greatly reduce the final size of the epidemic, and make the turning point much earlier than without these measures. We observe that the predictive capability of model (1) requires valid estimates of the parameters f, ν, and η, which depend on the input of medical and biological epidemiologists. Our results can contribute to the prevention and control of the COVID-19 epidemic in Wuhan.

As a consequence of our study, we note that public health measures, such as isolation, quarantine, and public closings, greatly reduce the final size of this epidemic, and make the turning point much earlier than without these measures. With our method, we fix η, ν, and f and get the same turning point for the three data sets in Table 2, Table 3 and Table 4. We choose f=0.8, which means that around 80% of cases are reported in the model, since cases are very well documented in China. Thus, we only assume that a small fraction, 20%, were not reported. This assumption may be confirmed later on.

We also vary the parameters η, ν, and f, and we do not observe a strong variation of the turning point. Nevertheless, the number of reported cases are very sensitive to the data sets, as shown in the figures. Formula (4) for CR(t) is very descriptive until 26 January for the reported case data for China and Hubei but is not reasonable for Wuhan data. This suggests that the turning point is very robust, while the number of cases is very sensitive. We can find multiple values of η, ν, and f that provide a good fit for the data. This means that η, ν, and f should also be evaluated using other methods. The values 1/η=7 days and 1/ν=7 days are taken from information concerning earlier corona viruses, and are used now by medical authorities [2].

The predictive capability of models (1) and (10) requires valid estimates of the parameters f (fraction of asymptomatic infectious that become reported as symptomatic infectious), the parameter 1/ν (average time asymptomatic infectious are asymptomatic), and the parameter 1/η (average time symptomatic infectious remain infectious). In Figure 4, we graph R0 as a function of f and 1/ν for the data in Table 2, to illustrate the importance of these values in the evolution of the epidemic. The accuracy of these values depend on the input of medical and biological epidemiologists.

In influenza epidemics, the fraction f of reported cases may be significantly increased by public health reporting measures, with greater efforts to identify all current cases. Our model reveals the impact of an increase in this fraction f in the value of R0, as evident in Figure 6 above, for the COVID-19 epidemic in Wuhan.

Figure 6.

Figure 6

In this figure, we use 1/η=7 days, and we plot the basic reproductive number R0 as a function of f and 1/ν using (9) with χ2=0.38, which corresponds to the data for China in Table 2. If both f and 1/ν are sufficiently small, R0<1. The red plane is the value of R0=4.13.

4. Materials and Methods

4.1. Method to Estimate the Parameters of (1) from the Number of Reported Cases

In the following the parameters f,ν and η are fixed.

Step 1: Since f and ν, we know that

ν1=fνandν2=(1f)ν.

Step 2: By using Equation (3), we obtain

CR(t)=ν1I(t)χ1χ2expχ2t=ν1I(t) (12)

and

expχ2texpχ2t0=I(t)I(t0),

and therefore

I(t)=I0expχ2tt0. (13)

Moreover, by using (12) at t=t0,

I0=χ1χ2expχ2t0fν=χ3χ2fν. (14)

Step 3: In order to evaluate the parameters of the model, we replace S(t) by S0=11.081×106 on the right-hand side of (1) (which is equivalent to neglecting the variation of susceptibles due to the epidemic, which is consistent with the fact that tCR(t) grows exponentially). Therefore, it remains to estimate τ and η in the following system:

I(t)=τS0[I(t)+U(t)]νI(t)U(t)=ν2I(t)ηU(t). (15)

By using the first equation, we obtain

U(t)=1τS0I(t)+νI(t)I(t),

and therefore, by using (13), we must have

I(t)=I0expχ2tt0andU(t)=U0expχ2tt0,

so, by substituting these expressions into (15), we obtain

χ2I0=τS0[I0+U0]νI0χ2U0=ν2I0ηU0. (16)

Remark 3.

Here, we fix τ in such a way that the value χ2 becomes the dominant eigenvalue of the linearized Equation (21) and (I0,U0) is the positve eigenvector associated with this dominant eigenvalue χ2. Thus, we apply implicitly the Perron–Frobenius theorem. Moreover, the exponentially growing solution (I(t),U(t)) that we consider (which is starting very close to (0,0)) follows the direction of the positive eigenvector associated with the dominant eigenvalue χ2.

By dividing the first equation of (16) by I0 we obtain

χ2=τS01+U0I0ν

and hence

U0I0=χ2+ντS01. (17)

By using the second equation of (16), we obtain

U0I0=ν2η+χ2. (18)

By using (17) and (18), we obtain

τ=χ2+νS0η+χ2ν2+η+χ2. (19)

By using (18), we can compute

U0=ν2η+χ2I0=(1f)νη+χ2I0. (20)

4.2. Computation of the Basic Reproductive Number R0

In this section, we apply results in Diekmann, Heesterbeek, and Metz [7] and Van den Driessche and Watmough [8]. The linearized equation of the infectious part of the system is given by

I(t)=τS0[I(t)+U(t)]νI(t),U(t)=ν2I(t)ηU(t). (21)

The corresponding matrix is

A=τS0ντS0ν2η

and the matrix A can be rewritten as

A=VS

where

V=τS0τS0ν20andS=ν00η.

Therefore, the next generation matrix is

VS1=τS0ντS0ην2ν0

which is a Leslie matrix, and the basic reproductive number becomes

R0=τS0ν1+ν2η. (22)

By using (19), we obtain

R0=χ2+νS0η+χ2ν2+η+χ2S0ν1+ν2η

and, by using ν2=(1f)ν, we obtain

R0=χ2+ννη+χ2(1f)ν+η+χ21+(1f)νη. (23)

Acknowledgments

We thank Arnaud Ducrot for discussion and technical assistance.

Author Contributions

Z.L., O.S., P.M., and G.W. conceived and designed the study. P.M. and O.S. analyzed the data, carried out the analysis, and performed numerical simulations, Z.L. and G.W. conducted the literature review. All authors participated in writing and reviewing the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by NSFC and CNRS (Grant Nos. 11871007 and 11811530272) and the Fundamental Research Funds for the Central Universities.

Conflicts of Interest

The authors declare no conflict of interest.

References

  • 1.Chinese Center for Disease Control and Prevention. [(accessed on 30 January 2020)]; Available online: http://www.chinacdc.cn/jkzt/crb/zl/szkb_11803/jszl_l11809/
  • 2.New England Journal of Medicine, Letter to the Editor. [(accessed on 30 January 2020)]; Available online: https://www.nejm.org/doi/full/10.1056/NEJMc2001468.
  • 3.Tang B., Wang X., Li Q., Bragazzi N.L., Tang S., Xiao Y., Wu J. Estimation of the Transmission Risk of 2019-nCov and Its Implication for Public Health Interventions. [(accessed on 30 January 2020)];J. Clin. Med. 2020 9:462. doi: 10.3390/jcm9020462. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3525558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Magal P., Webb G. The parameter identification problem for SIR epidemic models: Identifying Unreported Cases. J. Math. Biol. 2018;77:1629–1648. doi: 10.1007/s00285-017-1203-9. [DOI] [PubMed] [Google Scholar]
  • 5.Ducrot A., Magal P., Nguyen T., Webb G. Identifying the Number of Unreported Cases in SIR Epidemic Models. Math. Med. Biol. J. IMA. 2019 doi: 10.1093/imammb/dqz013. [DOI] [PubMed] [Google Scholar]
  • 6.Wuhan Municipal Health Commission. [(accessed on 30 January 2020)]; Available online: http://wjw.wuhan.gov.cn/front/web/list3rd/yes/802.
  • 7.Diekmann O., Heesterbeek J.A.P., Metz J.A.J. On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations. J. Math. Biol. 1990;28:365–382. doi: 10.1007/BF00178324. [DOI] [PubMed] [Google Scholar]
  • 8.Van den Driessche P., Watmough J. Reproduction numbers and subthreshold endemic equilibria for compartmental models of disease transmission. Math. Biosci. 2002;180:29–48. doi: 10.1016/S0025-5564(02)00108-6. [DOI] [PubMed] [Google Scholar]

Articles from Biology are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES