Skip to main content
Biology logoLink to Biology
. 2020 Jun 17;9(6):132. doi: 10.3390/biology9060132

Unreported Cases for Age Dependent COVID-19 Outbreak in Japan

Quentin Griette 1,*, Pierre Magal 1,*, Ousmane Seydi 2
PMCID: PMC7345722  PMID: 32560572

Abstract

We investigate the age structured data for the COVID-19 outbreak in Japan. We consider a mathematical model for the epidemic with unreported infectious patient with and without age structure. In particular, we build a new mathematical model and a new computational method to fit the data by using age classes dependent exponential growth at the early stage of the epidemic. This allows to take into account differences in the response of patients to the disease according to their age. This model also allows for a heterogeneous response of the population to the social distancing measures taken by the local government. We fit this model to the observed data and obtain a snapshot of the effective transmissions occurring inside the population at different times, which indicates where and among whom the disease propagates after the start of public mitigation measures.

Keywords: coronavirus, age-structured data, reported and unreported cases, isolation, quarantine, public closings, epidemic mathematical model

1. Introduction

COVID-19 disease caused by the severe acute respiratory syndrome coronavirus (SARS-CoV-2) first appeared in Wuhan, China, and the first cases were notified to WHO on 31 December 2019 [1,2]. Beginning in Wuhan as an epidemic, it then spread very quickly and was characterized a pandemic on 11 March 2020 [1]. Symptoms of this disease include fever, shortness of breath, cough, and a non-negligible proportion of infected individuals may develop severe forms of the symptoms leading to their transfer to intensive care units and, in some cases, death, see e.g., Guan et al. [3] and Wei et al. [4]. Both symptomatic and asymptomatic individuals can be infectious [4,5,6], which makes the control of the disease particularly challenging.

The virus is characterized by its rapid progression among individuals, most often exponential in the first phase, but also a marked heterogeneity in populations and geographic areas [7,8,9]. The number of reported cases worldwide exceeded 3 millions as of 3 May 2020 [10]. The heterogeneity of the number of cases and the severity according to the age groups, especially for children and elderly people, aroused the interest of several researchers [11,12,13,14,15]. Indeed, several studies have shown that the severity of the disease increases with the age and co-morbidity of hospitalized patients (see e.g., To et al. [15] and Zhou et al. [8]). Wu et al. [16] have shown that the risk of developing symptoms increases by 4% per year in adults aged between 30 and 60 years old while Davies et al. [17] found that there is a strong correlation between chronological age and the likelihood of developing symptoms. Since completely asymptomatic individuals can also be contagious, a higher probability of developing symptoms does not necessarily imply greater infectiousness: Zou et al. [6] found that, in some cases, the viral load in asymptomatic patients was similar to that in symptomatic patients. Moreover while adults are more likely to develop symptoms, Jones et al. [18] found that the viral loads in infected children do not differ significantly from those of adults.

These findings suggest that a study of the dynamics of inter-generational spread is fundamental to better understand the spread of the coronavirus and most importantly to efficiently fight the COVID-19 pandemic. To this end the distribution of contacts between age groups in society (work, school, home, and other locations) is an important factor to take into account when modeling the spread of the epidemic. To account for these facts, some mathematical models have been developed [13,14,17,19,20]. In Ayoub et al. [19] the authors studied the dependence of the COVID-19 epidemic on the demographic structures in several countries but did not focus on the contacts distribution of the populations. In [13,14,17,20] a focus on the social contact patterns with respect to the chronological age has been made by using the contact matrices provided in Prem et al. [21]. While Ayoub et al. [19], Chikina and Pegden [20] and Davies et al. [17] included the example of Japan in their study, their approach is significantly different from ours. Indeed, Ayoub et al. [19] use a complex mathematical model to discuss the influence of the age structure on the infection in a variety of countries, mostly through the basic reproduction number R0. They use parameter values from the literature and from another study of the same group of authors [22], where the parameter identification is done by a nonlinear least-square minimization. Chikina and Pegden [20] use an age-structured model to investigate age-targeted mitigation strategies. They rely on parameter values from the literature and do discuss using age-structured temporal series to fit their model. Finally, Davies et al. [17] also discuss age-related effects in the control of the COVID epidemic, and use statistical inference to fit an age-structured SIR variant to data; the model is then used to discuss the efficiency of different control strategies. We provide a new, explicit computational solution for the parameter identification of an age-structured model. The model is based on the SIUR model developed in Liu et al. [23], which accounts for a differentiated infectiousness for reported and unreported cases (contrary to, for instance, other SIR-type models). In particular, our method is significantly different from nonlinear least-squares minimization and does not involve statistical inference.

In this article we focus on an epidemic model with unreported infectious symptomatic patients (i.e., with mild or no symptoms). Our goal is to investigate the age structured data of the COVID-19 outbreak in Japan. In Section 2 we present the age structured data and in Section 3 the mathematical models (with and without age structure). One of the difficulties in fitting the model to the data is that the growth rate of the epidemic is different in each age class, which lead us to adapt our early method presented in Liu et al. [23]. The new method is presented in the Appendix A. In Section 4 we present the comparison of the model with the data. In the last section we discuss our results.

2. Data

Patient data in Japan have been made public since the early stages of the epidemic with the quarantine of the Diamond Princess in the Haven of Yokohama. We used data from the website covid19japan.com (https://covid19japan.com. Accessed 6 May 2020) which is based on reports from national and regional authorities. Patients are labeled “confirmed” when tested positive to COVID-19 by PCR. Interestingly, the age class of the patient is provided for 13,660 out of 13,970 confirmed patients (97.8% of the confirmed population) as of 29 April. The age distribution of the infected population is represented in Figure 1 compared to the total population per age class (data from the Statistics Bureau of Japan estimate for 1 October 2019). In Figure 2 we plot the number of reported cases per 10,000 people of the same age class (i.e., the number of infected patients divided by the population of the age class times 10,000). Both datasets are given in Table 1 and a statistical summary is provided by Table 2. Note that the high proportion of 20–60 years old confirmed patients may indicate that the severity of the disease is lower for those age classes than for older patients, and therefore the disease transmits more easily in those age classes because of a higher number of asymptomatic individuals. Elderly infected individuals might transmit less because they are identified more easily. The cumulative number of death (Figure 3) is another argument in favor of this explanation. We also reconstructed the time evolution of the reported cases in Figure 4 and Figure 5. Note that the steepest curves precisely concern the 20–60-year old, probably because they are economically active and therefore have a high contact rate with the population.

Figure 1.

Figure 1

In this figure we plot in blue bars the age distribution of the Japanese population for 10,000 people and we plot in orange bars the age distribution of the number of reported cases of SARS-CoV-2 for 10,000 patient on 29 April (based on the total of 13,660 reported cases). We observe that 77% of the confirmed patients belong to the 20–60 years age class.

Figure 2.

Figure 2

In this figure we plot the number of infected patients for each age class per 10,000 individuals of the same age class (i.e., the number of infected individuals divided by the population of the age class times 10,000). The figure shows that the individuals are more or less likely to becomes infected depending on their age class. The bars describe the susceptibility of people to the SARS-CoV-2 depending on their age class.

Table 1.

The age distribution of Japan is taken from the Statistics Bureau of Japan [24]. The number of cases and the number of death the data come from Prefectural Governments and Japan Ministry of Health, Labour and Welfare.

Age group [0,10[ [10,20[ [20,30[ [30,40[ [40,50[ [50,60[ [60,70[ [70,80[ [80,90[ [90,100[
Age class for 2019 9,859,515 11,171,044 12,627,964 14,303,042 18,519,755 16,277,853 16,231,582 15,926,926 8,939,954 2,309,313
Age class per 10,000 people 781 885 1000 1133 1467 1290 1286 1262 709 183
Confirmed Cases 211 327 2216 2034 2220 2355 1566 1289 857 304
Death 0 0 0 2 6 4 7 37 49 9

Table 2.

Statistical summary of the data from Table 1.

Dataset Japanese Population Infected Deceased
First Quartile 28 28 68
Median 48 44 75
Third Quartile 67 59 81

Figure 3.

Figure 3

Cumulative number of SARS-CoV-2-induced deaths per age class (red bars). We observe that 83% of death occur in between 70 and 100 years old.

Figure 4.

Figure 4

Time evolution of the cumulative number of reported cases of SARS-CoV-2 per age class. The vertical axis represents the total number of cumulative reported cases in each age class.

Figure 5.

Figure 5

Time evolution of the cumulative number of reported cases of SARS-CoV-2 per age class. The vertical axis represents the total number of cumulative reported cases in each age class.

3. Methods

3.1. SIUR Model

The model consists of the following system of ordinary differential equations:

S(t)=τ(t)S(t)I(t)+U(t)N,I(t)=τ(t)S(t)I(t)+U(t)NνI(t),R(t)=ν1I(t)ηR(t),U(t)=ν2I(t)ηU(t). (1)

This system is supplemented by initial data

S(t0)=S00,I(t0)=I00,R(t0)0 and U(t0)=U00. (2)

Here tt0 is time in days, t0 is the starting date of the epidemic in the model, S(t) is the number of individuals susceptible to infection at time t, I(t) is the number of asymptomatic infectious individuals at time t, R(t) is the number of reported symptomatic infectious individuals at time t, and U(t) is the number of unreported symptomatic infectious individuals at time t. A flow chart of the model is presented in Figure 6.

Figure 6.

Figure 6

Compartments and flow chart of the model.

Asymptomatic infectious individuals I(t) are infectious for an average period of 1/ν days. Reported symptomatic individuals R(t) are infectious for an average period of 1/η days, as are unreported symptomatic individuals U(t). We assume that reported symptomatic infectious individuals R(t) are reported and isolated immediately, and cause no further infections. The asymptomatic individuals I(t) can also be viewed as having a low-level symptomatic state. All infections are acquired from either I(t) or U(t) individuals. A summary of the parameters involved in the model is presented in Table 3.

Table 3.

Parameters of the model.

Symbol Interpretation Method
t0 Time at which the epidemic started fitted
S0 Number of susceptible at time t0 fixed
I0 Number of asymptomatic infectious at time t0 fitted
U0 Number of unreported symptomatic infectious at time t0 fitted
τ(t) Transmission rate at time t fitted
D First day of public intervention fitted
μ Intensity of the public intervention fitted
1/ν Average time during which asymptomatic infectious are asymptomatic fixed
f Fraction of asymptomatic infectious that become reported symptomatic infectious fixed
ν1=fν Rate at which asymptomatic infectious become reported symptomatic fixed
ν2=(1f)ν Rate at which asymptomatic infectious become unreported symptomatic fixed
1/η Average time symptomatic infectious have symptoms fixed

Our study begins in the second phase of the epidemics, i.e., after the pathogen has succeeded in surviving in the population. During this second phase τ(t)τ0 is constant. When strong government measures such as isolation, quarantine, and public closings are implemented, the third phase begins. The actual effects of these measures are complex, and we use a time-dependent decreasing transmission rate τ(t) to incorporate these effects. The formula for τ(t) is

τ(t)=τ0,0tD,τ(t)=τ0expμtD,D<t. (3)

The date D is the first day of public intervention and μ characterises the intensity of the public intervention.

A similar model has been used to describe the epidemics in mainland China, South Korea, Italy, and other countries, and give reasonable trajectories for the evolution of the epidemic based on actual data [23,25,26,27,28,29]. Compared with these models, we added a scaling with respect to the total population size N, for consistency with the age-structured model (12). This only changes the value of the parameter τ and does not impact the qualitative or quantitative behavior of the model.

3.2. Comparison of the Model (1) with the Data

At the early stages of the epidemic, the infectious components of the model I(t), U(t) and R(t) must be exponentially growing. Therefore, we can assume that

I(t)=I0expχ2tt0.

The cumulative number of reported symptomatic infectious cases at time t, denoted by CR(t), is

CR(t)=ν1t0tI(s)ds. (4)

Since I(t) is an exponential function and CR(t0)=0 it is natural to assume that CR(t) has the following special form:

CR(t)=χ1expχ2tχ3. (5)

As in our early articles [23,26,27,28,29], we fix χ3=1 and we evaluate the parameters χ1 and χ2 by using an exponential fit to

χ1expχ2tCRdata(t).

We use only early data for this part, from day t=d1 until day t=d2, because we want to catch the exponential growth of the early epidemic and avoid the influence of saturation arising at later stages.

Remark 1.

The estimated parameters χ1 and χ2 will vary if we change the interval d1,d2.

Once χ1,χ2,χ3 are known, we can compute the starting time of the epidemic t0 from (5) as:

CR(t0)=0χ1expχ2t0χ3=0t0=1χ2lnχ3lnχ1.

We fix S0=126.8×106, which corresponds to the total population of Japan. The quantities I0, R0, and U0 correspond to the values taken by I(t), R(t) and U(t) at t=t0 (and in particular R0 should not be confused with the basic reproduction number R0). We fix the fraction f of symptomatic infectious cases that are reported. We assume that between 80% and 100% of infectious cases are reported. Thus, f varies between 0.8 and 1. We assume that the average time during which the patients are asymptomatic infectious 1/ν varies between 1 day and 7 days. We assume that the average time during which a patient is symptomatic infectious 1/η varies between 1 day and 7 days. In other words we fix the parameters f, ν, η. Since f and ν are known, we can compute

ν1=fνandν2=(1f)ν. (6)

Computing further (see below for more details), we should have

I0=χ1χ2expχ2t0fν=χ3χ2fν, (7)
τ=Nχ2+νS0η+χ2ν2+η+χ2, (8)
R0=ν1η+χ2I0=fνη+χ2I0, (9)

and

U0=ν2η+χ2I0=(1f)νη+χ2I0. (10)

By using the approach described in Diekmann et al. [30], van den Driessche and Watmough [31], the basic reproductive number for model (1) is given by

R0=τS0νN1+ν2η.

By using (8) we obtain

R0=χ2+νν(η+χ2)ν2+η+χ21+ν2η. (11)

3.3. Model SIUR with Age Structure

In what follows we will denote N1,,N10 the number of individuals respectively for the age classes [0,10[,,[90,100[. The model for the number of susceptible individuals S1(t),,S10(t), respectively for the age classes [0,10[,,[90,100[, is the following

S1(t)=τ1S1(t)ϕ1,1I1(t)+U1(t)N1++ϕ1,10I10(t)+U10(t)N10,S10(t)=τ10S10(t)ϕ10,1I1(t)+U1(t)N1++ϕ10,10I10(t)+U10(t)N10. (12)

The model for the number of asymptomatic infectious individuals I1(t),,I10(t), respectively for the age classes [0,10[,,[90,100[, is the following

I1(t)=τ1S1(t)ϕ1,1I1(t)+U1(t)N1++ϕ1,10I10(t)+U10(t)N10νI1(t),I10(t)=τ10S10(t)ϕ10,1I1(t)+U1(t)N1++ϕ10,10I10(t)+U10(t)N10νI10(t). (13)

The model for the number of reported symptomatic infectious individuals R1(t),,R10(t), respectively for the age classes [0,10[,,[90,100[, is

R1(t)=ν11I1(t)ηR1(t),R10(t)=ν110I10(t)ηR10(t). (14)

Finally the model for the number of unreported symptomatic infectious individuals U1(t),,U10(t), respectively in the age classes [0,10[,,[90,100[, is the following

U1(t)=ν21I1(t)ηU1(t),U10(t)=ν210I10(t)ηU10(t). (15)

In each age class [0,10[,,[90,100[ we assume that there is a fraction f1,,f10 of asymptomatic infectious individual who become reported symptomatic infectious (i.e., with severe symptoms) and a fraction (1f1),,(1f10) who become unreported symptomatic infectious (i.e., with mild symptoms). Therefore we define

ν11=νf1andν21=ν(1f1),ν110=νf10andν210=ν(1f10). (16)

In this model τ1,,τ10 are the respective transmission rates for the age classes [0,10[,,[90,100[.

The matrix ϕij represents the probability for an individual in the class i to meet an individual in the class j. In their survey, Prem and co-authors [21] present a way to reconstruct contact matrices from existing data and provide such contact matrices for a number of countries including Japan. Based on the data provided by Prem et al. [21] for Japan we construct the contact probability matrix ϕ. More precisely, we inferred contact data for the missing age classes [80,90[ and [90,100[. The precise method used to construct the contact matrix γ is detailed in Appendix B. An analogous contact matrix for Japan has been proposed by Munasinghe, Asai and Nishiura [32]. The contact matrix γ we used is the following

γij=4.030.920.471.690.830.920.780.560.570.570.718.061.381.361.961.740.750.860.740.570.551.054.632.251.841.920.940.460.740.731.521.202.544.972.982.401.760.990.530.730.691.421.932.873.912.761.351.330.950.530.340.481.201.461.612.971.400.981.230.950.280.180.200.520.380.772.671.720.921.230.120.100.090.180.190.250.761.991.180.930.090.100.080.090.130.170.270.641.611.190.090.090.100.080.090.130.170.270.641.61, (17)

where the ith line of the matrix γij is the average number of contact made by an individuals in the age class i with an individual in the age class j during one day. Notice that the higher number of contacts are achieved within the same age class. The matrix of conditional probability ϕ of contact between age classes is given by (18) and we plot a visual representation of this matrix in Figure 7.

ϕij=0.350.080.040.140.070.080.060.040.050.050.030.440.070.070.100.090.040.040.040.030.030.060.300.140.120.120.060.030.040.040.070.060.120.250.150.120.080.050.020.030.030.070.100.160.220.150.070.070.050.030.020.030.090.110.120.230.110.070.090.070.030.020.020.050.040.080.300.190.100.130.020.010.010.030.030.040.130.340.200.160.020.020.010.020.020.030.060.140.360.270.020.020.030.020.020.030.050.080.190.48. (18)

Figure 7.

Figure 7

Graphical representation of the contact matrix ϕ. The intensity of blue in the cell (i,j) indicates the conditional probability that, given a contact between an individual of age group i and another individual, the latter belongs to the age class j. The matrix was reconstructed from the data of Prem et al. [21], with the method described in Appendix B.

4. Results

4.1. Model without Age Structure

The daily number of reported cases from the model can be obtained by computing the solution of the following equation:

DR(t)=ν1I(t)DR(t),fortt0andDR(t0)=DR0. (19)

In Figure 8 and Figure 9 we employ the method presented previously in Liu et al. [29] to fit the data for Japan without age structure.

Figure 8.

Figure 8

Cumulative number of cases. We plot the cumulative data (reds dots) and the best fits of the model CR(t) (black curve) and CU(t) (green curve). We fix f=0.8, 1/η=7 days and 1/ν=7 and we apply the method described in Liu et al. [29]. The best fit is d1= 2 April, d2= 5 April, D= 27 April, μ=0.6, χ1=179, χ2=0.085, χ3=1 and t0= 13 January.

Figure 9.

Figure 9

Daily number of cases. We plot the daily data (black dots) with DR(t) (blue curve). We fix f=0.8, 1/η=7 days and 1/ν=7 and we apply the method described in Liu et al. [29]. The best fit is d1= 2 April, d2= 5 April, N= 27 April, μ=0.6, χ1=179, χ2=0.085, χ3=1 and t0= 13 January.

The model to compute the cumulative number of death from the reported individuals is the following

D(t)=ηDpR(t),fortt0andD(t0)=0, (20)

where ηD is the death rate of reported infectious symptomatic individuals and p is the case fatality rate (namely the fraction of death per reported infectious individuals).

In the simulation we chose 1/ηD=6 days and the case fatality rate p=0.286 is computed by using the cumulative number of confirmed cases and the cumulative number of deaths (as of 29 April) as follows

p=cumulativenumberofdeathscumulativenumberofreportedcases=39313744. (21)

In Figure 10 we plot the cumulative number of D(t) by using the same simulations than in Figure 8 and Figure 9.

Figure 10.

Figure 10

In this figure we plot the data for the cumulative number of death (black dots), and our best fits for D(t) (red curves).

4.2. Model with Age Structure

In order to describe the confinement for the age structured model (12)–(15) we will use for each age class i=1,,10 a different transmission rate having the following form

τi(t)=τi,0tDi,τi(t)=τiexpμitDi,Di<t. (22)

The date Di is the first day of public intervention for the age class i and μi is the intensity of the public intervention for each age class.

In Figure 11 we plot the cumulative number of reported cases as given by our model (12)–(15) (solid lines), compared with reported cases data (black dots). We used the method described in the Appendix A to estimate the parameters τi from the data. In Figure 12 we plot the cumulative number of unreported cases (solid lines) as given by our model with the same parameter values, compared to the existing data of reported cases (black dots).

Figure 11.

Figure 11

We plot a comparison between the model (12)–(15) and the age structured data from Japan by age class. We took 1/ν=1/η=7 days for each age class. Our best fit is obtained for fi which depends linearly on the age class until it reaches 90%, with f1=0.1, f2=0.2, f3=0.3, f4=0.4, f5=0.5, f6=0.6, f7=0.7, f8=0.8, f9=0.9, and f10=0.9. The values we used for the first day of public intervention are Di=13April for the 0–20 years age class i=1,2, Di=11April for the age class going from [20,30[ to [60,70[i=3,4,5,6,7, and Di=16April for the remaining age classes. We fit the data from 30 March to 20 April to derive the value of χ1i and χ2i for each age class. For the intensity of confinement we use the values μ1=μ2=0.4829, μ3=μ4=0.2046, μ5=μ6=0.1474, μ7=0.0744, μ8=0.1736, μ9=μ10=0.1358. By applying the method described in Appendix A, we obtain τ1=0.1630, τ2=0.1224, τ3=0.3028, τ4=0.2250, τ5=0.1520, τ6=0.1754, τ7=0.1289, τ8=0.1091, τ9=0.1211 and τ10=0.1642. The matrix ϕ is the one defined in (18).

Figure 12.

Figure 12

Cumulative number of unreported cases as given by the fit of the model (12)–(15) to Japanese data. The solid curves represent the solution of the model and the black dots correspond to the reported cases data. Parameters are the same as in Figure 11.

In order to understand the role of transmission network between age groups in this epidemic, we plot in Figure 13 the transmission matrices computed at different times. The transmission matrix is the following

C(t)=diagτ1(t),τ2(t),,τ10(t)×ϕ, (23)

where the matrix ϕ describes contacts and is given in (18), and the transmission rates are the ones fitted to the data as in Figure 11

τi(t)=τi0(t)exp(μi(tDi)+).

Figure 13.

Figure 13

Rate of contact between age classes according to the fitted data. For each age class in the y-axis we plot the rate of contacts between one individual of this age class and another individual of the age class indicated on the x-axis. (a) is the rate of contacts before the start of public measures (11 April). (b) is the rate of contacts at the date of effect of the public measures for the last age class (16 April). (c) is the rate of contacts one week later (23 April). (d) is the rate of contacts one month later (16 May). In this figure we use τ1=0.1630, τ2=0.1224, τ3=0.3028, τ4=0.2250, τ5=0.1520, τ6=0.1754, τ7=0.1289, τ8=0.1091, τ9=0.1211 and τ10=0.1642, μ1=μ2=0.4829, μ3=μ4=0.2046, μ5=μ6=0.1474, μ7=0.0744, μ8=0.1736, μ9=μ10=0.1358, and D1=D2=13April, D3=D4=D5=D6=D7=11April, D8=D9=D10=16April.

During the early stages of the epidemic, the transmission seems to be evenly distributed among age classes, with a little bias towards younger age classes (Figure 13a). Younger age classes seem to react more quickly to social distancing policies than older classes, therefore their transmission rate drops rapidly (Figure 13b,c); one month after the start of social distancing measures, the transmission mostly occurs within elderly classes (60–100 years, Figure 13d).

5. Discussion

The recent COVID-19 pandemic has lead many local governments to enforce drastic control measures in an effort to stop its progression. Those control measures were often taken in a state of emergency and without any real visibility concerning the later development of the epidemics, to prevent the collapse of the health systems under the pressure of severe cases. Mathematical models can precisely help see more clearly what could be the future of the pandemic provided that the particularities of the pathogen under consideration are correctly identified. In the case of COVID-19, one of the features of the pathogen which makes it particularly dangerous is the existence of a high contingent of unidentified infectious individuals who spread the disease without notice. This makes non-intensive containment strategies such as quarantine and contact-tracing relatively inefficient but also renders predictions by mathematical models particularly challenging.

Early attempts to reconstruct the epidemics by using SIUR models were performed in Liu et al. [23,26,27,28], who used them to fit the behavior of the epidemics in many countries, by including undetected cases into the mathematical model. Here we extend our modeling effort by adding the time series of deaths into the equation. In Section 4 we present an additional fit of the number of disease-induced deaths coming from symptomatic (reported) individuals (see Figure 10). In order to fit properly the data, we were forced to reduce the length of stay in the R-compartment to 6 days (on average), meaning that death induced by the disease should occur on average faster than recovery. A shorter period between infection and death (compared to remission) has also been observed, for instance, by Verity et al. [7].

The major improvement in this article is to combine our early SIUR model with chronological age. Early results using age structured SIR models were obtained by Kucharski et al. [33] but no unreported individuals were considered and no comparison with age-structured data were performed. Indeed in this article we provide a new method to fit the data and the model. The method extends our previous method for the SIUR model without age (see Appendix A).

The data presented in Section 2 suggests that the chronological age plays a very important role in the expression of the symptoms. The largest part of the reported patients are between 20 and 60 years old (see Figure 1), while the largest part of the deceased are between 60 and 90 years old (see Figure 3). This suggests that the symptoms associated with COVID-19 infection are more severe in elderly patients, which has been reported in the literature several times (see e.g., Lu et al. [12], Zhou et al. [8]). In particular, the probability of being asymptomatic (our parameter f) should in fact depend on the age class.

Indeed, the best match for our model (see Figure 11) was obtained under the assumption that the proportion of symptomatic individual among the infected increases with the age of the patient. This linear dependency of f as a function of age is consistent with the observations of Wu et al. [16] that the severity of the symptoms increase linearly with age. As a consequence, unreported cases are a majority for young age classes (for age classes less than 50 years) and become a minority for older age classes (more than 50 years), see Figure 12. Moreover, our model reveals the fact that the policies used by the government to reduce contacts between individuals have strongly heterogeneous effects depending on the age classes. Plotting the transmission matrix at different times (see Figure 13) shows that younger age classes react more quickly and more efficiently than older classes. This may be due to the fact that the number of contacts in a typical day is higher among younger individuals. As a consequence, we predict that one month after the effective start of public measures, the new transmissions will almost exclusively occur in elderly classes. The observation that younger ages classes play a major roles in the transmission of the disease has been highlighted several times in the literature, see e.g., Davies et al. [17], Cao et al. [11], Kucharski et al. [33] for the COVID-19 epidemic, but also Mossong et al. [34] in a more general context.

We develop a new model for age-structured epidemic and provided a new and efficient method to identify the parameters of this model based on observed data. Our method differs significantly from the existing nonlinear least-squares and statistical inference methods and we believe that it produces high-quality results. Moreover, we only use the initial phase of the epidemic for the identification of the epidemiological parameters, which shows that the model itself is consistent with the observed phenomenon and argues against overfitting. Yet our study could be improved in several direction. We only use reported cases which were confirmed by PCR tests, and therefore the number of tests performed could introduce a bias in the observed data – and therefore our results. We are currently working on an integration of this number of tests in our model. We use a phenomenological model to describe the response of the population in terms of number of contacts to the mitigation measures imposed by the government. This could probably be described more precisely by investigating the mitigation strategies in terms of social network. Nevertheless we believe that our study offers a precise and robust mathematical method which adds to the existing literature.

Acknowledgments

Data from https://covid19japan.com.

Appendix A. Method to Fit of the Age Structured Model to the Data

We first choose two days d1 and d2 between which each cumulative age group grows like an exponential. By fitting the cumulative age classes [0,10[, [10,20[, …and [90,100[ between d1 and d2, for each age class j=1,10 we can find χ1j and χ2j

CRjdata(t)χ1jeχ2jt.

We choose a starting time t0d1 and we fix

χ3j=χ1jeχ2jt0,j=1,,n,

and we obtain

CR1(t)=χ11eχ21tχ31,CRn(t)=χ1neχ2ntχ3ni (A1)

where

χji0,i=1,,n,j=1,2,3.

Figure A1.

Figure A1

We plot an exponential fit for each age classes using the data from Japan.

We assume that

CR1(t)=ν11I1(t),CRn(t)=ν1nIn(t), (A2)

where

ν1i=νfi,andν2i=ν(1fi),i=1,,n.

Therefore we obtain

Ij(t)=Ijeχ2jt (A3)

where

Ij:=χ1jχ2jν1j.

By assuming that the number of susceptible individuals remains constant we have

I1(t)=τ1S1ϕ11I1(t)+U1(t)N1++ϕ1nIn(t)+Un(t)NnνI1(t),In(t)=τnSnϕn1I1(t)+U1(t)N1++ϕnnIn(t)+Un(t)NnνIn(t), (A4)

and

U1(t)=ν21I1(t)ηU1(t),Un(t)=ν2nIn(t)ηUn(t). (A5)

If we assume that the Uj(t) have the following form

Uj(t)=Ujeχ2jt, (A6)

then by substituting in (A5) we obtain

Uj=ν2jIjη+χ2j. (A7)

The cumulative number of unreported cases CUj(t) is computed as

CUj(t)=ν2jIj(t),

and we used the following initial condition:

CUj(0)=CUj*=0ν2jIj*eχ2jsds=ν2jIj*χ2j.

We define the error between the data and the model as follows

ε1(t)=I1(t)τ1S1ϕ11I1(t)+U1(t)N1++ϕ1nIn(t)+Un(t)Nn+νI1(t),εn(t)=In(t)τnSnϕn1I1(t)+U1(t)N1++ϕnnIn(t)+Un(t)Nn+νIn(t), (A8)

or equivalently

ε1(t)=χ21+νI1eχ21tτ1S1ϕ11I1+U1N1eχ21t++ϕ1nIn+UnNneχ2nt,εn(t)=χ2n+νIneχ2ntτnSnϕn1I1+U1N1eχ21t++ϕnnIn+UnNneχ2nt. (A9)

Let the matrix ϕ be fixed. We look for the vector τ=τ1,,τn which minimizes of

minτRnj=1,,nd1d2εj(t)2dt.

Define for each j=1,,n

Kj(t):=χ2j+νIjeχ2jt

and

Hj(t):=Sjϕj1I1+U1N1eχ21t++ϕjnIn+UnNneχ2nt,

so that

εj(t)=Kj(t)τjHj(t).

Hence for each j=1,,n

d1d2εj(t)2dt=d1d2Kj(t)2dt2τjd1d2Kj(t)Hj(t)dt+τj2d1d2Hj(t)2dt,

and by setting

0=τjd1d2εj(t)2dt=2d1d2Kj(t)Hj(t)dt+2τjd1d2Hj(t)2dt

we deduce that

τj=d1d2Kj(t)Hj(t)dtd1d2Hj(t)2dt. (A10)

Remark A1.

It does not seem possible to estimate the matrix of contact ϕ by using similar optimization method. Indeed, if we look for a matrix ϕ=ϕij which minimizes

minϕMnRj=1,,nd1d2εj(t)2dt,

it turns out that

j=1,,nd1d2εj(t)2dt=0

whenever ϕ is diagonal. Therefore the optimum is reached for any diagonal matrix. Moreover by using similar considerations, if several χj2 are equal, we can find a multiplicity of optima (possibly with ϕ not diagonal). This means that trying to optimize by using the matrix ϕ does not yield significant and reliable information.

In the Figure A2 below, we present an example of application of our method to fit the Japanese data. We use the period going from 20 March to 15 April.

Figure A2.

Figure A2

We plot a comparison between the model (12)–(15) (without public intervention) and the age structured data from Japan. We set 1/ν=1/η=7 days, fi which actually depends on the age class, with f1=0.1, f2=0.2, f3=0.4, f4=0.4, f5=0.6, f6=0.6, f7=0.8, f8=0.8, f9=0.8, and f10=0.9. and we obtain τ1=0.1264, τ2=0.1655, τ3=0.3538, τ4=0.2966, τ5=0.1513, τ6=0.1684, τ7=0.1251, τ8=0.1168, τ9=0.1015, τ10=0.1258. The matrix ϕ is the one defined in (18).

Appendix B. Construction of the Contact Matrix

The survey [21] presents reconstructed contact matrices for a number of countries including Japan for the 5-year age classes [0,5), [5,10), ..., [75,80) at various locations (work, school, home, and other locations) and a compilation of those contact matrices to account for all locations. The precise description of the compilation is presented in the paper. Note that this paper is a follow-up of Mossong et al. [34] where the survey procedure is described (including the data collection protocol) for several European countries participating in the POLYMOD study.

The data is publicly available online (Prem et al. [21], Supporting dataset, DOI: https://doi.org/10.1371/journal.pcbi.1005697.s002) and is presented in the form of a zipped collection of spreadsheets, containing the data for several countries in columns X1 X2 ... X16. The columns stand for the average number of contact of one individual of the corresponding age class (0–5 years for X1, 5–10 years for X2, etc...), with an individual of the age class indicated by the row (first row is 0–5 years, second is 5–10 years etc...). Since the age span covered by the study stops at 80, we had to infer the number of contacts for people over the age of 80. We postulated that most people aged 80 or more are retired and that their behaviour does not significantly differs from the behavior of people in the age class [75,80). Therefore we completed the missing columns by copying the last available information and shifting it to the bottom. We repeated the procedure for lines. We believe that the introduced bias is kept to a minimum since the numerical values are relatively low compared to the diagonal.

Because we use 10-year ages classes and the data is given in 5-year age classes, we had to combine adjacent columns to recover the average number of contacts. To combine columns together, we used the weighted average

Ci=N2(i1)+1C2(i1)+1+N2(i1)+2C2(i1)+2N2(i1)+1+N2(i1)+2,

where the column Ci corresponds to the average number of contacts of an individual taken at random in the [10(i1),10i) and Ci is the average number of contacts of an individual taken at random in the age class [5(i1),5i). To combine two lines, we simply use the sum of the data

Li=L2(i1)+1+L2(i1)+2.

The matrix γ in (17) is the transpose of the array obtained by the former procedure applied to the “all locations” dataset. Then ϕ is obtained by scaling the lines of γ to 1, i.e.,

ϕij=γijk=110γik.

Author Contributions

P.M. and O.S. designed the original study; Q.G., P.M. and O.S. participated in the adaptation to the age-structured data; Q.G. and P.M. wrote the computer code; all authors actively contributed to the initial version and revisions of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

Q.G. and P.M. acknowledge the support of ANR flash COVID-19 MPCUII.

Conflicts of Interest

The authors declare no conflict of interest.

References

  • 1.WHO Timeline—COVID-19. [(accessed on 21 May 2020)]; Available online: https://www.who.int/news-room/detail/27-04-2020-who-timeline---covid-19.
  • 2.World Health Organization Pneumonia of Unknown Cause—China. [(accessed on 21 May 2020)];Disease Outbreak News. 2020 Jan 5; Available online: https://www.who.int/csr/don/05-january-2020-pneumonia-of-unkown-cause-china/en/
  • 3.Guan W.J., Ni Z.Y., Hu Y., Liang W.H., Ou C.Q., He J.X., Liu L., Shan H., Lei C.L., Hui D.S.C., et al. Clinical Characteristics of Coronavirus Disease 2019 in China. N. Engl. J. Med. 2020;382:1708–1720. doi: 10.1056/NEJMoa2002032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wei W.E., Li Z., Chiew C.J., Yong S.E., Toh M.P., Lee V.J. Presymptomatic Transmission of SARS-CoV-2—Singapore, January 23–March 16, 2020. Morb. Mortal. Wkly. Rep. 2020;69:411. doi: 10.15585/mmwr.mm6914e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Rothe C., Schunk M., Sothmann P., Bretzel G., Froeschl G., Wallrauch C., Zimmer T., Thiel V., Janke C., Guggemos W., et al. Transmission of 2019-nCoV infection from an asymptomatic contact in Germany. N. Engl. J. Med. 2020;382:970–971. doi: 10.1056/NEJMc2001468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Zou L., Ruan F., Huang M., Liang L., Huang H., Hong Z., Yu J., Kang M., Song Y., Xia J., et al. SARS-CoV-2 viral load in upper respiratory specimens of infected patients. N. Engl. J. Med. 2020;382:1177–1179. doi: 10.1056/NEJMc2001737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Verity R., Okell L.C., Dorigatti I., Winskill P., Whittaker C., Imai N., Cuomo-Dannenburg G., Thompson H., Walker P.G.T., Fu H., et al. Estimates of the severity of coronavirus disease 2019: A model-based analysis. Lancet Infect. Dis. 2020 doi: 10.1016/S1473-3099(20)30243-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zhou F., Yu T., Du R., Fan G., Liu Y., Liu Z., Xiang J., Wang Y., Song B., Gu X., et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: A retrospective cohort study. Lancet. 2020;395:1054–1062. doi: 10.1016/S0140-6736(20)30566-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.World Health Organization Report of the WHO-China Joint Mission on Coronavirus Disease 2019 (COVID-19) [(accessed on 6 May 2020)];2020 Available online: https://www.who.int/publications-detail/report-of-the-who-china-joint-mission-on-coronavirus-disease-2019-(covid-19)
  • 10.World Health Organization Coronavirus Disease 2019 (COVID-19): Situation Report, 104. [(accessed on 21 May 2020)];2020 Available online: https://apps.who.int/iris/handle/10665/332058.
  • 11.Cao Q., Chen Y.C., Chen C.L., Chiu C.H. SARS-CoV-2 infection in children: Transmission dynamics and clinical characteristics. J. Formos. Med. Assoc. 2020;119:670–673. doi: 10.1016/j.jfma.2020.02.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lu X., Zhang L., Du H., Zhang J., Li Y.Y., Qu J., Zhang W., Wang Y., Bao S., Li Y., et al. SARS-CoV-2 Infection in Children. N. Engl. J. Med. 2020;382:1663–1665. doi: 10.1056/NEJMc2005073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Prem K., Liu Y., Russell T.W., Kucharski A.J., Eggo R.M., Davies N., Flasche S., Clifford S., Pearson C.A.B., Munday J.D., et al. The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: A modelling study. Lancet Public Health. 2020;5 doi: 10.1016/S2468-2667(20)30073-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Singh R., Adhikari R. Age-structured impact of social distancing on the COVID-19 epidemic in India. arXiv. 20202003.12055 [Google Scholar]
  • 15.To K.K.W., Tsang O.T.Y., Leung W.S., Tam A.R., Wu T.C., Lung D.C., Yip C.C.-Y., Cai J.-P., Chan J.M.-C., Chik T.S.-H., et al. Temporal profiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infection by SARS-CoV-2: An observational cohort study. Lancet Infect. Dis. 2020;20:565–574. doi: 10.1016/S1473-3099(20)30196-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wu J.T., Leung K., Bushman M., Kishore N., Niehus R., de Salazar P.M., Cowling B.J., Lipsitch M., Leung G.M. Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China. Nat. Med. 2020;26:506–510. doi: 10.1038/s41591-020-0822-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Davies N.G., Klepac P., Liu Y., Prem K., Jit M., Eggo R.M., CMMID COVID-19 Working Group Age-dependent effects in the transmission and control of COVID-19 epidemics. MedRxiv. 2020 doi: 10.1038/s41591-020-0962-9. [DOI] [PubMed] [Google Scholar]
  • 18.Jones T.C., Mühlemann B., Veith T., Biele G., Zuchowski M., Hoffmann J., Stein A., Edelmann A., Corman V.M., Drosten C. An analysis of SARS-CoV-2 viral load by patient age. medRxiv. 2020 doi: 10.1101/2020.06.08.20125484. [DOI] [Google Scholar]
  • 19.Ayoub H.H., Chemaitelly H., Seedat S., Mumtaz G.R., Makhoul M., Abu-Raddad L.J. Age could be driving variable SARS-CoV-2 epidemic trajectories worldwide. medRxiv. 2020 doi: 10.1101/2020.04.13.20059253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Chikina M., Pegden W. Modeling strict age-targeted mitigation strategies for COVID-19. arXiv. 2020 doi: 10.1371/journal.pone.0236237.2004.04144 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Prem K., Cook A.R., Jit M. Projecting social contact matrices in 152 countries using contact surveys and demographic data. PLoS Comput. Biol. 2017;13:e1005697. doi: 10.1371/journal.pcbi.1005697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ayoub H.H., Chemaitelly H., Mumtaz G.R., Seedat S., Awad S.F., Makhoul M., Abu-Raddad L.J. Characterizing key attributes of the epidemiology of COVID-19 in China: Model-based estimations. medRxiv. 2020 doi: 10.1101/2020.04.08.20058214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Liu Z., Magal P., Seydi O., Webb G. Understanding unreported cases in the 2019-nCov epidemic outbreak in Wuhan, China, and the importance of major public health interventions. Biology. 2020;9:50. doi: 10.3390/biology9030050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Portal Site of Official Statistics of Japan Website Reference Table for the Year 2019: Computation of Population by Age (Single Years) and Sex—Total Population, Japanese Population. [(accessed on 6 May 2020)];2020 Available online: http://www.stat.go.jp/english/data/jinsui/index.htm.
  • 25.Griette Q., Liu Z., Magal P. Estimating the last day for COVID-19 outbreak in mainland China. medRxiv. 2020 doi: 10.1101/2020.04.14.20064824. [DOI] [Google Scholar]
  • 26.Liu Z., Magal P., Seydi O., Webb G. Predicting the cumulative number of cases for the COVID-19 epidemic in China from early data. Math. Biosci. Eng. 2020;17:3040–3051. doi: 10.3934/mbe.2020172. [DOI] [PubMed] [Google Scholar]
  • 27.Liu Z., Magal P., Seydi O., Webb G. A COVID-19 epidemic model with latency period. Infect. Dis. Model. 2020 doi: 10.1016/j.idm.2020.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Liu Z., Magal P., Seydi O., Webb G. A model to predict COVID-19 epidemics with applications to South Korea, Italy, and Spain. SIAM News. 2020;53:4. [Google Scholar]
  • 29.Liu Z., Magal P., Webb G. Predicting the number of reported and unreported cases for the COVID-19 epidemic in China, South Korea, Italy, France, Germany and United Kingdom. medRxiv. 2020 doi: 10.1101/2020.04.09.20058974. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Diekmann O., Heesterbeek J.A.P., Metz J.A. On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations. J. Math. Biol. 1990;28:365–382. doi: 10.1007/BF00178324. [DOI] [PubMed] [Google Scholar]
  • 31.van den Driessche P., Watmough J. Reproduction numbers and subthreshold endemic equilibria for compartmental models of disease transmission. Math. Biosci. 2002;180:29–48. doi: 10.1016/S0025-5564(02)00108-6. [DOI] [PubMed] [Google Scholar]
  • 32.Munasinghe L., Asai Y., Nishiura H. Quantifying heterogeneous contact patterns in Japan: A social contact survey. Theor. Biol. Med. Model. 2019;16:6. doi: 10.1186/s12976-019-0102-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Kucharski A.J., Russell T.W., Diamond C., Liu Y., Edmunds, J., Funk S., Eggo R.M., Sun F., Jit M., Munday, J.D., et al. Early dynamics of transmission and control of COVID-19: A mathematical modelling study. Lancet Infect. Dis. 2020;20:553–558. doi: 10.1016/S1473-3099(20)30144-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Mossong J., Hens N., Jit M., Beutels P., Auranen K., Mikolajczyk R., Massari M., Salmaso S., Tomba G.S., Wallinga J., et al. Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Med. 2008;5:e74. doi: 10.1371/journal.pmed.0050074. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Biology are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES