Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Jul 6;15(7):e0234763. doi: 10.1371/journal.pone.0234763

Prediction of COVID-19 spreading profiles in South Korea, Italy and Iran by data-driven coding

Choujun Zhan 1, Chi K Tse 2,¤,*, Zhikang Lai 3, Tianyong Hao 1, Jingjing Su 4
Editor: Delia Goletti5
PMCID: PMC7337285  PMID: 32628673

Abstract

This work applies a data-driven coding method for prediction of the COVID-19 spreading profile in any given population that shows an initial phase of epidemic progression. Based on the historical data collected for COVID-19 spreading in 367 cities in China and the set of parameters of the augmented Susceptible-Exposed-Infected-Removed (SEIR) model obtained for each city, a set of profile codes representing a variety of transmission mechanisms and contact topologies is formed. By comparing the data of an early outbreak of a given population with the complete set of historical profiles, the best fit profiles are selected and the corresponding sets of profile codes are used for prediction of the future progression of the epidemic in that population. Application of the method to the data collected for South Korea, Italy and Iran shows that peaks of infection cases are expected to occur before mid April, the end of March and the end of May 2020, and that the percentage of population infected in each city or region will be less than 0.01%, 0.5% and 0.5%, for South Korea, Italy and Iran, respectively.

1 Introduction

The 2019 Coronavirus Disease (COVID-19) is a highly contagious disease caused by infection of the SARS-CoV-2 virus. The disease began to spread in China in mid December 2019 [1], and as the volume of intercity travel escalated around the Lunar New Year period, the number of infected individuals began to soar in mid January 2020 [2, 3] With no travel restriction in place due to the low level of vigilance or unawareness of the disease during the early phase of the outbreak, the spreading of the disease had gone almost unobstructed. Travel restriction began to be implemented throughout China since January 24, 2020, which has proven to be effective in curbing the spread of the virus. However, international traffic has not ceased and infectious individuals (who may or may not show any symptom at the time of travel) have actually travelled to different countries with the virus they contracted [4, 5]. Recent studies have also showed that travel restriction did contribute to the control of the spreading of COVID-19 within China as well as in a global context [4, 6, 7]. By May 17, 2020, China had confirmed a total of 82,954 cases of COVID-19 infection, with death toll reaching 4,634. While China has begun to see declining numbers of infected cases in most cities from late February, other countries started to report surging number of cases in some cities or regions. As of May 17, 2020, the cumulative number of cases of COVID-19 infection was 4,804,011 worldwide, with South Korea, Italy and Iran reporting surges of infected cases within two weeks in late February and early March. Moreover, the global mortality rate has increased from around 3% (March 8, 2020) to 6.84% (May 14, 2020).

In our recent work (available on February 19, 2020) [3], intercity travel data obtained from Baidu Migration [8] has been collected and integrated into the traditional Susceptible-Exposed-Infected-Removed (SEIR) model [911] to account for the effects of inflow and outflow traffic between 367 cities in China. Parameters of susceptible-to-exposed infection rates, exposed-to-infected infection rates, and recovery rates for 367 cities have been identified by fitting the augmented SEIR model with historical data available from the National Health Commission of China. The predicted spreading profiles of the 367 Chinese cities (which were available on February 19, 2020 [3]) have been highly consistent with the actual profiles, including the times of infection peaks and the percentages of infected individuals in the 367 cities.

In this work, we build on the data and estimated parameters obtained for 367 cities in our prior work [3], and establish a library of profiles (sets of codes) of different spreading profiles. Suppose a new outbreak has occurred in a given population. The numbers of infected and recovered cases during the early phase of the spreading form an initial profile. This initial or incomplete profile is compared with the historical full profiles obtained previously. Then, by selecting the best fit historical profiles, we identify the candidates parameters for prediction of the future spreading profile in that given population (of a city or region). It should be emphasized that the set of historical profiles obtained previously covers various possible spreading dynamics, representing a variety of contact topologies and transmission mechanisms including the travel effects that have been integrated in the model used to capture the transmission dynamics of the 367 cities. Thus, a new outbreak in a given population would likely follow one or a combination of the profiles based on the augmented model proposed in our previous work [3], and hence can be reconstructed from the historical sets of profiles. In this work, we develop a procedure for implementing the selection of historical profiles, identification of best-fit parameters and construction of future profile. We have applied the procedure to predict the epidemic progression in the cities of South Korea, Italy and Iran. Results of this study showed that the first wave of epidemic progression in most cities in South Korea peaked between early March and early April, 2020, and in Italy between late April and mid May, 2020, while Iran would have its peak in late April, 2020. We have also investigated the number of infected individuals in each city or region. Our method provides the average number of individuals eventually infected, along with a predicted deviation range at 95% confidence level. For Korea, we predicted that Daegu and Gyeongsang North Road would have around 7,619 and 1,287 people eventually infected (i.e., 0.306% and 0.062% of the city’s population), respectively, whereas the number of infected individuals in other cities in South Korea would be fewer than 300, i.e., less than 0.01% of city population. For Italy, we predicted that Lombardy and Emilia-Romagna would eventually have about 90,000 and 30,000 infected cases (i.e., 0.802% and 0.604% of the region’s population), respectively, and the number of people eventually infected in other cities in Italy would be below 10,000. Moreover, Iran would have more than 9,000 and 6,000 confirmed cases in Tehran and Isfahan (i.e., 0.39% and 0.2% of the city’s population). In addition, the number of people infected in most other cities would be larger than 1,000 (>0.1% of the city’s population). From the progression trends of the epidemic in these three countries, provided control measures continue to be in place, our model show that the number of people infected in most regions of these three countries will peak before the end of May 2020. Hence, the first wave of the spreading of the disease is expected to come under control before the end of May 2020.

In the remainder of the paper, we first introduce the official daily infection data used in this study. The augmented SEIR model is briefly reviewed, mainly to introduce the parameters of the model used for prediction of spreading profiles. The key procedure for matching historical profiles and prediction of future spreading profiles will be explained. Results of application of the proposed method to prediction of the peaks and extents of outbreaks in South Korea, Italy and Iran will be given. Finally, we will provide a discussion of our estimation of the propagation and the reasonableness of our estimation in view of the measures taken by the authorities in controlling the spreading of this new disease.

2 Data

The World Health Organization currently sets the alert level of COVID-19 to the highest, and has made data related to the epidemic available to the public in a series of situation reports as well as other formats [12]. Our data include the number of infected cases, the cumulative number of infected cases, the number of recovered cases, and death tolls, for individual cities and regions in South Korea and Italy, from February 19, 2020, to May 12, 2020, and in Iran from February 19, 2020, to March 22, 2020, Data organized in convenient formats are also available elsewhere [1315]. Samples of data for Daegu, Gyeongsang North Road (South Korea), Lombardy, Amelia Romagna, Tehran and Mazandaran are shown graphically in Fig 1. It should be noted that the data obtained for South Korea, Italy and Iran correspond to initial stages of the epidemic progression as the number of infected cases are still climbing, as of March 6, 2020.

Fig 1. Samples of data.

Fig 1

At present, there are two types of tests for confirming COVID-19 infected cases. One type of tests aims to confirm the presence of the SARS-Cov-2 virus in the body of an individual, which is commonly done via detecting the viral RNA through a polymerase chain reaction (PCR) [16]. The other type establishes the presence of antibodies in an individual, i.e., whether the individual being tested has been infected in the past, regardless of him or her carrying the virus at the time of testing. In this work, the official number of infected cases corresponds only to individuals who have been tested positive for the presence of the SARS-Cov-2 virus.

Method

The augmented SEIR model

The travel-data augmented SEIR model [3] describes the spreading dynamics in terms of a basic fourth-order dynamical system with consideration of intercity travel in China. Consider a city j of population Pj. The states of the model are the number of susceptible individuals Ij(t), the number of exposed individuals (infectious but without symptom) Ej(t), the number of infected individuals Ij(t), and the number of recovered or removed individuals Rj(t). The model takes the following form in discrete time [3, 17]:

Xj(t+1)=FaSEIR(Xj(t),Mj,μj) (1)

where Xj(t) = [Sj(t)Ej(t)Ij(t)Rj(t)]T is the state vector on day t, FaSEIR(.) is the travel-data augmented function, Mj is the set of inflow and outflow travel strengths for city j, and μj is the set of parameters for city j, i.e.,

μj=[αj,βj,κj,γj,δj,kl] (2)

where βj is the rate at which a susceptible individual is infected by an infected individual in city j, αj is the rate at which a susceptible individual is infected by an exposed individuals in city j, κj is the rate at which an exposed individual becomes infected in city j, and γj is the recovery rate in city j, kI is the possibility of an infected individual moving from one city to another, and δj is the eventual percentage of the population infected in city j. Moreover, the eventual infected population in city j is given by Njs=δjPj.

To facilitate comparison and matching of profiles, we introduce the normalized states as I¯j(t)Δ=Ij(t)/Njs(t), E¯j(t)Δ=Ej(t)/Njs(t), S¯j(t)Δ=Sj(t)/Njs(t), and R¯j(t)Δ=Rj(t)/Njs(t). Thus, (1) can be represented in normalized form as

X¯j(t+1)=FaSEIR(X¯j(t),Mj,μj) (3)

where 0|X¯j(t)|1. Since the above model has taken into account the human migration effect as well as the necessary transmission mechanism, we may consider the basic set of parameters to represent the characteristics of the propagation profile of city j. The complete set of parameters have been identified for 367 cities in China [3], which will serve as a set of codes for various propagation profiles of COVID-19 so far obtained. For brevity, we do not repeat the results here.

While two different cities may have different population size and percentage of eventual infected population, the rates of infection and recovery should be similar across a group of cities, i.e., μiμj. Thus, in normalized form, we have

X¯i(t)-X¯j(t)<ϵforsomeϵ>0, (4)

for cities i and j within a group of cities having similar parameter sets. This also means

NjsXi(t)NisXj(t)orδjPjXi(t)δiPiXj(t) (5)

for the group of cities having similar rates of infection and recovery. Thus, provided the historical archive has adequately covered the possible dynamical profiles, we are able to perform fast prediction for any city o, by fitting an incomplete set of data (corresponding to an early outbreak stage in city o) and using the model parameters already obtained previously, as detailed in the following subsection.

Prediction method

The proposed data-driven prediction algorithm is based on the set of historical data of the spreading profiles of COVID-19 in 367 cities in China, namely, 367 sets of normalized time series of the form:

I¯i(c)={I¯i(1),I¯i(2),I¯i(Ki)}R¯i(c)={R¯i(1),R¯i(2),R¯i(Ki)} (6)

where i = 1, 2, …, 367, and Ki is the length of the data recorded in city i. Superscript “(c)” denotes data of Chinese cities.

Now, suppose an outbreak occurs in city o, and only ko days of data have been obtained in normalized form as

I¯(o)={I¯o(1),I¯o(2),I¯o(ko)}R¯(o)={R¯o(1),R¯o(2),R¯o(ko)} (7)

where ko < Ki. Then, assuming the spreading profile of city o is related to that of city i in the historical archive, as permitted by virtue of the validity of (4), we formulate the following optimization problem to predict the epidemic progression in city o:

P0:mini(1,N)fis.t.(i)fi=[wI(I¯o(j)-I¯i(c)(j))2+wR(R¯o(j)-R¯i(c)(j))2](ii)μLμμU(iii)wI,wR>0. (8)

where N is the number of Chinese cities in the historical archive, wI and wJ are weighting coefficients, μL and μU are the lower and upper bounds of the searching space, respectively. By solving the the nonlinear optimization problem, we can find the most closely resembling growth curve from the historical profiles, e.g., city i. Then, we apply the the augmented SEIR model with the profile code given in the parameter set for city i to predict the future spreading trend of city o. Furthermore, we can choose the top n best candidates with the smallest error as the candidate set for prediction, giving an average predicted propagation profile and a deviation range based on n best-fit profile codes.

4 Results

We apply the aforedescribed procedure to the data obtained so far for cities or regions in South Korea, Italy and Iran, as listed in Table 1. For each city or region, we identify a group of 10 profiles of best fit from the historical archive, and retrieve the corresponding sets of profile codes for generating the propagation profiles in the coming days. Using these 10 profiles, we produce an average progression profile, which is also accompanied by a deviation range at 95% confidence level. For Iran, we have only collected data for cities and provinces up to March 22, 2020, as data after this data was no longer available for individual cities and provinces.

Table 1. Populations of cities, regions or provinces in South Korea, Italy, and Iran.

City/Region/Province Population City/Region/Province Population
South Korea: Italy:
Daegu 2,487,823 Lombardy 10,078,012
Seoul 10,018,537 Venetia 4,905,854
Gwangju 1,472,802 Emilia-Romagna 4,459,477
Busan 3,513,361 Piedmont 4,356,406
Gyeongsangbuk-do 2,071,424 Lazio 5,879,082
Gyeongsangnam-do 2,870,401 Tuscany 3,729,641
Chungcheongbuk-do 1,191,341 Sicily 5,029,675
Chungcheongnam-do 606,019 Trento 539,898
Jeollanam-do 1,055,957 Liguria 1,565,349
Jeollabuk-do 652,858 Marche 1,532,000
Gangwon-do 1,135,134 Campania 5,827,000
Incheon 2,927,295 Abruzzo 1,315,000
Jejudo 666,686 Apulia 4,048,000
Gyeonggi-do 12,476,073 Umbria 884,600
Daejeon 1,518,024 Molise 330,000
Ulsan 1,173,568 Basilicata 595,727
Sejong 314,126 Friuli Venezia Giulia 1,216,000
Sardinia 1,648,000
Iran (Provinces): Calabria 1,957,000
Tehran 13,267,637 Aosta Valley 126,200
Mazandaran 3,283,582 Bolzano 520,900
Bushehr 2,712,000
Golestan 1,868,819
Semnan 702,360
Isfahan 5,120,850
Fars 4,851,000
Hormozgan 1,776,000
Bushehr 1,163,400
Gilan 2,530,696
Ardabil 1,270,420
Kurdistan 1,603,000
Markazi 1,429,000
Khuzestan 4,711,000
Lorestan 1,754,000
Razavi Khorasan 5,994,000
Sistan and Baluchestan 2,775,000
East Azerbaijan 3,725,000
West Azerbaijan 3,081,000
Kerman 3,164,718
Qom 1,292,283

Figs 2, 3 and 4 show the data and the predicted number of infected individuals, each with a deviation range of the predicted average trajectory, for South Korea, Italy and Iran, respectively, and Figs 5, 6 and 7 show the corresponding cumulative values. Statistics of infection peaks are shown in Fig 8. Statistics of the percentage of population eventually infected and the number of individuals eventually infected are shown in Fig 9. Our key findings are summarized as follows:

Fig 2. Official and estimated number of infected individuals in some cities or regions in South Korea.

Fig 2

Fig 3. Official and estimated number of infected individuals in some provinces in Iran.

Fig 3

Fig 4. Official and estimated number of infected individuals in some regions in Italy.

Fig 4

Fig 5. Official and estimated cumulative number of infected individuals in some regions in Italy.

Fig 5

Fig 6. Official and estimated cumulative number of infected individuals in some cities or regions in South Korea.

Fig 6

Fig 7. Official and estimated cumulative number of infected individuals in some provinces in Iran.

Fig 7

Fig 8. Statistics of peak times in (a) South Korea; (b) Italy; (c) Iran.

Fig 8

Fig 9.

Fig 9

Statistics of proportion of eventual infected population in (a) South Korea; (b) Italy; (c) Iran. Statistics of eventual infected population in (d) South Korea; (e) Italy; (f) Iran.

  1. The number of active infected individuals in South Korea, Italy and Iran is expected to continue to increase until the end of May 2020. While South Korea saw peaks in most of the cities between March 8, 2020 and April 9, 2020, most cities in Italy saw peaks between April 15 and mid May, while most provinces in Iran would saw peaks before the end of May 2020, as shown in the distributions of the peak times given in Fig 9(a) to 9(c).

  2. For South Korea, our results show that Daegu (with a population of 2,487,823) and Seoul (with a population of 10,018,537) are the two hardest hit cities, with 7,619 ±2,096 and 1,287 ±197 people eventually infected, accounting for 0.306% ±0.084% and 0.062% ±0.009% of the city’s population. In other cities in South Korea, the number of infected people will be fewer than 300, i.e., below 0.01% of the population.

  3. For Italy, our results show that Lombardy (with a population of 10,078,012) and Emilia-Romagna (with a population of 4,459,477) have the highest number of cases, with about 90,000 and 30,000 people eventually infected, accounting for 0.802% and 0.604% of the region’s population. In other Italian cities or regions, the number of infected people will be below 10,000.

  4. For Iran, we expect Tehran (with a population of 13,267,637) and Isfahan Province (1,292,283) to be most severely affected, reaching more than 9,000 (0.067%) and 1,695 ±92 (0.13%) cases eventually, respectively. Other Iranian cities will see more than 1,000 eventual infected cases. Our prediction shows that about 0.5% of the country’s population have been infected until May 14, 2020.

  5. Provided the authorities continue to impose strict control measures, our model shows that most regions (cities and provinces) of these three countries would see peaks of infection cases before the end of May 2020. Hence, the first wave of epidemic will come under control by June 2020 for these three countries.

Our prediction on the South Korean cities has revealed a very rapid progression of the epidemic, with 5,000 infections emerged within 10 days and peaks to be expected in most cities or regions within about 2 weeks. The Korean authorities have managed to test an overwhelmingly large number of people (140,000 until March 5, 2020) within a short time, thus preventing a large number of infected and infectious individuals not being quarantined in time [18]. This strategy has an obvious advantage of offering a clear picture of the extents and locations of the infected individuals in the country at the early phase of the epidemic progression. The epidemic progression is found to be more rapid than typical, reflecting on the effectiveness of the control measures being taken.

Italy has the second highest death toll after China, reaching 197 on March 6, 2020 [19]. The fatality rate is about 4%, which is the highest in the world. With infection cases soaring to 3,916 (on March 6, 2020), Italy had implemented control measures to contain the spread of the virus by shutting down schools and suspending public events in regions where outbreaks were reported. The epidemic has been expected to progress in a typical pace (with the present set of parameters), unless more stringent measures are in place.

The situation in Iran is also critical, with the number of infected cases escalated to over 4,000 in less than 2 weeks. Iran has reported death of two lawmakers as of March 7, 2020, and has been struggling to control the contagion, which has spread to 31 provinces [20]. The progression profile is again typical, however, expecting to peak in around 3 weeks. Like Italy, most cities show typical spreading profiles, and the peaks and subsequent decline are not expected to advance sooner unless more stringent measures are implemented to control the contagion.

Finally, depending on the effectiveness of treatment, recovery rates vary, and judging from the predicted trends shown in Figs 2, 3 and 4, the epidemic progressions for the three countries are expected to subside by the end of May, with South Korea expected to recover sooner than the others.

5 Conclusion

The spreading of the 2019 New Coronavirus Disease (COVID-19) has evolved to a global contagion, which has spread to 87 countries within two months and more 190 countries until May 14, 2020. The numbers of confirmed infection cases in South Korea, Italy and Iran have surged in late February and early Match, and continued to progress in the last two months, reaching 10,926, 222,104 and 112,725, respectively, on May 14, 2020. The global fatality rate, however, has increased from below 3% on March 8, 2020 to more than 6% on May 14, 2020. In this study, we build on the results of our previous work [3] that have established a library of parameters of an augmented SEIR model, corresponding to the historic spreading profiles of 367 cities in China. This library forms a set of profile codes that cover a variety of possible epidemic progression profiles. By comparing the early incomplete data of epidemic progression collected for a specific population with the historic profiles, we select a few candidate profiles from the historic archive using a nonlinear optimization procedure. The corresponding profile codes of the selected historic progression profiles can then be used to produce estimates of the future progression for that specific population. We apply this method to predict the spreading profiles for South Korea, Italy and Iran, and specifically, to provide a method for estimating the proportion of infected population in these countries. Results have shown that the three countries would see infection peaks in most cities or provinces before the end of May 2020, with South Korea’s cases reaching their peaks much earlier than the others. The percentage of population eventually infected will be less than 0.3%, 0.5% and 0.5% for South Korea, Italy and Iran, respectively. The epidemic is expected to come under control before June 2020 in these countries, and depending on the effectiveness of treatment, particular cities may see full recovery or zero infection sooner or later than others. It is worth noting that the epidemic progression in South Korean cities are found to be more rapid than typical, implying that the authorities might have taken effective measures to control the spread. The predicted progressions for Italy and Iran, on the other hand, are found to display profiles that are typical of those in the historical archive, and unless more stringent measures are taken, the peaks and subsequent decline of the infection numbers will unlikely come sooner or more rapidly than the predicted trajectories.

Finally, we should stress that the proposed data-driven coding method is applicable to predicting epidemic progression in any given population and the accuracy of prediction will depend on the adequacy of the available data in allowing a reliable match to be identified from the historical archive. In this study, the profile predicted in early March turned out to be consistent with the actual data collected up to mid May for the three countries. However, due to limited coverage of the data collected, the data-driven model may not perform satisfactorily if it is applied to a new epidemic which has a significantly different set of spreading characteristics (i.e., model parameters being significantly different from those collected in the historical database) or in a population or country having a significantly different contact topology, social behavior, travel patterns, effectiveness of control as well as climate. In fact, deviation was observed in the Hong Kong data in late March when compared with the profile predicted earlier in February, and the deviation was found to be a result of an unexpected surge in inbound travelers which were mostly overseas students returning from the UK and USA. Thus, had superspreader events or other unexpected events occurred, the spreading profile predicted by the data-driven model might deviate from the actual pattern. Furthermore, the model does not consider the effects of different control strategies. Continuous effort will be made to enhance the database so as to widen the coverage of possible progression profiles as well as to incorporate the effects of different control measures on the epidemic progression profiles.

Supporting information

S1 File

(ZIP)

Data Availability

The data underlying the results presented in the study are available from https://www.who.int/emergencies/diseases/novel-coronavirus-2019.

Funding Statement

CZ was supported by National Science Foundation of China Project 61703355 (http://www.nsfc.gov.cn/) and [Science and Technology Program of Guangdong 201904010224.] CKT was supported by City University of Hong Kong under Special Fund 9380114 (https://www.cityu.edu.hk/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

Decision Letter 0

Delia Goletti

23 Apr 2020

PONE-D-20-06792

Prediction of COVID-19 Spreading Profiles in South Korea, Italy and Iran by Data-Driven Coding

PLOS ONE

Dear prof Chi K. TSE, the manuscript is very interesting.

Please, revise according to the referess points:

1. present more clear the results

2. update the results with today data

3. give a prospetive of your findings

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

We would appreciate receiving your revised manuscript by Jun 07 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Delia Goletti, M.D., Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: N/A

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In the submitted manuscript, the authors provide a model of prediction of SARS-CoV-2 spreading in three main Countries: South Korea, Italy, and Iran. Their model is based on a previous publication by the same authors and on data collected in 367 Chinese cities. The data and the conclusion are clearly presented and the manuscript is easy to read, but some points have to be clarified and better discussed, especially in the light of updated observed data of viral spreading in the regions under investigation.

One point that can be better explained, to make it easier for the large audience of Plos One to understand the hypothesis of the authors, are the variables taken into account for the construction of the model. Indeed some social/political/climatic variables could differently affect viral spreading in different countries and/or cities. The effect of such factors should be at least discussed in the conclusions.

Another important point to discuss is the difference between the predictions presented and the actual situation (see for example line 200-205). We do understand that the authors performed the analysis having the data till the 6th of March but they now have access to the observed cases in the different countries and regions and some of the observations importantly deviate from the model of spread here presented. The information on observed cases should be updated and discussed proposing criteria for the adjustment of the presented model.

In paragraph line 212-215, why do the authors conclude that the epidemic will end before June 2020, on the basis of which data?

Minor points:

To speak about SARS-CoV-2 spreading would be more appropriated than COVID-19 spreading.

To indicate that for infected individuals the authors mean the people tested positive would also be appropriated, especially when presenting the data as proportion of population infected.

Indeed, only seroprevalence studies may actually estimate the proportion of population that underwent infection.

The names of Italian administrative regions should be corrected in text and figures:

Lombardy, Emilia Romagna, Apulia, Basilicata.

Reviewer #2: The manuscript describes “Prediction of COVID-19 Spreading Profiles in South Korea, Italy and Iran by Data-Driven Coding.” The title is interesting, however, the manuscript is not well organized and understandable because of the lack of a good explanation of the results. The population of each city, province, and region (with a reference) should be mentioned in the results. For example, lines 194-197, the population of Daegu and Seoul should be mentioned. If Seoul’s population is x million with x people eventually infected, it is x% of the city’s population. In addition, it seems that the mentioned places of Iran are Provinces (not cities) and the population of whole province should be considered, however, they should be spelled also correctly (e.g. Fig.6, f, h, i, k, l). Moreover, some parts of the manuscript do not have related references which should be considered. Finally, the limitations of the present study should be addressed.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Jul 6;15(7):e0234763. doi: 10.1371/journal.pone.0234763.r002

Author response to Decision Letter 0


19 May 2020

(Already uploaded with the revised manuscript)

We would like to thank the Editor and all reviewers for their helpful comments and suggestions, which have been taken into consideration in the revision of this paper.

Reviewer: 1

Comments to the Author

In the submitted manuscript, the authors provide a model of prediction of SARS-CoV-2 spreading in three main Countries: South Korea, Italy, and Iran. Their model is based on a previous publication by the same authors and on data collected in 367 Chinese cities. The data and the conclusion are clearly presented and the manuscript is easy to read, but some points have to be clarified and better discussed, especially in the light of updated observed data of viral spreading in the regions under investigation.

Authors’ Response: Thank you for the positive support and encouragement.

One point that can be better explained, to make it easier for the large audience of Plos One to understand the hypothesis of the authors, are the variables taken into account for the construction of the model. Indeed some social/political/climatic variables could differently affect viral spreading in different countries and/or cities. The effect of such factors should be at least discussed in the conclusions.

Authors’ Response: Thank you for the positive support and encouragement. The point raised is indeed very legitimate for data-driven model. We have included a brief discussion at the end of the paper (Conclusion) to highlight this issue, and specifically, performance of any data-driven model would depend on the breath of coverage of the historical data. Thus, we do admit that the model needs to be continuously updated to cover different set of spreading characteristics. At present, we the data is limited, and the data-driven model may not perform satisfactorily if it is applied to a new epidemic which (i.e., model parameters being significantly different from those collected in the historical database) or in a population or country having a significantly different contact topology, travel patterns, effectiveness of government's control as well as climate.

Another important point to discuss is the difference between the predictions presented and the actual situation (see for example line 200-205). We do understand that the authors performed the analysis having the data till the 6th of March but they now have access to the observed cases in the different countries and regions and some of the observations importantly deviate from the model of spread here presented. The information on observed cases should be updated and discussed proposing criteria for the adjustment of the presented model.

Authors’ Response: The manuscript was finished on March 8, 2020. Now, two months have passed. We have updated the results based on new datasets. All the figures are updated. It is found that the general profile pattern remains the same for the three countries under study, despite some adjustment on the actual number of infected cases predicted and the exact times of the peaks. We believe that the progression profile is basically being captured by the historical data of the 367 cities collected. However, deviation can still be expected due to outlier events such as superspreader events and irregular travel patterns that may cause deviation from the general patterns. For instance, we found significant deviation in the Hong Kong data compared with a similar prediction conducted earlier, and there was unexpected surge in the number of infected cases in mid March due to an unexpectedly large number of inbound travelers as a result of overseas students returning from UK and USA.

Regarding the updates in this revised version, we should mention that after March 22, Iran no longer publishes detailed data for individual provinces. Thus, we cannot achieve detailed data for individual provinces for forecasting after March 22. The main updates have been included in the blue texts in the Abstract, Section 4 and Conclusion of the revised paper.

In paragraph line 212-215, why do the authors conclude that the epidemic will end before June 2020, on the basis of which data?

Authors’ Response: We provided further explanation in the revised paper as follows. Basically, from the progression trends of the epidemic these three countries, provided control measures continue to be in place, our model show that the number of confirmed cases of COVID-19 infection in most regions of these three countries will peak before the end of May 2020. Hence, the first wave of epidemic progression would come under control before the end of May 2020. This has been mentioned in Page 3 and Page 7 of the revised paper.

Minor points:

To speak about SARS-CoV-2 spreading would be more appropriated than COVID-19 spreading.

Authors’ Response: According to WHO (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/technical-guidance/naming-the-coronavirus-disease-(covid-2019)-and-the-virus-that-causes-it), the disease is abbreviated as COVID-19 and the virus is SARS-CoV-2. We have found that the literature has more adopted “COVID-19 spreading”. In the revised paper, we have added SARS-CoV-2 in the Introduction but has continued to adopt only COVID-19 for simplicity.

To indicate that for infected individuals the authors mean the people tested positive would also be appropriated, especially when presenting the data as proportion of population infected. Indeed, only seroprevalence studies may actually estimate the proportion of population that underwent infection.

Authors’ Response: Thank you for pointing this out. In Section 2, as far as our study is concerned, we have clarified the kind of data we have, corresponding to the number of infected cases. See Section 2 on Page 3 of the revised paper.

The names of Italian administrative regions should be corrected in text and figures:

Lombardy, Emilia Romagna, Apulia, Basilicata.

Authors’ Response: We have tried to unified all the names for the cities and regions of the three countries.

Reviewer: 2

Comments to the Author

The manuscript describes “Prediction of COVID-19 Spreading Profiles in South Korea, Italy and Iran by Data-Driven Coding.” The title is interesting, however, the manuscript is not well organized and understandable because of the lack of a good explanation of the results.

Authors’ Response: We have updated the results and improved the descriptions in the revised paper, so that the idea of the proposed data-driven model is more clearly explained to the readers. The updates are highlighted in blue in the revised data. We have included more information, as suggested, such as information about populations of individual regions.

The population of each city, province, and region (with a reference) should be mentioned in the results. For example, lines 194-197, the population of Daegu and Seoul should be mentioned. If Seoul’s population is x million with x people eventually infected, it is x% of the city’s population.

Authors’ Response: See Table 1 of the revised paper. We have also modified the description in the text as per your suggestion.

In addition, it seems that the mentioned places of Iran are Provinces (not cities) and the population of whole province should be considered, however, they should be spelled also correctly (e.g. Fig.6, f, h, i, k, l).

Authors’ Response: Table 1 indeed are provinces’ populations.

Moreover, some parts of the manuscript do not have related references which should be considered.

Authors’ Response: The following references have been cited:

• Jia JS, Lu X, Yuan Y, Xu G, Jia J, Christakis NA. Population ow drives spatio-temporal distribution of COVID-19 in China. Nature 2020, 29: 1-1. doi: https://doi.org/10.1038/s41586-020-2284-y10.1126/science.aba9757.

• Du Z, Wang L, Cauchemez S, Xu X, Wang X, Cowling BJ and Meyers LA. Risk of transportation of 2019 novel coronavirus disease from Wuhan to other cities in China. Emerg Infect Dis 2020; 26(5) DOI: 10.3201/eid2601.200146.

• Wu JT, Leung K, Leung GM. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. The Lancet 2020; 395(10225): 689-97.

• Humanity tested. Nat Biomed Eng 2020; 4: 355-356.

• Zhan C, Tse C, Lai Z, Chen X and Mo M. General model for COVID-19 spreading with consideration of intercity migration, insuffcient testing and active intervention: application to study of pandemic progression in Japan and USA, Preprint available at medRxiv.org, 2020, doi: https://doi.org/10.1101/2020.03.25.20043380.

Finally, the limitations of the present study should be addressed.

Authors’ Response: Indeed, our data-driven model does have its weakness. due to limited coverage of the data collected, the data-driven model may not perform satisfactorily if it is applied to a new epidemic which has a significantly different set of spreading characteristics (i.e., model parameters being significantly different from those collected in the historical database) or in a population or country having a significantly different contact topology, social behavior, travel patterns, effectiveness of control as well as climate. In fact, deviation was observed in the Hong Kong data in late March when compared with the profile predicted earlier in February, and the deviation was found to be a result of an unexpected surge in inbound travelers which were mostly overseas students returning from the UK and USA. Thus, had superspreader events or other unexpected events occurred, the spreading profile predicted by the data-driven model might deviate from the actual pattern. We have provided some comments in the Conclusion.

Attachment

Submitted filename: Responses_to_reviewers_file.pdf

Decision Letter 1

Delia Goletti

3 Jun 2020

Prediction of COVID-19 Spreading Profiles in South Korea, Italy and Iran by Data-Driven Coding

PONE-D-20-06792R1

Dear Dr. Chi K. TSE, 

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Delia Goletti, M.D., Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Delia Goletti

25 Jun 2020

PONE-D-20-06792R1

Prediction of COVID-19 Spreading Profiles in South Korea, Italy and Iran by Data-Driven Coding

Dear Dr. Tse:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Delia Goletti

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    (ZIP)

    Attachment

    Submitted filename: Responses_to_reviewers_file.pdf

    Data Availability Statement

    The data underlying the results presented in the study are available from https://www.who.int/emergencies/diseases/novel-coronavirus-2019.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES