Skip to main content
PLOS One logoLink to PLOS One
. 2021 Mar 3;16(3):e0245519. doi: 10.1371/journal.pone.0245519

Forecasting the spread of SARS-CoV-2 is inherently ambiguous given the current state of virus research

Melissa Koenen 1, Marleen Balvert 1, Ruud Brekelmans 1, Hein Fleuren 1, Valentijn Stienen 1, Joris Wagenaar 1,*
Editor: Abdallah M Samy2
PMCID: PMC7928451  PMID: 33657128

Abstract

Since the onset of the COVID-19 pandemic many researchers and health advisory institutions have focused on virus spread prediction through epidemiological models. Such models rely on virus- and disease characteristics of which most are uncertain or even unknown for SARS-CoV-2. This study addresses the validity of various assumptions using an epidemiological simulation model. The contributions of this work are twofold. First, we show that multiple scenarios all lead to realistic numbers of deaths and ICU admissions, two observable and verifiable metrics. Second, we test the sensitivity of estimates for the number of infected and immune individuals, and show that these vary strongly between scenarios. Note that the amount of variation measured in this study is merely a lower bound: epidemiological modeling contains uncertainty on more parameters than the four in this study, and including those as well would lead to an even larger set of possible scenarios. As the level of infection and immunity among the population are particularly important for policy makers, further research on virus and disease progression characteristics is essential. Until that time, epidemiological modeling studies cannot give conclusive results and should come with a careful analysis of several scenarios on virus- and disease characteristics.

Introduction

The COVID-19 pandemic has disrupted society all across the world. At the time of the SARS-CoV-2 virus outbreak in Wuhan province, China went into lockdown. Many countries across the world followed when the virus reached them a few weeks or months later. Since then many researchers and national health institutions have focused on predicting the course of the epidemic, assessing the effects of non-medical interventions in the form of social distancing, and evaluating the possibilities of an exit strategy [13]. The epidemiological models underlying these studies heavily rely on virus and disease characteristics such as the case fatality ratio (CFR). Within just a few months researchers made great progress in estimating these characteristics [47], and a plethora of data sources and scientific studies rapidly became available [810]. These sources however report various estimates on parameters that describe the virus behavior and disease progression. As a result, many aspects of the SARS-CoV-2 virus’ behavior that forecasting models rely on, including CFR, still remain uncertain or unknown.

The aim of this paper is to get a good view on the spread of the virus and its characteristics as the virus spread would behave without taking any social distancing measures. Background is that, in future research, we are interested how the SARS-CoV-2 virus spreads in low income countries, slums and refugee camps where, for various reasons, measures hardly can be taken or are not effective at all. We therefore use data of the initial phase of the COVID-19 spread in the Netherlands where relatively plenty of good quality data is available.

This research consists of two parts. First, the validity of several assumptions on four disease and virus characteristics: the probability of developing symptoms, case fatality ratios, when people develop immunity, and the probability of virus transmission between an infected and a non-infected individual. We did so by simulating the spread of SARS-CoV-2 under a variety of assumptions on these four parameters using data from the Netherlands. Combinations of assumptions that lead to a predicted number of ICU admissions and death toll that resembled reality were considered plausible, while scenarios that lead to predicted ICU admissions and death toll that substantially differed from reality were considered unrealistic. As such, we obtain a set of realistic assumptions for the four model parameters, which provides better insight in the SARS-CoV-2 virus spread and disease progression.

Second, we assess the sensitivity of the model predictions with respect to the uncertainty in model inputs. Several combinations of assumptions yield a realistic number of estimated daily ICU admissions and deaths and were hence plausible scenarios. However, they gave different predictions in terms of unobservable yet important characteristics, namely the number of infections and the level of immunity among the population. This means that our current knowledge on virus- and disease characteristics is insufficient for an epidemiological modeling study to give conclusive answers concerning disease spread.

In order to carry out our analysis we developed a coarse-grained agent-based simulation model. This model holds the middle between a classical SEIR model and an agent-based simulation. As such, it allows for incorporating individual characteristics that are important determinants for virus spread and disease progression, while limiting the computational complexity of the model which enables us to simulate the entire population of the Netherlands. As mentioned, the model is applied to data from the Netherlands in the time period from February 27, 2020, the day that the first case in the Netherlands was identified, until March 25, 2020, when the first effects of social distancing measures became apparent in the data.

This paper is organized as follows. An overview of our simulation model is presented in the Materials and Methods section. The subsections in Materials and Methods describe the considered scenarios as well as further modeling details. The Results section first presents results for the analysis on the validity of model parameters are provided, followed by results for the sensitivity of the population’s level of infection and immunity with respect to the uncertain model parameters are presented. The paper concludes with a Discussion and Conclusions section.

Materials and methods

In the existing literature, many researchers have already pointed out that classical SEIR models and agent-based simulations need to be adapted in order to catch all the important virus characteristics of COVID-19. Although SEIR models have been commonly used to model disease spread and form the basis of many of today’s COVID-19 epidemiological models [11, 12], they need to be adapted to differentiate between age groups or geographic locations [1, 1316]. An often used alternative is agent-based simulation [3, 17, 18], which allows for modeling at the individual level rather than aggregating over the entire population. This is important when modeling COVID-19 [19, 20], as it allows for social and travel patterns that depend on age group and location.

The simulation model that we use to validate assumptions on virus spread and disease progression holds the middle between a compartmented SEIR model and an agent-based simulation model. Traditional compartmented SEIR models divide the population into several health stages such as susceptible (healthy individuals, denoted as S), exposed (asymptomatic infected individuals who are not able to spread the virus, E), infected (symptomatic infected individuals, I) and recovered (immune or deceased individuals, R). Over time, the health condition of individuals may progress from one health stage to another. Within the population no distinction is made between individuals based on age or other personal characteristics. Virus parameters are often estimated using differential equations. While the spread of SARS-CoV-2 as well as the disease progression behave differently depending on an individual’s age and location (rural or urban), splitting the population into subgroups based on these characteristics complicates the derivation of model parameters and increases the risk of nonidentifiability. In agent-based simulation one simulates the exact daily movements of each individual. As a result agent-based simulation can take many (virus-related) individual characteristics into account and can simulate the daily contacts of each individual in detail.

While agent-based models provide the level of detail required to model SARS-CoV-2 that is lacking in SEIR models, the computational complexity of agent-based modeling is prohibitive for a population of 17 million people, which is the size of the population of the Netherlands. We therefore propose an intermediate modeling form: a coarse-grained agent-based simulation model which uses the idea of health stages to simulate agents, where agents are characterized only by their age group and geographic region. The model is akin to agent-based simulations [3, 17, 18] in that distinctive individuals are simulated who commute between their region of residence and region of employment. The main difference is that we do not include social interactions at an individual level but aggregate over groups of people with the same age, region of residence and work region. This allows us to simulate on a large scale, i.e. to simulate all inhabitants of a country or state, while including individual characteristics such as age, region of residence and commute patterns that are highly relevant when modeling the spread of SARS-CoV-2 [7, 21]. Compared to an agent-based simulation our model reduces the number of assumptions one needs to make, as we aim to simulate on a country level. Furthermore, it gives us more freedom than when using the compartment model.

The disease progression as modeled in classical SEIR models, assuming that a population consists of susceptible (S), exposed (E), infectious (I) and recovered (R) individuals, is insufficient to reflect COVID-19 [4, 11]. We extended the disease progression with several disease stages (Fig 1). First, while classical SEIR models assume that infectious individuals are symptomatic and hence observable, for COVID-19 asymptomatic individuals can be infectious as well [2225]. We therefore split (I) into two subgroups: asymptomatic (I-a) and symptomatic (I-s). Second, it is unclear whether all infections lead to immunity [11, 26]. We tested the effects of assuming some infected individuals to not develop immunity but return to the susceptible group instead (dotted lines in Fig 1). In order to explicitly model recovered patients that obtained immunity, recovered (R) was replaced by two states: immune (IM) and deceased (D). Third, since we tested the validity of our model outcomes based on among others the number of daily ICU admissions, we included a stage “ICU admission” (ICU-a) [3, 11]. In the Netherlands, only patients with severe symptoms who have a chance of survival are admitted to the ICU [27]. For patients with severe symptoms but a low chance of survival we used the stage “ICU refusal” (ICU-r). Finally, we used the classical compartments susceptible (S) and exposed (E).

Fig 1. Progression of disease stages in our simulation model.

Fig 1

The remainder of this section is organized as follows. First an overview of the tested assumptions is given, followed by the commute and contact patterns used in this study. Next the computation of the transmission probability, i.e. the probability that a non-infected individual gets infected when they meet an infectious individual, is explained. After that the health stages that an infected agent goes through are discussed. This section ends with an explanation of the initialization of the simulation model.

Scenarios

The aim of this study was to test the validity of four important disease characteristics and assumptions. First, we estimated the probability of developing symptoms after infection. A major fraction of infections is asymptomatic and often goes by unnoticed, thus this probability is largely unknown [22]. For this, we have tested four scenarios where the probability of developing symptoms is 0.375, 0.5, 0.625 or 0.75, representing a wide range of possibilities.

Second, we assessed three scenarios for case fatality ratios. In the first scenario, we estimated the probability that a symptomatic individual dies, P(D|I-s), as the ratio between the death toll and the number of symptomatic individuals as estimated for the Netherlands. We used the death toll reported by the Dutch National Institute for Public Health and Environment (RIVM) [28] combined with the excess death rates reported by the Dutch Statistics Bureau [29]. The number of symptomatic individuals was estimated using data reported by Sanquin, the Dutch blood bank, that tested donated blood for antibodies to estimate the fraction of the population that had been infected [30]. For details see the S1 Appendix in S1 File. The second and third scenario were based on case fatality ratios per age group obtained from a study in China with approximately 72,000 cases [31]. The case counts in this study may either include all infections, or only symptomatic cases since these are observable. Therefore, in the second scenario, we assumed that this CFR reflects the death rate among all infected individuals, P(D|E), and in the third scenario we assumed that CFR reflects the fatality ratio among all symptomatic individuals, P(D|I-s).

Third, it is yet unknown which fraction of the non-lethal infections leads to immunity. For this we considered four scenarios. In the first scenario, all infections lead to immunity. The second scenario assumed that only those individuals who develop symptoms (I-s) become immune, while others return to the group of susceptible individuals (S). In the third and fourth scenario, we assumed that having symptoms leads to immunity in only 50% and 25% of the cases, respectively, while the remaining individuals return to the susceptible group. The latter three scenarios are indicated by the dotted lines in Fig 1.

Fourth, the virus transmission probability of an infectious individual encountering a susceptible individual is unknown. In total nine different probabilities were evaluated: 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50 and 0.55. Combined with the 48 scenarios for symptom development, CFR and developing immunity, this leads to 432 scenarios in total (see Table 1).

Table 1. Overview of tested scenarios.

Input parameter Scenario Description
Probability of 0.375 37.5% go from E to I-s
developing symptoms 0.5 50% go from E to I-s
0.625 62.5% go from E to I-s
0.75 75% go from E to I-s
Case fatality ratio Data from NL P(D|E) based on case counts and death toll
in the Netherlands
Literature-E P(D|E) based on [31]
Literature-I-s P(D|I-s) based on [31]
Developing immunity All All infections lead to immune
High All symptomatic infections lead to immune
Medium 50% of symptomatic infections lead to immune
Low 25% of symptomatic infections lead to immune
Virus transmission probability {0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55}

All 432 combinations are tested. E = Exposed, I-s = Infectious Symptomatic. Note that for the probability of developing immunity, we only considered infections that do not lead to death.

We used data from the Netherlands to simulate the period between February 27, 2020, when the first case was identified, and March 25, 2020, when the effects of the lockdown became visible in the number of ICU admissions and the death toll. Using only the initial period of the outbreak gives the most accurate view of virus parameters (thus excluding the effect of protection measures).

Daily commute and contact patterns

The Netherlands is divided into 40 regions termed corops, following a statistical division of the Netherlands for research institutions to present their data [32]. All approximately 17 million inhabitants of the Netherlands, termed “agents” in the simulation, have a known corop of residency and corop of employment. The population per corop per age group [33] and commute data between corop regions [34] were obtained from Statistics Netherlands. As the Netherlands is a small and densely populated country with many commuters, these corops are vastly interconnected. We assumed that agents who are unemployed stay in their corop of residency during the day.

The simulation divided each day into two epochs: during the day epoch, most inhabitants are at their work corop, and during the night epoch all agents are in their corop of residency. Each epoch an agent may meet other agents that reside in the same corop during that epoch, potentially leading to a new SARS-CoV-2 infection.

The daily contact pattern of an agent is determined by their age group and was obtained from [35] (Table 2). This paper reports the total number of daily contacts an individual of a certain age group has, whereas the age distribution of the people someone has contact with differs per age group. Consequently, the total number of contacts had to be divided over the different age groups. For this we used the percentage of contacts each age group has with another age group obtained from [36], by converting the amount of contacts found in that study into percentages. Combining these percentages with the total number of contacts per age group gave the contact pattern for each age group as shown in Table 3.

Table 2. Daily contact data obtained from [35].

Age group 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-150
# of daily contacts 12.5 16.1 21.2 21.8 22.1 20.9 15.4 10 9.5

Table 3. Social patterns obtained by combining [35] and [36].

Age group 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-150
0-9 4.75 2.25 1.36 1.36 0.70 0.70 0.45 0.45 0.45
10-19 1.93 6.04 1.87 1.87 1.34 1.34 0.56 0.56 0.56
20-29 1.97 1.99 2.90 4.34 2.84 2.84 1.44 1.44 1.44
30-39 2.03 2.05 2.99 4.40 2.92 2.92 1.48 1.48 1.48
40-49 1.02 1.33 2.41 3.16 3.93 3.93 2.12 2.12 2.12
50-59 0.96 1.25 2.28 2.99 3.72 3.72 2.01 2.01 2.01
60-69 0.60 0.60 0.79 1.32 1.74 1.74 2.86 2.86 2.86
70-79 0.39 0.39 0.51 0.86 1.13 1.13 1.86 1.86 1.86
80-150 0.37 0.37 0.48 0.817 1.07 1.07 1.77 1.77 1.77

The table shows the daily number of contacts an individual whose age group is found on one of the rows has with people from an age group found on one of the columns. For example, someone in the age group 30-39 has on average contact with 2.99 people from the age group 20-29 per day.

Transmission probability

At every time epoch the model determined for each susceptible agent whether they got infected based on their location and age group. The probability to get infected depends on the location of the agent, the number of (infected) other agents present in that location, the contact pattern of the agent, and the probability of transmission in case the agents interacts with an infectious agent.

In mathematical terms, we defined the infection probability pa,c, t as the probability that a susceptible agent from age group aA gets infected while being in corop cC at epoch t = 1, ⋯, T. This infection probability can be determined as follows:

pa,c,t=1-aA(1-pa,a,c,t)[#EpochContacts]a,a,

where [#EpochContacts]a,a is the number of contacts an agent of age group a has with agents of age group a′ during an epoch (see Section “Daily commute and contact patterns”) and pa,a′, c, t represents the probability that a susceptible individual from age group aA gets infected through an encounter with an agent from age group aA, in corop cC, at epoch t = 1, …, T.

The definition of the infection probability was chosen such that it allows for region dependent probabilities as well as social patterns that depend on age groups. This makes the model flexible and realistic: suppose that in a specific corop many agents of a given age group are infected, then an agent who resides in that specific corop and meets many individuals of that age group has a high risk of getting infected.

The probability that a susceptible individual from age group a gets infected through an encounter with an individual of age group a′, in corop c at epoch t (pa,a′, c, t) consisted of three components:

pa,a,c,t=P{Eaa}P{Ia,c,t}P{T},

We assumed independence between these components.

  • P{Eaa} represented the probability that an individual from age group a encounters an individual from age group a′. See Section “Daily commute and contact patterns” on how to obtain this probability from the contact patterns.

  • P{Ia,c,t} described the fraction of individuals in age group a′ in corop c that was contagious at time t. This fraction was determined for each epoch separately, and was computed by dividing the number of contagious agents in age group a′ in corop c at time t by the total number of agents of age group a′ in corop c at time t. For the infectious agents we only included those that were in health stages I-a or I-s, as individuals with symptoms so severe that ICU admission is necessary (either ICU-a or ICU-r) were assumed to be too severely ill to (be allowed to) meet others.

  • P{T} denoted the probability of virus transmission when a susceptible individual encounters an infectious individual. Since this is an unknown parameter, we tested several values for P{T}, as shown in Table 1.

Disease progression

The progression of an exposed agent’s health stage from one epoch to the next was assumed to depend only on the agent’s current health stage and his or her age group. Hence, after being infected, the agent’s health stage over time can be interpreted as a discrete time Markov chain where transitions can occur according to Fig 1. This Markov chain is represented by a transition matrix containing the transition probabilities of an agent moving from one health stage in a certain epoch to another health stage in the next epoch. Well-known results from Markov chain analysis [37] were used to compute these probabilities such that certain pre-imposed properties were satisfied.

Two types of properties were used to compute the transition matrices. First, we used the probability that an agent will eventually reach stage j at some point in time, given that the agent’s current state is i. For example, for i = E and j = D, P(D|E) is the probability that an exposed agent will eventually decease. This corresponds to the CFR, for which we tested three scenarios as discussed in Section “Scenarios”. We also used assumptions on P(D|ICU-a) and P(ICU-a|I-s). Data regarding ICU admissions and number of deaths at the ICU [38] was used to construct P(D|ICU-a). Furthermore, data from Sanquin on the percentage of their blood donors that had COVID-19 antibodies [30] (see S1 Appendix in S1 File) was used to obtain an estimation of the number of infections in each age group. Combining the Sanquin data with the number of ICU admissions we estimated P(ICU-a|I-s). The resulting probabilities that were used for estimating the transition matrices are listed in Table 4.

Table 4. Age-dependent probability properties common to all scenarios.

Age group P(D|ICU-a) P(ICU-a|I-s)
0–9 0.32 × 10−2 0.00
10–19 2.88 × 10−2 0.66 × 10−4
20–29 7.99 × 10−2 1.47 × 10−4
30–39 15.67 × 10−2 4.34 × 10−4
40–49 25.90 × 10−2 11.22 × 10−4
50–59 38.60 × 10−2 26.85 × 10−4
60–69 54.03 × 10−2 77.03 × 10−4
70–79 71.93 × 10−2 187.42 × 10−4
80+ 92.39 × 10−2 41.86

The second property type considered was duration, i.e. the expected time an agent spends in a health stage or a collection of health stages, given the current health stage. The values used are listed in Table 5. For instance, a well-known quantity from literature is the incubation time, i.e., the time from infection until developing symptoms. This incubation time corresponds to the expected time spent in health stages E and I-a combined, given that an agent’s current health stage is E (represented in the table by E(EI-s)). The average incubation time was estimated using an aggregated dataset of clinical outcomes [9] by taking the average of the reported values from the 20 studies with the largest population sizes. The average duration from symptoms to ICU admission (E(I-sICU-a), E(I-sICU-r)) was estimated based on an average from multiple papers in literature, among which [9] and [39].

Table 5. Average duration properties of Markov chain used to fit the transition matrix.

Stage(s) Description Literature Value
E(EI-s) Average incubation time [9] 5.758
E(I-aI-s) Average duration asymptomatic infectious - 1.5
E(I-sICU-a) and Average duration from symptoms until
E(I-sICU-r) ICU need [9, 39] 7.166
E(ICU-aIM) and Average time in ICU care
E(ICU-aD) [38] 11.3
E(ICU-rD) and Average time in ICU-r
E(ICU-rIM) - 5

Recall that we distinguished between age groups, as many properties of COVID-19, such as the probabilities mentioned above, depend on the patient’s age. Therefore, a transition matrix was estimated for each age group. The Markov properties can be used to fit a transition matrix to empirical values by minimizing a measure of goodness of fit on these properties. We used the sum of squared errors for this purpose after weighting the probability properties by a factor 100 to ensure balanced scaling between probability and duration properties.

The properties combined with the transition structure from Fig 1 do not uniquely determine the transition matrix. We therefore made the additional assumption that apart from the target properties, the transition matrices of the age groups should be as similar as possible. So, rather than fitting the transition matrices for each age group independently, we fitted all transition matrices simultaneously, and extended the goodness-of-fit measure with a penalty on the pair-wise distances between the transition matrices of all age groups. The distance between two transition matrices was measured by the Euclidean norm between the two vectors containing the nonzero probabilities in these transition matrices. The resulting transition matrices are available on our GitHub page https://github.com/zero-hunger-lab/covid-paper-supplement or via the supplementary information.

Model initialization

We estimated the number of people in each health stage for each corop on the starting date of the simulation, February 27, 2020, following the rationale explained by [40]. The duration from infection till death was estimated to be on average 24 days, based on the estimated incubation time, the time from symptoms to ICU admission, and the time from ICU until death. The number of people that got infected on day t can be estimated as the number of deaths on day t + 24 divided by the CFR. This means that a different start situation was constructed for all CFR scenarios. To construct the start situation at February 27, day s, we used the daily number of deaths from March 6, the day of the first reported death, up to and including March 20, which is 24 days after the start date.

The CFR for people under 60 is very low (<0.5%) leading to few deaths in these age groups. We therefore only estimated the number of infected people in the age groups 60-70, 70-80 and 80+. Assuming that the fraction of the population that got infected was the same over all age groups we estimated the number of infections under 60 per day by multiplying this fraction with the population size.

People who got infected before our simulation started, thus on day t ∈ {s − 24, s − 23, ⋯, s − 1}, may have progressed to other stages. We therefore determined in which stage the people who got infected on day t ∈ {s − 24, s − 23, ⋯, s − 1} are at day s by applying the transition matrix to the infected group for all epochs between t and s.

The total number of infected individuals was spread over the corops following the distribution of deaths over the country as reported in [30]. All initializations constructed using this approach can be obtained from our GitHub page.

The death toll was crucial for computing the initial health stage distribution of the population. Since the actual death toll is likely higher than the number of reported COVID-19 induced deaths, we computed a lower- and an upper bound on the death toll. Two initial distributions of the population over the health stages were computed, one based on the lower bound, and one based on the upper bound. We used the average of the two as the starting point for our simulation.

The death toll reported by the National Institute for Public Health and the Environment [20] was used as a lower bound. To obtain an upper bound, we used the excess number of deaths in 2020 compared to 2015-2019. This was determined using the weekly number of deaths in the Netherlands as reported by Statistics Netherlands [29]. The excess number of deaths was computed as the number of deaths in a certain week in 2020 minus the average number of deaths in that same week for 2015-2019. The resulting excess number of deaths is 442 for week 12 and 1164 for week 13. The deaths were then distributed over the days in that week following the trend of the reported COVID-19 deaths obtained from [28]. For example, if 10% of the COVID-19 confirmed deaths in week 12 occurred on Monday, then 10% of the 442 excess deaths were added to that number. The resulting daily number of deaths are available on our GitHub page.

Results

Assessing the likelihood of scenarios based on number of deaths and ICU admissions

To assess the validity of each of the scenarios, we first compared the death toll of COVID-19 predicted by the simulation model with an estimate of the actual death toll in the Netherlands. The predicted death toll by the simulation is computed as an average over ten different runs to avoid outliers as the simulation has stochastic components. As the true number of COVID-19 induced deaths is highly uncertain (see e.g. [22]), we employed a reliable but safe lower and upper bound of 602 and 2139 (see S2 Appendix in S1 File for details), respectively. All scenarios that led to a prediction of the total number of deaths between February 27 and March 25 within this range were considered to be plausible. In this way, we limited the set of realistic parameter combinations to 121. Fig 2 presents a heatmap showing the deviation from the lower and upper bound on the number of deaths for all parameter combinations. A value of 0 means that the number of deaths in the simulation for the specific parameter combination was within the lower and upper bound and were thus acceptable parameter combinations. S3 Fig in S1 File shows the simulated total number of deaths for each of the parameter scenarios.

Fig 2. A heatmap showing by how much the simulated number of deaths up to and including March 25 lies outside the interval [602, 2139] along with its legend on the right.

Fig 2

Columns correspond to virus transmission probabilities, rows represent the various combinations of the probability of developing symptoms, the case fatality ratio and the possibility of developing immunity. “Literature-E” and “Literature-I-s” correspond to the scenarios where the P(D|E) and P(D|I-s), respectively, are based on CFR estimates from the literature.

We further assessed the validity of the set of realistic scenarios based on the daily number of ICU admissions. Since this was a highly reliable parameter, we were able to use the reported values to compute the mean squared error (MSE) between the simulated and the real daily ICU admissions in the Netherlands. Any scenario that yielded an MSE greater than or equal to 225 was considered unreliable. The threshold of 225 was chosen based on visual inspection of plots showing simulated and real daily ICU admissions, which are available at https://covid-results.herokuapp.com. The plots allow for visual comparison of the simulations with each other and with reality, as well as a visual evaluation of the progression of these metrics over time. Furthermore, an MSE of 225 corresponds roughly to a difference between simulated and real daily ICU admissions of 15 on average. Fig 3 shows a heatmap of the MSE for only those scenarios that were considered realistic based on the predicted death toll. MSEs for all scenarios are shown in S4 Fig in S1 File.

Fig 3. A heatmap representing the prediction quality of different combinations of COVID-19 characteristics with respect to ICU occupation.

Fig 3

Quality is measured as the MSE between simulated and real daily ICU admissions up to and including March 25. A * indicates that the combination is not accurate in predicting the number of deaths. On the right a legend indicating that the darker the color is the higher the MSE. Columns correspond to virus transmission probabilities, rows represent the various combinations of the probability of developing symptoms, the case fatality ratio and the possibility of developing immunity. “Literature-E” and “Literature-I-s” correspond to the scenarios where the P(D|E) and P(D|I-s), respectively, are based on CFR estimates from the literature.

After removing all scenarios with an MSE of more than 225, our simulation still indicated 18 combinations of virus- and disease characteristics to be plausible. Note that the MSE for the ICU admission differed among these 18 settings.

The selected scenarios lead to a wide range of predictions on the number of infected and immune individuals

For the 18 remaining parameter combinations, the left colored panel of Fig 4 shows the percentage of infected individuals up to March 12 according to our simulation. At that time social distancing measures were installed in the Netherlands, leading to a strong reduction in virus spread. Since we were interested in virus behavior under stable circumstances, i.e., without changes in human behavior, we only considered the number of infected individuals until March 12. Note that the number of infected individuals was affected by social distancing immediately, while the ICU admissions and deaths were affected only after some time, which was why we used a different time horizon for the infections (until 12 March) than with ICU admissions and deaths (until 25 March). There was substantial variation among the 18 scenarios: the percentage of the population that was infected varies from 2.0% up to 5.0%.

Fig 4. The percentage of infected people on March 12 (penultimate column) and the percentage of immunity among the population based on individuals who were infected no later than March 12 (last column).

Fig 4

The legend right next to the table indicates that the darker the color, the more people are infected or immune. Results are shown only for scenarios that lead to reasonable predictions of the death toll and the daily ICU admissions.

An interesting metric for many policy makers is the fraction of the population that develops immunity. As before, we excluded the effects of social distancing by considering infections that happened no later than March 12. Only for the individuals who were infected no later than March 12, we simulated the disease progression until they ended up in one of the stages susceptible, immune or deceased. The right colored panel of Fig 4 displays the fraction of the population that develops immunity based on the infections up to March 12. The results show major differences among the 18 scenarios: the percentage of the population that was immune varies between 0.2% and 5.2%.

The percentages of infected individuals up to March 12 and percentages of immune individuals for all parameter combinations are shown in S5 and S6 Figs in S1 File.

Discussion and conclusions

Using an epidemiological simulation model, we evaluated the likelihood of a variety of scenarios for virus spread and disease progression characteristics for the SARS-CoV-2 virus. With the four uncertain input parameters, we were able to identify 18 sets of parameter values that all led to accurately simulated daily ICU admissions and number of deaths. In particular, based on our scenarios, our analysis indicated the following conclusions (Figs 2 and 3):

  1. All Literature-I-s scenarios, where we determined P(D|I-s) based on [31], do not seem realistic, as none of the corresponding scenarios yielded a realistic death toll and number of daily ICU admissions. Both Literature-E and Data from NL were realistic case fatality ratio scenarios, and further research is required.

  2. In order for the simulation results to match reality, a high probability of symptom development required the transmission probability to be low and vice versa. According to our analysis, a low transmission probability combined with a high probability of developing symptoms was equally likely as a high transmission probability combined with a low probability of developing symptoms.

  3. No conclusions regarding the probability of developing immunity could be drawn from this simulation study: the death toll and the number of daily ICU admissions differed only slightly between scenarios with different assumptions on immunity, ceteris paribus. Further research is required to obtain better estimates on the probability of developing symptoms.

While our study provided some clear pointers towards assumptions that were likely to reflect actual virus behavior, other important questions remain unanswered. This has several implications. First and foremost, modeling the spread of SARS-CoV-2 does not give conclusive insights in the number of infections and the level of immunity among the population, and requires a thorough analysis of the results for several uncertain scenarios regarding virus- and disease characteristics.

Second, the probability of developing symptoms was highly uncertain, which has major implications for virus spread predictions. When the probability for symptom development was assumed to be low, our simulation model could only reach a realistic number of ICU admissions and death toll if the uncertain parameter reflecting the transmission probability was high. This would imply a high attack rate (the percentage of the population that contracts the disease) during the early phase of the pandemic, leading to a large fraction of the community to be infected and possibly have developed immunity. On the other hand, if the probability of symptom development is high and the transmission probability is low, this would mean that only few infections have taken place so far and pre-symptomatic infections are less likely. It is thus very important to gain more insight in the probability of developing COVID-19 symptoms after infection.

Third, the progression of the virus spread in the long run was difficult to predict. If many people have already been infected, and if many infections have led to immunity, the level of immunity among the population has grown rapidly. This means that after a relatively short amount of time the susceptible group will decline and the death toll and ICU burden will reduce rapidly. On the other hand, a low number of infections and a low probability of developing immunity holds the potential of many more infections and hence deaths to come. This is crucial for policy makers when choosing e.g. the proper level of social distancing measures and scaling up the ICU capacity.

Fourth, the haziness around immunity has major implications for policy makers. It is unclear when an infection leads to immunity and whether immunity is obtained for life [41]. Some people may even have a good immune response already at the first infection and can therefore be considered immune prior to infection. Immunity is of vital importance for political decision making when developing a vaccination policy or aiming for herd immunity. Hence, developing exit strategies is not possible without further research on how immunity works regarding SARS-CoV-2.

Epidemiological models such as ours are based on many input parameters, most of which are uncertain and only have crude estimates. In our simulation study, we already considered many scenarios by varying only four input parameters: the probability of developing symptoms, the case fatality ratio, the probability of developing immunity, and the virus transmission probability. The best available estimates from literature were used for the other uncertain parameters such as the incubation time, the time until an exposed individual becomes infectious, the probability of ending up in the ICU, and the fraction of symptomatic patients that goes into self-quarantine. Of course, more scenarios can be created by varying some of these uncertain parameters as well, but this will only yield more alternative scenarios that offer a possible explanation of the observed death toll and ICU admissions.

In case further research resolves one or more of the uncertainties in virus and/or disease characteristics, then our or a similar study can be used to narrow down the range of possibilities for the other characteristics. Additionally, a better estimate of the death toll allows for further reducing the number of realistic scenarios. For example, if the lower bound on the death toll estimate were increased by 50% and the upper bound reduced by 33%, only 10 scenarios remain. The bandwidth of the predictions on infections and immunity is already much smaller than for the original 18 scenarios: the percentage of the population that was infected varied from 1.8% to 3%, and the level of immunity ranged from 1.3% to 2.8%.

The values for virus transmission probability reported in this study should be interpreted with caution. First, there is uncertainty whether the transmission probability is constant when one is infected, or if the probability changes over time depending on the disease stage of an individual. This is an example of one of the uncertainties that has not been included in the study and including it will only enlarge the number of plausible scenarios. Second, the daily number of infected individuals in our simulation was determined by multiplying this virus transmission probability with the number of infectious individuals and their number of daily contacts. Here, the definition of a contact is important: including only conversations as contacts results in fewer social contacts than including also e.g. passersby. A restricted definition of contacts naturally corresponds to a higher transmission probability. This was reflected by the model as well: a more restricted definition leads to fewer contacts, hence a higher transmission probability was necessary to achieve the same number of infections, ICU admissions and deaths. We therefore did not consider our validated values for the virus transmission probabilities to be exact and universally applicable. Rather we showed that for a variety of assumptions on CFR, developing immunity and symptom development, and for a given definition of contacts, there exist transmission probabilities that lead to realistic simulation results. Further research is required to have a concrete definition of a contact in combination with the probability of transmitting the virus.

In summary, this paper presents a comprehensive study to test the validity of a wide range of SARS-CoV-2 virus- and COVID-19 disease characteristics. A variety of assumptions yielded a realistic number of deaths and daily ICU admissions. However, these scenarios disagreed on the predicted number of infections and immune individuals, two unobservable but important metrics. From this we conclude that the currently available information on the behavior of the SARS-CoV-2 virus is insufficient to accurately model and predict the virus spread, evaluate the effects of social distancing measures in detail and develop social distancing policies or even exit strategies. Note that this does not imply that such studies are not informative: epidemiological forecasting models have helped us understand the severity of the pandemic already early on, and are well capable of forecasting the trend of the virus spread. When conducting a forecasting analysis that requires insights in the infections and immunity among a population we highly recommend to analyze the results for several scenarios on virus- and disease characteristics, report the range of results obtained with these scenarios and draw conclusions and recommendations based on the full set of scenario-specific outcomes.

Supporting information

S1 File

(ZIP)

S1 Data. The data required to duplicate the research.

It contains the data files in the input folder and the source code of the model in the src folder. Furthermore, in order to understand the input data please read the Readme file. Finally, in case you are unable to work with java, there is a .jar file included that runs our code without having to understand java.

(ZIP)

Acknowledgments

We thank Dr. Jean-Luc Murk of the Elisabeth-Tweesteden hospital Tilburg, the Netherlands, for his valuable input and feedback.

Data Availability

The resulting transition matrices are available on our GitHub page https://github.com/zero-hunger-lab/covid-paper-supplement or via the Supplementary information.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1. Davies NG, Kucharski AJ, Eggo RM, Gimma A, Edmunds WJ, Group CCW, et al. The effect of non-pharmaceutical interventions on COVID-19 cases, deaths and demand for hospital services in the UK: a modelling study. MedRxiv. 2020;. 10.1016/S2468-2667(20)30133-X [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Muller SA, Balmer M, Neumann A, Nagel K. Mobility traces and spreading of COVID-19. medRxiv. 2020;. [Google Scholar]
  • 3. Ferguson N, Laydon D, Nedjati Gilani G, Imai N, Ainslie K, Baguelin M, et al. Report 9: Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand. 2020;. [Google Scholar]
  • 4. Liu Y, Gayle AA, Wilder-Smith A, Rocklöv J. The reproductive number of COVID-19 is higher compared to SARS coronavirus. Journal of travel medicine. 2020;. 10.1093/jtm/taaa021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Kucharski AJ, Russell TW, Diamond C, Liu Y, Edmunds J, Funk S, et al. Early dynamics of transmission and control of COVID-19: a mathematical modelling study. The lancet infectious diseases. 2020;. 10.1016/S1473-3099(20)30144-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Sebastiani G, Massa M, Riboli E. Covid-19 epidemic in Italy: evolution, projections and impact of government measures. European Journal of Epidemiology. 2020; p. 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Salje H, Kiem CT, Lefrancq N, Courtejoie N, Bosetti P, Paireau J, et al. Estimating the burden of SARS-CoV-2 in France. Science. 2020;. 10.1126/science.abc3517 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Alamo T, Reina DG, Mammarella M, Abella A. Covid-19: Open-Data Resources for Monitoring, Modeling, and Forecasting the Epidemic. Electronics. 2020;9(5):827. 10.3390/electronics9050827 [DOI] [Google Scholar]
  • 9. Bertsimas D, Bandi H, Boussioux L, Cory-Wright R, Delarue A, Digalakis V, et al. An Aggregated Dataset of Clinical Outcomes for COVID-19 Patients; 2020. [Google Scholar]
  • 10.MIDAS Network. https://midasnetwork.us/covid-19/; 2020.
  • 11. Kissler SM, Tedijanto C, Goldstein E, Grad YH, Lipsitch M. Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period. Science. 2020;368(6493):860–868. 10.1126/science.abb5793 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Roda WC, Varughese MB, Han D, Li MY. Why is it difficult to accurately predict the COVID-19 epidemic? Infectious Disease Modelling. 2020;. 10.1016/j.idm.2020.03.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Peng L, Yang W, Zhang D, Zhuge C, Hong L. Epidemic analysis of COVID-19 in China by dynamical modeling. arXiv preprint arXiv:200206563. 2020;.
  • 14. Prem K, Liu Y, Russell TW, Kucharski AJ, Eggo RM, Davies N, et al. The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study. The Lancet Public Health. 2020;. 10.1016/S2468-2667(20)30073-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Hou C, Chen J, Zhou Y, Hua L, Yuan J, He S, et al. The effectiveness of quarantine of Wuhan city against the Corona Virus Disease 2019 (COVID-19): A well-mixed SEIR model analysis. Journal of medical virology. 2020;. 10.1002/jmv.25827 [DOI] [PubMed] [Google Scholar]
  • 16. Yang Z, Zeng Z, Wang K, Wong SS, Liang W, Zanin M, et al. Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions. Journal of Thoracic Disease. 2020;12(3):165. 10.21037/jtd.2020.02.64 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Wilder B, Charpignon M, Killian JA, Ou HC, Mate A, Jabbari S, et al. The role of age distribution and family structure on COVID-19 dynamics: A preliminary modeling assessment for hubei and lombardy. Available at SSRN. 2020;. [Google Scholar]
  • 18.Mniszewski SM, Del Valle SY, Stroud PD, Riese JM, Sydoriak SJ. EpiSimS simulation of a multi-component strategy for pandemic influenza. In: Proceedings of the 2008 Spring simulation multiconference. Society for Computer Simulation International; 2008. p. 556–563.
  • 19. Chan JFW, Yuan S, Kok KH, To KKW, Chu H, Yang J, et al. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. The Lancet. 2020;395(10223):514–523. 10.1016/S0140-6736(20)30154-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Riou J, Althaus CL. Pattern of early human-to-human transmission of Wuhan 2019 novel coronavirus (2019-nCoV), December 2019 to January 2020. Eurosurveillance. 2020;25(4):2000058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. de Vlas SJ, Coffeng LE. A phased lift of control: a practical strategy to achieve herd immunity against Covid-19 at the country level. medRxiv. 2020;. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Li R, Pei S, Chen B, Song Y, Zhang T, Yang W, et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science. 2020;368(6490):489–493. 10.1126/science.abb3221 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Zhang W, Cheng W, Luo L, Ma Y, Xu C, Qin P, et al. Secondary Transmission of Coronavirus Disease from Presymptomatic Persons, China. Emerging Infectious Diseases. 2020;26(8). 10.3201/eid2608.201142 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Hu Z, Song C, Xu C, Jin G, Chen Y, Xu X, et al. Clinical characteristics of 24 asymptomatic infections with COVID-19 screened among close contacts in Nanjing, China. Science China Life Sciences. 2020;63(5):706–711. 10.1007/s11427-020-1661-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Rothe C, Schunk M, Sothmann P, Bretzel G, Froeschl G, Wallrauch C, et al. Transmission of 2019-nCoV infection from an asymptomatic contact in Germany. New England Journal of Medicine. 2020;382(10):970–971. 10.1056/NEJMc2001468 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Cobey S. Modeling infectious disease dynamics. Science. 2020;368(6492):713–714. 10.1126/science.abb5659 [DOI] [PubMed] [Google Scholar]
  • 27. Bakker J, Damen J, Van Zanten A, Hubben JH. Criteria voor opname en ontslag van intensive care afdelingen in Nederland. Ned Tijdschr Geneesk. 2003;147:110–115. [PubMed] [Google Scholar]
  • 28.RIVM. https://www.rivm.nl/en/novel-coronavirus-covid-19/current-information-about-novel-coronavirus-covid-19, accessed on April 10, 2020; 2020.
  • 29.Statistics Netherlands (CBS). Overledenen; geslacht en leeftijd, per week, https://opendata.cbs.nl/statline/#/CBS/nl/dataset/70895ned/table?ts=1591770380485; Data retrieved on May 12, 2020.
  • 30.Van Dissel JT. Slides “Technische briefing Tweede Kamer, 22 april 2020”. Available from https://www.tweedekamer.nl/debat_en_vergadering/commissievergaderingen/details?id=2020A01701; 2020.
  • 31. The Novel Coronavirus Pneumonia Emergency Response Epidemiology Team. The Epidemiological Characteristics of an Outbreak of 2019 Novel Coronavirus Diseases (COVID-19)—China, 2020. China CDC Weekly. 2020;2:113. 10.46234/ccdcw2020.032 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Indeling van Nederland in 40 COROP-gebieden;. https://www.cbs.nl/-/media/_pdf/2019/04/2019ov12_kaart_40-coropgebieden.pdf.
  • 33.Statistics Netherlands (CBS). Regionale kerncijfers Nederland, https://opendata.cbs.nl/statline/#/CBS/nl/dataset/03759ned/table?ts=1591775235782; Data retrieved on March 20, 2020.
  • 34.Statistics Netherlands (CBS). Banen van werknemers naar woon- en werkregio, https://opendata.cbs.nl/statline/#/CBS/nl/dataset/83628NED/table; Data accessed on March 20, 2020.
  • 35. Del Valle SY, Hyman JM, Hethcote HW, Eubank SG. Mixing patterns between age groups in social networks. Social Networks. 2007;29(4):539–554. 10.1016/j.socnet.2007.04.005 [DOI] [Google Scholar]
  • 36. Wallinga J, Teunis P, Kretzschmar M. Using data on social contacts to estimate age-specific transmission parameters for respiratory-spread infectious agents. American journal of epidemiology. 2006;164(10):936–944. 10.1093/aje/kwj317 [DOI] [PubMed] [Google Scholar]
  • 37. Ross SM. Introduction to Probability Models. 11th ed. Academic Press; 2014. [Google Scholar]
  • 38.Stichting NICE. https://www.stichting-nice.nl/ accessed on April 28, 2020; 2020.
  • 39. Wang D, Hu B, Hu C, Zhu F, Liu X, Zhang J, et al. Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus–infected pneumonia in Wuhan, China. Jama. 2020;323(11):1061–1069. 10.1001/jama.2020.1585 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Pueyo T. https://medium.com/@tomaspueyo/coronavirus-act-today-or-people-will-die-f4d3d9cd99ca, accessed on March 22, 2020; 2020.
  • 41. To KKW, Hung IFN, Ip JD, Chu AWH, Chan WM, Tam AR, et al. COVID-19 re-infection by a phylogenetically distinct SARS-coronavirus-2 strain confirmed by whole genome sequencing. Clinical Infectious Diseases. 2020;. 10.1093/cid/ciaa1275 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Abdallah M Samy

12 Nov 2020

PONE-D-20-29479

Forecasting the spread of SARS-CoV-2 is inherently ambiguous given the current state of virus research

PLOS ONE

Dear Dr. Wagenaar,

Thank you very much for submitting your manuscript "Forecasting the spread of SARS-CoV-2 is inherently ambiguous given the current state of virus research" (PONE-D-20-29479) for consideration at PLOS ONE. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

Please submit your revised manuscript by Dec 27 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Abdallah M. Samy, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This article entitled Forecasting the spread of SARS-CoV-2 is inherently ambiguous given the current state

of virus research presents a statistical model adapted from standard epidemiological models.

Its objective is to show us that existing models should be viewed with caution given the lack of knowledge on the viral characteristics of SARS-CoV-2.

The authors therefore offer us a new model, necessarily based on assumptions (not the same as the previous models) and vary certain parameters of interest to explain to you at the end that various variations on distinct parameters can lead to a fairly high prediction. This is the very principle of probabilities and statistics: an accumulation of errors can nevertheless lead to a fair result. And because of this, it is very difficult to say that one model is more or less fair than another, simply because the data and parameters are often inherent in the structure that creates the model.

In the case of this article there is nothing new about varying parameters and checking the impact they have on the dynamics of SARS-CoV-2. It is very judicious to want to take into account the movements of population according to time but where are the elementary parameters accounting for the functioning of the virus, for example it is completely false to suppose that the contagiousness of an individual of a class of age is strictly that of another individual of the same age group at time t. The contagion of each individual very probably evolves according to his viral load and the viral load is not constant over time.

This article entitled Forecasting the spread of SARS-CoV-2 is inherently ambiguous given the current state

of virus research

presents a statistical model adapted from standard epidemiological models. Its objective is to show us that existing models should be viewed with caution given the lack of knowledge on the viral characteristics of SARS-CoV-2. The authors therefore offer us a new model, necessarily based on assumptions (not the same as the previous models) and vary certain parameters of interest to explain to you at the end that various variations on distinct parameters can lead to a fairly high prediction. fair. This is the very principle of probabilities and statistics: an accumulation of errors can nevertheless lead to a fair result. And because of this, it is very difficult to afford to say that one model is more or less fair than another, simply because the data and parameters are often inherent in the structure that creates the model.

In the case of this article one thing is for sure, there is nothing new about varying parameters and checking the impact they have on the dynamics of SARS-CoV-2. It is very judicious to want to take into account the movements of population according to time but where are the elementary parameters accounting for the functioning of the virus, for example it is completely false to suppose that the contagiousness of an individual of a class of age is strictly that of another individual of the same age group at time t. The contagion of each individual very probably evolves according to his viral load and the viral load is not constant over time.

Likewise, to make the model more complex, it would be wise to take into account an infection reduction coefficient which is due to the implementation of health measures (for example homeworking).

Finally, it seems to me very difficult conceptually to vary certain parameters but to fix others while explaining to us that it is very dangerous to set parameters to evaluate the dynamics of a virus. From this perspective, it would have been undoubtedly much more interesting and coherent to try to present a nonparametric model of evolution of the dynamics of SARS-CoV-2.

Reviewer #2: In the present paper titled “Forecasting the spread of SARS-CoV-2 is inherently ambiguous given the current state of virus research”, the authors addressed the validity of various assumptions using an epidemiological simulation model. Result feedback that multiple scenarios all lead to

realistic numbers of deaths and ICU admissions, two observable and verifiable metrics,

but gave different estimates for the number of infected and immune individuals. To validate the assumption on the spread of virus or disease, the present paper applied a popular classical model called the SEIR model and agent-based simulation which can address the challenges in the SEIR model.

The study was timing and output were interesting and shows its originality. The paper was well structured, the method and materials for assumption and the corresponding results, technical support is sound enough. However, the authors may wish to consider minor revisions as follows to the manuscript:

• The reader may benefit from a definition of the SEIR model and agent-based simulation with short theory.

• In the abstract review, more considerable information should be given to represent the whole contributions of the present manuscript.

• If possible it is suggested to add legend on figures.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Mar 3;16(3):e0245519. doi: 10.1371/journal.pone.0245519.r002

Author response to Decision Letter 0


11 Dec 2020

We would like to thank the Associate Editor and the Referees for the careful consideration of our work and for giving us the opportunity to address the comments on the earlier version of our manuscript. We outline our response to the comments below. We hope our responses address the concerns of both Reviewer 1 and 2 and we remain at the disposal of the referee team for further clarifications.

Before we respond to the comments by Associate Editor and referees, we would like to remark that while preparing the revision we noted an error in the input data used for our simulations. Accidentally, the wrong daily contact patterns were used for one of the age groups. We ran the simulations again with the correct contact patterns and adapted the results in the tables and text of the revised manuscript. The conclusions remain valid: multiple scenarios lead to realistic numbers of verifiable metrics (number of deaths and ICU admissions) but result in varying results for the number of infections and immune individuals.

Response to Associate Editor’s Comments

Comment: Please include the following items when submitting your revised manuscript:

· A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

· A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

· An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

Response: We have included both a rebuttal letter where we address all points raised by the editor and by the reviewers. Further, we have included a marked-up copy of our manuscript that highlights all changes we made to the original version. This copy crosses all parts of the original text that we do no longer use and inserts all new text in red. Further, we have also included our new version without the tracked changes.

Response to reviewer 1’s comments

Comment: This article entitled “Forecasting the spread of SARS-CoV-2 is inherently ambiguous given the current state of virus research” presents a statistical model adapted from standard epidemiological models.

Its objective is to show us that existing models should be viewed with caution given the lack of knowledge on the viral characteristics of SARS-CoV-2.

The authors therefore offer us a new model, necessarily based on assumptions (not the same as the previous models) and vary certain parameters of interest to explain to you at the end that various variations on distinct parameters can lead to a fairly high prediction. This is the very principle of probabilities and statistics: an accumulation of errors can nevertheless lead to a fair result. And because of this, it is very difficult to say that one model is more or less fair than another, simply because the data and parameters are often inherent in the structure that creates the model.

In the case of this article there is nothing new about varying parameters and checking the impact they have on the dynamics of SARS-CoV-2. It is very judicious to want to take into account the movements of population according to time but where are the elementary parameters accounting for the functioning of the virus, for example it is completely false to suppose that the contagiousness of an individual of a class of age is strictly that of another individual of the same age group at time t. The contagion of each individual very probably evolves according to his viral load and the viral load is not constant over time.

Response: Thank you for your comment and insights. We agree that models inherently have statistical properties. Our goal was to provide insight in which combinations of scenarios/parameters seem plausible, which in the end turned out to be notoriously difficult and it leads to widely varying results on currently unknown outcomes (e.g. immunity and number of infected people). Statistics inherently leads to variations, however, these variations seem to be even stronger with SARS-CoV-2 due to the lack of knowledge and the difficulty to track down this lack. This seems to have been forgotten by published simulation studies in the past months.

In case we include even more uncertain parameters, as for example the development of the contagiousness of an infected individual, the number of plausible parameters/scenarios would most likely increase even more. We have tried to explain this more extensively in the new version of the paper in the Introduction section on lines 30-34 on page 2, and in the Conclusion section on lines 416-426 on page 13-14, with the paragraph:

Epidemiological models such as ours are based on many input parameters, most of which are uncertain and only have crude estimates. In our simulation study, we already considered many scenarios by varying only four input parameters: the probability of developing symptoms, the case fatality ratio, the probability of developing immunity, and the virus transmission probability. The best available estimates from literature were used for the other uncertain parameters such as the incubation time, the time until an exposed individual becomes infectious, the probability of ending up in the ICU, and the fraction of symptomatic patients that goes into self-quarantine. Of course, more scenarios can be created by varying some of these uncertain parameters as well, but this will only yield more alternative scenarios that offer a possible explanation of the observed death toll and ICU admissions.

Further, our simulation does not include the infection probability at the individual level, but at an aggregate level where group characteristics are taken into account. As such, the simulation is not able to include contagiousness levels increasing over time. However, it would not change the average outcome of our simulation, because in the end an individual will on average have the same contagiousness level as we included. We have tried to explain this better in the second and third paragraph in the Methods and Materials section on page 3 (lines 68-100).

Comment: Likewise, to make the model more complex, it would be wise to take into account an infection reduction coefficient which is due to the implementation of health measures (for example homeworking).

Response: Thank you for your comment and suggestion of including infection reduction coefficients. First, it would be possible to include a reduction coefficient, but this would only lead to more uncertainty as it is unknown which fraction of the people followed the health measures. Even if the reduction coefficient is known, then there is uncertainty as to how it relates to the total contact reduction of the inhabitants. Including this uncertainty would give us more possible parameter combinations that might be correct. We have included in the new version of the paper in the Introduction on lines 30-34 on page 2 and Conclusion section on lines 416-426 the statement that we already take four levels of uncertainty into account, and including more would only lead to having even more plausible parameter combinations.

Second, it is not the objective of our paper to look at the effects of health measures. More fundamentally, we would like to model the virus behavior as purely as possible in order to apply it later in other settings (e.g. humanitarian related settings as refugee camps and slums). In order to do so, more research is first required in finding the correct parameters. See the following added paragraph in the new version of the paper in the Introduction section on lines 15-21:

The aim of this paper is to get a good view on the spread of the virus and its characteristics as the virus spread would behave without taking any social distancing measures. Background is that, in future research, we are interested how the SARS-CoV-2 virus spreads in low income countries, slums and refugee camps where, for various reasons, measures hardly can be taken or are not effective at all. We therefore use data of the initial phase of the COVID-19 spread in the Netherlands where relatively plenty of good quality data is available.

Comment: Finally, it seems to me very difficult conceptually to vary certain parameters but to fix others while explaining to us that it is very dangerous to set parameters to evaluate the dynamics of a virus. From this perspective, it would have been undoubtedly much more interesting and coherent to try to present a nonparametric model of evolution of the dynamics of SARS-CoV-2.

Response: In general, most epidemiological models assume fixed parameters and then evaluate the spread of a virus. Our paper demonstrates that varying four disease and virus characteristics already leads to a wide variety of results, which can all be correct given the current knowledge of the virus. The best available estimates from literature were used for the other uncertain parameters such as the incubation time, the time until an exposed individual becomes infectious, the probability of ending up in the ICU, and the fraction of symptomatic patients that goes into self-quarantine.

The four parameters for which we vary the values are all very uncertain in literature. In case we would vary the other parameters as well, then that would lead to more variety and thus more possible correct parameters. We have tried to explain this more carefully in the new version of the paper in the Introduction on lines 30-34 and in the Conclusion section on lines 416-426.

Furthermore, non-parametric models need much data and do not make use of the structure of the underlying model. The structure of our problem is known and should therefore be used, only the parameters within the model are uncertain. Therefore, we have decided to use the simulation model as presented in the paper. We have tried to explain this reasoning in the first three paragraphs of the Methods and Materials section (lines 58-100) in the new version of the paper.

Response to reviewer 2’s comments

Comment: In the present paper titled “Forecasting the spread of SARS-CoV-2 is inherently ambiguous given the current state of virus research”, the authors addressed the validity of various assumptions using an epidemiological simulation model. Result feedback that multiple scenarios all lead to realistic numbers of deaths and ICU admissions, two observable and verifiable metrics, but gave different estimates for the number of infected and immune individuals. To validate the assumption on the spread of virus or disease, the present paper applied a popular classical model called the SEIR model and agent-based simulation which can address the challenges in the SEIR model.

The study was timing and output were interesting and shows its originality. The paper was well structured, the method and materials for assumption and the corresponding results, technical support is sound enough. However, the authors may wish to consider minor revisions as follows to the manuscript:

• The reader may benefit from a definition of the SEIR model and agent-based simulation with short theory.

• In the abstract review, more considerable information should be given to represent the whole contributions of the present manuscript.

• If possible it is suggested to add legend on figures.

Response: Thank you for the feedback and the suggestions to improve the paper. We have added a clear definition of a SEIR model and agent-based simulations in the first three paragraphs of the Methods and Materials section on page 3 (lines 58-100) in the new version of the paper:

In the existing literature, many researchers have already pointed out that classical SEIR models and agent-based simulations need to be adapted in order to catch all the important virus characteristics of COVID-19. Although SEIR models have been commonly used to model disease spread and form the basis of many of today’s COVID-19 epidemiological models [11,12], they need to be adapted to differentiate between age groups or geographic locations [1,13-16]. An often used alternative is agent-based simulation [3,17,18], which allows for modeling at the individual level rather than aggregating over the entire population. This is important when modeling COVID-19 [19,20], as it allows for social and travel patterns that depend on age group and location.

The simulation model that we use to validate assumptions on virus spread and disease progression holds the middle between a compartmented SEIR model and an agent-based simulation model. Traditional compartmented SEIR models divide the population into several health stages such as susceptible (healthy individuals, denoted as S, exposed (asymptomatic infected individuals who are not able to spread the virus, E), infected (symptomatic infected individuals, I) and recovered (immune or deceased individuals, R). Over time, the health condition of individuals may progress from one health stage to another. Within the population no distinction is made between individuals based on age or other personal characteristics. Virus parameters are often estimated using differential equations. While the spread of SARS-CoV-2 as well as the disease progression behave differently depending on an individual's age and location (rural or urban), splitting the population into subgroups based on these characteristics complicates the derivation of model parameters and increases the risk of nonidentifiability. In agent-based simulation one simulates the exact daily movements of each individual. As a result agent-based simulation can take many (virus-related) individual characteristics into account and can simulate the daily contacts of each individual in detail.}

While agent-based models provide the level of detail required to model SARS-CoV-2 that is lacking in SEIR models, the computational complexity of agent-based modeling is prohibitive for a population of 17 million people, which is the size of the population of the Netherlands. We therefore propose an intermediate modeling form: a coarse-grained agent-based simulation model which uses the idea of health stages to simulate agents, where agents are characterized only by their age group and geographic region. The model is akin to agent-based simulations [3,17,18] in that distinctive individuals are simulated who commute between their region of residence and region of employment. The main difference is that we do not include social interactions at an individual level but aggregate over groups of people with the same age, region of residence and work region. This allows us to simulate on a large scale, i.e. to simulate all inhabitants of a country or state, while including individual characteristics such as age, region of residence and commute patterns that are highly relevant when modeling the spread of SARS-CoV-2 [7,21]. Compared to an agent-based simulation our model reduces the number of assumptions one needs to make, as we aim to simulate on a country level. Furthermore, it gives us more freedom than when using the compartment model.

Furthermore, we have changed the abstract such that it contains more information about the contributions of our manuscript and we have added legends to the figures and hope this helps to explain the figures in a better way.

Decision Letter 1

Abdallah M Samy

2 Jan 2021

Forecasting the spread of SARS-CoV-2 is inherently ambiguous given the current state of virus research

PONE-D-20-29479R1

Dear Dr. Wagenaar,

We’re pleased to inform you that your manuscript, "Forecasting the spread of SARS-CoV-2 is inherently ambiguous given the current state of virus research" (PONE-D-20-29479R1), has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Abdallah M. Samy, PhD

Academic Editor

PLOS ONE

Acceptance letter

Abdallah M Samy

11 Feb 2021

PONE-D-20-29479R1

Forecasting the spread of SARS-CoV-2 is inherently ambiguous given the current state of virus research

Dear Dr. Wagenaar:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Abdallah M. Samy

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    (ZIP)

    S1 Data. The data required to duplicate the research.

    It contains the data files in the input folder and the source code of the model in the src folder. Furthermore, in order to understand the input data please read the Readme file. Finally, in case you are unable to work with java, there is a .jar file included that runs our code without having to understand java.

    (ZIP)

    Data Availability Statement

    The resulting transition matrices are available on our GitHub page https://github.com/zero-hunger-lab/covid-paper-supplement or via the Supplementary information.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES