Skip to main content
PLOS One logoLink to PLOS One
. 2021 Apr 14;16(4):e0250110. doi: 10.1371/journal.pone.0250110

COVID-19: Short term prediction model using daily incidence data

Hongwei Zhao 1,*, Naveed N Merchant 2, Alyssa McNulty 1, Tiffany A Radcliff 1, Murray J Cote 1, Rebecca S B Fischer 1, Huiyan Sang 2, Marcia G Ory 1
Editor: John Schieffelin3
PMCID: PMC8046206  PMID: 33852642

Abstract

Background

Prediction of the dynamics of new SARS-CoV-2 infections during the current COVID-19 pandemic is critical for public health planning of efficient health care allocation and monitoring the effects of policy interventions. We describe a new approach that forecasts the number of incident cases in the near future given past occurrences using only a small number of assumptions.

Methods

Our approach to forecasting future COVID-19 cases involves 1) modeling the observed incidence cases using a Poisson distribution for the daily incidence number, and a gamma distribution for the series interval; 2) estimating the effective reproduction number assuming its value stays constant during a short time interval; and 3) drawing future incidence cases from their posterior distributions, assuming that the current transmission rate will stay the same, or change by a certain degree.

Results

We apply our method to predicting the number of new COVID-19 cases in a single state in the U.S. and for a subset of counties within the state to demonstrate the utility of this method at varying scales of prediction. Our method produces reasonably accurate results when the effective reproduction number is distributed similarly in the future as in the past. Large deviations from the predicted results can imply that a change in policy or some other factors have occurred that have dramatically altered the disease transmission over time.

Conclusion

We presented a modelling approach that we believe can be easily adopted by others, and immediately useful for local or state planning.

Introduction

Since the World Health Organization declared a pandemic for the novel SARS-CoV-2 2019 virus (COVID-19) on March 11, 2020 [1], the Americas, Europe, South-East Asia and Eastern Mediterranean regions have the most documented cases [2]. Globally, nationally, and at every sub-governmental level, there is a need to monitor the current caseload and project the rate and nature of the spread to guide public health awareness, preparedness, and response. Societies have to deal with many pressing issues such as ensuring adequate supplies of personal protective equipment, considerations about the adequacy of the health care workforce and other health care resources, as well as how to balance restrictive safety guidelines with keeping businesses open and the economy sound. For a novel infectious disease, it is especially important to forecast future cases based on what has happened in the immediate past.

Prediction for the number of cases in a pandemic and implications for health care needs and resources have received a lot of attention in the scientific world [35], government agencies [68], and in media lately [911]. With the plethora of models, there is also growing scrutiny [12] about the accuracy of different models, and an appreciation that model parameters need to be refined based on evolving knowledge about the disease trajectory and factors impacting infection and transmission rates.

The different approaches to modeling and forecasting infectious disease epidemics can be characterized as: 1) mechanistic models based on SEIR (referring to Susceptible, Exposed, Infected, and Recovered states) framework [13]; or its modified version [1416]; 2) time series prediction models such as ARIMA [17], Grey Model [18], and Markov Chain models [19]; and 3) agent type models (i.e. simulating individual activities for a population) [20]. Even within each category, there are different types of approaches attempted. For SEIR models, there are deterministic models involving differential equations, and stochastic models entailing probability distributions. There are models that are designed to make long-term forecasts, and models that are best used for short-term predictions. For this paper, we primarily focus on short-term predictions based on SEIR concepts intended to forecast incidence cases for the next two to three weeks.

The SEIR model is an extension of the classical SIR model [21], and both SEIR and SIR models are foundations for many epidemiological modeling techniques. The model’s strength lies in its simple approximation of a complex process. For example, a typical SIR model specifies that at a certain time t, the population (with size N) can be classified as people who are susceptible S(t), infected I(t), and recovered R(t) according to the following series of differential equations:

dS(t)dt=-β(t)I(t)S(t)N,
dI(t)dt=β(t)I(t)S(t)N-λ(t)I(t),
dR(t)dt=λ(t)I(t),
S(t)+I(t)+R(t)=N,

where β and λ represent the transmission rate and recovery rate, respectively.

In theory, the population size for each state as a time series can be used to estimate the parameters in the model according to the system of equations. In practice, modelers rarely have an accurate count of people at each stage, and the parameters could change with time. The problem has been tackled using different approaches. For example, Zhu and Chen [22] considered a statistical transmission model for early phase of COVID-19 outbreak; Wu et al. [23] incorporated the possibility of people moving out of the compartments due to migration in the modified SEIR model. However, both approaches made the assumption that the transmission rate was constant. In many states within US, or in many counties, we have seen a rapid change of the transmission rate caused by public behavior and public policy, therefore, it is not realistic to use a model with a constant transmission rate over a long period of time.

Although many approaches to predicting infectious disease transmission have appeared in literature, we have not found one method that can be used readily for a day-to-day short-term forecast purpose. Godio et al. [24] used SEIR models for predicting epidemic evolution by means of a stochastic solver, which allows a time-dependent transmission rate. They model the transmission rate as a function of community mobility. This approach is more flexible than the constant transmission rate assumption. However, it still cannot capture other dynamic aspects of the environment that impact the transmission rate, such as masks mandates, and adoption of contact tracing, early testing and isolation. Alternatively, Friston et al. [25] proposed a dynamic causal model framework for COVID-19, where they tried to include every variable that “matters” in the spread of the disease. This model suggested that individuals had four different characteristics: location, infection status, testing results, and clinical status (i.e., how sick they are). Each of these four characteristics contained four different states, and individuals could move from one state to another state over time. The main challenge was that there were many parameters used in the model, and identifying accurate initial estimates of all the parameters is difficult for a novel infectious disease with non-specific symptoms and potentially many asymptomatic cases.

The objective of this paper is to provide a method that can be reliably used to make predictions for the epidemic evolution in the next two to three weeks, based on the observed incidence cases only. Due to the relative small percentage of death in the whole population, we will ignore the death data in our modeling. The motivation for this work originated from pragmatic planning questions posed by local and state officials charged with allocating resources and ensuring population health. Members from the Texas A&M University School of Public Health started to monitor and forecast COVID-19 cases at the beginning of the pandemic, and then used the projected cases to support predictions for hospitalization and related health resource utilization.

Methods

Assuming that we have observed a time series of COVID-19 incidence cases up to a time t, our goal is to make predictions of incidence cases in the next two to three weeks. In an ideal scenario, all data sets would be calibrated to the time of infection (an admitted impossibility). However, publicly available data sets most often reflect the date of reporting, which may be the date of reporting to the local health department, but more often reflects the date of reporting up the chain, such as to the State health department. As such, day-to-day variations of reported incidence cases often reflect not the true variation of the disease infection but reporting capacity. In addition, a large data dump might occur because of attempts to process backlogged data. Therefore, we propose to perform a smoothing average of data (e.g. 3-day weighted average) before performing any analysis. In the event of a big data dump, we also need to make adjustment to the data and distribute the cases over time. These adjustment to public databases would not only improve model handling but also be valuable for our interpretation and application.

Our approach to forecasting future COVID-19 cases involves two main steps. First, we model the observed incidence cases using similar ideas as appeared in Cori et al. [26]. Assuming a Poisson distribution for the daily incidence number, and a gamma distribution for the series interval, we are able to estimate the parameter (i.e. the effective reproduction number Re) in the model. In the forecasting step, we draw future incidence cases from their posterior predictive distributions, assuming that the current Re will stay the same, decrease 5%, or increase 5%. The upper 95% posterior credible intervals for increased Re scenario together with the lower 95% posterior credible intervals (CI) for decreased Re scenario constitute our prediction intervals. The detailed description of our methods can be found in S1 Appendix.

Some basic assumptions are necessary for using our methods. In order to determine the value of the effective reproduction number Re, we made the assumption that Re has a prior gamma distribution with a shape parameter of 1 and a scale parameter of 5, similar to Cori et al. [26]. We also assumed that the serial interval has a discretized gamma distribution [26] with a mean of 3⋅95 and a standard deviation of 4⋅24 [27]. These hyper-parameters are generally fixed in our model and in our projection.

One parameter that we allow to vary is the time interval τ which we use to get reliable estimates of Re. In essence, we assume that Re is constant during this interval [tτ + 1, t] so that we can get a reliable estimate of Re(t) at time t. From our experience, τ = 7 days or τ = 12 days are recommended, the choice of which depends on the incidence numbers (smaller incidence cases require a larger τ) and the actual dynamic change of the transmission rate (a smaller τ can capture the change better). A detailed discussion of the assumptions and parameters used for our model is provided in the “Choosing Model Parameter” section in S1 Appendix.

Application to COVID-19 data sets

We first demonstrate how to use our methods for predicting COVID-19 cases in Texas, a large and diverse state in the US with a population size of approximately 29 million. We utilize data from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. As of November 15, 2020, the total number of reported cases was 1,059,753, corresponding to an attack rate of 38⋅0 per 1,000 people.

We emphasize the importance of understanding how the case reports can be influenced by administrative issues, and the need to adjust our model accordingly. For example, on September 21, 2020 there was a reported 14,129 cases for Harris county due to processing of backlogged data on that day. This artificial spike would influence the estimate of Re, and consequently, the prediction going forward. Therefore, we reassigned those cases from Harris county according to the following rule: We first imputed the number of cases on that day using the average number of cases in the past seven days. Then we evenly spread the extra cases over the previous 31 days including that index day of September 21. The modified series would be treated as the observed series in our subsequent modeling analysis. Another modification we made was to smooth the data series. Due to the high variability of the daily cases, and the fact that there was often a delay in reporting especially during the weekends, we smoothed the data using the following algorithm, similar to Sun et al. [28]:

I(t)=0·3*I(t-1)+0·4*I(t)+0·3*I(t+1),t=2,3,,T-1
I(1)=0·7*I(1)+0·3*I(2),
I(T)=0·7*I(T)+0·3*I(T-1),

where T is the last time point in the data series upon which a forecast is to be made. The smoothed data series were the data we used for generating our prediction models.

As mentioned in the detailed “Methods” section in S1 Appendix, we first used the method of Cori et al. [26] to estimate the reproduction number Re(t) for different time t based on the smoothed incidence data in Texas, with a cut off date of November 15, 2020, and an interval of τ = 7 days. (Results using τ = 12 days are presented in S1S3 Figs). The smoothed data series, the estimated Re(t), and its 95% confidence intervals (CI) are shown in Fig 1.

Fig 1. Texas incidence cases over time (smoothed) and the estimated effective reproduction number Re(t) (95% CI in shaded area) using 7-day intervals.

Fig 1

It is clear from Fig 1 that there were different stages of COVID-19 spread in Texas. Due to the large number of incidence cases, the 95% CI for the effective reproduction number Re are quite narrow. During the month of April, the case counts were kept very low due to a statewide Shelter-in-Place order that was enacted by the Governor. The estimated Re was close to 1⋅0 around mid-April. Beginning May 1, 2020 Texas started phased reopening process, with many restrictions lifted in early June, right after the Memorial Day holiday. The daily incidence cases began to increase dramatically after Memorial Day weekend, and continued throughout June, reaching a peak daily incidence of about 13,000 in early July. During this period, Re gradually increased to a value of 1⋅325. A statewide mask mandate was implemented on July 3, 2020, and a couple of weeks after that, we started to see a downward trend in the incidence cases. The reproduction number slowly decreased to below 1⋅0 towards the end of July and during August. Unfortunately, the trend reversed starting in early September, with cases increasing again and a reproduction number above 1.0. The uptick was possibly due to Labor Day weekend gatherings and widespread reopening of in-person options for schools and colleges for the Fall 2020 semester. The epidemic was then kept under control for a while until Mid-October, when COVID-19 cases started to increase dramatically both statewide and nationwide.

For illustration purposes, we applied our prediction method at four equally spaced time points that were two months apart: April 15, June 15, August 15, and October 15. We plotted three projection lines corresponding to the predicted mean values when the transmission rate (or equivalently the reproduction number Re) stayed the same, increased 5%, or decreased 5%. We also plotted the prediction intervals (shaded areas) based on the upper 95% CI limits for the 5% increasing Re and the lower 95% CI limits for the 5% decreasing Re scenario. The predicted daily cases and cumulative cases, together with their prediction intervals for the next three weeks are shown in Figs 2 and 3 separately.

Fig 2. Texas predicted incidence cases using 7-day intervals.

Fig 2

Three solid lines represent the predicted cases corresponding to current rate of transmission sustained, 5% increase in transmission rate, and 5% decrease in transmission rate. The shaded areas indicate prediction intervals.

Fig 3. Texas predicted cumulative incidence cases using 7-day intervals.

Fig 3

Three solid lines represent the predicted cases corresponding to current rate of transmission sustained, 5% increase in transmission rate, and 5% decrease in transmission rate. The shaded areas indicate prediction intervals.

As expected, our predictions performed differently at different times. On April 15, our forecast assuming constant transmission rate matched the observed data very well. On June 15, when Re was increasing rapidly because of the business reopening process and the Memorial Day holiday weekend, the observed cases fell between our predicted curves assuming the same transmission rate and 5% increase in transmission rate. On August 15, we saw a gradual decrease in transmission rate due to a statewide mask mandate, and the forecast with 5% decrease in transmission rate matched the observed data closely. Finally, on October 15, we started to see an increasing trend again, and the forecast assuming 5% increase in transmission rate worked well.

Secondarily, we chose to test the applicability of our model to a smaller geographic region within Texas. We applied our method to predicting the number of cases for the Brazos Valley (BV), a group of seven counties in Texas (i.e., Brazos, Robertson, Burleson, Madison, Grimes, Leon, and Washington counties), which collectively comprise the Bryan-College Station metropolitan area and neighboring counties. The center is Brazos County, where Texas A&M University is located. This area is approximately 100 miles from both Austin and Houston and has a younger population than Texas as a whole. Several healthcare entities and a public health authority in the BV needed timely and accurate forecasts to support planning for local COVID-19 cases.

The BV incidence cases and the estimated reproduction number Re(t) using 12-day intervals are presented in Fig 4. Due to small incidence cases in BV, the CIs for Re were quite wide, making forecasting for BV more challenging. The trend for BV was influenced by the local context so it did not always follow the trend in Texas. In addition, due to a relative small population size (approximately 229,000), and sudden population change caused by college students’ moving out (in late-March corresponding to the Stay-at-Home order) and then back to the region (in mid-August to correspond with the start of the Fall semester), we saw more variability in the incidence cases for BV. Therefore, we chose to use 12-day intervals for our modeling approach, but we also provided results using 7-day intervals in S4S6 Figs for additional information. All other parameters were the same as appeared in the state model, and we made predictions on the same days as we did for the state model. The predicted daily incidence cases and cumulative incidence cases for BV are shown in Figs 5 and 6 separately.

Fig 4. Brazos Valley incidence cases over time (smoothed) and the estimated effective reproduction number Re(t) (95% CI in shaded area) using 12-day intervals.

Fig 4

Fig 5. Brazos Valley predicted incidence cases using 12-day intervals.

Fig 5

Three solid lines represent the predicted cases corresponding to current rate of transmission sustained, 5% increase in transmission rate, and 5% decrease in transmission rate. The shaded areas indicate prediction intervals.

Fig 6. Brazos Valley predicted cumulative incidence cases using 12-day intervals.

Fig 6

Three solid lines represent the predicted cases corresponding to current rate of transmission sustained, 5% increase in transmission rate, and 5% decrease in transmission rate. The shaded areas indicate prediction intervals.

On April 15, our prediction assuming the same transmission rate sustained agreed well with the observed cases. On June 15, when the transmission rate increased rapidly, the prediction upper bounds followed approximately the observed curve. Our forecast based on past history did not capture the increased case numbers at the end of August when school started, since we had an influx of cases due to thousands of students moving to Brazos county from all over Texas. Starting October 15, although past trend suggested increasing incidence cases, the observed data matched more closely with the prediction lower bounds. Our model and method produced reasonably accurate results when the Re value is distributed similarly in the future as it is in the past. Large deviations from the predicted results can imply that a change in policy or some other factors have occurred that have dramatically altered the Re value over time.

Conclusion

We have proposed a method that generates predictions for the number of COVID-19 infectious disease cases in the future, based on what estimates of Re are like at the current time. The major strength of our approach lies in its simplicity, which makes it easy to implement with a small team of modellers. As such, we have incorporated it as part of a dashboard (https://covid19-modeltrac.shinyapps.io/TX-BV-ModelTrac/#section-tx-forecasts), where it can automatically generate forecasting values every day for a future view of three weeks using publicly-available data. This transparent and straightforward approach means that the method can be easily adopted by others who want to do similar predictions to help inform local or state-wide decision using public data sources. Our predicted case numbers can also be used as data inputs alongside other information for predicting health care utilization and health outcomes such as hospitalizations, intensive care unit (ICU) occupancy and corresponding ventilator use, and anticipated fatalities. These projections should be performed routinely to plan for surges and avoid overwhelming health resources. In Texas for example, hospitals are collectively working together using surge projections to identify and refer patients to available hospital beds [29].

A limitation for any infectious disease prediction model is the complexity inherent in how data are collected. Infectious disease reporting has long been plagued with many challenges. It is important to acknowledge that our model, as many others, relies on detection of infections through testing and reporting. In reality, the journey of a simple data element, from infection to tabulation, has many obstacles and nuances along the way. Some major complexities of the data include: policies about testing algorithms (e.g. which suspect cases are tested); if screenings or surveillance is conducted, which diagnostic test is acceptable or required for reporting; accessibility and availability of testing; administrative issues such as reporting requirements, procedures, and infrastructure. These elements can vary widely by locale and among populations within a locale. Thus, the available data are likely to represent some fraction of infections. Understanding the underlying caveats and how local situations contribute to limitations is essential to evaluating the model output. Even so, the opportunity for practical application of our model to provide insight for assessment, planning, and policy-making remains invaluable.

Similar to the widely-adopted method for estimating Re [26], we made a few assumptions, e.g. the incidence I(t) follows a Poisson distribution, with a mean parameter determined by a renewal function involving a serial function w(s). The serial function is assumed to have a discretized gamma distribution. The reproduction number Re varies with time, but we assume that it is constant over a time interval (7 days, or 12 days) in order to obtain a stable estimate for its posterior distribution. Under these assumptions, we can predict the number of cases that could occur in the following two or three weeks, allowing Re to stay the same, increase 5%, or decrease 5%. The assumption that Re behaves similarly in the future as it does now is a major assumption, and is probably inaccurate if we project far into the future. However, we believe it to be a reasonable approximation of the true process if we want to see what happens in the next couple of weeks from the present.

Because Re is related to many factors, it can change dramatically. It is a function of transmission probability, which means it can be affected by a mask mandate. It is also affected by the average number of contacts one person has, hence, we expect that Re might increase when in-person school resumes. In addition, it depends on how many days on average one person is infectious after becoming infected, which can be reduced by contact tracing and early isolation. The number of people that are susceptible or immune is also changing over time. As more people become infected and then become recovered, the effective Re should decrease over time if other factors stay constant. If we want to make more accurate forecasts, we should allow a future Re to be a function of all these different factors. Another way to think about this is that if we make projections according to current values of Re, then any deviations from the current trend can be attributed to factors not explicit in our model, such as a policy implementation, or behavior changes arising from reactions to current situation.

One contributing factor to Re that can be objectively measured is mobility data. If mobility data could provide insight on how Re may vary, incorporating the motility data in a prediction model can result in better predictions for Re in the future, which in turn will result in better estimates for the number of incidence cases. Finding the trend of Re values in the future using other data sources is a direction of our future research.

In summary, we presented a modelling approach that we believe can be easily adopted by others, and immediately useful for local or state planning. Although many initially downplayed the long-term consequences of COVID-19 [30], it is now clear that new surges are appearing in the US as well as globally [3133], and that the pandemic spread is likely to last for another year or two [3]. Thus, public health and governmental responses will need to be guided by data that pinpoint where, when, and among whom the new cases are occurring. This information can help guide public health messaging as well as the nature and degree of government responses to mandating public health practices or regulating business operations to limit spread. Timely projections regarding case counts are critical to planning for healthcare resources and assuring available care and best possible outcomes for populations facing the uncertainty of a rapidly emerging infectious disease during a pandemic response.

Supporting information

S1 Appendix. Technical details.

Methods for predicting COVID-19 cases and the selection of model parameters.

(PDF)

S1 Fig. Texas incidence cases over time (smoothed) and the estimated effective reproduction number Re(t) (95% CI in shaded area) using 12-day intervals.

(TIF)

S2 Fig. Texas predicted incidence cases using 12-day intervals.

Three solid lines represent the predicted cases corresponding to current rate of transmission sustained, 5% increase in transmission rate, and 5% decrease in transmission rate. The shaded areas indicate prediction intervals.

(TIF)

S3 Fig. Texas predicted cumulative incidence cases using 12-day intervals.

Three solid lines represent the predicted cases corresponding to current rate of transmission sustained, 5% increase in transmission rate, and 5% decrease in transmission rate. The shaded areas indicate prediction intervals.

(TIF)

S4 Fig. Brazos Valley incidence cases over time (smoothed) and the estimated effective reproduction number Re(t) (95% CI in shaded area) using 7-day intervals.

(TIF)

S5 Fig. Brazos Valley predicted incidence cases using 7-day intervals.

Three solid lines represent the predicted cases corresponding to current rate of transmission sustained, 5% increase in transmission rate, and 5% decrease in transmission rate. The shaded areas indicate prediction intervals.

(TIF)

S6 Fig. Brazos Valley predicted cumulative incidence cases using 7-day intervals.

Three solid lines represent the predicted cases corresponding to current rate of transmission sustained, 5% increase in transmission rate, and 5% decrease in transmission rate. The shaded areas indicate prediction intervals.

(TIF)

Acknowledgments

We are appreciative of the inspiration and insight we have gotten from the Texas A&M Emergency Management Advisory Group, and Public Health Modelling Team.

Data Availability

We used data from public available source, namely, COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University https://github.com/CSSEGISandData/COVID-19.

Funding Statement

The author(s) received no specific funding for this work.

References

Decision Letter 0

John Schieffelin

28 Jan 2021

PONE-D-20-36327

COVID-19: Short term prediction model using daily incidence data

PLOS ONE

Dear Dr. Zhao,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Mar 14 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

John Schieffelin, MD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2.Thank you for stating the following in the Acknowledgments Section of your manuscript:

"We thank Texas A&M University administration for internal funding to support this 264

work."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

 "The author(s) received no specific funding for this work."

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The need for an accessible method to estimate and predict SARS-CoV2 incidence, both short- and long-term, is very real, and the authors propose an intriguing option to meet that need. Howerver, they themselves point out that the porposed model is less successful when a variety of parameters shift within the projection interval. Under these circumstances the range encompassed by the +/- 5% change seems, in fact, unacceptabley wide from an operational perspective, and not resonable as suggested by the authors. And while the authors identified a number of chages of status which could be resonble easy to identfiy (and avoid) for a predeiction period, they never mentioned one of the truly problematic elements related to identificaiton of SARS-CoV2 infections (cases), which is the testing itself - not just the mentioned lag in reproting, but actual uptake of testing, and the tremendous variablility that can occur in uptake of diagnostic testing, influenced by supply shortages, population interest in and access to testing, at a local or state level. The unfortunate reality is that diganosed and reported infections with SARS-CoV2 are in fact, some unknown fraction of true infections, which also changes over time. This model actual gives some evidence of that, providing much tighter ranges, aligning more closely with actual case counts in the early periods, with far less precision in the late intervals.

Reviewer #2: In the paper PONE-D-20-36327 "Covid-19, Short term prediction model using daily incidence data", Zhao et al proposed a new approach to forecasts the number of incident cases in the near future using some assumptions. Based on the paper, they reported that the method can produces reasonably results and large deviation from the predicted results can imply that a change in policy or some other factors. The results seem reasonable.

Some similar results have been studied by Jin's group in Fudan(See [CCJL2020],[SZYPCC2020],[P2020]), Jin's model is well suitable for Chinese data. But the scene and data in USA are more complicated. Zhao's work is interesting.

One suggestion is that we may not deal with the original number of incident cases, instead, we may consider to filter or smooth the number of incident cases, for example, 7-day average.

[CCJL2020]Chen, Y., Cheng, J., Jiang, Y. and Liu, K. A time delay

dynamical model for outbreak of 2019-nCoV and the parameter

identification. J. Inverse Ill-Posed Probl., 28(2020), 243–250.

[SZYPCC2020]Shao, N., Zhong, M., Yan, Y., Pan, H., Cheng, J. and Chen, W.

Dynamic models for coronavirus disease 2019 and data

analysis. Math. Methods Appl. Sci., 43(2020), 4943–4949.

[P2020]Hanshuang Pan, Nian Shao, Yue Yan, Xinyue Luo, Shufen Wang, Ling Ye, Jin Cheng and Wenbin Chen,

Multi-chain Fudan-CCDC model for COVID-19-a revisit to Singapore's case,Quantitative Biology, 2020, 8(4): 325–335.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Apr 14;16(4):e0250110. doi: 10.1371/journal.pone.0250110.r002

Author response to Decision Letter 0


11 Feb 2021

We thank the reviewers and the academic editor for valuable comments on our submitted manuscript entitled "COVID-19: Short term prediction model using daily incidence data" (PONE-D-20-36327). We have made changes according to your requests and replied to the questions in the file named "Response to Reviewers.docx". We also uploaded two copies of our manuscript, one was a marked-up copy that highlighted changes made to the original version, and an unmarked version without tracked changes.

Thank you very much for your consideration. Look forward to hearing back from you soon.

Attachment

Submitted filename: Response to Reviewers 11.02.2021.docx

Decision Letter 1

John Schieffelin

31 Mar 2021

COVID-19: Short term prediction model using daily incidence data

PONE-D-20-36327R1

Dear Dr. Zhao,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

John Schieffelin, MD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: In the paper "COVID-19: Short term prediction model using daily incidence data", they describe a new approach that forecasts the number of incident cases, first model the observed incidence cases using a Poisson distribution for the daily incidence number, and a gamma distribution for the series interval, then estimate the effective reproduction number assuming its value stays constant during a short time interval; and finally draw future incidence cases from their posterior distributions.

The method is interesting and new, and the forecast results and explanation seem reasonable.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Acceptance letter

John Schieffelin

5 Apr 2021

PONE-D-20-36327R1

COVID-19: Short term prediction model using daily incidence data

Dear Dr. Zhao:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr, John Schieffelin

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Technical details.

    Methods for predicting COVID-19 cases and the selection of model parameters.

    (PDF)

    S1 Fig. Texas incidence cases over time (smoothed) and the estimated effective reproduction number Re(t) (95% CI in shaded area) using 12-day intervals.

    (TIF)

    S2 Fig. Texas predicted incidence cases using 12-day intervals.

    Three solid lines represent the predicted cases corresponding to current rate of transmission sustained, 5% increase in transmission rate, and 5% decrease in transmission rate. The shaded areas indicate prediction intervals.

    (TIF)

    S3 Fig. Texas predicted cumulative incidence cases using 12-day intervals.

    Three solid lines represent the predicted cases corresponding to current rate of transmission sustained, 5% increase in transmission rate, and 5% decrease in transmission rate. The shaded areas indicate prediction intervals.

    (TIF)

    S4 Fig. Brazos Valley incidence cases over time (smoothed) and the estimated effective reproduction number Re(t) (95% CI in shaded area) using 7-day intervals.

    (TIF)

    S5 Fig. Brazos Valley predicted incidence cases using 7-day intervals.

    Three solid lines represent the predicted cases corresponding to current rate of transmission sustained, 5% increase in transmission rate, and 5% decrease in transmission rate. The shaded areas indicate prediction intervals.

    (TIF)

    S6 Fig. Brazos Valley predicted cumulative incidence cases using 7-day intervals.

    Three solid lines represent the predicted cases corresponding to current rate of transmission sustained, 5% increase in transmission rate, and 5% decrease in transmission rate. The shaded areas indicate prediction intervals.

    (TIF)

    Attachment

    Submitted filename: Response to Reviewers 11.02.2021.docx

    Data Availability Statement

    We used data from public available source, namely, COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University https://github.com/CSSEGISandData/COVID-19.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES