Abstract
While there are many online data dashboards on COVID-19, there are few analytics available to the public and non-epidemiologists to help them gain a deeper insight into the COVID-19 pandemic and evaluate the effectiveness of social intervention measures. To address the issue, this study describes the methods underlying the development of a real-time, data-driven online Epidemic Calculator for tracking COVID-19 growth parameters. From publicly available infection case and death data, the calculator is used to estimate the effective reproduction number, final epidemic size, and death toll. As a case study, we analyzed the results for Singapore during the "Circuit Breaker” period from April 7, 2020 to the end of May 2020. The calculator shows that the stringent measures imposed have an immediate effect of rapidly slowing down the spread of the coronavirus. After about two weeks, the effective reproduction number reduced to about 1.0. Since then, the number has been fluctuating around 1.0 for more than a month.
The COVID-19 Epidemic Calculator is available in the form of an online Google Sheet and the results are presented as Tableau Public dashboards at www.cv19.one. By making the calculator readily accessible online, the public can have a tool to assess the effectiveness of measures to control the pandemic meaningfully.
Keywords: COVID-19, Reproduction number, Doubling time, Epidemic calculator, Singapore
1. Introduction
As countries worldwide take drastic measures to contain the COVID-19 pandemic, people need to understand the effectiveness of such interventions. Making sense of epidemiological data can be challenging, given confusing and overlapping terminology. Raw data and statistics on infection numbers (e.g., Fig. 1) do not directly help answer the following questions: (1) Are social distancing measures working? (2) How much longer does it take to flatten the curve? (3) What will be the final death toll? Data analysts in other disciplines such as social sciences, economics, and management could explore how epidemiological trends impact their own areas.
Fig. 1.
Total confirmed cases in Singapore (D-19 situation report).
This paper describes the methods underlying an online COVID-19 Epidemic Calculator for tracking and estimating COVID-19 growth parameters, including reproduction number, final epidemic size, and death toll. These methods are illustrated using the case example of Singapore. We demonstrate how the calculator can reveal the effect of imposing strict social distancing measures (“Circuit breaker”) from April 7, 2020 that is not apparent from just looking at infection numbers.
While our methodology is similar in certain aspects to several freely available software packages and programming codes for calculating the effective reproduction number (e.g. (Kevin; Cori et al., 2013; Epiforecasts [Internet].; Boelle and Obadia, 2015)), we differ from those work because we implement real-time, data-driven calculations in the widely used Excel spreadsheet, with sub-minute execution time, even for calculating a 4-month, 100-country data set. Furthermore, this calculator is readily available as an online spreadsheet (D-19 Epidemic Calcula) to facilitate sharing and collaboration. The input data for the calculator is obtained from publicly available sources (Our World in Data. Corona; Theracking Projec; Data on the geographic di) and is automatically updated daily.
1.1. Introduction to terminology
Fig. 2 identifies the different overlapping terms used in epidemiology and illustrates the timeline for the various stages of infection. These terms and variables will be used for calculating parameters that can help us understand and monitor the spread of the COVID-19 infection in a country. Exposed is the state at which an individual first becomes infected but is not yet contagious. The latent period is the time from being infected (exposed) to becoming contagious. An infected person can be contagious even before the onset of symptoms. Data suggests that some people could have infected others 1–3 days before they developed symptoms (Wei et al., 2020; World Health Organization, 2020).
Fig. 2.
Timeline of infection stages with typical parameter estimates for COVID-19 in Singapore.
The incubation period is the time from exposed to the onset of symptoms. The mean incubation period for COVID-19 is estimated to be 5 days (Bi et al., 2020; Lauer et al., 2020). The infectious period is the time between becoming contagious to the time of removal or recovery. Hence, it is the difference between the time of removal and the latent period (Tremoved – Tlatent).
In Singapore, the 14-day average time from the onset of symptoms to removal ranges from 1.5 to 6 days after the start of the Circuit breaker on April 7, 2020 (Fig. 3).
Fig. 3.
Average number of days from onset of symptoms to isolation for community unlinked cases in Singapore (D-19 Situation Report).
The serial interval is the time when a secondary infection is generated. For COVID-19 in Singapore, the serial interval between transmission pairs ranges between 3 days and 8 days (Pung et al., 2020). Other researchers have reported serial intervals within the same range (Bi et al., 2020; Du et al., 2020; Li et al., 2020; Nishiura et al., 2020).
1.2. Critical parameters
The rate of infection growth in a population can be estimated using the effective reproduction number. The effective reproduction number is the number of secondary cases directly being infected by a primary case in a population. Social distancing measures should reduce the spread of infection and this would be reflected by a reduction in the effective reproduction number. Hence, monitoring the effective reproduction number over time will allow us to evaluate if social distancing measures or any other interventions are working. We will demonstrate how to estimate the effective reproduction number using a Bayesian approach. We will also show how to derive estimates of dates of actual symptom onset and dates of being exposed which are important for our estimation of the effective reproduction number.
Estimating the time needed to flatten an epidemic curve is an important part of forecasting the scope of an infectious disease outbreak. When new cases are significantly reduced, social distancing restrictions can be relaxed and other less intrusive measures can be put in place. In this study, we will show how logistic and Gompertz models can be used for forecasting the future number of cases and deaths over time using only publicly available data. These numbers will allow us to gauge the vulnerability of the population and quantify the direct health impact of COVID-19.
Since these parameters could help non-epidemiologists understand the spread of COVID-19 in their countries and other countries and regions, the objective of this research study is to develop a readily available online COVID-19 Epidemic Calculator to provide estimates of the critical parameters described here. The interested public can access this online calculator to gain a deeper insight into the COVID-19 pandemic and evaluate the effectiveness of a range of public health and social intervention measures.
2. Methodology
2.1. Method for calculating the effective reproduction number for estimating how fast COVID-19 is spreading in a country
-
Step 1
Deriving symptom onset dates from confirmation dates
The daily number of reported cases is partly dependent on the number of tests conducted, which may be variable due to factors such as testing capacity and the day of week. To account for this variation, we perform a running 7-day average of test cases. Other methods of applying a smoothing filter to the time series may be used if appropriate.
Another issue is the delay between the onset of symptoms and case confirmation (removal or isolation). Case onset dates can be derived if records of onset-to-confirmation dates are available for every individual (e.g. see Fig. 3). Otherwise, case onset dates can be estimated by using the following procedure.
-
i)
For each date, distribute case counts back in time according to a Poisson distribution with a mean of 3 days (symptom onset to removal) as illustrated in Fig. 4.
Fig. 4.
Distributing case counts back in time.
-
ii)
Sum the back distributed case counts for each date to derive the onset curve as shown in Fig. 5.
Fig. 5.
The onset curve estimates the cases during the onset of symptoms.
-
iii)
Distributing reported cases back in time and recreating the onset curve result in a “right-censored” time series. This means that there are onset cases close to the present date that are yet to be reported. We correct this by estimating the percentage of onset cases on Day (t-a) that have not yet been reported by today (Day t). We can use the cumulative distribution function of the Poisson “onset-to-removed” distribution to adjust for the number of onset cases, thus removing right censoring.
| (1) |
Consider an example illustrated in Fig. 6. Three days ago, there were 470 reported onset cases. This represents the fraction of the actual number reported over the next 3 days. This fraction is equal to the value of the cumulative distribution function of our Poisson distribution at Day 3, which is 65%. Hence, the current count of onset on that day represents 65% of the actual total. After adjustment, the actual total is estimated to be (1/0.65) of 470, which is 723. Fig. 7 shows the adjusted onset curve.
Fig. 6.
Adjusting for right-censoring.
Fig. 7.
Onset numbers close to the present date are adjusted for right censoring.
-
Step 2
Deriving infection (exposed) dates from onset dates
A similar procedure as in Step 1 can be applied to the onset counts to derive the infection (exposed) time series. Fig. 8 shows the adjusted exposed time series where the incubation period (from exposed to symptom onset) follows a Poisson distribution with a mean of 5 days.
Fig. 8.
The Adjusted Exposed curve is derived using a Poisson distribution with a mean incubation period of 5 days.
-
Step 3
Estimating the effective reproduction number, R(t)
The basic reproduction number, R0, is the expected number of infections directly generated by one case given that all individuals are equally susceptible. As the infection spreads, the susceptibility of the population decreases. The effective reproduction number, R(t), is related to the basic reproduction number, R0, by , where S(t) is the average susceptibility of the population. R(t) is often used as an indicator of the effectiveness of interventions, such as social distancing measures, to contain the spread of a virus. If R(t) is greater than 1.0, the infection is growing at an exponential rate. If R(t) is at 1.0, the spread is sustained at a linear rate. If R(t) is less than 1.0, the infection is spreading at a slower pace and will eventually die out.
Although R(t) cannot be measured directly, it can be estimated in different ways. We describe one method that can be implemented in a spreadsheet without any programming codes.
2.1.1. Bayesian approach
The Bayesian approach allows us to continuously update our estimate of a set of parameters, Θ, as more data becomes available.
| (2) |
, the prior distribution, represents our prior estimates about the true value of Θ.
is the likelihood distribution. It is also often written as which means the probability of observing the data given Θ. For the method to work, it is necessary to calculate the likelihood distribution for all possible values of Θ.
is the model evidence and it is the same for all possible hypotheses (values of Θ) being considered.
is the posterior distribution and represents our updated estimate of the value of Θ given the observed data.
The main objective of Bayesian inference is to calculate the posterior distribution of our parameters using our prior beliefs updated with our likelihood. From the posterior distribution, we can determine the most likely values of Θ given the observed data. Since we are usually only interested in relative probabilities of different hypotheses, can be left out of the calculation and we write the model form of Bayes’ theorem as
| (3) |
where ∝ means “proportional to”. For estimating Rt, the Bayes' theorem that we use is
| (4) |
where the data, kt, is the daily number of cases, and the parameter, Rt, is the effective reproduction number.
Equation (4) is updated every day by using yesterday's posterior, , to be today's prior . On day two, the equation becomes
| (5) |
So generally,
| (6) |
Assuming a uniform starting prior , this reduces to:
| (7) |
Note that the posterior on any given day is equally influenced by the distant past as much as the recent day. This is fine if we are estimating a static parameter that does not change with time. However, the value of Rt is dynamic and is more closely related to recent values than older ones. To address this issue, we can adopt Systrom's approach (Kevin) of only incorporating the last m days of the likelihood function:
| (8) |
2.1.2. Bettencourt & Ribeiro's likelihood function
To calculate the likelihood function , we first assume that the number of new infections on any given day can be described by a Poisson probability distribution with a mean of λ. The probability of seeing k new cases is
| (9) |
Bettencourt & Ribeiro (Bettencourt and Ribeiro, 2008) has derived an equation relating Rt to λ.
| (10) |
where γ is the reciprocal of the serial interval (see Fig. 2). Fig. 9 shows the variation of λ with Rt for some values of kt-1.
Fig. 9.
Variation of λ with Rt given kt-1.
Equations (9), (10) allow us to reformulate the likelihood function as a Poisson distribution, parameterized by fixing k and varying Rt.
| (11) |
Fig. 10 shows that as k increases, the peak value of the likelihood function increases and the distribution becomes less spread out. This means that as the number of infections increases the confidence of our Rt estimate should improve.
Fig. 10.
Variation of with Rt given k.
In evaluating the posteriors, it is more convenient to use the logarithm of the likelihood function.
| (12) |
To perform the Bayesian update, we can do a sum of the log-likelihoods over the last m days and then exponentiate to get the likelihood. From equations (8), (12),
| (13) |
From the posterior distribution (Fig. 11) we can also obtain the confidence interval for Rt.
Fig. 11.
Variation of posterior with Rt.
2.2. Method for forecasting the final total number of cases and deaths
When the growth rate is slowing down (Rt < 1), we can project the final total cases and death counts by fitting publicly available data to a logistic model. The logistic model is often used to describe the shape of the cumulative epidemic curve (Fig. 12) where the number of infected cases grow exponentially at first, then slows down, and finally flattens to a maximum limit. The final epidemic size can be estimated based on this slowing growth.
Fig. 12.
A logistic function. L = 1, k = 1, x0 = 5.
Some research [e.g. Levitt et al., 2020, Ohnishi et al., 2020, Pérez et al., 2020, Torrealba-Rodriguez et al., 2020) have also suggested that another parametric model that can be used for forecasting COVID-19 case or death count is the Gompertz function, defined as
| (14) |
It is a special case of the generalised logistic function. The final value asymptote of the function is approached more slowly by the curve than the initial value asymptote, unlike the simple logistic function in which both asymptotes are approached by the curve symmetrically. For example, Fig. 13 shows the cumulative death over time for a few countries that clearly illustrate the asymmetry.
Fig. 13.
Cumulative COVID-19 death for Singapore, UK, Brazil and USA.
To find the best curve fit to the data and an estimate for , we use the maximum likelihood method (Ma, 2020). We assume that the number of reported cases, xi, at time, ti, follows the Poisson distribution and has a mean of , where is the calculated number of cases at time, ti.
| (15) |
Then, the log-likelihood function to be maximized is
| (16) |
We choose the parameter values for CF and r that maximize the log-likelihood function. This can be done by using the Solver function in Excel. The parameter CF is estimated over a rolling window of, say 60 days, to obtain a moving update. See Fig. 20.
Fig. 20.
Projected cases for Singapore based on a two-month dataset.
3. Results and discussion
3.1. Evaluating the effectiveness of social distancing measures using effective reproduction number
Fig. 14 shows the most likely values of Rt and the confidence interval over time for Singapore during the Circuit breaker period calculated using the Bayesian method. The serial interval is assumed to be a Gamma distribution with a mean of 7 days and a mode of 4 days (standard deviation = 4.6 days). We can see that Rt changes with time and the confidence interval narrows with more data.
Fig. 14.
Effective reproduction number Rt.
The results generally agree with those calculated using the EpiEstim code (Fig. 15) (Cori et al., 2013; Epiforecasts [Internet].).
Fig. 15.
Results from EpiEstim.
The results clearly show that the Circuit Breaker measures imposed from April 7, 2020 have an immediate effect of rapidly slowing down the spread of COVID-19. We can also see that Rt settled to around 1.0 after about two weeks. Since then, the infection rate has remained sustained for more than a month. Given that dormitory residents make up the majority of the infected individuals, it can be concluded that individuals continue to infect others with a reproductive ratio of approximately 1 to 1 in that setting during the Circuit Breaker period as depicted in Fig. 16.
Fig. 16.
Effective reproduction number Rt. Started to rise again ter the end of the circuit breaker in Singapore.
One problem with the calculation method is that it can only provide a good estimate for the reproduction number up to abou one to two weeks before the current date. This is due to the time lag between infection and confirmation. As we get closer to the present day, the calculated mean value of Rt always tends to 1.
For example, suppose that today is April 10, 2020, and case data is only available up to this date. Fig. 17(a) shows that the calculated Rt values after April 3 do not reflect the true values as shown in Fig. 16.
Fig. 17.
(a) Rt. tends to 1 as it approaches the current date (b) Rt. adjusted by extrapolating next week's cases from this week's data.
Since the calculation cannot provide reliable estimates for current Rt based on real time data, it somewhat limits the metric's usefulness for tracking infection spread. To alleviate this limitation, we do an exponential regression on the latest week's case data and project the trend forward by one week (Fig. 18). The results, shown in Fig. 17(b), give a much better estimate for the current values of Rt.
Fig. 18.
An exponential trend line is used to project new cases forward by one week.
Rt values also reflect the effectiveness of vaccination programmes and the rise of new variants. For example, Fig. 19 shows the Rt history for Israel. Israel launched its COVID-19 vaccination campaign on 20 December 2020. As of June 1, 2021, 60% of the population has had at least one dose. The Delta variant cases started to surge in June 2021 until social restriction measures were reinstated.
Fig. 19.
Effective reproduction number for Israel.
3.2. How much longer does it take to flatten the curve? Forecasting final case and death counts
Fig. 20 shows the projected cases for Singapore calculated according to the method described in 2.2 and using a two-month data set.
Table 1 shows the 3-month forecasts for a few countries including Singapore and how they compare with the projections by the Institute for Health Metrics and Evaluation (IHME) (Institute for Health Metr) at University of Washington Medicine, and the actual death toll. The projections by the IHME are based on more complex analytics and consider factors such as changes in social distancing measures, diagnostic capability, and hospital capacity. Even though we did not directly account for these factors, our forecasts of the total number of deaths were within 20% of the actual numbers. This demonstrates concurrent validity of our model. Assuming current prevailing conditions in the populations, results from the COVID-19 Epidemic Calculator are likely to be realistic estimates. Our simpler model is more accessible to non-epidemiologists and data analysts in other fields who might want to incorporate it in their own analytics.
Table 1.
A comparison of the projected total deaths from the COVID-19 calculator as at July 25, 2020, using data from the last two months, and the Institute for Health Metrics and Evaluation (IHME).
| Country | D0, initial death count on May 20, 2020 | Total death for Nov 1, 2020 by Gompertz function fit | IHME projection for Nov 1, 2020 | Actual |
|---|---|---|---|---|
| Singapore | 22 | 28 | 27 | 28 |
| USA | 97,222 | 186,285 | 219,864 (141,969–284,123) | 240,265 |
| UK | 35,704 | 47,105 | 51,274 (48,025–60,078) | 46,781 |
| Brazil | 18,894 | 134,781 | 177,235 (81,623–191,443) | 160,104 |
4. Conclusion
This paper describes the methods underlying the online COVID-19 Epidemic Calculator for tracking COVID-19 growth parameters. From publicly available data, the calculator is used to estimate the distributions at time of symptom-onset and infection, effective reproduction number, final epidemic size, and death toll for Singapore and other countries.
The calculator and the associated graphs clearly show that the Circuit Breaker measures imposed from April 7, 2020 in Singapore had an immediate effect of rapidly slowing down the spread of the COVID-19. Additionally, the results also reveal that the effective reproduction number has settled to around 1.0 after about two weeks. Since then, it has remained at that level for more than a month. This indicates that the infection rate among the dormitory residents is sustained and not likely to be reduced until this group become less susceptible.
The COVID-19 Epidemic Calculator is available in the form of an online spreadsheet (D-19 Epidemic Calcula) that imports daily infection data from publicly available sources (Our World in Data. Corona; Theracking Projec; Data on the geographic di). The results are presented online as dashboards on Tableau Public (Global Covid19 Reproduction) (Fig. 21). It has the advantage of fast execution time without the need for any specialized software package or programming script. Users can also interact with the models by changing the parameters. Comparing over eighteen months of data with other similar work, our parameter estimates are in good agreement with those estimated using different models and software. By making the COVID-19 Epidemic Calculator readily accessible online, it is hoped that the public and interested analysts from non-epidemiological disciplines have the tool to assess our effort in fighting COVID-19 meaningfully.
Fig. 21.
A visualization of the effective reproduction number for countries on a map. The size of each circle is proportional to the total number of infections. The color of the rings within a circle varies over time from red (R > 0) to white (R = 0) to blue (R < 0), reflecting the rate of growth of the virus.
Disclosure of potential conflicts of interest
The authors declare that they have no conflict of interest.
Ethics approval for research involving human participants
This research study did not involve collection of data from human participants. Ethics approval is not required.
Informed consent
Informed consent was not required since there were no individual participants in this study.
Data and/or code availability
The data used in the calculations are publicly available and the sources are cited in the References section. The codes for the calculator can be downloaded from the website, www.cv19.one.
Authors’ contribution statements
Conceptualization: Fook Fah YAP and Minglee YONG; Methodology: Fook Fah YAP and Minglee YONG; Formal analysis and investigation: Fook Fah YAP; Writing - original draft preparation: Fook Fah YAP; Writing - review and editing: Fook Fah YAP and Minglee YONG.
Handling editor: CYIMS- Shao
Footnotes
Peer review under responsibility of KeAi Communications Co., Ltd.
References
- Bettencourt L.M.A., Ribeiro R.M. Real time bayesian estimation of the epidemic potential of emerging infectious diseases. PLoS One. 2008;3:5. doi: 10.1371/journal.pone.0002185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bi Q. The Lancet Infectious Diseases; 2020. Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: A retrospective cohort study. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boelle P.-Y., Obadia T. R package version 1. 2015. R0: Estimation of R0 and real-time reproduction number from epidemics.https://cran.r-project.org/web/packages/R0/index.html [Internet] [Google Scholar]
- Cori A. A new framework and software to estimate time-varying reproduction numbers during epidemics. American Journal of Epidemiology. 2013;178(9):1505–1512. doi: 10.1093/aje/kwt133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- https://covidtracking.com/api The COVID Tracking Project [Internet]
- COVID-19 situation Report. https://covidsitrep.moh.gov.sg/ Ministry of Health Singapore [Internet]. Available from.
- https://1drv.ms/x/s!AqPZ6X_dB4uSkOFgz-viKTJyxCD2wA?e=yI5HT6 COVID-19 Epidemic Calculator [Internet]
- https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide Data on the geographic distribution of COVID-19 cases worldwide [Internet]
- Du Z. The serial interval of COVID-19 from publicly reported confirmed cases. medRxiv. 2020 doi: 10.3201/eid2606.200357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Epiforecasts Internet https://epiforecasts.io/covid/
- Global Covid19 reproduction number and doubling time tracker. http://www.cv19.one [Internet]
- Institute for Health Metrics and Evaluation (IHME) https://covid19.healthdata.org/ COVID-19 Projections [Internet]
- Kevin S. The metric we need to manage COVID-19. http://systrom.com/blog/the-metric-we-need-to-manage-covid-19/ Rt: the effective reproduction number” [Internet]
- Lauer S.A. 2020. The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application. Annals of internal medicine. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levitt M., Scaiewicz A., Zonta F. 2020. Predicting the trajectory of any COVID19 epidemic from the best Straight line. medRxiv. [DOI] [Google Scholar]
- Li Q. Early transmission dynamics in Wuhan, China, of novel coronavirus–infected pneumonia. New England Journal of Medicine. 2020 doi: 10.1056/NEJMoa2001316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma J. Infectious Disease Modelling; 2020. Estimating epidemic exponential growth rate and basic reproduction number. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nishiura H., Linton N.M., Akhmetzhanov A.R. Serial interval of novel coronavirus (COVID-19) infections. International Journal of Infectious Diseases. 2020 doi: 10.1016/j.ijid.2020.02.060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ohnishi A., Namekawa Y., Fukui T. medRxiv; 2020. Universality in COVID-19 spread in view of the Gompertz function. [Google Scholar]
- Our World in data. https://ourworldindata.org/coronavirus-source-data Coronavirus Source Data [Internet]
- Pérez D., Javier F., Chinarro D., Pino M., Mouhaffel A. Growth forecast of the covid-19 with the gompertz function, case study: Italy, Spain, hubei (China) and South Korea. International Journal of Advanced Engineering Research and Science. 2020;7:67–77. doi: 10.22161/ijaers.77.8. [DOI] [Google Scholar]
- Pung R. Investigation of three clusters of COVID-19 in Singapore: Implications for surveillance and response measures. The Lancet. 2020 doi: 10.1016/S0140-6736(20)30528-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Torrealba-Rodriguez O., Conde-Gutiérrez R.A., Hernández-Javier A.L. 2020. Modeling and prediction of COVID-19 in Mexico applying mathematical and computational models; p. 109946. Chaos, Solitons & Fractals. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wei WE, Li Z, Chiew CJ, Yong SE, et al. Presymptomatic transmission of SARS-CoV-2 — Singapore, January 23–March 16, 2020. Morbidity and Mortality Weekly Report, 1 April 2020/69. [DOI] [PMC free article] [PubMed]
- World Health Organization . World Health Organization; Geneva: 2020. Report of the WHO-China joint mission on coronavirus disease 2019 (COVID-19) 16-24 february 2020.https://www.who.int/docs/default-source/coronaviruse/who-china-joint-mission-on-covid-19-finalreport.pdf [Internet] Available from: [Google Scholar]





















