Abstract
Purpose:
This paper proposes a methodology and a computational tool to study the COVID-19 pandemic throughout the world and to perform a trend analysis to assess its local dynamics.
Methods:
Mathematical functions are employed to describe the number of cases and demises in each region and to predict their final numbers, as well as the dates of maximum daily occurrences and the local stabilization date. The model parameters are calibrated using a computational methodology for numerical optimization. Trend analyses are run, allowing to assess the effects of public policies. Easy to interpret metrics over the quality of the fitted curves are provided. Country-wise data from the European Centre for Disease Prevention and Control (ECDC) concerning the daily number of cases and demises around the world are used, as well as detailed data from Johns Hopkins University and from the Brasil.io project describing individually the occurrences in United States counties and in Brazilian states and cities, respectively. U. S. and Brazil were chosen for a more detailed analysis because they are the current focus of the pandemic.
Results:
Illustrative results for different countries, U. S. counties and Brazilian states and cities are presented and discussed.
Conclusion:
The main contributions of this work lie in (i) a straightforward model of the curves to represent the data, which allows automation of the process without requiring interventions from experts; (ii) an innovative approach for trend analysis, whose results provide important information to support authorities in their decision-making process; and (iii) the developed computational tool, which is freely available and allows the user to quickly update the COVID-19 analyses and forecasts for any country, United States county or Brazilian state or city present in the periodic reports from the authorities.
Keywords: COVID-19, Epidemiology, Mathematical modeling, Trend analysis, Forecast, Numerical optimization, Sequential quadratic programming (SQP)
Graphical abstract
1. Introduction
On December 2019, a series of pneumonia cases of unknown cause emerged in Wuhan, China, with clinical presentations greatly resembling viral pneumonia [1]. The Chinese authorities identified a new type of coronavirus (novel coronavirus, named 2019-nCoV), which was isolated on 7 January 2020 [2]. Coronaviruses are a family of viruses that can cause respiratory, hepatic, and neurological diseases in humans and animals [3].
Initially, these infections were thought to result from zoonotic (animal-to-human) transmission. However, an exponential growth of case incidence, with many cases detected in other parts of the world, showed a strong evidence of human-to-human secondary transmission [4]. On March 2020, the coronavirus disease (COVID-19) was declared a public health emergency of international concern by the World Health Organization [5]. A worrying aspect in relation to the disease is the fact that it is highly contagious, spreading very quickly and causing overcrowding in the health system [6].
Since January 2020, many studies have been done to assess the transmission potential of 2019-nCoV nationally and internationally, as well as to forecast its spread. Quarantine measures have been implemented worldwide, as international travel has helped to spread the virus to other parts of the world [6], [7]. At the time of this writing, 57.8 million people worldwide were infected with COVID-19 and 1.3 million passed away [8].
Many efforts have been done to determine an effective treatment and to develop a vaccine [9], [10], [11]. Currently, the results of many studies suggest that early detection, hand washing, self-isolation and household quarantine are effective to mitigate this pandemic [7].
Similar situations, although in smaller scale, occurred during the Influenza A epidemic in 2009 and during the Middle East Respiratory Syndrome coronavirus (MERS-CoV) epidemic in 2012. The Influenza A virus appeared in April 2009 [12] and caused a pandemic with more than 280,000 deaths worldwide [13]. The MERS epidemic emerged in Saudi Arabia in 2012 [14] and caused thousands of infections in dozens of countries worldwide. This virus, also belonging to the coronavirus family, has a high fatality rate [15]. Under these scenarios, Dugas et al. [16] created a forecasting model for Influenza A and Kim et al. [14] formulated a forecasting model for MERS transmission dynamics and estimated transmission rates, considering several categories of patients and transmission rates. In fact, mathematical models have been widely used to study the transmission dynamics of infectious diseases, enabling the understanding of the disease spread and the optimization of disease control [17].
Forecasting models are used to predict future behavior as a function of past data. This is a widely used method in the implementation of epidemic mathematical models, since it is necessary to know the past behavior of a disease to understand how it will evolve in the future. Accurate forecasts of disease activity could allow for better preparation, such as public health surveillance, development and use of medical countermeasures, and hospital resource management [18].
A similar approach is the concept of trend analysis, which allows predicting future behavior with accuracy, especially in the short run. A trend is a change over time exhibited by a random variable [19]; trend analyses provide direction to a trend from past behavior, allowing predicting future data. For better effectiveness, the predictions should be updated periodically, as soon as new data are available.
The technique of trend analysis is widely used in several areas of science, such as finances [20], [21] and meteorology [19], [22]. In the context of health systems, trend analysis was used by Zhao et al. [23], to analyze malignant mesotheliomas in China, aiming to provide data for its prevention and control; by Soares et al. [24], to predict the testicular cancer mortality in Brazil; by Zahmatkesh et al. [25], to forecast the occurrences of breast cancer in Iran; by Mousavizadeh et al. [26], to forecast multiple sclerosis in a region of Iran; and by Yuan et al. [27], to analyze and predict the cases of type 2 diabetes in East Asia.
Modeling and prediction of the dynamics of the COVID-19 pandemic is a subject of great interest. Therefore, a myriad of papers on this theme have been published over the last months. For this purpose, some research groups extended previous epidemiological models to describe the COVID-19 pandemic: Lin et al. [28] created a conceptual model for the COVID-19 outbreak in Wuhan, China, using components from the 1918 influenza pandemic in London, while Paiva et al. [29] proposed a dynamic model to describe the COVID-19 pandemic, based on a model previously developed for the MERS epidemic. Different modeling approaches have been exploited, such as compartment models [30], time series analysis [31], artificial intelligence [32], [33], and regression-based models [34], [35]. This list is far from being exhaustive. For a detailed survey on different modeling approaches in this context, the reader is referred to review papers such as [36] and [37].
It is important to note that the behavior of the pandemic may vary greatly in the different regions of the world, due to characteristics such as different social habits (higher or lower physical interaction between citizens), capacity of the local health system, different governmental actions, and so on. Therefore, the parameters of a mathematical model need to be tailored to the region where the disease behavior is being studied. Furthermore, even in the same region, the conditions may vary very quickly, in a matter of weeks or even days (for instance, following the decree or release of a lockdown, or the saturation of the available intensive care unit vacancies in the hospitals); thus, the model parameters would need to be updated very often, usually by an expert. However, these analyses might take time and require dedicated work from highly qualified personnel, thus decreasing their availability. It is natural to expect that such analyses are run periodically at the country level, but the same may not be a reality locally at every municipality. Therefore, in this scenario, it is useful to have a computational tool to perform a quick and automatic analysis and forecast of the disease conditions in any region, following the periodic updates published by the authorities. This is the purpose of the present paper.
The methodology proposed here employs mathematical functions to model the behavior of the pandemic. A numeric optimization algorithm is used to calibrate such models, in an automatic process that does not require intervention from experts. An original trend analysis technique is proposed, allowing determining the effects of public policies. Illustrative results at different territorial levels (country, state, and city) are presented and discussed. A computational tool was developed and is available online, allowing to process data from different countries and subnational data from the United States and Brazil.
When compared to other modeling approaches from the literature, the main advantage of our model lies in the automatic calibration process, which allows the analysis in different regions of the world, being especially useful for regions where experts are not available. The automatic analysis is also useful to update the results as soon as new data becomes available. Furthermore, the automation also allows the proposed trend analysis, which would not be feasible manually because it requires an impractically large number of parameter estimations with different amounts of data.
In the remainder of the paper, we first present the fundamental mathematical function adopted here, which is an asymmetric sigmoid, as well as its tuning parameters and the most relevant epidemic characteristics that can be inferred from the curve. The procedure to fit the data through optimization is then discussed. Subsequently, we discuss criteria to define the complexity of the model, i.e., whether a symmetric sigmoid is enough to describe the data adequately or an asymmetric one is necessary and also the number of sigmoids that should be used for fitting a set of data for a locality. Afterwards, we discuss when it can be considered that the convergence to the final value has happened. Statistical tools to evaluate the quality of the fit are presented in the sequence. After that, results are presented for several localities illustrating the properties of the proposal and the usage of the software. Then an in-depth discussion of the chosen examples is carried out and finally conclusions are drawn.
2. Methods
This section describes the methods adopted in this study. Section 2.1 discusses the mathematical foundations employed to formulate the model describing the behavior of the pandemic. Section 2.2 explains how the model parameters are estimated. Section 2.3 extends the formulation to analyze a repeated behavior, characterized by multiple occurrences of the fundamental function. Sections 2.4, 2.5 describe the criteria to evaluate the suitability of the estimated parameters, by quantifying the accuracy of the model output compared to historical data and the convergence properties of the estimations. Sections 2.6, 2.7 explain the methodologies to select the complexity of the model and to quantify the associated uncertainties. Section 2.8 presents a summary. Finally, the computer implementation is described in Section 2.9.
2.1. Mathematical formulation of the curve to describe the data
In the present paper, the fundamental curve that is used to describe the historical data is an asymmetric sigmoid, i.e., letting the independent variable be , then the dependent variable is given as a function [38]:
| (1) |
with the parameters and . is the final number of occurrences; is the day with maximum daily occurrences; is a parameter defining how asymmetric is the function; and is a parameter associated to how fast the convergence of the function to its final value is. The meaning of these four parameters will become clearer in the forthcoming discussion.
In the present work, the independent variable is the time in days, whereas the dependent variable is either the cumulative number of individuals that were positively tested for SARS-CoV-2 or the cumulative number of individuals deceased with the disease as the cause.
Notice that
| (2) |
i.e., the modeling of the cumulative number of cases/demises by (1) implies convergence to a final value . However, the convergence is asymptotic, therefore it is interesting to know when a certain threshold of the final number of infected/deceased has been reached. For that purpose, let a time instant be such that a particular value is reached:
| (3) |
where the parameter . Then, by replacing (1) for in (3), one may solve to find:
| (4) |
Therefore, from (4) one can determine the (finite) instant when a certain proportion of the final number of cases/demises is reached, which is a useful figure to evaluate whether the contamination can be considered over or not. In this paper, the settling date of the contamination is adopted as , corresponding to the day where the number of occurrences reaches 98% of its final value. The settling ratio of 98% is a standard value used in the analysis of dynamic systems [39].
The rate at which the number of infections/demises grows can be calculated by differentiation of (1) with respect to the independent variable , which yields
| (5) |
where
| (6) |
As a matter of fact, the value of (5) in a particular day is an important indicator for healthcare infrastructure decision-making concerning the number of infected individuals, as a higher value indicates that the upcoming period might stress the healthcare infrastructure, whereas a comparatively lower value points that the number of new cases might be accommodated with the existing infrastructure. By analyzing the number of individuals that are cured each day and discharged from the facilities and comparing it with the rate of newly infected individuals, if the first is greater than the latter, than the capacity of the facilities is enough to treat the ill and they will not be endangered by lack of proper treatment.
Differentiating (5) with respect to yields
| (7) |
A sign change in (7) occurs when the term crosses zero, as the remaining terms are all positive for any . Therefore, there is a single inflection in the curve (5) at . On the other hand, since for and for , this point corresponds to the maximum rate, i.e., the daily number of either infected or deceased individuals. Replacing in (1) yields
| (8) |
Notice from (8) that entails , i.e., the sigmoid curve crosses half of the final value at . This is deemed a symmetric sigmoid. For the sake of understanding, consider two other illustrative possible values of :
-
(a)
for , from (8), , that is, the inflection happens at a later stage, when roughly 58% of the final value has been reached;
-
(b)
for , from (8), , in other words, the inflection happens at an earlier stage, when approximately only 44% of the final value has been reached.
It is clear from these examples and from (1) that the value of controls the degree of asymmetry in the sigmoid curve, with representing a symmetric curve about the vertical straight-line. This is illustrated in Fig. 1(a), where (1) is shown for three values of whereas Fig. 1(b) shows (5), i.e., the rate. It is interesting to remark that the value of impacts the symmetry of the derivative, with representing a Gaussian curve, with acceleration and deceleration phases occurring at the same rate. When , the deceleration phase of the sigmoid is slower than the acceleration phase; when , the opposite occurs.
Fig. 1.
(a) Sigmoid curves and (b) their derivatives for different values of the parameter . The remaining parameters are , , and .
In view of their capability of representing processes with asymmetric acceleration and deceleration phases, asymmetric sigmoid curves are interesting to represent the data of a pandemic. Many factors can contribute to the asymmetry between acceleration and deceleration phases besides the very nature of the disease spread, such as the introduction of policies by health authorities in order to slow down the spread, e.g., reduced social contact. Therefore, this extra degree of freedom brought by the asymmetric sigmoid curve is useful to better represent the data. Moreover, the added complexity with regard to a symmetric curve is due only to the necessity of estimating a single additional parameter, namely .
In our context, there are three main sigmoid parameters of interest, which are described in Table 1.
Table 1.
Main sigmoid parameters of interest .
The next subsection presents the algorithm used to estimate the parameters , , , and based on measured data from either the number of newly infected individuals per day or the number of deceased per day.
2.2. Parameter estimation
The parameters , , , and are estimated based on the solution of a constrained optimization problem, in which the Integral Time Square Error (ITSE) [39] is minimized, where the error is the difference between the value of output by (1) and the corresponding data obtained from the authorities at the same day. We consider a time window for for which the data are available at each day. There is a small abuse of notation by restricting the real-valued variable to assume only integer values coinciding with the number of the day,
Let the vector of parameters to be estimated be defined as
| (9) |
where the symbol indicates the transpose of a vector . The optimal value of the vector is given as
| (10) |
where the argument was explicitly included in to emphasize that the parameters may be varied during the optimization process. Note that, for optimization purposes, strict inequalities cannot be implemented, therefore for the constraints and , an arbitrary small positive real number is chosen and the constraints are approximated as and . After the optimization problem is solved to yield , the optimal values of , , , are fixed values used to build the curve.
The function is nonlinear in the parameters , and the cost function exacerbates that further, rendering the optimization problem nonlinear. Moreover, the inequality constraints introduce additional difficulty, rendering the analytical solution of the optimization problem impractical. Therefore, numerical methods must be used.
One class of methods that are suitable for nonlinear constrained optimization is the so-called Sequential Quadratic Programming (SQP) [40], [41], [42]. SQP iteratively approximates the general nonlinear cost function in (10) by a quadratic one, and the constraints by linear ones, which entails a Quadratic Programming (QP) problem. QPs can be solved to global optimality in finite time, therefore each iteration of the SQP method takes finite time. The solution of the underlying QP approximation is then used to build a next iterate, for which another QP is solved, therefore the name Sequential Quadratic Programming. SQP presents good convergence properties, converging quadratically to the optimal solution when the active set does not change [43]. The implementation of SQP that is used in the present work is that of the function fmincon [44], from the Optimization Toolbox ™ of MATLAB®.
In order to take into account the dependence between cases and demises in the same region, we initially estimate the function describing the number of cases and then impose two additional constraints for the function representing the behavior of the demises. Considering that the demises will necessarily occur after the infections, the two additional constraints impose that both the date of maximum daily occurrences and the settling date for the demises must occur after the corresponding dates in the function describing the number of cases.
2.3. Multiple sigmoids
A second wave of spread has not been discarded. On the contrary, researchers argue that lifting the social distance measures might indeed lead to a retake in the infections [45], [46], [47], [48].
In order to describe the occurrence of multiple epidemiological waves, we propose to employ a sum of sigmoids. For this purpose, let be the adopted number of sigmoids. Eq. (1) is then generalized to
| (11) |
where
| (12) |
Similarly, the vector of parameters , originally given by (9), is generalized to a column vector with parameters defined as:
| (13) |
where
| (14) |
With these extended definitions, Eq. (10) can still be used to estimate the value of by considering the inequalities applied to each , , and , .
It is important to establish the number of sigmoids . For this purpose, an evaluation of the number of switches between deceleration and acceleration phases is performed. The rationale behind this assessment is: each sigmoid results in a single acceleration and a single deceleration phases, with a clear switching point between them, as discussed in Section 2.1. Therefore, the number of sigmoids can be estimated by counting the amount of switches from a deceleration to an acceleration phase. However, this counting requires careful consideration, as one is dealing with real noisy data. More so, recall that for identifying acceleration/deceleration the second derivative of the cumulative number of either infected or deceased individuals has to be considered. As it is well known, differentiation is prone to increase the effect of noise in the measurements [39]. Therefore, to mitigate the effect of noise in increasing artificially the amount of switches, a common approach is to consider a deadzone [39] in the difference between the acceleration and deceleration.
Let be the set of switching instants from a deceleration to an acceleration phase. Then, for each , the following logic is used to implement an identification of switches with a deadzone:
| (15) |
where the parameter can be adjusted to provide a compromise between noise and detection sensitivity. In the present work, the value was set to persons/day 2.
Thus, the number of sigmoids is given by the cardinality of , summed with 1.
| (16) |
The value of 1 refers to the first sigmoid.
Recall, from Table 1, that there are three parameters of interest. The final number of occurrences may be obtained as:
| (17) |
On the other hand, when a sum of sigmoids is used, there are no analytical expressions to determine the other two parameters of interest, i.e., the date of maximum number of daily occurrences and the settling date . In this case, a numerical search algorithm has to be used to find each of these parameters.
The optimization problems to determine these parameters can be posed as follows:
| (18) |
| (19) |
These two optimization problems are solved using the Nelder–Mead algorithm [49]. It should be noted that each problem has only one independent variable (time). Therefore, the search algorithm converges very quickly to the desired solution.
The rationale to select the use of one or multiple sigmoids will be explained in Section 2.6, which discusses the complexity of the model.
2.4. Criteria for statistical analysis of the matching between the fitted curve and the data
Two criteria are used to evaluate the degree of fidelity of the fitted curves to the data. The first is the so-called Root Mean Square Error (RMSE), defined as:
| (20) |
| (21) |
From (20) the name of RMSE becomes clear, as it involves the square root of the mean of the squared error (MSE). Notice that, in (21), the values of the curve with the optimal parameters are used to calculate the error between the data and the value returned by the fitted curve. Moreover, the term reflects the number of terms in the summation, as the index starts at and ends at . The RMSE is used in statistical analysis to measure compactly the degree of fidelity between the fitted curve and the data. The lower the value of the RMSE, the better the fitted curve matches the data [50].
In this paper, a normalized version of the RMSE is used, obtained as:
| (22) |
where is the final number of occurrences, as defined in (1), (17) for one and multiple sigmoids, respectively. This normalization is adopted to allow a fair comparison of the RMSE of different curves.
A second criterion to determine the quality of the representation of the data by the fitted curve generally applied in statistics is the squared correlation coefficient, which varies between and , with the latter meaning that there exists a perfect linear functional relationship between the data and the fitted curve points, whereas the first means the opposite. First, let us define the covariance of the data as
| (23) |
where and are the mean values of and , respectively, i.e.
| (24) |
in which the symbol can be replaced by either of and , yielding and , respectively. Similarly, the variances of and are
| (25) |
| (26) |
The squared correlation coefficient can then be determined from (23)–(26) as:
| (27) |
2.5. Criteria for assessment of convergence of the sigmoid towards the final value
Additional criteria are defined to evaluate whether the data are enough to allow the convergence of the estimated values of the parameters . This is carried out by fitting the sigmoid curves to the data for each possible value of . Thus, instead of using all available data as in (10), windows of varying length are used; the minimum length of a window is adopted as 10 to ensure a minimum amount of data to calibrate the curve. Therefore, the sigmoid parameters are estimated within different windows as
| (28) |
where , depending on the number of sigmoids.
The main parameters in Table 1 are then determined from as follows:
-
•
For a single sigmoid, and are directly extracted from in view of (9), whereas is calculated by (4) employing , and extracted from considering (9).
-
•
For multiple sigmoids, (17)–(19) are used to determine , and .
Then, the relative variation of the estimated values of these parameters is calculated for each time window and multiplied over the time window, composing indices to evaluate if the data are enough to asseverate the suitability of the sigmoid that was fitted. These indices are defined as
| (29) |
where the symbol represents one of the parameters of interest, namely, , , or , for a time window up to days of data. It is clear that, if the data are enough and a suitable set of parameters is found, then each of the terms in the product in (28) approaches one. Therefore, the closer the value is to one, the better the fit. Moreover, the “” in (28) ensures that each term in the product is less than or equal to one, from which it follows that . Analyzing for different values of enables the conclusion of whether the convergence has occurred or not within variable window sizes. We adopt windows of size 7, 14 and 21 days, in order to verify the stability of the predictions over the last one, two and three weeks.
2.6. Selecting the complexity of the model
From the previous discussion, it is possible to choose among different curve types (symmetric or asymmetric) and numbers (single or multiple sigmoids). This plays an important role both in the accuracy of the fit and in the complexity of the models (as per the different amounts of parameters to be estimated with each choice).
It should be noted that the choice of a more complex model without a significant increase in the accuracy may lead to the problem of model overfitting, that is, an exaggeration while fitting of the training data that may compromise the generalization of the model predictions [51]. In order to avoid this problem, this paper employs a generalized cross validation (GCV) approach.
The generalized cross validation index (GCVI) is defined as [52]:
| (30) |
where MSE is the mean squared error defined in (21), is the number of points used to calibrate the model and is the number of free parameters. Each asymmetric sigmoid contains four free parameters (, , , ), whereas a symmetric sigmoid contains only three (, , ) — note that, in a symmetric sigmoid, is constrained to 1.
A more complex model will generally lead to a lower MSE, but also to a higher value of . The choice of the model with the minimum GCVI value allows a compromise between accuracy and complexity [52]. With this choice, a more substantial gain in the accuracy of the fit has to be obtained to justify a more complex curve.
Particularly for cases of regions where the contagion is in its early stage, there are not enough data to observe a deceleration phase. Therefore, in this case the data are insufficient to support estimation of the asymmetric curves. In these situations, the symmetric curves can be used in the fitting and an automated decision of whether to present results with a symmetric or an asymmetric curve is required. This decision is taken by comparing the GCVI values of the symmetric and asymmetric curves and selecting the one with the lower GCVI value.
Similarly, given the number of sigmoids, two fits are performed, using either one or sigmoids, and the corresponding GCVI values are calculated. The model with lower GCVI is then selected.
2.7. Uncertainty quantification
In order to quantify the uncertainty associated to the predictions, the behavior of the model in the last month is analyzed. For this purpose, the software follows the methodology explained in Section 2.6 to calculate the values of , for , that is, the final number of occurrences predicted with data available on the last day of the analysis and on 10, 20 and 30 days prior to this date.
The ratio is defined as:
| (31) |
and the overall uncertainty is described as:
| (32) |
The curves representing the minimum and maximum predicted number of occurrences are sigmoids whose parameters and are:
| (33) |
| (34) |
The remaining parameters () are the same of the nominal sigmoid.
Such sigmoids are considered only in the future, that is, for .
2.8. Summary
The methodology proposed in the current paper is summarized by the flowchart presented in Fig. 2.
Fig. 2.
Flowchart summarizing the methodology proposed in the current paper.
2.9. Implementation
The computer program described in this paper was developed using MATLAB® 2020a, with the Optimization Toolbox ™ and the MATLAB Compiler ™.
The program uses as data source reports published in spreadsheet format in the websites of the European Centre for Disease Prevention and Control (ECDC) [53], of Johns Hopkins University [54] and of the Brasil.io project [55].
The ECDC reports contain country-wise data of the countries in the world, while the reports of Johns Hopkins University and of the Brasil.io project presents data of United States counties and of Brazilian states and cities, respectively.
The data inform the number of newly infected and deceased people on each date. These numbers are informed separately for each region, allowing to perform an independent analysis for each of them.
3. Results
3.1. The graphical user interface
The computer program may be downloaded from the following link, where the data files updated until 12-Aug-2020 are also available.
The folder contains a “readme” file, which explains the main features and the preliminary steps to use the program. We emphasize that the program may be installed and run directly from the operating system, independently of the user possessing a licensed MATLAB® installation. Should the user have MATLAB® and the required packages installed, he/she may run directly a different file from the package without any installation.
A Graphical User Interface (GUI), illustrated in Fig. 3, Fig. 4, will appear. These figures present the main screen of the GUI with data from the European Centre for Disease Prevention and Control and from Johns Hopkins University, respectively. A zoom was applied to these figures to allow for a better reading of their contents; that is the reason why the names of some U.S. counties appear truncated and why the predictions in the bottom of the figures appear incomplete.
Fig. 3.
Main screen of the Graphical User Interface (GUI), with worldwide data from the European Centre for Disease Prevention and Control (ECDC). Note the list of countries on the left side.
Fig. 4.
Main screen of the Graphical User Interface (GUI), with U.S. data from Johns Hopkins University. Note the list of counties on the left side and the search for “Illinois” on the upper left corner.
When running the GUI for the first time, the user is advised to initially select the option “File: Download New Data File” of the main menu, as described in further detail below.
The GUI contains a main menu with the following options:
• “File: Load Data File” — This option is used to load data from a spreadsheet in the standard formats defined by the ECDC, by Johns Hopkins University and by the Brasil.io project. Upon first reading, the spreadsheet will be converted to a MATLAB® data file with extension .mat, in order to speed up the following readings. When the interface is opened, the last data file is automatically reloaded.
• “File: Last Data Files” — This option is used to reload one of the last data files, as illustrated in Fig. 5.
Fig. 5.
“File: Last Data Files” option from the main menu.
• “File: Download New Data File” — This option shows the menu presented in Fig. 6, where the user may select one of the following websites: ECDC, Johns Hopkins University or Brasil.io project. The user may choose to download the data file directly or to access one of these sites, using the default web browser. It is easier to choose the automatic download; however, we opt to also provide an option to access the websites as an acknowledgment of the work performed by the people responsible for them.
Fig. 6.
Options to download new data files and to access the corresponding websites.
• “File: Quit” — This is a standard option to close the interface.
• “Analysis: Run Trend Analysis” — This option runs the trend analysis, as described in the previous section, and presents its results in an external figure. Examples of results of this analysis are presented in a following subsection.
• “Figures: Export Figure” — This option is used to export the graphs of the main screen to a new figure, in order to facilitate its edition and copy to external software.
• “Figures: Close all Figures” — This option closes all external figures.
• “About: Info” — This option shows updated information about the interface. It also contains an acknowledgment of the sites used as data sources.
On the left of the main screen (Fig. 3, Fig. 4), there is a list of all available regions, which will be henceforth called the region list. In this list, when analyzing data from the ECDC or from Johns Hopkins University, the user can select the name of the desired country or of the desired U.S. county (in English); when analyzing data from the Brasil.io project, the user can select the name of the Brazilian states and cities (in Portuguese). Brazilian states are identified by their two-letter acronym. The names of the cities are presented without accents; for instance, the cities of “São Paulo”, “Santa Bárbara d’Oeste” and “Santa Fé” are identified as “Sao Paulo”, “Santa Barbara d’Oeste” and “Santa Fe”, respectively.
Above the region list, there is an edit field where the user can type the name of a region to look for on the list. The user can select the region name in full or in part, and may also employ regular expressions. Furthermore, a vertical bar “ ” can be used representing the “or” operator, to perform a search for more than one region; for instance: fran germ italy will restrict the countries in the list to France, Germany and Italy. An empty string is used in the search bar to restore the complete list of regions. Note the search for “Illinois” in Fig. 4.
Below the region list, there are three options: “Show predictions”, “Show uncertainties” and “Show dates”. “Show predictions” is used to enable or disable the mathematical modeling (if disabled, only historical data will be shown). “Show uncertainties” is used to calculate and present the uncertainty cone. If “Show dates” is disabled, then sequential numbers are shown in the graphs’ axes, instead of dates.
In the lower left corner of the GUI, there is an edit field where the user can specify the number of days for testing. For instance, if the user specifies a value of 7 days, then the model is calibrated with all data available until one week before the data acquisition, and the remaining days are used to test the model, allowing a comparison between the predictions of the model and the observed data.
The main screen presented in Fig. 3, Fig. 4 contains four graphs, representing the accumulated (top) and daily (bottom) number of cases (left) and demises (right). Each graph contains historical data and theoretical curves representing them. Observed data are presented using either circles (accumulated values) or bars (daily values). The model output is represented by the continuous blue line.
Below the graphs, the following predictions are presented, for either cases or demises: final number, date of maximum daily occurrences and settling date (as defined in Table 1). Furthermore, the equation of the best sigmoid (or set of sigmoids in case more than one wave is identified) matching the accumulated data is presented, as well as the indices RMSE and , defined in the previous section. The predictions are presented in editable fields, so that the user can copy their texts and paste them in external software. An example of such information is presented in Table 2.
Table 2.
Example of the information presented at the bottom of the main screen.
| Canada | Canada |
| ———— | ———— |
| Predictions: | Predictions: |
| Final number of cases: 122407 | Final number of demises: 9030 |
| Date of maximum daily cases: 23-Apr-2020 | Date of maximum daily demises: 02-May-2020 |
| Settling date (cases): 11-Aug-2020 | Settling date (demises): 18-July-2020 |
| ———— | ———— |
| Number of accumulated cases on 12-Aug-2020: 120406 | Number of accumulated demises on 12-Aug-2020: 8991 |
| ———— | ———— |
| Prediction with data from 12-Mar-2020 to 12-Aug-2020 | Prediction with data from 12-Mar-2020 to 12-Aug-2020 |
| Curve: f(t)=(109370 / (1 + 3.56e−02 exp(-((t-42.34)/23.24)))^(1/3.56e−02)) + (13037 / (1 + exp(-(t-137.34)/7.67))) | Curve: f(t)=(9030 / (1 + 5.93e−02 exp(-((t-51.30)/19.77)))^(1/5.93e−02)) |
| Normalized RMSE (cases): 4.904e−03 | Normalized RMSE (demises): 5.499e−03 |
| R^2 (cases): 0.9998 | R^2 (demises): 0.9998 |
3.2. Illustrative results — time series
In order to illustrate the use of the tool to perform predictions, Fig. 7, Fig. 8, Fig. 9 show the model results for the country of Bolivia, the U.S. county of Los Angeles and the Brazilian capital city of Brasilia, respectively. Data updated on 12-Aug-2020 were used. The number of testing days was set to 21, meaning that the model was calibrated with data until 21-July-2020 and the following three weeks were used to compare the predicted and observed values.
Fig. 7.
Graphs of cases and demises for the country of Bolivia. The figure shows accumulated (top) and daily (bottom) occurrences.
Fig. 8.
Graphs of cases and demises for the U.S. county of Los Angeles. The figure shows accumulated (top) and daily (bottom) occurrences.
Fig. 9.
Graphs of cases and demises for the Brazilian capital city of Brasilia. The figure shows accumulated (top) and daily (bottom) occurrences.
3.3. Illustrative results — trend analysis
As previously mentioned, the trend analysis is run when the user selects the corresponding option in the main menu. Examples of figures resulting from such analysis are presented in Fig. 10, Fig. 11, Fig. 12, which correspond to the Brazilian city of São Paulo (SP), the US county of Cook, Illinois, and the Brazilian state of São Paulo, respectively. Each of these figures contains three subfigures, showing the predicted value of the three parameters of interest described in Table 1.
Fig. 10.
Trend analysis results for the Brazilian city of São Paulo (SP). The abscissa indicates the date of the estimation.
Fig. 11.
Trend analysis results for the U.S. county of Cook, Illinois. The abscissa indicates the date of the estimation.
Fig. 12.
Trend analysis results for the Brazilian state of SP (São Paulo). The abscissa indicates the date of the estimation.
The abscissa of the graphs indicates the date of the estimation, meaning that all data available until that date were used to estimate the value of the parameter under study. It can be seen that, as expected, the values of the estimated parameters vary with the amount of data used to estimate them.
On the title of each subfigure, the values of and are presented, indicating how stable each prediction is, considering the last one, two and three weeks, respectively. A value of closer to one indicates a more stable prediction.
4. Discussion
Fig. 7, Fig. 8, Fig. 9 indicate that the model represents well the training data and that the observed accumulated values (subfigures (a) and (b)) follow closely the values predicted by the model. In the first week ahead, the observed results are very close to the predicted ones. In the following weeks, the results are still close, but the observed values start to drift away from the model outputs, although inside the established tolerance. In fact, forecasts are expected to diminish in accuracy over time. Nevertheless, as will be discussed below, our methodology presents parameters to assess the quality of the predictions and to analyze their trend.
When analyzing the daily experimental curves (subfigures (c) and (d)), it can be seen that there are high amplitude fluctuations, which may be ascribed to the nature of the observed data and may be associated to non-uniform delays in the official notifications of contaminations and deceases. These fluctuations in amplitude are similar to the sensor noise observed when analyzing physical data. In the study of dynamical systems, it is known that integral operations are robust to the presence of noise. By analogy, since our methodology uses the accumulated data (and not the daily values) to estimate the model parameters, it can be concluded that it is less sensitive to daily fluctuations in the data.
The proposed model is able to identify and represent as many epidemiological waves as necessary. For instance, the two peaks observed in the model representation of the number of daily cases in Fig. 4 indicate a clear occurrence of two epidemiological waves in the U.S. county of Cook, Illinois — the daily cases were decreasing until the second week of June, and then started to consistently increase again. On the other hand, only one wave is observed in the Brazilian city of Brasilia (Fig. 9(c)). To the best of the authors’ knowledge, no more than two epidemiological waves have been reported in any region yet. However, it may still happen, especially if there are frequent changes in the public policies, imposing and relieving containment actions. The model is ready to represent this behavior.
Fig. 10, Fig. 11, Fig. 12, with the results of the trend analysis, indicate how the prediction of each parameter of interest varies with time.
Fig. 10(a) represents the final number of predicted demises in the city of São Paulo (SP). It can be seen that there are oscillations in the predictions until May 5th. These oscillations result from the inclusion of new data and are expected to occur when the pandemic is at an early stage. From May 5th to June 12th, there is a clear increasing tendency in the number of demises. On June 13th, the prediction stabilizes around approximately 15000 demises. Finally, on July 2nd, the number of demises reaches approximately 12600 and is stabilized around this value ever since. Likewise, Figs. 10-(b) and (c) indicate a stable prediction of the date of maximum daily demises and of the settling date since July 2nd.
Similarly, Fig. 11-(a) indicates that the dynamics of the pandemic in the U.S. county of Cook, Illinois, followed a similar pattern. There were oscillations until the end of April, followed by an increasing trend until May 17th, a slightly decreasing tendency and finally an stabilized predicted value of approximately 5000 demises since June 17th.
On the other hand, Fig. 12 indicates that the pandemic is not yet stabilized in the Brazilian state of SP (São Paulo). An increasing tendency can be seen over the last weeks. For instance, Fig. 12-(a) shows that the predicted number of demises was approximately 37000 on July 14th and changed to 44000 on August 11th, indicating an increase of approximately 20% in 28 days.
The stability of such predictions over the last one, two and three weeks may be verified by the values of and shown in the title of each figure. It can be seen that these values are close to one, indicating stabilized or near-to-stabilization predictions.
The values of are intended to represent the convergence of the estimations. As an additional feature, they may also be used as a measurement of the quality of the prediction — higher values of indicate that the pandemic has been following the same predicted behavior over the last weeks. For instance, when analyzing the values of presented in Figs. 10-(a) and 12-(a), it can be seen that such values are and for the city and for the state of São Paulo, respectively. One may infer from these numbers that the predictions obtained in the last three weeks are more stable in the city of São Paulo than in the state with the same name. This is the same conclusion that was achieved by analyzing the curves, as described in the previous paragraphs.
It should be emphasized that the values in Figs. 10-(a), 11-(a) and 12-(a) do not refer to the number of demises on the day of the analysis, but rather to the predicted final number of demises, estimated on the basis of all data available until that date. This is an innovative approach for trend analysis in this context and, to the best of the authors’ knowledge, has not been proposed before.
Additionally, the same analyses presented here for the number of demises can be run for the number of infected people.
Typical trends observed in this kind of analysis are (a) oscillations, (b) increasing values and (c) stabilized values. High-amplitude oscillations may occur in the beginning of the pandemic and do not allow to reach any conclusion; however, they usually disappear after the first few weeks. Increasing values indicate a need of more compelling action by the authorities, while stabilized values indicate that the pandemic is under control.
The stabilized results for the city of São Paulo and for the county of Cook, Illinois, allow to conclude that the actions of the local governments to control the pandemic are taking effect. It is of public interest to determine how the disease will spread in each city after the restriction measures are alleviated. For this purpose, the trend analysis should be run again. Should a new increasing tendency be observed, the authorities would be advised to reinstate some containment measures.
It is important to point out that these results, although helpful, should be validated by medical experts and not be considered alone when deciding public policies.
The trend analysis may be run for a country, a state, a county or a city. It provides more useful information when it is run for smaller administrative regions such as a county or a city, because it allows supporting decision by local authorities based on specific data of the region under consideration.
A limitation of the proposed approach is that it is not adequate to analyze the pandemic in very small cities or counties, because the number of infections and demises is usually very low, not allowing a good fitting by the mathematical model proposed here. However, for medium- and large-sized cities or counties, informative results are expected, as the ones presented here for the U.S. counties of Cook, Illinois and Los Angeles, California and for the Brazilian cities of Brasilia (DF) and São Paulo (SP).
Moreover, caution must be taken in using the forecast capability, especially for longer time intervals. For instance, the occurrence of a second wave can be detected with our proposed method only once enough data is available, i.e., forecasting a second wave is not possible. After detection, enough data must be fed to perform the fitting to achieve a trustworthy estimation of the magnitude of this eventual new wave, similarly to the analysis in the beginning of a first wave that we showed. Therefore, we do not advise the usage of the tool to support claims that the pandemic is over. It must remain as an auxiliary tool to assess the number of infected/deceased individuals in a short period after the last fit, as shown in our results. We also emphasize that updating the model periodically is recommended.
The results presented here are illustrative and correspond to the scenario on the date when the data were acquired, that is, on 12-Aug-2020. These analyses should always employ updated data to increase their reliability. Therefore, the authors recommend these studies to be repeated periodically, at least on a weekly basis. The developed computer program allows to easily perform this task.
5. Conclusion
This paper proposed a methodology and a computational tool to forecast the COVID-19 pandemic throughout the world, providing useful resources for health-care authorities. A user-friendly Graphical User Interface (GUI) in MATLAB® was developed and can be downloaded online for free use. An innovative approach for trend analysis was presented.
Resources in the computational tool allow to quickly run analyses for the desired regions. Additional options allow to access the official website of the European Centre of Disease Prevention and Control, of Johns Hopkins University and of the Brasil.io project, in order to download new data as soon as they are published online. To this date, these institutions have been updating their reports on a daily basis.
The analyses run by the program are intended only as an aid and the results should be interpreted with care. They do not replace a careful analysis by experts. Nevertheless, such results may be a very useful tool to assist the authorities in their decision-making process.
The proposed program is in continuous development and future added features will be published and described in the project webpage. The authors would appreciate any feedback and suggestions to improve the computational tool.
The program, in its current version, is able to process detailed information about U.S. counties and about Brazilian states and cities. These two countries were chosen because they have continental dimensions and are currently the focus of the COVID-19 pandemic. Nevertheless, the same resource could be extended to other countries. For this purpose, the main requirement would be to write a code to read other country data files and convert them to the format recognized by the program, which is quite simple.
Future works can employ the same methodology and adapt the computer tool to describe the dynamics of other epidemics around the world. In the recent past, no pandemic was as severe as the COVID-19, but there were occurrences of other diseases such as Influenza A and MERS-CoV. Should a similar epidemic occur again, the computer program described here would be a resourceful tool.
CRediT authorship contribution statement
Mohallem Paiva: Conceptualization, Data curation, Methodology, Project administration, Software, Validation, Formal analysis, Investigation, Writing, Visualization, Supervision.
Rubens Junqueira Magalhães Afonso: Conceptualization, Data curation, Methodology, Validation, Formal analysis, Investigation, Writing, Visualization.
Fabiana Mara Scarpelli de Lima Alvarenga Caldeira: Formal analysis, Writing.
Ester de Andrade Velasquez: Formal analysis, Writing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors acknowledge the European Centre for Disease Prevention and Control (ECDC), Johns Hopkins University and the Brasil.io project for making the COVID-19 data publicly available and for allowing its use for research purposes.
Rubens Afonso acknowledges the support of CAPES, Brazil (fellowship proc. #88881.145490/2017-01) and the Federal Ministry for Education and Research of Germany through the Alexander von Humboldt Foundation, Germany.
Henrique Paiva acknowledges the support of the Sao Paulo Research Foundation FAPESP, Brazil .
References
- 1.Huang C., Wang Y., Li X., et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2019;395(10223):497–506. doi: 10.1016/S0140-6736(20)30183-5. (published correction appears in Lancet. 30 January 2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.World Health Organization (WHO) C. 2020. Novel coronavirus – China. http://www.who.int/csr/don/12-January-2020-novel-coronavirus-china/en/. (Accessed 09 May 2020) [Google Scholar]
- 3.Geng H.Y., Tan W.J. A novel human coronavirus: Middle East respiratory syndrome human coronavirus. Sci. China Life Sci. 2013;56(8):683–687. doi: 10.1007/s11427-013-4519-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Linton N.M., Kobayashi T., Yang Y., et al. Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data. J. Clin. Med. 2020;9(2):538. doi: 10.3390/jcm9020538. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.World Health Organization (WHO) N.M. 2020. Coronavirus disease (covid-19). Situation report – 51. http://www.who.int/docs/default-source/coronaviruse/situation-reports. (Accessed 09 May 2020) [Google Scholar]
- 6.Wu J.T., Leung K., Leung G.M. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. Lancet. 2019;395(10225):689–697. doi: 10.1097/01.ogx.0000688032.41075.a8. (published correction appears in Lancet. 4 February 2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Chinazzi M., Davis J.T., Ajelli M., et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science. 2019;368(6489):395–400. doi: 10.1126/science.aba9757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.World Health Organization (WHO) M. 2020. COVID-19 weekly epidemiological update - 24 November 2020. https://www.who.int/publications/m/item/weekly-epidemiological-update---24-november-2020. (Accessed 30 November 2020) [Google Scholar]
- 9.Chen L., Xiong J., Bao L., Shi Y. Convalescent plasma as a potential therapy for COVID-19. Lancet Infect Dis. 2020;20(4):398–400. doi: 10.1016/s1473-3099(20)30141-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Guastalegname M., Vallone A. Could chloroquine /hydroxychloroquine be harmful in coronavirus disease 2019 (COVID-19) treatment? Clin. Infect. Dis. 2019:321. doi: 10.1093/cid/ciaa321. (published online ahead of print, 24 March 2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lundstrom K. Coronavirus pandemic - therapy and vaccines. Biomedicines. 2020;8:109. doi: 10.3390/biomedicines8050109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Sarkar M., Agrawal A.S., Dey R.S., Chattopadhyay S., Mullick R., De P., Chakrabarti S., Chawla-Sarkar M. Molecular characterization and comparative analysis of pandemic H1N1/2009 strains with co-circulating seasonal H1N1/2009 strains from eastern India. Arch. Virol. 2009;156(2):207–217. doi: 10.1007/s00705-010-0842-6. [DOI] [PubMed] [Google Scholar]
- 13.Ross R. Center for Infectious Disease Research and Policy; 2012. CDC Estimate of Global H1N1 Pandemic Deaths: 284,000. https://www.cidrap.umn.edu/news-perspective/2012/06/cdc-estimate-global-h1n1-pandemic-deaths-284000. (Accessed 11 May 2020) [Google Scholar]
- 14.Kim Y., Lee S., Chu C., Choe S., Hong S., Shin Y. The characteristics of Middle Eastern respiratory syndrome coronavirus transmission dynamics in South Korea. Osong Public Health Res. Perspect. 2016;7(1):49–55. doi: 10.1016/j.phrp.2016.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Chan J.F., Sridhar S., Yip C.C., Lau S.K., Woo P.C. The role of laboratory diagnostics in emerging viral infections: the example of the Middle East respiratory syndrome epidemic. J. Microbiol. 2017;55(3):172–182. doi: 10.1007/s12275-017-7026-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Dugas A.F., Jalalpour M., Gel Y., Levin S., Torcaso F., Igusa T., Rothman R.E. Influenza forecasting with Google flu trends. PLoS One. 2013:8. doi: 10.1371/journal.pone.0056176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Nishiura H. Real-time forecasting of an epidemic using a discrete time stochastic model: a case study of pandemic influenza (H1N1-2009) BioMed EngOnLine. 2009;10:15. doi: 10.1186/1475-925x-10-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Chretien J.P., George D., Shaman J., Chitale R.A., McKenzie F.E. Influenza forecasting in human populations: a scoping review. PLoS One. 2014:9. doi: 10.1371/journal.pone.0094130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Longobardi A., Villani P. Trend analysis of annual and seasonal rainfall time series in the Mediterranean area. Int. J. Climatol. 2010;30(10):1538–1546. doi: 10.1002/joc.2001. [DOI] [Google Scholar]
- 20.Atan R., Raman S.A., Sawiran M.S., Mohamed N., Mail R. 2010 International Conference on Science and Social Research. 2010. Financial performance of Malaysian local authorities: A trend analysis; pp. 271–276. [DOI] [Google Scholar]
- 21.Wen M., Li P., Zhang L., Chen Y. Stock market trend prediction using high-order information of time series. IEEE Access. 2019;7:28299–28308. doi: 10.1109/access.2019.2901842. [DOI] [Google Scholar]
- 22.Oliveira P.T., Santos e Silva C.M., Lima K.C. Climatology and trend analysis of extreme precipitation in subregions of Northeast Brazil. Theor. Appl. Climatol. 2017;130(1–2):77–90. doi: 10.1007/s00704-016-1865-z. [DOI] [Google Scholar]
- 23.Zhao J., Zuo T., Zheng R., Zhang S., Zeng H., Xia C., Chen W. Epidemiology and trend analysis on malignant mesothelioma in China. Chin. J. Cancer Res. 2017;29(4):361. doi: 10.21147/j.issn.1000-9604.2017.04.09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Soares S.C.M., dos Santos K.M.R., de Morais Fernandes F.C.G., Barbosa I.R., de Souza D.L.B. Testicular cancer mortality in Brazil: trends and predictions until 2030. BMC Urol. 2019;19(1):59. doi: 10.1186/s12894-019-0487-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zahmatkesh B., Keramat A., Alavi N., Khosravi A., Kousha A., Motlagh A.G., Chaman R. Breast cancer trend in Iran from 2000 to 2009 and prediction till 2020 using a trend analysis method. Asian Pac. J. Cancer Prev. 2000;17(3):1493–1498. doi: 10.7314/apjcp.2016.17.3.1493. [DOI] [PubMed] [Google Scholar]
- 26.Mousavizadeh A., Dastoorpoor M., Naimi E., Dohrabpour K. Time-trend analysis and developing a forecasting model for the prevalence of multiple sclerosis in Kohgiluyeh and Boyer-Ahmad Province, southwest of Iran. Public Health. 2018;154:14–23. doi: 10.1016/j.puhe.2017.10.003. [DOI] [PubMed] [Google Scholar]
- 27.Yuan H., Li X., Wan G., Sun L., Zhu X., Che F., Yang Z. Type 2 diabetes epidemic in East Asia: a 35-year systematic trend analysis. Oncotarget. 2018;9(6):6718. doi: 10.18632/oncotarget.22961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Lin Q., Zhao S., Gao D., Lou Y., Yang S., Musa S.S., Wang M., Cai Y., Wang W., Yang L., He D. A conceptual model for the coronavirus disease 2019 (COVID-19) outbreak in Wuhan, China with individual reaction and governmental action. Int. J. Infect. Dis. 2019;93:211–216. doi: 10.1016/j.ijid.2020.02.058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Paiva H.M., Afonso R.J.M., de Oliveira I.L., Garcia G.F. A data-driven model to describe and forecast the dynamics of COVID-19 transmission. PLoS One. 2020;15(7) doi: 10.1371/journal.pone.0236386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hou C., Chen J., Zhou Y., Hua L., Yuan J., He S., Zhang J. The effectiveness of quarantine of wuhan city against the Corona virus disease 2019 (COVID-19): A well-mixed SEIR model analysis. J. Med. Virol. 2020 doi: 10.1002/jmv.25827. [DOI] [PubMed] [Google Scholar]
- 31.Salgotra R., Gandomi M., Gandomi A.H. Time series analysis and forecast of the COVID-19 pandemic in India using genetic programming. Chaos Solitons Fractals. 2020 doi: 10.1016/j.chaos.2020.109945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Ribeiro M.H.D.M., da Silva R.G., Mariani V.C., dos Santos Coelho L. Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil. Chaos Solitons Fractals. 2020 doi: 10.1016/j.chaos.2020.109853. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Yang Zifeng, et al. Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions. J. Thorac. Dis. 2020;12(3):165. doi: 10.21037/jtd.2020.02.64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Hernandez-Matamoros A., Fujita H., Hayashi T., Perez-Meana H. Forecasting of COVID19 per regions using ARIMA models and polynomial functions. Appl. Soft Comput. 2020 doi: 10.1016/j.asoc.2020.106610. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Rath S., Tripathy A., Tripathy A.R. Prediction of new active cases of coronavirus disease (COVID-19) pandemic using multiple linear regression model. Diabetes Metab. Syndr. Clin. Res. Rev. 2020 doi: 10.1016/j.dsx.2020.07.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Lin Y.F., Duan Q., Zhou Y., Yuan T., Li P., Fitzpatrick T., et al. Spread and impact of COVID-19 in China: a systematic review and synthesis of predictions from transmission-dynamic models. Front. Med. 2020;7:321. doi: 10.3389/fmed.2020.00321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Mohamadou Y., Halidou A., Kapen P.T. A review of mathematical modeling, artificial intelligence and datasets used in the study, prediction and management of COVID-19. Appl. Intell. 2020:1–13. doi: 10.1007/s10489-020-01770-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Richards F.J. A flexible growth function for empirical use. J. Exp. Bot. 1959;10(2):290–301. doi: 10.1093/jxb/10.2.290. [DOI] [Google Scholar]
- 39.Dorf R.C., Bishop R.H. thirteenth ed. Pearson; London: 2016. Modern Control Systems. [Google Scholar]
- 40.Gill P.E., Wong E. In: Mixed Integer Nonlinear Programming. Lee J., Leyffer S., editors. Springer; New York: 2012. Sequential quadratic programming methods; pp. 147–224. [DOI] [Google Scholar]
- 41.Khan W.U., Ye Z., Chaudhary N.I., Raja M.A.Z. Backtracking search integrated with sequential quadratic programming for nonlinear active noise control systems. Appl. Soft Comput. 2018;73:666–683. doi: 10.1016/j.asoc.2018.08.027. [DOI] [Google Scholar]
- 42.Khalilpourazari S., Pasandideh S.H.R., Niaki S.T.A. Optimization of multi-product economic production quantity model with partial backordering and physical constraints: SQP, SFS, SA, and WCA. Appl. Soft Comput. 2016;49:770–791. doi: 10.1016/j.asoc.2016.08.054. [DOI] [Google Scholar]
- 43.Nocedal J., Wright S.J. second ed. Springer; New York: 2006. Numerical Optimization. [DOI] [Google Scholar]
- 44.Mathworks J. 2020. Documentation of the fmincon function. Available at https://www.mathworks.com/help/optim/ug/fmincon.html. (Accessed 12 August 2020) [Google Scholar]
- 45.Aleta A, Martín-Corral D., Pastore y Piontti A., et al. Modelling the impact of testing, contact tracing and household quarantine on second waves of COVID-19. Nat. Hum. Behav. 2020 doi: 10.1038/s41562-020-0931-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Leung K., Wu J.T., Liu D., Leung G.M. First-wave COVID-19 transmissibility and severity in China outside hubei after control measures, and second-wave scenario planning: a modelling impact assessment. Lancet. 2020;395(10223):1382–1393. doi: 10.1016/s0140-6736(20)30746-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.López L., Rodó X. The end of social confinement and COVID-19 re-emergence risk. Nat. Hum. Behav. 2020;4:746–755. doi: 10.1038/s41562-020-0908-8. [DOI] [PubMed] [Google Scholar]
- 48.Xu S., Li Y. Beware of the second wave of COVID-19. Lancet. 2020;395(10233):1321–1322. doi: 10.1016/S0140-6736(20)30845-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Lagarias J.C., Reeds J.A., Wright M.H., Wright P.E. Convergence properties of the Nelder–Mead simplex method in low dimensions. SIAM J. Optim. 1999;9(1):112–147. doi: 10.1137/S1052623496303470. [DOI] [Google Scholar]
- 50.Barnston A.G. Correspondence among the correlation, RMSE, and Heidke forecast verification measures; refinement of the Heidke score. Weather Forecast. 1992:699–709. doi: 10.1175/1520-0434(1992)007<0699:CATCRA>2.0.CO;2. [DOI] [Google Scholar]
- 51.Steyerberg E.W. Clinical Prediction Models. Springer; Cham: 2019. Overfitting and optimism in prediction models; pp. 95–112. [DOI] [Google Scholar]
- 52.Paiva H.M., Galvão R.K.H. Wavelet-packet identification of dynamic systems in frequency subbands. Signal Process. 2006;86(8):2001–2008. doi: 10.1016/j.sigpro.2005.09.021. [DOI] [Google Scholar]
- 53.European Centre for Disease Prevention and Control (ECDC) H.M. 2020. Download today’s data on the geographic distribution of COVID-19 cases worldwide. https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide. (Accessed 12 August 2020) [Google Scholar]
- 54.Dong E., Du H., Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis. 2020;20(5):533–534. doi: 10.1016/s1473-3099(20)30120-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.2020. Brasil.io Project. http://brasil.io/. (Accessed 12 August 2020) [Google Scholar]













