Abstract
Dengue fever is a disease with increasing incidence, now occurring in some regions which were not previously affected. Ribeirão Preto and São Paulo, municipalities in São Paulo state, Brazil, have been highlighted due to the high dengue incidences especially after 2009 and 2013. Therefore, the current study aims to analyse the temporal behaviour of dengue cases in the both municipalities and forecast the number of disease cases in the out-of-sample period, using time series models, especially SARIMA model. We fitted SARIMA models, which satisfactorily meet the dengue incidence data collected in the municipalities of Ribeirão Preto and São Paulo. However, the out-of-sample forecast confidence intervals are very wide and this fact is usually omitted in several papers. Despite the high variability, health services can use these models in order to anticipate disease scenarios, however, one should interpret with prudence since the magnitude of the epidemic may be underestimated.
Key words: Forecast, dengue, SARIMA
Introduction
Dengue fever presents high levels of infection being reported in many tropical and subtropical localities populated by Aedes aegypti mosquitoes, the main vector of the disease [1].
In the 21st century, Brazil has become the country with the highest number of reported cases of dengue in the world, reaching the first place in the international ranking for total cases of the disease [2]. In this country, the southeastern region has been very affected, especially São Paulo state, with high numbers of dengue cases being reported. In 2015 exclusively, more than 745 600 cases were reported in São Paulo state, representing 1732 cases per 100 000 inhabitants. Two municipalities in this state, Ribeirão Preto and São Paulo, have been highlighted due to the high incidence of dengue [3–6]. In the period from 2000 to 2015, the annual incidence rate in the municipality of Ribeirão Preto ranged from 9 to 4903 cases per 100 000 inhabitants. In the municipality of São Paulo, the annual incidence rate ranged from 1 to 837 cases per 100 000 inhabitants, considering the same period.
In the face of this scenario, obtaining detailed information on when and where dengue outbreaks occurred in the past can be a useful guide to predict the magnitude and severity of future epidemics and thus allow adequate allocation of resources to better public health interventions [7]. Therefore, time series analysis tools have been used to predict the occurrence of infectious diseases such as dengue [7–9], malaria [10] and influenza [11]. Hence, the current study aims to analyse the temporal behaviour of dengue cases in the municipalities of Ribeirão Preto and São Paulo, in order to forecast the monthly number of dengue cases in 2016.
Methodology
Study area
Ribeirão Preto is a municipality in São Paulo state, Brazil, with a south latitude of 21°10′ and a longitude of 47°50′ west. It occupies an area of about 651 km2, with 127 km2 being in urban perimeter [12, 13]. The Brazilian Institute of Geography and Statistics (or IBGE) estimated its population as 674 405 inhabitants [14] and its economy is based on agribusiness, mainly the sugar-alcohol sector and citriculture.
São Paulo is a Brazilian municipality, capital of São Paulo state, with a south latitude of 23°32′ and longitude of 46°38′ west. It is the most populous city in Brazil with more than 12 million inhabitants and is the main financial, mercantile and corporate centre of South America [15–17].
Data collection
The number of dengue cases (monthly basis) and the population information of the municipality of Ribeirão Preto were obtained through the database of the City Hall of Ribeirão Preto [18, 19]. For the municipality of São Paulo, the number of dengue cases were obtained through the DATASUS database and the City Hall of the city of São Paulo [20–22].
Statistical analysis
In order to analyse the monthly number of dengue cases in each city until 2015, seasonal autoregressive integrated moving average (SARIMA) time series model was proposed since the SARIMA takes into account the seasonality, possible nonstationarity and all autocorrelations. To satisfy all the assumptions of the usual SARIMA model, as homoscedastic uncorrelated errors with Gaussian distribution and to include outliers in epidemic periods, the model was fitted to the logarithm of the number of cases plus 1 [5, 8], summing 1 to the number of cases to avoid zero counts.
The SARIMA model [23] was chosen after fitting several models with different SARIMA(p, d, q)(P, D, Q) specifications, where d and D correspond respectively to the number of usual differences and seasonal differences necessary to achieve stationarity, P and p are the autoregressive orders and Q and q are the moving average orders. A first candidate model is the one which minimises the Akaike information criteria (AIC) [24], corresponding to the one that maximises the likelihood, penalising an increase in the number of parameters, avoiding overfitting.
After fitting the model with the best AIC using the maximum likelihood method, a residual analysis is performed to check the validity of all assumptions of the model. The final model must have all valid assumptions. The residual analysis consists of the time series plot of observed and predicted monthly number of cases; a time series plot of residuals, the residual autocorrelation function, the Ljung–Box tests [25] and the residual qq-plot to check the normality.
For the final model, the significance of all parameters was tested using the Wald test and non-significant terms were removed from the model. After choosing the final model, the monthly number of dengue cases in 2016 was forecasted with their respective 95% confidence intervals. All tests considered the 5% level of significance and all the analyses were executed in R software.
Federal University of São Paulo Ethical Committee approved this study under process number 3696290616.
Results
In the period from 2000 to 2016, more than 11 million cases of dengue were reported in Brazil, of which 2 080 584 were in the state of São Paulo. The municipalities of Ribeirão Preto and São Paulo were responsible for more than 14% of the total cases in this state. The number of monthly dengue cases in both cities is shown in Figure 1, in order to analyse behaviour over time.
It is verified that the number of dengue cases shows a cyclical behaviour, especially in Ribeirão Preto. It should be observed that in the years 2010, 2011, 2013 and 2016, Ribeirão Preto presented a large number of individuals with the disease, totalising about 101 thousand infected individuals, that is, 15 945 cases per 100 000 inhabitants. On the other hand, the years 2000, 2002, 2004, 2012 and 2014 presented a low incidence with few reported cases. In São Paulo, the years 2014 and 2015 were the ones with the highest number of dengue cases, with more than 129 440 cases, representing 1119 cases per 100 000 inhabitants. The years 2004 and 2005 were the years with the lowest incidence of dengue. In addition, it is observed that the appearance of cases is increasing in the first months of each year, coinciding with the seasons of the summer, that is, the incidence of dengue presents an annual seasonal cycle.
Thus, considering the seasonality of the disease, it was possible to adjust SARIMA models with different indications of the components p, d, q, P, D, Q, whose SARIMA(0,1,0)(2,0,0)12 presented the lowest AIC for both Ribeirão Preto and São Paulo. The residuals obtained fitting this model present significant autocorrelations until the lag 6, indicating that SARIMA(6,1,0)(2,0,0)12 is more appropriate and in fact it satisfied all the model assumptions. The next step was to estimate the parameters of the proposed model.
Municipality Ribeirão Preto
Concerning the data from Ribeirão Preto, we observed that the AR1 and AR3 terms were not significant and were removed from the model. The results for the final fitted model are shown in Table 1.
Table 1.
Estimate | Standard error | Z Stat. | p | |
---|---|---|---|---|
AR2 | 0.266 | 0.070 | 3.822 | <0.001 |
AR4 | −0.185 | 0.071 | −2.590 | 0.010 |
AR5 | −0.163 | 0.068 | −2.385 | 0.017 |
AR6 | −0.213 | 0.071 | −2.999 | 0.003 |
SAR1 | 0.194 | 0.076 | 2.575 | 0.010 |
SAR2 | 0.284 | 0.076 | 3.723 | <0.001 |
After estimating the parameters of this model, we assessed their adequacy by analysing their residuals (Fig. 2).
Figure 2(a) suggests that the standardised residuals estimated from this model should behave as an independent and identically distributed sequence with a mean of zero and a constant variance. The qq plot, Figure 2(b), shows that the standardised residuals for the model approximated a normal distribution. Based on the Ljung–Box test, the hypothesis of all autocorrelations up to lag 15 are null (p = 0.3395), suggesting that the residuals behave as a white noise. This can be seen in Figures 2(c) and (d), where the graphs of the autocorrelation function and the partial autocorrelation function suggest that the autocorrelations are jointly non-significant, that is, the autocorrelations approach of zero. Thus, all assumptions were satisfied and the model error variance is 0.6914.
As the model is satisfactory, it was possible to carry out the forecast for 2016, which is represented in Figure 3.
The SARIMA(6,1,0)(2,0,0)12 model closely fits dengue in Ribeirão Preto, however for the out-of-sample forecasts, the confidence intervals in the log scale are very wide; this shows that the forecasts are not very accurate.
Municipality São Paulo
For the data from São Paulo, the first three autoregressive coefficients were not significant (p = 0.9024 – Wald), these terms were removed from model. The estimates of parameters for the final model are shown in Table 2.
Table 2.
Estimate | Standard error | Z Stat. | p | |
---|---|---|---|---|
AR4 | −0.210 | 0.071 | −2.940 | 0.003 |
AR5 | −0.189 | 0.073 | −2.571 | 0.010 |
AR6 | −0.187 | 0.072 | −2.595 | 0.009 |
SAR1 | 0.458 | 0.074 | 6.227 | <0.001 |
SAR2 | 0.252 | 0.077 | 3.258 | 0.001 |
After estimating the parameters of this model, we assessed their adequacy by analysing their residuals (Fig. 4).
Similarly to the analysis of the data performed for Ribeirão Preto, the residual analysis for São Paulo indicated that this model was adequate, with uncorrelated residuals up to lag 15 (p = 0.1481), despite a higher autocorrelation in lag 14, and with approximately normal distribution (Fig. 4) and the model error variance estimate is 0.4571.
As the model is appropriate, it was possible to carry out the forecast for 2016, which is represented in Figure 5.
Evaluating the forecast number of cases
The graph presented in Figure 6 compares the observed, the predicted and forecast number of cases (not in the logarithm scale). In São Paulo, the predicted number of cases during the outbreak in 2014 was lower than the observed, since it was the first outbreak in São Paulo and the forecast for 2016 was larger than the observed. In Ribeirão Preto the predicted values are close to the observed number of cases and forecasts for 2016 are smaller than the observed number of cases.
Figure 7 presents the observed and the forecasts of the number of cases in 2016. The forecast 95% confidence intervals are so wide that their upper limits reach more than the double of the monthly number of cases ever observed, highlighting the large variability of forecasts. Exemplifying this situation, for illustrative purposes, the 95% confidence intervals for the number of cases in April 2016 were [66; 114 479] in Ribeirão Preto and [983; 197 178] in São Paulo. Also in April 2016, there were 3554 cases in Ribeirão Preto and 4524 cases in São Paulo. The observed number of cases belongs to the intervals, however, they are very wide.
Discussion and conclusion
Looking ahead to future scenarios of the diseases distribution in the population and recognising the capable factors of interfering with this distribution allows decision making and planning to reduce the burden of diseases [26]. Thus, the time series analysis tools, in particular the SARIMA models, have been widely used by several authors (Table 3) to forecast the occurrence of outbreaks of infectious diseases, such as dengue. Additionally, for comparison purpose, our results are shown in Table 3.
Table 3.
Reference | Area of study | Period | Data | Model |
---|---|---|---|---|
Luz et al. [8] | Rio de Janeiro, Brazil | 1994–2004 | Monthly | SARIMA(2,0,0)(1,0,0)12 |
Hu et al. [27] | Queensland, Australia | 1993–2003 | Monthly | SARIMA(1,0,0)(2,1,0)12 |
Gharbi et al. [9] | Guadeloupe, French West Indies | 2000–2006 | Weekly | SARIMA(0,1,1)(0,1,1)52 |
Martinez et al. [28] | Campinas, Brazil | 1998–2008 | Monthly | SARIMA(2,1,2)(1,1,1)12 |
Martinez & Silva [5] | Ribeirão Preto, Brazil | 2000–2008 | Monthly | SARIMA(2,1,3)(1,1,1)12 |
Bhatnagar et al. [7] | Rajasthan, India | 2001–2010 | Monthly | SARIMA(0,0,1)(0,1,1)12 |
Dela Cruz et al. [29] | Philippines | 2005–2010 | Monthly | SARIMA(1,0,1)(0,1,1)12 |
Phung et al. [30] | Can Tho, Vietnam | 2003–2010 | Monthly | SARIMA(1,1,1)(1,1,0)12 |
Our results | Ribeirão Preto and São Paulo, Brazil |
2000–2015
2001–2015 |
Monthly |
SARIMA(6,1,0)(2,0,0)12
SARIMA(6,1,0)(2,0,0)12 |
In the current study, the analysis of time series allowed the development of SARIMA models, which satisfactorily fit the dengue incidence data collected in the municipalities of Ribeirão Preto and São Paulo, in addition to forecasting the number of dengue cases for a subsequent year. Our fitted model has a larger order of the autoregressive term because it was necessary to take into account the residual autocorrelation to meet all the model assumptions. This means that the choice of an appropriate SARIMA model depends on each particular analysed time series and for each case a complete residual analyses must be accomplished, after choosing a candidate model using the AIC criteria. In this study, for the municipality of Ribeirão Preto, from 2000 to 2015, the SARIMA model (6,1,0)(2,0,0)12 presented the best fit. On the other hand, Martinez & Silva [5] concluded that the SARIMA model (2,1,3)(1,1,1)12 was the one that had the best fit for the incidence of dengue cases in the period from 2000 to 2008, for the same municipality. In this sense, we can see that the different series require different SARIMA models.
The results presented in Table 3 showed that the predicted number of dengue cases in Ribeirão Preto depends on the number of cases in the previous 2 to 24 months. In addition, the monthly incidence of dengue observed in 2016 was significantly higher than the number predicted by the SARIMA model. In fact, out-of-sample forecasts follow the past observed behaviour and may not be credible to forecast the number of dengue cases in epidemic years, such as 2016, since the large number of reported cases may be a consequence of the lack of immunity of the population exposed by the first dengue virus, making the outbreak unpredictable [5]. Only in 2016, more than 35 000 cases of dengue fever were confirmed, being considered the largest epidemic in the city's history.
In São Paulo, the predicted number of dengue cases in a given month depends on the number of dengue cases occurring in the previous 24 months. The year 2015 presented the largest epidemic ever occurred in the city of São Paulo, with more than 100 400 confirmed cases. Thus, a decline in the number of dengue cases in the following year was expected due to the immunity acquired by the population exposed to one of the four viral serotypes of dengue [31]. However, a forecasting model assumes that a distribution pattern will be repeated in the future [32], so the forecast for 2016 followed the same trend as in 2015 and the expected number of cases for 2016 was higher than the observed number of cases.
Several papers displayed in Table 3 also presented out-of-sample forecasts, but, in general, they present confidence intervals for the forecasts only in the log-scale. Moreover, they omit the corresponding interval for number of cases, because they would be very wide. These wide intervals indicate that we must use these forecasts prudently.
Although dengue predictive models have difficulties in maintaining the accuracy of the prediction, due to their epidemiology is influenced by a combination of factors [32], it is essential to carry out similar research for an early identification of diseases. Elimination of dengue as a burden for public health can only be achieved through the integration of vector control and vaccines [33]. On the other hand, efforts to anticipate disease scenarios may prioritise a better combination of vector control interventions according to a magnitude of the epidemic, as well as helping to provide subsidies for structuring health care services.
Acknowledgements
This work was supported by the Coordination for the Improvement of Higher Education Personnel (CAPES) (grant number: 1655386) and São Paulo Research Foundation (FAPESP) (grant 2018/04654-9).
Author ORCIDs
Ana Flávia Gabriel, 0000-0002-7643-7298
References
- 1.Liu W et al. (2016) Highly divergent dengue virus type 2 in traveler returning from Borneo to Australia. Emerging Infectious Diseases 22, 2146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Teixeira MG et al. (2009) Dengue: twenty-five years since reemergence in Brazil. Cadernos de Saúde Pública 25, S7–S18. [DOI] [PubMed] [Google Scholar]
- 3.Hino P et al. (2010) Temporal evolution of dengue fever in Ribeirão Preto, São Paulo State, 1994–2003 [in Portuguese]. Ciência & Saúde Coletiva 15, 233–238. [DOI] [PubMed] [Google Scholar]
- 4.Ruediger MA et al. Dengue numbers in the state and in the municipality of São Paulo [in Portuguese]. Fundação Getúlio Vargas. Diretoria de Análise de Políticas Públicas, 2016.
- 5.Martinez EZ and Silva EA (2011) Predicting the number of cases of dengue infection in Ribeirão Preto, São Paulo State, Brazil, using a SARIMA model. Cadernos de Saúde Pública 27, 1809–1818. [DOI] [PubMed] [Google Scholar]
- 6.De Masi. Intervention analysis in time series of dengue and leptospirosis of the city of São Paulo: political, administrative, technical and environmental factor impact (thesis). São Paulo, SP, Brazil: University of São Paulo, 2014.
- 7.Bhatnagar S et al. (2012) Forecasting incidence of dengue in Rajasthan, using time series analyses. Indian Journal of Public Health 56, 281–285. [DOI] [PubMed] [Google Scholar]
- 8.Luz PM et al. (2008) Time series analysis of dengue incidence in Rio de Janeiro, Brazil. The American Journal of Tropical Medicine and Hygiene 79, 933–939. [PubMed] [Google Scholar]
- 9.Gharbi M et al. (2011) Time series analysis of dengue incidence in Guadeloupe, French West Indies: forecasting models using climate variables as predictors. BCM Infectious Diseases 11, 166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ostovar A et al. (2016) Time series analysis of meteorological factors influencing malaria in South Eastern Iran. Journal of Arthropod-Borne Diseases 10, 222–236. [PMC free article] [PubMed] [Google Scholar]
- 11.Soebiyanto RP, Adimi F and Kiang RK (2010) Modeling and predicting seasonal influenza transmission in warm regions using climatological parameters. PLoS ONE 5, e9450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Prefeitura de Ribeirão Preto. Dados Geográficos [cited2016Out10]. Available at https://www.ribeiraopreto.sp.gov.br/crp/dados/local/i01local.php.
- 13.Fundação Sistema Estadual de Análise de Dados – SEADE. Informações dos municípios paulista [cited2016Out10]. Available at http://www.perfil.seade.gov.br/#.
- 14.Instituto Brasileiro de Geografia e Estatística [cited2017Fev27]. Available at http://www.ibge.com.br/cidadesat/painel/populacao.php?codmun=354340&search=%7C%7Cinfograficos:-evolucao-populacional-e-piramide-etaria&lang=.
- 15.Centro de Pesquisa Metereológicas e Climáticas Aplicadas à Agricultura (CEPAGRI) [cited2016Out10]. Available at http://www.cpa.unicamp.br/outras-informacoes/clima_muni_565.html.
- 16.Fundação Sistema Estadual de Análise de Dados – SEADE. Projeção da população [cited2017Fev27]. Available at http://produtos.seade.gov.br/produtos/projpop/index.php.
- 17.Portal do governo. Panorama do estado de São Paulo [cited2017Out19]. Available at http://www.saopauloglobal.sp.gov.br/panorama_geral.aspx.
- 18.Prefeitura de Ribeirão Preto. Boletim Epidemiológico [cited2017Jan16]. Available at http://www.ribeiraopreto.sp.gov.br/ssaude/pdf/dengue-2014-casos.pdf.
- 19.Prefeitura de Ribeirão Preto. Boletim Epidemiológico [cited 2018Jan27]. Available at https://www.ribeiraopreto.sp.gov.br/ssaude/pdf/boletim_dengue.pdf.
- 20.Ministério da Saúde (BR). Departamento de Informática do SUS (DATASUS) [cited2017Jan16]. Available at http://tabnet.datasus.gov.br/cgi/deftohtm.exe?sinanwin/cnv/dengueSP.def.
- 21.Prefeitura de São Paulo. Boletim Epidemiológico [cited2017Jan16]. Available at http://www.prefeitura.sp.gov.br/cidade/secretarias/upload/casos%20autoctones.pdf.
- 22.Prefeitura de São Paulo. Boletim Epidemiológico [cited2017Set20]. Available at http://www.prefeitura.sp.gov.br/cidade/secretarias/upload/chamadas/boletim_arboviroses_29_2017_vale_este_1502376904.pdf.
- 23.Shumway RH and Stoffer DS (2011) Time Series Analysis and Its Applications: with R Examples, 3rd Edn New York: Springer, 201–202. [Google Scholar]
- 24.Akaike H (1974) A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 716–723. [Google Scholar]
- 25.Ljung GM and Box GEP (1978) On a measure of lack of fit in time series models. Biometrika 65, 297–303. [Google Scholar]
- 26.Antunes JLF and Cardoso MRA (2015) Uso da análise de séries temporais em estudos epidemiológicos. Epidemiologia e Serviços de Saúde 24, 565–576. [Google Scholar]
- 27.Hu W et al. (2010) Dengue fever and El Nino/Southern Oscillation in Queensland, Australia: a time series predictive model. Occupational and Environmental Medicine 67, 307–311. [DOI] [PubMed] [Google Scholar]
- 28.Martinez EZ, Silva EA and Fabbro AL (2011) A SARIMA forecasting model to predict the number of cases of dengue in Campinas, State of São Paulo, Brazil. Revista da Sociedade Brasileira de Medicina Tropical 44, 436–440. [DOI] [PubMed] [Google Scholar]
- 29.Dela Cruz AC et al. (2012) Forecasting dengue incidence in the national capital region, Philippines: using time series analysis with climate variables as predictors. Acta Manilana 60, 19–26. [Google Scholar]
- 30.Phung D et al. (2015) Identification of the prediction model for dengue incidence in Can Tho city, a Mekong Delta area in Vietnam. Acta Tropica 141, 88–96. [DOI] [PubMed] [Google Scholar]
- 31.Teixeira MG, Barreto ML and Guerra Z (1999) Epidemiology and preventive measures of Dengue. Informe Epidemiológico do SUS 8, 5–33. [Google Scholar]
- 32.Hii YL et al. (2012) Forecast of dengue incidence using temperature and rainfall. PLoS Neglected Tropical Diseases 6, e1908. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Achee NL et al. (2015) A critical assessment of vector control for dengue prevention. PLoS Neglected Tropical Diseases 9, e0003655. [DOI] [PMC free article] [PubMed] [Google Scholar]