Skip to main content
. 2022 Jan 24;16(1):e0010071. doi: 10.1371/journal.pntd.0010071

Table 2. Mean performance of the baseline dengue incidence prediction model across all cities, for the time period between January 2011 and July 2016.

The approach, a linear regression applied to the most recently available data points, is assessed across all four assumed delays in the reporting of epidemiological information, and compared with the random forest and LASSO-based approaches (with the AR+GT feature set). Numbers in bold represent the best performance for a given model and autoregressive lag across each of the metrics. This corresponds to the lowest value for the RMSE and relative RMSE metrics, and the highest value for the R^2 and Pearson correlation metrics.

Reporting Delay Model and Feature Set RMSE Relative RMSE R^2 Pearson Correlation
8 weeks Baseline -3.201 1.333 153.500 0.175
Lasso, AR+GT 23.828 0.752 0.304 0.555
Random Forest, AR+GT 23.355 0.737 0.331 0.596
6 weeks Baseline -1.054 1.135 127.146 0.372
Lasso, AR+GT 22.615 0.717 0.368 0.608
Random Forest, AR+GT 22.055 0.699 0.399 0.664
3 weeks Baseline -0.406 0.916 85.878 0.677
Lasso, AR+GT 18.859 0.602 0.556 0.752
Random Forest, AR+GT 17.613 0.562 0.612 0.793
1 week Baseline 0.649 0.483 39.984 0.870
Lasso, AR+GT 12.259 0.393 0.811 0.901
Random Forest, AR+GT 11.027 0.354 0.847 0.924