Fig 7. Modified Taylor diagrams showing the performance of the models based in three metrics: Standard deviation, RMSD, and a correlation coefficient (left: Spearman’s, right: Kendall Tau), which are represented by the Y, circular, and radial axes, respectively.
Each of the colored symbols represents an ensemble with a concrete number of tree estimators (T) and samples per leaf node (S). Using the pink hexagon as a reference, we can see that the models better performing are located under the arc created by RMSD = 2, since they present a high Pearson/Kendall coefficient, low RMSD, and a standard deviation close to the raw tick bites per grid cell (i.e. stdev = 3.15). Out of these selected models, we can see that 2 ZIP and 1 ZINB models present a higher skill to model overdispersion (i.e. std. dev. > 4), whereas the small cluster of NB and ZINB models under the arc are better suited to predict zero-inflation. As seen, experiments with in the range of 200–600 samples per leaf node seem to perform optimally in both diagrams.