Skip to main content
. 2019 Jan 29;14(1):e0211262. doi: 10.1371/journal.pone.0211262

Table 4. Prediction accuracy performance of different approaches.

Approach Prediction of over-reported event
Q1 Q2  Q3  Q4 
Sampling approaches
SRS  18.77% 14.98% 22.56% 20.04%
SRS with district stratification 18.83% 15.21% 23.22% 19.9%
SRS of offenders & non-offenders - 34.5% 36.5% 27.87%
SRS of only offenders - 44.5% 42.19% 38.81%
Supervised learning
Logistic Regression 58.42% 32.84% 31.28% 34.76%
Naïve Bayes 55.24% 46.15% 32.05% 41.3%
SVM 64.75% 58.02% 49% 52.26%
Random Forest 86.6% 89.18% 84.92% 77.31%
Random Forest with district 87.84% 86.19% 81.99% 76.96%
Random Forest with intervention 85.08%  82.29% 77.83% 73.08%

Note: Accuracy is calculated as average of 1,000 independent sampling without replacement iterations for SRS, and 10-fold cross-validation for supervised learning.