Table 4. Prediction accuracy performance of different approaches.
Approach | Prediction of over-reported event | |||
---|---|---|---|---|
Q1 | Q2 | Q3 | Q4 | |
Sampling approaches | ||||
SRS | 18.77% | 14.98% | 22.56% | 20.04% |
SRS with district stratification | 18.83% | 15.21% | 23.22% | 19.9% |
SRS of offenders & non-offenders | - | 34.5% | 36.5% | 27.87% |
SRS of only offenders | - | 44.5% | 42.19% | 38.81% |
Supervised learning | ||||
Logistic Regression | 58.42% | 32.84% | 31.28% | 34.76% |
Naïve Bayes | 55.24% | 46.15% | 32.05% | 41.3% |
SVM | 64.75% | 58.02% | 49% | 52.26% |
Random Forest | 86.6% | 89.18% | 84.92% | 77.31% |
Random Forest with district | 87.84% | 86.19% | 81.99% | 76.96% |
Random Forest with intervention | 85.08% | 82.29% | 77.83% | 73.08% |
Note: Accuracy is calculated as average of 1,000 independent sampling without replacement iterations for SRS, and 10-fold cross-validation for supervised learning.