Table 5. Cross-validated and hold-out scores (%) according to different metrics (F1, precision, recall, accuracy and area under the curve) for the English and Dutch three best and worst combined feature type systems.
Feature combination | Cross-validation scores | Hold-out scores | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
F1 | P | R | Acc | AUROC | F1 | P | R | Acc | AUROC | ||
English | |||||||||||
Best three | B + C + D + E | 64.26 | 73.32 | 57.19 | 96.97 | 78.07 | 63.69 | 74.13 | 55.82 | 97.21 | 77.47 |
A + B + C | 64.24 | 73.22 | 57.23 | 96.96 | 78.09 | 64.32 | 74.08 | 56.83 | 97.24 | 77.96 | |
A + C + E | 63.84 | 73.21 | 56.59 | 96.94 | 77.78 | 62.94 | 72.82 | 55.42 | 97.14 | 77.24 | |
Worst three | D | 40.48 | 38.98 | 42.12 | 94.10 | 69.41 | 39.56 | 39.56 | 39.56 | 94.71 | 68.39 |
A + D + E | 38.95 | 31.47 | 51.10 | 92.37 | 72.76 | 40.71 | 33.87 | 51.00 | 93.49 | 73.22 | |
E | 17.35 | 9.73 | 79.91 | 63.72 | 71.41 | 15.70 | 8.72 | 78.51 | 63.07 | 70.44 | |
Baseline | word n-gram | 58.17 | 67.55 | 51.07 | 96.54 | 74.93 | 59.63 | 69.57 | 52.17 | 96.57 | 75.50 |
profanity | 17.17 | 9.61 | 80.14 | 63.73 | 71.53 | 17.61 | 9.90 | 78.51 | 63.79 | 71.34 | |
Dutch | |||||||||||
Best three | A + B + C + E | 61.20 | 56.76 | 66.40 | 94.47 | 81.42 | 58.13 | 54.03 | 62.90 | 94.58 | 79.75 |
A + B + C + D + E | 61.03 | 71.55 | 53.20 | 95.53 | 75.86 | 58.72 | 67.40 | 52.03 | 95.62 | 75.21 | |
A + C + E | 60.82 | 71.66 | 52.84 | 95.53 | 75.68 | 58.15 | 67.71 | 50.96 | 95.61 | 74.71 | |
Worst three | D + B | 32.90 | 29.23 | 37.63 | 89.91 | 65.61 | 30.16 | 34.72 | 26.65 | 92.61 | 61.73 |
D | 28.65 | 19.36 | 55.10 | 81.97 | 69.48 | 25.13 | 16.73 | 50.53 | 81.99 | 67.26 | |
B | 24.74 | 21.24 | 29.61 | 88.16 | 60.94 | 17.99 | 23.15 | 14.71 | 91.98 | 55.80 | |
Baseline | word n-gram | 50.39 | 67.80 | 40.09 | 94.81 | 69.38 | 49.54 | 64.29 | 40.30 | 95.09 | 69.44 |
profanity | 28.46 | 19.24 | 54.66 | 81.99 | 69.28 | 25.13 | 16.73 | 50.53 | 81.99 | 67.26 |