Table 7.
Results of the adversarial attack mitigation technique (before and after) on multiple datasets.
| Attack type | Dataset | Movie reviews | Hate speech | Clickbait | |||
|---|---|---|---|---|---|---|---|
| Attack | Before | After | Before | After | Before | After | |
| Word-level | TextFloor | 0.27 | 0.79 | 0.32 | 0.68 | 0.65 | 0.85 |
| PWWS | 0.21 | 0.58 | 0.35 | 0.60 | 0.66 | 0.80 | |
| GENETIC | 0.15 | 0.63 | 0.23 | 0.58 | 0.58 | 0.76 | |
| SememePSO | 0.40 | 0.76 | 0.67 | 0.79 | 0.79 | 0.90 | |
| BAE | 0.85 | 0.99 | 0.94 | 0.96 | 0.98 | 0.97 | |
| BERT-ATTACK | 0.8 | 0.71 | 0.24 | 0.75 | 0.52 | 0.78 | |
| HotFlip | 0.47 | 0.84 | 0.71 | 0.88 | 0.81 | 0.85 | |
| Sentence-level | SEA | 1 | 1 | 1 | 1 | 1 | 1 |
| GAN | 0.63 | 0.89 | 0.60 | 0.66 | 0.54 | 0.56 | |
| SCPN | 1 | 1 | 1 | 1 | 1 | 1 | |
| Char-level | DeepWordBug | 0.58 | 0.92 | 0.74 | 0.83 | 0.62 | 0.85 |
| VIPER | 0.94 | 0.93 | 0.99 | 0.97 | 0.98 | 0.97 | |
| UAT | 0.11 | 0.74 | 0.18 | 0.35 | 0.39 | 0.36 | |
| TextBugger | 0.5 | 0.64 | 0.25 | 0.65 | 0.61 | 0.74 | |
| Average | 0.565 | 0.815 | 0.587 | 0.764 | 0.723 | 0.813 | |
| Relative ↑% | ↑25% | ↑17% | ↑9% | ||||
Significant values are in bold.