Table 1.
Adversarial text attack methods and their main ideas across different levels.
| Attack level | Attack | Main idea |
|---|---|---|
| Character-level | Generating Adversarial Text Against Real-world Applications (TextBugger)23 | Greedy word substitution and character manipulation |
| Universal Adversarial Triggers for Attacking and Analyzing NLP (UAT)1 | Gradient-based word or character manipulation | |
| Visually Attacking and Shielding NLP Systems (VIPER) 24 | Visually similar character substitution | |
| Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers (DeepWordBug) 25 | Greedy character manipulation | |
| TextFooler (TF)26 | Greedy word substitution | |
| Word-level | White-Box Adversarial Examples for Text Classification (HotFlip)27 | Gradient-based word or character substitution |
| Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency (PWWS)28 | Greedy word substitution | |
| Generating Natural Language Adversarial Examples (Genetic)29 | Genetic algorithm-based word substitution | |
| Word-level Textual Adversarial Attacking as Combinatorial Optimization (SememePSO)30 | Particle swarm optimization-based word substitution | |
| Adversarial Attack Against BERT Using BERT (BERT-ATTACK)31 | Greedy contextualized word substitution | |
| BERT-based Adversarial Examples for Text Classification (BAE)32 | Greedy contextualized word substitution and insertion | |
| Semantically Equivalent Adversarial Rules for Debugging NLP Models (SEA)33 | Rule-based paraphrasing | |
| Sentence-level | Adversarial Example Generation with Syntactically Controlled Paraphrase Networks (SCPN)34 | Paraphrasing |
| Generating Natural Adversarial Examples (GAN)35 | Text generation by encoder–decoder |