Table 2.
Description of preprocessing steps and options used in traditional classifiers.
| Preprocessing steps | Descriptions | Optionsa |
| placeholder_remove | Remove textual placeholders such as _mention_, _hashtag_, _unicode_, and _url_ | True, false |
| emoji_remove | Remove textual descriptions that denote emojis | True, false |
| negation_expand | Expand negative contractions, for example, “don’t” is expanded to “do not” and “can’t” is expanded to “cannot” | True, false |
| punctuation_remove | Remove all punctuation symbols | True, false |
| digits_remove | Remove all numeric digits (0-9) | True, false |
| negation_mark | Mark words that occur between a negation trigger and a punctuation mark with the NEG prefix [28] | True, false |
| normalize | Reduce to 2 characters all consecutive characters that appear more than twice, for example, “happppy” is reduced to “happy” | True, false |
| stemming | Reduce inflection in words (eg, troubled, troubles) to their root form (eg, trouble) using the Porter Stemmer [29] | True, false |
| stopwords_remove | Remove common words such as “the,” “a,” “on,” “is,” and “all” that are listed in the Natural Language Toolkit English stop words list [30] | True, false |
| lowercase | Change the case of all characters to lowercase | True, false |
aIf the option for a step is set to true, the corresponding preprocessing step will be applied in the preprocessing pipeline; if the option is set to false, the corresponding preprocessing step will be skipped in the pipeline.