Skip to main content
. 2020 Aug 12;22(8):e17478. doi: 10.2196/17478

Table 2.

Description of preprocessing steps and options used in traditional classifiers.

Preprocessing steps Descriptions Optionsa
placeholder_remove Remove textual placeholders such as _mention_, _hashtag_, _unicode_, and _url_ True, false
emoji_remove Remove textual descriptions that denote emojis True, false
negation_expand Expand negative contractions, for example, “don’t” is expanded to “do not” and “can’t” is expanded to “cannot” True, false
punctuation_remove Remove all punctuation symbols True, false
digits_remove Remove all numeric digits (0-9) True, false
negation_mark Mark words that occur between a negation trigger and a punctuation mark with the NEG prefix [28] True, false
normalize Reduce to 2 characters all consecutive characters that appear more than twice, for example, “happppy” is reduced to “happy” True, false
stemming Reduce inflection in words (eg, troubled, troubles) to their root form (eg, trouble) using the Porter Stemmer [29] True, false
stopwords_remove Remove common words such as “the,” “a,” “on,” “is,” and “all” that are listed in the Natural Language Toolkit English stop words list [30] True, false
lowercase Change the case of all characters to lowercase True, false

aIf the option for a step is set to true, the corresponding preprocessing step will be applied in the preprocessing pipeline; if the option is set to false, the corresponding preprocessing step will be skipped in the pipeline.