Table 1. Pre-processing techniques, a short description and examples from the literature.
Technique | Details | Example in literature |
---|---|---|
Tokenisation | Splitting text on sentence and word level | 56 , 83 , 88 |
Normalisation | Replacing integers, units, dates, lower-casing | 65 , 89 , 90 |
Lemmatisation and stemming | Reducing words to shorter or more common forms | 53 , 91 , 92 |
Stop-word removal | Removing common words, such as ‘the’, from the text | 44 , 48 , 80 |
Part-of-speech tagging and dependency parsing | Tagging words with their respective grammatical roles | 41 , 78 , 88 |
Chunking | Defining sentence parts, such as noun-phrases | 65 , 76 , 93 |
Concept tagging | Processing and tagging words with semantic classes or concepts, e.g. using word lists or MetaMap | 75 , 79 , 94 |