Skip to main content
. 2023 Oct 9;10:401. Originally published 2021 May 19. [Version 2] doi: 10.12688/f1000research.51117.2

Table 1. Pre-processing techniques, a short description and examples from the literature.

Technique Details Example in literature
Tokenisation Splitting text on sentence and word level 56 , 83 , 88
Normalisation Replacing integers, units, dates, lower-casing 65 , 89 , 90
Lemmatisation and stemming Reducing words to shorter or more common forms 53 , 91 , 92
Stop-word removal Removing common words, such as ‘the’, from the text 44 , 48 , 80
Part-of-speech tagging and dependency parsing Tagging words with their respective grammatical roles 41 , 78 , 88
Chunking Defining sentence parts, such as noun-phrases 65 , 76 , 93
Concept tagging Processing and tagging words with semantic classes or concepts, e.g. using word lists or MetaMap 75 , 79 , 94