Skip to main content
. 2014 Apr 28;6:17. doi: 10.1186/1758-2946-6-17

Table 3.

Description of common categories of textual features with some examples, summarized from [23]

Features categories Objectives and Examples
Linguistic
to find the prefix that is common to all variations of the term,
to find the root term of the variant word,
to assign each token to a grammatical category or
to divide the text into syntactical correlated parts of words,
(e.g chucking, lemmatization, stemming and Part-of-speech (POS) tagging)
Orthographic
to capture knowledge on word formation by the presence of these features, (e.g capitalization and symbols)
Morphological
to reflect common structures and/or sub-sequences of characters among entities, (e.g suffixes and prefixes, char n-gram and word shape patterns)
Context
to establish a higher level of relationship between the tokens and the extracted features, e.g (windows and conjunctions)
Lexicons to add domain knowledge to the set of features for optimizing the NER system. Dictionaries of domain term are used to match the entity names in the text and the resulting tags are used as features. Examples of the types of dictionaries used (target entity name and trigger name).