Linguistic
|
to find the prefix that is common to all variations of the term,
|
to find the root term of the variant word,
|
to assign each token to a grammatical category or
|
to divide the text into syntactical correlated parts of words,
|
(e.g chucking, lemmatization, stemming and Part-of-speech (POS) tagging)
|
Orthographic
|
to capture knowledge on word formation by the presence of these features, (e.g capitalization and symbols)
|
Morphological
|
to reflect common structures and/or sub-sequences of characters among entities, (e.g suffixes and prefixes, char n-gram and word shape patterns)
|
Context
|
to establish a higher level of relationship between the tokens and the extracted features, e.g (windows and conjunctions)
|
Lexicons |
to add domain knowledge to the set of features for optimizing the NER system. Dictionaries of domain term are used to match the entity names in the text and the resulting tags are used as features. Examples of the types of dictionaries used (target entity name and trigger name). |