Skip to main content
. 2022 Nov 8;65(2):463–516. doi: 10.1007/s10115-022-01779-1

Table 1.

Some methods that addresses the issues related to the nature of named entities. The abbreviation "NI" in this table means Not Included

Publication Ambiguity Boundary Name variation Composed entities
Lei et al. [76] + Word segment and section information. + Medical dictionary to segment words. NI NI
- Most of errors are in long entities.
Quimbaya et al. [110] - Ignore the context and surrounding words. NI + Edit distance, exact and lemmatized matching by a knowledge base. NI
Xu et al. [150] + Category Word2vec, PoS and dependence relations, and semantic correlation knowledge. - Filtering may miss some medical entities. + Medical native noun phrases. + Based on knowledge base. - May obtain some inexact entities. NI + All medical native noun phrases.
Ghiasvand and Kate [41] + Exact matching of unambiguous words from UMLS. + Boundary expansion model trained on UMLS words. + Classify all possible noun phrases. - Noun phrase extraction not always perfect. - There are some nonnoun phrase entities. + Lemma and stem forms as features. + Complete parsing to extract all noun phrases. - Automatic noun phrase extraction is not always perfect. - Some entities not belong to noun phrases.
Zhou et al. [163] + Word and character embeddings. - Capture the contextual relation on word-level. - Can’t treat complex entities in phrase-level. + Character representation can capture out-of-vocabulary words. NI
Deng et al. [33] + Learn contextual semantic information without feature engineering. + BiLSTM can learn the contextual dependences. + CRF can improve the annotation in phrase-level. + Ensures the integrity and the accuracy of the entity by bidirectional storage of textual information. + IOB annotation format. + Avoid segmentation errors by character embeddings. - Nested entities results in unclear boundaries. + Character embedding. - Limited by the entity annotation granularity.
Zhao et al. [161] + Extract lexical, contextual and syntactic clues. + Fine-tune BERT with BiLSTM-CRF. + Rules contextual embedding using ELMO model. + Extract noun phrases in sentence by PoS patterns. + Clues-based rules. - Rules not appropriate for other domains. NI
Li et al. [78] + Word2vec is improved by BiLSTM to capture contextual information. + BERT is better and can capture the context without BiLSTM. + Relation classification between pair of spans is able to recognize discontinuous entities. + ELMo character-level embedding. - Word-level embedding is needed to capture the whole meaning of words. + Enumerates and represents all text spans and apply a relation classification.
Sui et al. [126] + Interactions between the words, entity triggers and the whole sentence semantics. NI + Entity triggers to recognize entity by cue words. - Manual effort is required to prepare entity triggers. + Cast the problem into a graph node classification task.