Table 1.
Some methods that addresses the issues related to the nature of named entities. The abbreviation "NI" in this table means Not Included
Publication | Ambiguity | Boundary | Name variation | Composed entities |
---|---|---|---|---|
Lei et al. [76] | + Word segment and section information. | + Medical dictionary to segment words. | NI | NI |
- Most of errors are in long entities. | ||||
Quimbaya et al. [110] | - Ignore the context and surrounding words. | NI | + Edit distance, exact and lemmatized matching by a knowledge base. | NI |
Xu et al. [150] | + Category Word2vec, PoS and dependence relations, and semantic correlation knowledge. - Filtering may miss some medical entities. | + Medical native noun phrases. + Based on knowledge base. - May obtain some inexact entities. | NI | + All medical native noun phrases. |
Ghiasvand and Kate [41] | + Exact matching of unambiguous words from UMLS. | + Boundary expansion model trained on UMLS words. + Classify all possible noun phrases. - Noun phrase extraction not always perfect. - There are some nonnoun phrase entities. | + Lemma and stem forms as features. | + Complete parsing to extract all noun phrases. - Automatic noun phrase extraction is not always perfect. - Some entities not belong to noun phrases. |
Zhou et al. [163] | + Word and character embeddings. - Capture the contextual relation on word-level. | - Can’t treat complex entities in phrase-level. | + Character representation can capture out-of-vocabulary words. | NI |
Deng et al. [33] | + Learn contextual semantic information without feature engineering. + BiLSTM can learn the contextual dependences. + CRF can improve the annotation in phrase-level. | + Ensures the integrity and the accuracy of the entity by bidirectional storage of textual information. + IOB annotation format. + Avoid segmentation errors by character embeddings. - Nested entities results in unclear boundaries. | + Character embedding. | - Limited by the entity annotation granularity. |
Zhao et al. [161] | + Extract lexical, contextual and syntactic clues. + Fine-tune BERT with BiLSTM-CRF. + Rules contextual embedding using ELMO model. | + Extract noun phrases in sentence by PoS patterns. | + Clues-based rules. - Rules not appropriate for other domains. | NI |
Li et al. [78] | + Word2vec is improved by BiLSTM to capture contextual information. + BERT is better and can capture the context without BiLSTM. | + Relation classification between pair of spans is able to recognize discontinuous entities. | + ELMo character-level embedding. - Word-level embedding is needed to capture the whole meaning of words. | + Enumerates and represents all text spans and apply a relation classification. |
Sui et al. [126] | + Interactions between the words, entity triggers and the whole sentence semantics. | NI | + Entity triggers to recognize entity by cue words. - Manual effort is required to prepare entity triggers. | + Cast the problem into a graph node classification task. |