Table 5.
Machine learning-based NER methods
Publication | Dataset | Dataset size | Method | Features | P | R | F |
---|---|---|---|---|---|---|---|
Tang et al. [132] | i2b2 2010 | 349 train, 477 test | SSVM | Word + context + sentence + section + cTAKES + MetaMap + ConText + Brown clustering | 87.38% | 84.31% | 85.82% |
Wu et al. [148] | i2b2 2010 | 349 train, 477 test | LSTM | Word embedding | 85.33% | 86.56% | 85.94% |
Lee et al. [74] | i2b2 2010 | 170 train, 256 test | BERT model | Pre-trained and fine-tuned BioBERT | - | - | 86.46% |
Zhou et al. [163] | i2b2 2010 | 349 train, 477 test | LSTM-CRF | Pre-trained contextualized embeddings + Glove embedding | - | - | 87.45% |
Zhou et al. [163] | NCBI-disease 2014 | 593 train, 100 valid, 100 test | LSTM-CRF | Pre-trained contextualized embeddings + Glove embedding | - | - | 87.88% |
Lee et al. [74] | NCBI-disease 2014 | 593 train, 100 valid, 100 test | BERT model | Pre-trained and fine-tuned BioBERT | - | - | 89.36% |
Yang et al. [153] | n2c2 2019 for family history extraction | 99 train, 117 test | majority voting of LSTM-CRF models with BERT fine-tuning | Fasttext embedding + pre-trained BERT | 79.69% | 79.20% | 79.44% |
Uzuner et al. [139] | SemEval 2014 Task 7A | 199 train, 99 valid, 133 test | CRF models | Textual features enhanced with a rule-based system | 91.1% | 85.6% | 88.3% |
Vunikili et al. [140] | CANTEMIST-NER sub-task with tumor morphology mentions | 501 train, 500 valid, 5232 test | Transfer learning to fine-tune the BERT model | BERT contextual embeddings pre-trained on general domain Spanish text | 72.7% | 74.1% | 73.4% |
Deng et al. [33] | crawled TCM patents’ abstract texts annotated with herb names, disease names, symptoms, and therapeutic effects. | 1600 copies: 60% train, 20% valid, 20% test characters | BiLSTM-CRF | Pre-trained and fine-tuned character embedding | 94.63% | 94.47% | 94.48% |
Zhou et al. [163] | MACCROBAT 2018 case reports | 160 train, 20 valid, 20 test | LSTM-CRF | Pre-trained contextualized embeddings + Glove embedding | - | - | 65.75% |
Lee et al. [74] | MACCROBAT 2018 case reports | 160 train, 20 valid, 20 test | BERT model | Pre-trained and fine-tuned BioBERT | - | - | 64.38% |