Skip to main content
. 2009 Jan 21;10:28. doi: 10.1186/1471-2105-10-28

Table 1.

Algorithms for Word Sense Disambiguation.

publ. Data Background knowledge Approach Experiment Accuracy
Established Knowledge [12] gene definition & abstract vector 5 human gen. dbs & MeSH cosine similarity 52,529 Medline abstracts, 690 human gene symbols 92.7%
[13] free text UMLS, Journal Descriptors Journal Descriptor Indexing (JDI) 45 ambiguous UMLS terms (NLM WSD Collection) 78.7%
[14] Medline abstracts BioCreative-2 GN lexicon & text, EntrezGene, UniProt, GOA motifs from multiple sequence alignments BioCreative-2 GN challenge 81%
[15] Medline abstracts list of gene senses, EntrezGene inverse co-author graph BioCreative GN challenge 97%P

Supervised [8] XML tagged abstracts, positional info, PoS - naive Bayes, decision trees, inductive rule training protein/gene/mRNA assignment: 9 million words (mol. biol. journals) 85%
[49] text - word count, word cooc - 86.5%
[9,50] Medline abstracts UMLS terms UMLS term cooc 35 biomedical abbreviations 93%P
[10] abbreviations in Medline abstracts - SVM build dictionary, use for abbreviations occurring with their long forms 98.5%
[11] gene symbol context (n words +/-) - SVM - 85%

Unsupervised [19,20] document - LSA/LSI, 2nd order cooc 170,000 documents, 1013 terms (TREC-1) (Wall Street Journal) ↑ 7–14%
[51] word cooc, PoS tags WordNet average link clustering 13 words, ACL/DCI 73.4%
[21] Wall Street Journal Corpus
[22] - - 1st, 2nd order context vectors (coocs within 5 positions) 24 Senseval-2 words, Line, Hard, Serve corpora 44%
[23] text few tagged data, WordNet co-training, collocations 12 common Engl. words × 4000 instances 96.5%
[25] - - co-training & majority voting Senseval-2 generic English ↑ 9.8%
[24] - WordNet noun coocs, Markov clustering - -