Established Knowledge |
[12] |
gene definition & abstract vector |
5 human gen. dbs & MeSH |
cosine similarity |
52,529 Medline abstracts, 690 human gene symbols |
92.7% |
|
[13] |
free text |
UMLS, Journal Descriptors |
Journal Descriptor Indexing (JDI) |
45 ambiguous UMLS terms (NLM WSD Collection) |
78.7% |
|
[14] |
Medline abstracts |
BioCreative-2 GN lexicon & text, EntrezGene, UniProt, GOA |
motifs from multiple sequence alignments |
BioCreative-2 GN challenge |
81% |
|
[15] |
Medline abstracts |
list of gene senses, EntrezGene |
inverse co-author graph |
BioCreative GN challenge |
97%P |
|
Supervised |
[8] |
XML tagged abstracts, positional info, PoS |
- |
naive Bayes, decision trees, inductive rule training |
protein/gene/mRNA assignment: 9 million words (mol. biol. journals) |
85% |
|
[49] |
text |
- |
word count, word cooc |
- |
86.5% |
|
[9,50] |
Medline abstracts |
UMLS terms |
UMLS term cooc |
35 biomedical abbreviations |
93%P |
|
[10] |
abbreviations in Medline abstracts |
- |
SVM |
build dictionary, use for abbreviations occurring with their long forms |
98.5% |
|
[11] |
gene symbol context (n words +/-) |
- |
SVM |
- |
85% |
|
Unsupervised |
[19,20] |
document |
- |
LSA/LSI, 2nd order cooc |
170,000 documents, 1013 terms (TREC-1) (Wall Street Journal) |
↑ 7–14% |
|
[51] |
word cooc, PoS tags |
WordNet |
average link clustering |
13 words, ACL/DCI |
73.4% |
|
[21] |
|
|
|
Wall Street Journal Corpus |
|
|
[22] |
- |
- |
1st, 2nd order context vectors (coocs within 5 positions) |
24 Senseval-2 words, Line, Hard, Serve corpora |
44% |
|
[23] |
text |
few tagged data, WordNet |
co-training, collocations |
12 common Engl. words × 4000 instances |
96.5% |
|
[25] |
- |
- |
co-training & majority voting |
Senseval-2 generic English |
↑ 9.8% |
|
[24] |
- |
WordNet |
noun coocs, Markov clustering |
- |
- |