Table 7.
Overall accuracy on the data set
| Data set | NB | AEC | JDI | MRD | 2-MRD |
|---|---|---|---|---|---|
| Abbreviation Set | 0.9716 | 0.9090 | 0.8759 | 0.8501 | |
| Abbreviation Subset | 0.9760 | 0.9218 | 0.6725 | 0.8838 | 0.8725 |
| Term Set | 0.8980 | 0.7462 | 0.7148 | 0.6773 | |
| Term Subset | 0.8991 | 0.7448 | 0.6209 | 0.7132 | 0.6609 |
| Term/Abbreviation Set | 0.9384 | 0.8879 | 0.8801 | 0.9356 | |
| Term/Abbreviation Subset | 0.9360 | 0.9026 | 0.6899 | 0.8715 | 0.9350 |
| Overall MSH WSD Set | 0.9386 | 0.8383 | 0.8070 | 0.7799 | |
| Overall MSH WSD Subset | 0.9413 | 0.8448 | 0.6551 | 0.8118 | 0.7837 |
| NLM WSD | 0.8830 | 0.6836 | 0.6389 | 0.5500 | |
| NLM WSD Subset | 0.9063 | 0.6932 | 0.7475 | 0.6526 | 0.5800 |
NB stands for Naïve Bayes, AEC stands for Automatic Extracted Corpus, MRD stands for Machine Readable dictionary, 2-MRD stands for 2nd Order Co-occurrence MRD, and JDI stands for Journal Descriptor Indexing. The term set stands for all the ambiguous words in the category while subset indicates that only the words that the JDI method can use are considered. Results on the NLM WSD set have been included.