Skip to main content
Proceedings of the AMIA Symposium logoLink to Proceedings of the AMIA Symposium
. 2002:489–493.

Free-text medical document retrieval via phrase-based vector space model.

Wenlei Mao 1, Wesley W Chu 1
PMCID: PMC2244442  PMID: 12463872

Abstract

Many information retrieval systems are based on vector space model (VSM) that represents a document as a vector of index terms. Concepts have been proposed to replace word stems as the index terms to improve retrieval accuracy. However, past research revealed that such systems did not outperform the traditional stem-based systems. Incorporating conceptual similarity derived from knowledge sources should have the potential to improve retrieval accuracy. Yet the incompleteness of the knowledge source precludes significant improvement. To remedy this problem, we propose to represent documents using phrases. A phrase consists of multiple concepts and word stems. The similarity between two phrases is jointly determined by their conceptual similarity and their common word stems. The document similarity can in turn be derived from phrase similarities. Using OHSUMED as a test collection and UMLS as the knowledge source, our experiment results reveal that phrase-based VSM yields a 16% increase of retrieval accuracy compared to the stem-based model.

Full text

PDF
493

Articles from Proceedings of the AMIA Symposium are provided here courtesy of American Medical Informatics Association

RESOURCES