Skip to main content
Proceedings of the AMIA Symposium logoLink to Proceedings of the AMIA Symposium
. 2001:229–233.

Subword segmentation--leveling out morphological variations for medical document retrieval.

U Hahn 1, M Honeck 1, M Piotrowski 1, S Schulz 1
PMCID: PMC2243631  PMID: 11825186

Abstract

Many lexical items from medical sublanguages exhibit a complex morphological structure that is hard to account for by simple string matching (e.g., truncation). While inflection is usually easy to deal with, productive morphological processes in terms of derivation and (single-word) composition constitute a real challenge. We here propose an approach in which morphologically complex word forms are segmented into medically significant subwords. After segmentation, both query terms and document terms are submitted to the matching procedure. This way, problems arising from morphologically motivated word form alterations can be eliminated from the retrieval procedure. We provide empirical data which reveals that subword-based indexing and retrieval performs significantly better than conventional string matching approaches.

Full text

PDF
229

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Baud R. H., Lovis C., Rassinoux A. M., Scherrer J. R. Morpho-semantic parsing of medical expressions. Proc AMIA Symp. 1998:760–764. [PMC free article] [PubMed] [Google Scholar]
  2. Norton L. M., Pacak M. G. Morphosemantic analysis of compound word forms denoting surgical procedures. Methods Inf Med. 1983 Jan;22(1):29–36. [PubMed] [Google Scholar]
  3. Pacak M. G., Norton L. M., Dunham G. S. Morphosemantic analysis of -ITIS forms in medical language. Methods Inf Med. 1980 Apr;19(2):99–105. [PubMed] [Google Scholar]
  4. Pratt A. W., Pacak M. Identification and transformation of terminal morphemes in medical English. Methods Inf Med. 1969 Apr;8(2):84–90. [PubMed] [Google Scholar]
  5. Rector A. L., Bechhofer S., Goble C. A., Horrocks I., Nowlan W. A., Solomon W. D. The GRAIL concept modelling language for medical terminology. Artif Intell Med. 1997 Feb;9(2):139–171. doi: 10.1016/s0933-3657(96)00369-7. [DOI] [PubMed] [Google Scholar]
  6. Schulz S., Hahn U. Morpheme-based, cross-lingual indexing for medical document retrieval. Int J Med Inform. 2000 Sep;58-59:87–99. doi: 10.1016/s1386-5056(00)00078-2. [DOI] [PubMed] [Google Scholar]
  7. Wolff S. The use of morphosemantic regularities in the medical vocabulary for automatic lexical coding. Methods Inf Med. 1984 Oct;23(4):195–203. [PubMed] [Google Scholar]

Articles from Proceedings of the AMIA Symposium are provided here courtesy of American Medical Informatics Association

RESOURCES