Abstract
This paper describes a unique approach, "Least Square Fit Mapping," to clinical data classification. We use large collections of human-assigned text-to-category matches as training sets to compute the correlations between physicians' terms and canonical concepts. A Linear Least Squares Fit (LLSF) technique is employed to obtain a mapping function which optimally fits the known matches given in a training set and probabilistically captures the unknown matches for arbitrary texts. We tested our method with 16,032 texts from the Mayo Clinic, and judged the results using human-assigned answers. In a test for comparison, the LLSF mapping achieved a precision rate of 89% at 100% recall, outperforming alternative approaches including string matching (36% precision), string matching enhanced by morphological parsing (51% precision), and statistical weighting (61% precision).
Full text
PDF




Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Chute C. G., Yang Y., Evans D. A. Latent Semantic Indexing of medical diagnoses using UMLS semantic structures. Proc Annu Symp Comput Appl Med Care. 1991:185–189. [PMC free article] [PubMed] [Google Scholar]
- Hersh W., Hickam D. H., Haynes R. B., McKibbon K. A. Evaluation of SAPHIRE: an automated approach to indexing and retrieving medical literature. Proc Annu Symp Comput Appl Med Care. 1991:808–812. [PMC free article] [PubMed] [Google Scholar]
- Salton G., Buckley C. Global text matching for information retrieval. Science. 1991 Aug 30;253(5023):1012–1015. doi: 10.1126/science.253.5023.1012. [DOI] [PubMed] [Google Scholar]
