Skip to main content
Proceedings of the AMIA Symposium logoLink to Proceedings of the AMIA Symposium
. 2000:923–927.

Medical text representations for inductive learning.

A Wilcox 1, G Hripcsak 1
PMCID: PMC2243822  PMID: 11080019

Abstract

Inductive learning algorithms have been proposed as methods for classifying medical text reports. Many of these proposed techniques differ in the way the text is represented for use by the learning algorithms. Slight differences can occur between representations that may be chosen arbitrarily, but such differences can significantly affect classification algorithm performance. We examined 8 different data representation techniques used for medical text, and evaluated their use with standard machine learning algorithms. We measured the loss of classification-relevant information due to each representation. Representations that captured status information explicitly resulted in significantly better performance. Algorithm performance was dependent on subtle differences in data representation.

Full text

PDF
923

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Aronow D. B., Fangfang F., Croft W. B. Ad hoc classification of radiology reports. J Am Med Inform Assoc. 1999 Sep-Oct;6(5):393–411. doi: 10.1136/jamia.1999.0060393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Chapman W. W., Haug P. J. Comparing expert systems for identifying chest x-ray reports that support pneumonia. Proc AMIA Symp. 1999:216–220. [PMC free article] [PubMed] [Google Scholar]
  3. Fiszman M., Chapman W. W., Evans S. R., Haug P. J. Automatic identification of pneumonia related concepts on chest x-ray reports. Proc AMIA Symp. 1999:67–71. [PMC free article] [PubMed] [Google Scholar]
  4. Grier J. B. Nonparametric indexes for sensitivity and bias: computing formulas. Psychol Bull. 1971 Jun;75(6):424–429. doi: 10.1037/h0031246. [DOI] [PubMed] [Google Scholar]
  5. Haug P. J., Koehler S., Lau L. M., Wang P., Rocha R., Huff S. M. Experience with a mixed semantic/syntactic parser. Proc Annu Symp Comput Appl Med Care. 1995:284–288. [PMC free article] [PubMed] [Google Scholar]
  6. Hersh W. R., Leen T. K., Rehfuss P. S., Malveau S. Automatic prediction of trauma registry procedure codes from emergency room dictations. Stud Health Technol Inform. 1998;52(Pt 1):665–669. [PubMed] [Google Scholar]
  7. Hripcsak G., Friedman C., Alderson P. O., DuMouchel W., Johnson S. B., Clayton P. D. Unlocking clinical data from narrative reports: a study of natural language processing. Ann Intern Med. 1995 May 1;122(9):681–688. doi: 10.7326/0003-4819-122-9-199505010-00007. [DOI] [PubMed] [Google Scholar]
  8. Jain N. L., Knirsch C. A., Friedman C., Hripcsak G. Identification of suspected tuberculosis patients based on natural language processing of chest radiograph reports. Proc AMIA Annu Fall Symp. 1996:542–546. [PMC free article] [PubMed] [Google Scholar]
  9. Kossovsky M. P., Sarasin F. P., Bolla F., Gaspoz J. M., Borst F. Distinction between planned and unplanned readmissions following discharge from a Department of Internal Medicine. Methods Inf Med. 1999 Jun;38(2):140–143. [PubMed] [Google Scholar]
  10. Lin R., Lenert L., Middleton B., Shiffman S. A free-text processing system to capture physical findings: Canonical Phrase Identification System (CAPIS). Proc Annu Symp Comput Appl Med Care. 1991:843–847. [PMC free article] [PubMed] [Google Scholar]
  11. Peduzzi P., Concato J., Kemper E., Holford T. R., Feinstein A. R. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996 Dec;49(12):1373–1379. doi: 10.1016/s0895-4356(96)00236-3. [DOI] [PubMed] [Google Scholar]
  12. Wilcox A., Hripcsak G. Classification algorithms applied to narrative reports. Proc AMIA Symp. 1999:455–459. [PMC free article] [PubMed] [Google Scholar]
  13. Wilcox A., Hripcsak G. Knowledge discovery and data mining to assist natural language understanding. Proc AMIA Symp. 1998:835–839. [PMC free article] [PubMed] [Google Scholar]
  14. Yang Y., Chute C. G. An application of Expert Network to clinical classification and MEDLINE indexing. Proc Annu Symp Comput Appl Med Care. 1994:157–161. [PMC free article] [PubMed] [Google Scholar]
  15. Zingmond D., Lenert L. A. Monitoring free-text data using medical language processing. Comput Biomed Res. 1993 Oct;26(5):467–481. doi: 10.1006/cbmr.1993.1033. [DOI] [PubMed] [Google Scholar]
  16. de Estrada W. D., Murphy S., Barnett G. O. Puya: a method of attracting attention to relevant physical findings. Proc AMIA Annu Fall Symp. 1997:509–513. [PMC free article] [PubMed] [Google Scholar]

Articles from Proceedings of the AMIA Symposium are provided here courtesy of American Medical Informatics Association

RESOURCES