Skip to main content
Proceedings of the AMIA Symposium logoLink to Proceedings of the AMIA Symposium
. 1998:835–839.

Knowledge discovery and data mining to assist natural language understanding.

A Wilcox 1, G Hripcsak 1
PMCID: PMC2232072  PMID: 9929336

Abstract

As natural language processing systems become more frequent in clinical use, methods for interpreting the output of these programs become increasingly important. These methods require the effort of a domain expert, who must build specific queries and rules for interpreting the processor output. Knowledge discovery and data mining tools can be used instead of a domain expert to automatically generate these queries and rules. C5.0, a decision tree generator, was used to create a rule base for a natural language understanding system. A general-purpose natural language processor using this rule base was tested on a set of 200 chest radiograph reports. When a small set of reports, classified by physicians, was used as the training set, the generated rule base performed as well as lay persons, but worse than physicians. When a larger set of reports, using ICD9 coding to classify the set, was used for training the system, the rule base performed worse than the physicians and lay persons. It appears that a larger, more accurate training set is needed to increase performance of the method.

Full text

PDF
835

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Braaten O. Artificial intelligence in pediatrics: important clinical signs in newborn syndromes. Comput Biomed Res. 1996 Jun;29(3):153–161. doi: 10.1006/cbmr.1996.0013. [DOI] [PubMed] [Google Scholar]
  2. Friedman C., Alderson P. O., Austin J. H., Cimino J. J., Johnson S. B. A general natural-language text processor for clinical radiology. J Am Med Inform Assoc. 1994 Mar-Apr;1(2):161–174. doi: 10.1136/jamia.1994.95236146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Gundersen M. L., Haug P. J., Pryor T. A., van Bree R., Koehler S., Bauer K., Clemons B. Development and evaluation of a computerized admission diagnoses encoding system. Comput Biomed Res. 1996 Oct;29(5):351–372. doi: 10.1006/cbmr.1996.0026. [DOI] [PubMed] [Google Scholar]
  4. Hripcsak G., Allen B., Cimino J. J., Lee R. Access to data: comparing AccessMed with Query by Review. J Am Med Inform Assoc. 1996 Jul-Aug;3(4):288–299. doi: 10.1136/jamia.1996.96413137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Hripcsak G., Friedman C., Alderson P. O., DuMouchel W., Johnson S. B., Clayton P. D. Unlocking clinical data from narrative reports: a study of natural language processing. Ann Intern Med. 1995 May 1;122(9):681–688. doi: 10.7326/0003-4819-122-9-199505010-00007. [DOI] [PubMed] [Google Scholar]
  6. Hripcsak G., Johnson S. B., Clayton P. D. Desperately seeking data: knowledge base-database links. Proc Annu Symp Comput Appl Med Care. 1993:639–643. [PMC free article] [PubMed] [Google Scholar]
  7. Jollis J. G., Ancukiewicz M., DeLong E. R., Pryor D. B., Muhlbaier L. H., Mark D. B. Discordance of databases designed for claims payment versus clinical information systems. Implications for outcomes research. Ann Intern Med. 1993 Oct 15;119(8):844–850. doi: 10.7326/0003-4819-119-8-199310150-00011. [DOI] [PubMed] [Google Scholar]
  8. Lenert L. A., Tovar M. Automated linkage of free-text descriptions of patients with a practice guideline. Proc Annu Symp Comput Appl Med Care. 1993:274–278. [PMC free article] [PubMed] [Google Scholar]
  9. Long W. J., Griffith J. L., Selker H. P., D'Agostino R. B. A comparison of logistic regression to decision-tree induction in a medical domain. Comput Biomed Res. 1993 Feb;26(1):74–97. doi: 10.1006/cbmr.1993.1005. [DOI] [PubMed] [Google Scholar]
  10. Ohmann C., Moustakis V., Yang Q., Lang K. Evaluation of automatic knowledge acquisition techniques in the diagnosis of acute abdominal pain. Acute Abdominal Pain Study Group. Artif Intell Med. 1996 Feb;8(1):23–36. doi: 10.1016/0933-3657(95)00018-6. [DOI] [PubMed] [Google Scholar]
  11. Quartararo M., Glasziou P., Kerr C. B. Classification trees for decision making in long-term care. J Gerontol A Biol Sci Med Sci. 1995 Nov;50(6):M298–M302. doi: 10.1093/gerona/50a.6.m298. [DOI] [PubMed] [Google Scholar]
  12. Spyns P. Natural language processing in medicine: an overview. Methods Inf Med. 1996 Dec;35(4-5):285–301. [PubMed] [Google Scholar]
  13. Zweigenbaum P. MENELAS: an access system for medical records using natural language. Comput Methods Programs Biomed. 1994 Oct;45(1-2):117–120. doi: 10.1016/0169-2607(94)90029-9. [DOI] [PubMed] [Google Scholar]

Articles from Proceedings of the AMIA Symposium are provided here courtesy of American Medical Informatics Association

RESOURCES