Skip to main content
Proceedings of the AMIA Symposium logoLink to Proceedings of the AMIA Symposium
. 1999:671–675.

Extracting noun phrases for all of MEDLINE.

N A Bennett 1, Q He 1, K Powell 1, B R Schatz 1
PMCID: PMC2232564  PMID: 10566444

Abstract

A natural language parser that could extract noun phrases for all medical texts would be of great utility in analyzing content for information retrieval. We discuss the extraction of noun phrases from MEDLINE, using a general parser not tuned specifically for any medical domain. The noun phrase extractor is made up of three modules: tokenization; part-of-speech tagging; noun phrase identification. Using our program, we extracted noun phrases from the entire MEDLINE collection, encompassing 9.3 million abstracts. Over 270 million noun phrases were generated, of which 45 million were unique. The quality of these phrases was evaluated by examining all phrases from a sample collection of abstracts. The precision and recall of the phrases from our general parser compared favorably with those from three other parsers we had previously evaluated. We are continuing to improve our parser and evaluate our claim that a generic parser can effectively extract all the different phrases across the entire medical literature.

Full text

PDF
672

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Alper J. Taming MEDLINE with concept spaces. Science. 1998 Sep 18;281(5384):1785–1785. doi: 10.1126/science.281.5384.1785. [DOI] [PubMed] [Google Scholar]
  2. Do Amaral Marcio B., Satomura Y. Associating semantic grammars with the SNOMED: processing medical language and representing clinical facts into a language-independent frame. Medinfo. 1995;8(Pt 1):18–22. [PubMed] [Google Scholar]
  3. Friedman C., Alderson P. O., Austin J. H., Cimino J. J., Johnson S. B. A general natural-language text processor for clinical radiology. J Am Med Inform Assoc. 1994 Mar-Apr;1(2):161–174. doi: 10.1136/jamia.1994.95236146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Goldberg H. S., Hsu C., Law V., Safran C. Validation of clinical problems using a UMLS-based semantic parser. Proc AMIA Symp. 1998:805–809. [PMC free article] [PubMed] [Google Scholar]
  5. Lamiell J. M., Wojcik Z. M., Isaacks J. Computer auditing of surgical operative reports written in English. Proc Annu Symp Comput Appl Med Care. 1993:269–273. [PMC free article] [PubMed] [Google Scholar]
  6. McCray A. T. Extending a natural language parser with UMLS knowledge. Proc Annu Symp Comput Appl Med Care. 1991:194–198. [PMC free article] [PubMed] [Google Scholar]
  7. Schatz B. R. Information retrieval in digital libraries: bringing search to the net. Science. 1997 Jan 17;275(5298):327–334. doi: 10.1126/science.275.5298.327. [DOI] [PubMed] [Google Scholar]

Articles from Proceedings of the AMIA Symposium are provided here courtesy of American Medical Informatics Association

RESOURCES