Abstract
We present research aimed at devising a tool for using natural language processing to identify and extract biomedical information from text for the purpose of assisting researchers in molecular biology manage large amounts of information. A pilot project based on the molecular genetics of diabetes demonstrates our ability to explore the interaction of genomic phenomena and clinical findings. We suggest the cooperation of this extracted information with systems for clustering text and constructing labeled networks of data.
Full text
PDF




Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Almind K., Doria A., Kahn C. R. Putting the genes for type II diabetes on the map. Nat Med. 2001 Mar;7(3):277–279. doi: 10.1038/85405. [DOI] [PubMed] [Google Scholar]
- Andrade M. A., Bork P. Automated extraction of information in molecular biology. FEBS Lett. 2000 Jun 30;476(1-2):12–17. doi: 10.1016/s0014-5793(00)01661-6. [DOI] [PubMed] [Google Scholar]
- Aronson A. R. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp. 2001:17–21. [PMC free article] [PubMed] [Google Scholar]
- Ding J., Berleant D., Nettleton D., Wurtele E. Mining MEDLINE: abstracts, sentences, or phrases? Pac Symp Biocomput. 2002:326–337. doi: 10.1142/9789812799623_0031. [DOI] [PubMed] [Google Scholar]
- Ehm M. G., Karnoub M. C., Sakul H., Gottschalk K., Holt D. C., Weber J. L., Vaske D., Briley D., Briley L., Kopf J. Genomewide search for type 2 diabetes susceptibility genes in four American populations. Am J Hum Genet. 2000 May 2;66(6):1871–1881. doi: 10.1086/302950. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hahn Udo, Romacker Martin, Schulz Stefan. Creating knowledge repositories from biomedical reports: the MEDSYNDIKATE text mining system. Pac Symp Biocomput. 2002:338–349. [PubMed] [Google Scholar]
- Hristovski D., Stare J., Peterlin B., Dzeroski S. Supporting discovery in medicine by association rule mining in Medline and UMLS. Stud Health Technol Inform. 2001;84(Pt 2):1344–1348. [PubMed] [Google Scholar]
- Jenssen T. K., Laegreid A., Komorowski J., Hovig E. A literature network of human genes for high-throughput analysis of gene expression. Nat Genet. 2001 May;28(1):21–28. doi: 10.1038/ng0501-21. [DOI] [PubMed] [Google Scholar]
- Jenssen T. K., Vinterbo S. A set-covering approach to specific search for literature about human genes. Proc AMIA Symp. 2000:384–388. [PMC free article] [PubMed] [Google Scholar]
- Masys D. R., Welsh J. B., Lynn Fink J., Gribskov M., Klacansky I., Corbeil J. Use of keyword hierarchies to interpret gene expression patterns. Bioinformatics. 2001 Apr;17(4):319–326. doi: 10.1093/bioinformatics/17.4.319. [DOI] [PubMed] [Google Scholar]
- Rindflesch T. C., Tanabe L., Weinstein J. N., Hunter L. EDGAR: extraction of drugs, genes and relations from the biomedical literature. Pac Symp Biocomput. 2000:517–528. doi: 10.1142/9789814447331_0049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Srinivasan P. MeSHmap: a text mining tool for MEDLINE. Proc AMIA Symp. 2001:642–646. [PMC free article] [PubMed] [Google Scholar]
- Stephens M., Palakal M., Mukhopadhyay S., Raje R., Mostafa J. Detecting gene relations from Medline abstracts. Pac Symp Biocomput. 2001:483–495. doi: 10.1142/9789814447362_0047. [DOI] [PubMed] [Google Scholar]
- Tanabe L., Scherf U., Smith L. H., Lee J. K., Hunter L., Weinstein J. N. MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques. 1999 Dec;27(6):1210-4, 1216-7. doi: 10.2144/99276bc03. [DOI] [PubMed] [Google Scholar]
- Watanabe R. M., Ghosh S., Langefeld C. D., Valle T. T., Hauser E. R., Magnuson V. L., Mohlke K. L., Silander K., Ally D. S., Chines P. The Finland-United States investigation of non-insulin-dependent diabetes mellitus genetics (FUSION) study. II. An autosomal genome scan for diabetes-related quantitative-trait loci. Am J Hum Genet. 2000 Oct 13;67(5):1186–1200. [PMC free article] [PubMed] [Google Scholar]
- Weeber M., Klein H., Aronson A. R., Mork J. G., de Jong-van den Berg L. T., Vos R. Text-based discovery in biomedicine: the architecture of the DAD-system. Proc AMIA Symp. 2000:903–907. [PMC free article] [PubMed] [Google Scholar]
- Wheeler David L., Church Deanna M., Lash Alex E., Leipe Detlef D., Madden Thomas L., Pontius Joan U., Schuler Gregory D., Schriml Lynn M., Tatusova Tatiana A., Wagner Lukas. Database resources of the National Center for Biotechnology Information: 2002 update. Nucleic Acids Res. 2002 Jan 1;30(1):13–16. doi: 10.1093/nar/30.1.13. [DOI] [PMC free article] [PubMed] [Google Scholar]