Abstract
Recent years have seen a huge increase in the amount of biomedical information that is available in electronic format. Consequently, for biomedical researchers wishing to relate their experimental results to relevant data lurking somewhere within this expanding universe of on-line information, the ability to access and navigate biomedical information sources in an efficient manner has become increasingly important. Natural language and text processing techniques can facilitate this task by making the information contained in textual resources such as MEDLINE more readily accessible and amenable to computational processing. Names of biological entities such as genes and proteins provide critical links between different biomedical information sources and researchers' experimental data. Therefore, automatic identification and classification of these terms in text is an essential capability of any natural language processing system aimed at managing the wealth of biomedical information that is available electronically. To support term recognition in the biomedical domain, we have developed Termino, a large-scale terminological resource for text processing applications, which has two main components: first, a database into which very large numbers of terms can be loaded from resources such as UMLS, and stored together with various kinds of relevant information; second, a finite state recognizer, for fast and efficient identification and mark-up of terms within text. Since many biomedical applications require this functionality, we have made Termino available to the community as a web service, which allows for its integration into larger applications as a remotely located component, accessed through a standardized interface over the web.
Full Text
The Full Text of this article is available as a PDF (216.3 KB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Chen Hao, Sharp Burt M. Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics. 2004 Oct 8;5:147–147. doi: 10.1186/1471-2105-5-147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gaizauskas R., Demetriou G., Artymiuk P. J., Willett P. Protein structures and information extraction from biological texts: the PASTA system. Bioinformatics. 2003 Jan;19(1):135–143. doi: 10.1093/bioinformatics/19.1.135. [DOI] [PubMed] [Google Scholar]
- Gene Ontology Consortium Creating the gene ontology resource: design and implementation. Genome Res. 2001 Aug;11(8):1425–1433. doi: 10.1101/gr.180801. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hahn Udo, Romacker Martin, Schulz Stefan. Creating knowledge repositories from biomedical reports: the MEDSYNDIKATE text mining system. Pac Symp Biocomput. 2002:338–349. [PubMed] [Google Scholar]
- Hirschman Lynette, Morgan Alexander A., Yeh Alexander S. Rutabaga by any other name: extracting biological names. J Biomed Inform. 2002 Aug;35(4):247–259. doi: 10.1016/s1532-0464(03)00014-5. [DOI] [PubMed] [Google Scholar]
- Wain Hester M., Lush Michael, Ducluzeau Fabrice, Povey Sue. Genew: the human gene nomenclature database. Nucleic Acids Res. 2002 Jan 1;30(1):169–171. doi: 10.1093/nar/30.1.169. [DOI] [PMC free article] [PubMed] [Google Scholar]
