Skip to main content
. 2017 Sep 22;8:44. doi: 10.1186/s13326-017-0153-x

Table 3.

General purpose biomedical semantic annotation tools (Part I)

cTAKES [4] NOBLE Coder [20] MetaMap [31, 32] NCBO annotator [14]
Modularity/configuration options Modular text processing pipeline Vocabulary (terminology);
Term matching options and strategies
Text processing pipeline;
Vocabulary (terminology);
Term matching options and strategies
Vocabulary (terminology);
Term matching options
Disambiguation of terms Enabled through integration of the YTEX component [8] Instead of through WSD, it uses heuristics to choose one concept among candidate concepts for the same piece of input text Supported; based on:
- removal of word senses based on a manual study of UMLS ambiguity
- a WSD algorithm that chooses a concept with the most likely semantic type for a given context
Not supported
Vocabulary (terminology) Subset of UMLS, namely SNOMED CT and RxNORM Several pre-built vocabularies, based on subsets of UMLS
(e.g. SNOMED CT, MeSH, RxNORM)
UMLS Metathesaurus UMLS Metathesaurus and BioPortal ontologies (over 330 ontologies)
Speed* Suitable for real-time processing Suitable for real-time processing Better for off-line batch processing Suitable for real-time processing
Implementation form Software (Java) library;
Stand-alone application
Software (Java) library;
Stand-alone application
Software library;
originally version in Prolog;
Java implementation, known as MMTX, is also available
RESTful Web service
Availability open source;
available under Apache License, v.2.0
open-source;
available under GNU Lesser General Public License v3
open source;
terms and conditions at: https://metamap.nlm.nih.gov/MMTnCs.shtml
closed source, but freely available
Specific features Better performance on clinical texts than on biomedical scientific literature (its NLP components are trained on clinical texts) Offers user interface for creating custom terminologies (to be used for annotation) by selecting and merging elements from several different thesauri/ontologies Primarily developed for annotation of biomedical literature (MEDLINE/PubMed citations); performs better on this kind of text than clinical notes It uses MGrep term-to-concept matching tool to get primary set of annotations; these are then extended using different forms of ontology-based semantic matching
URL http://ctakes.apache.org/ http://noble-tools.dbmi.pitt.edu/ https://metamap.nlm.nih.gov/ https://bioportal.bioontology.org/annotator

*Note that speed estimates are based on the experimental results reported in the literature; those experiments were done with corpora of up to 200 documents (paper abstracts or clinical notes); the given estimates might not hold for significantly larger corpora