Table 3.
cTAKES [4] | NOBLE Coder [20] | MetaMap [31, 32] | NCBO annotator [14] | |
---|---|---|---|---|
Modularity/configuration options | Modular text processing pipeline | Vocabulary (terminology); Term matching options and strategies |
Text processing pipeline; Vocabulary (terminology); Term matching options and strategies |
Vocabulary (terminology); Term matching options |
Disambiguation of terms | Enabled through integration of the YTEX component [8] | Instead of through WSD, it uses heuristics to choose one concept among candidate concepts for the same piece of input text | Supported; based on: - removal of word senses based on a manual study of UMLS ambiguity - a WSD algorithm that chooses a concept with the most likely semantic type for a given context |
Not supported |
Vocabulary (terminology) | Subset of UMLS, namely SNOMED CT and RxNORM | Several pre-built vocabularies, based on subsets of UMLS (e.g. SNOMED CT, MeSH, RxNORM) |
UMLS Metathesaurus | UMLS Metathesaurus and BioPortal ontologies (over 330 ontologies) |
Speed* | Suitable for real-time processing | Suitable for real-time processing | Better for off-line batch processing | Suitable for real-time processing |
Implementation form | Software (Java) library; Stand-alone application |
Software (Java) library; Stand-alone application |
Software library; originally version in Prolog; Java implementation, known as MMTX, is also available |
RESTful Web service |
Availability | open source; available under Apache License, v.2.0 |
open-source; available under GNU Lesser General Public License v3 |
open source; terms and conditions at: https://metamap.nlm.nih.gov/MMTnCs.shtml |
closed source, but freely available |
Specific features | Better performance on clinical texts than on biomedical scientific literature (its NLP components are trained on clinical texts) | Offers user interface for creating custom terminologies (to be used for annotation) by selecting and merging elements from several different thesauri/ontologies | Primarily developed for annotation of biomedical literature (MEDLINE/PubMed citations); performs better on this kind of text than clinical notes | It uses MGrep term-to-concept matching tool to get primary set of annotations; these are then extended using different forms of ontology-based semantic matching |
URL | http://ctakes.apache.org/ | http://noble-tools.dbmi.pitt.edu/ | https://metamap.nlm.nih.gov/ | https://bioportal.bioontology.org/annotator |
*Note that speed estimates are based on the experimental results reported in the literature; those experiments were done with corpora of up to 200 documents (paper abstracts or clinical notes); the given estimates might not hold for significantly larger corpora