Skip to main content
. Author manuscript; available in PMC: 2019 Jan 1.
Published in final edited form as: J Biomed Inform. 2017 Nov 21;77:34–49. doi: 10.1016/j.jbi.2017.11.011

Table 2.

IE frameworks, tools and toolkits used in the included publications.

Name Description License Website No. of
Papers
Frameworks
UIMA Software framework for the analysis of unstructured contents like: text, video and audio data Apache https://uima.apache.org/ 31
GATE Java-based open-source software for various NLP tasks such as information extraction and semantic annotation GNU Lesser General Public License https://gate.ac.uk/ 5
Protégé Open-source ontology editor and framework for building intelligent systems MIT License http://protege.stanford.edu/ 1
Tools
cTAKES Open-source NLP system based on UIMA framework for extraction of information from electronic health records unstructured clinical text Apache http://ctakes.apache.org/ 26
MetaMap National Institutes of Health (NIH)-developed NLP tool that maps biomedical text to UMLS concepts UMLS Metathesaurus https://metamap.nlm.nih.gov/ 12
MedLEE NLP system that extracts, structures, and encodes clinical information from narrative clinical notes NLP International for commercial use http://zellig.cpmc.columbia.edu/medlee/ 10
KnowledgeMap Concept Indexer (KMCI) NLP system that identifies biomedical concepts and maps them to UMLS concepts Vanderbilt License https://medschool.vanderbilt.edu/cpm/center-precision-medicine-blog/kmci-knowledgemap-concept-indexer 4
HITEx Open-source NLP tool built on top of the GATE framework for various tasks such as principal diagnoses extraction and smoking status extraction i2b2 Software License Agreement https://www.i2b2.org/software/projects/hitex/hitex_manual.html 4
MedEx NLP tool used to recognize drug names, dose, route, and frequency from free-text clinical records Apache https://medschool.vanderbilt.edu/cpm/center-precision-medicine-blog/medex-tool-finding-medication-information 4
MedTagger Open-source NLP pipeline based on UIMA framework for indexing based on dictionaries, information extraction, and machine learning–based named entity recognition from clinical text Apache http://ohnlp.org/index.php/MedTagger 3
ARC Automated retrieval console (ARC) is an open- source NLP pipeline that converts unstructured text to structured data such as Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT) or UMLS codes Apache http://blulab.chpc.utah.edu/content/arc-automated-retrieval-console 2
Medtex Clinical NLP software that extracts meaningful information from narrative text to facilitate clinical staff in decision-making process No license information available https://aehrc.com/research/projects/medical-free-text-retrieval-and-analytics/#medtex 2
CLAMP NLP software system based on UIMA framework for clinical language annotation, modeling, processing and machine learning Software Research License https://sbmi.uth.edu/ccb/resources/clamp.htm 1
MedXN A tool to extract comprehensive medication information from clinical narratives and normalize it to RxNorm Apache http://ohnlp.org/index.php/MedXN 1
MedTime A tool to extract temporal information from clinical narratives and normalize it to the TIMEX3 standard GNU General Public License http://ohnlp.org/index.php/MedTime 1
PredMED NLP application developed by IBM to extract full prescriptions from narrative clinical notes Commercial 1
SAS Text Miner A plug-in for SAS Enterprise Miner environment provides tools that enable you to extract information from a collection of text documents and uncover the themes and concepts that are concealed in them. Commercial 1
Toolkits
WEKA Open source toolkit that contains various machine learning algorithms for data-mining tasks GNU General Public License http://www.cs.waikato.ac.nz/ml/weka/ 5
MALLET Java-based package for various NLP tasks such as document classification, information extraction, and topic modeling Common Public License http://mallet.cs.umass.edu/ 4
OpenNLP Open-source machine learning toolkit for processing of natural language text Apache https://opennlp.apache.org/ 1
NLTK Python-based NLP toolkit for natural language text Apache http://www.nltk.org/ 1
SPLAT Statistical parsing and linguistic analysis toolkit (SPLAT) is a linguistic analysis toolkit for natural language developed by Microsoft research Commercial https://www.microsoft.com/en-us/research/project/msr-splat/ 1