Skip to main content
. Author manuscript; available in PMC: 2024 Dec 1.
Published in final edited form as: Artif Intell Med. 2023 Nov 1;146:102701. doi: 10.1016/j.artmed.2023.102701

Table 3:

The reported NLP systems or toolkits*

Systems/toolkits Full Names Purposes** Techniques***
Generic toolkits
ConText N/A Feature extraction and representation: sentence classification Rule-based NLP
FMA Freetext Matching Algorithm (FMA) Preprocessing: annotation Feature extraction and representation: information extraction Rule-based NLP
GATE General Architecture for Text Engineering Preprocessing: tokenization, sentence splitting, POS tagging, annotation Feature extraction and representation: name entity recognition, information extraction Non-neural ML: sentence classification Hybrid NLP: rule-based, non-neural ML (support vector machine; SVM, WEKA ML)
GENSIM (R) GENSIM Feature extraction and representation: topic modeling and word embedding Non-neural ML
Heidel Time High quality rule-based extraction and normalization of temporal expressions Preprocessing: tagging, normalization Feature extraction and representation: information extraction Rule-based NLP
MALLET MAchine Learning for LanguagE Toolkit Feature extraction and representation: document classification, clustering, topic modeling, information extraction Non-neural ML
NegEX N/A Feature extraction and representation: negation, affirmation Rule-based NLP
Punkt Sentence Tokenizer nltk.tokenize.punkt module Preprocessing: tokenization Non-neural ML
Protégé Feature extraction and representation: ontology editor or framework Rule-based NLP
NLTK (Python) Natural Language Toolkit Preprocessing: text tokenization, stemming, stop word removal, classification, clustering, POS tagging, parsing, and semantic reasoning Non-neural ML
SUTime Stanford NLP annotator Preprocessing: annotation, recognizing and normalizing time expressions (TIMEx) Rule-based NLP
Tagger_Date Normalizer plugin Not available Preprocessing: tagging, normalization Rule-based NLP
TextHunter N/A Preprocessing: tokenization, stemming, POS tagging Feature extraction and representation: information extraction Non-neural ML: automated concept identification Hybrid NLP: rule-based, non-neural ML (SVM based “batch learning”)
WordNet N/A Lexical database for NLP (ontology) Rule-based NLP
Clinical NLP toolkits
Clamp Clinical Language Annotation, Modeling, and Processing Toolkit Preprocessing: tokenization, POS tagging, annotation, sentence boundary detection, section header identification Feature extraction and representation: assertion, negation, name entity recognition, UMLS encoder Non-neural ML: sentence boundary detetion, section header identification, classification Hybrid NLP: rule-based NLP, non-neural ML (conditional random fields, CRF).
cTAKES clinical Text Analysis and Knowledge Extraction System Preprocessing: sentence boundary detection, tokenization, parsing, dictionary lookup annotation, normalization, POS tagging Feature extraction and representation: affirmation, negationname d section identification, name entity recognition, information extraction Non-neural ML: classification of medical information Hybrid NLP: rule-based NLP, non-neural ML (CRF, SVM)
CliX NLP Clinical NLP tools for SNOMED CT Feature extraction and representation: processing system based on SNOMEDCT Rule-based NLP
ClinREAD Rapid Clinical Note Mining for New Languages Preprocessing: tokenization, POS tagging, vocabulary mapping Feature extraction and representation: name entity recognition Rule-based NLP
MedLEE Medical Language Extraction and Encoding System Processing System Preprocessing: parsing Feature extraction: clinical entities extraction, assertions Non-neural ML: word disambiguation, classification of medical information, generate rules for classifying medical conditions Hybrid NLP: rule-based NLP, non-neural ML (CRF, SVM)
MedTagger N/A Preprocessing: tokenization, POS tagging Feature extraction and representation: information extraction, assertion, negation Non-neural ML: sentence detection, concept identification, patient level risk factor classification Hybrid NLP: rule-based NLP, non-neural ML (WEKA ML)
MetaMap N/A Preprocessing: vocabulary mapping, parsing, tokenization, POS tagging, sentence boundary determination Feature extraction and representation: lexical lookup of input words in the SPECIALIST lexicon – an information extraction system based on UMLS Rule-based NLP
MTERMS Medical Text Extraction, Reasoning and Mapping System Preprocessing: parsing, tokenization, POS tagging, vocabulary mapping, information extraction Feature extraction and representation: affirmation, negation Rule-based NLP
MedEx Medication Information Extraction System for Clinical Narratives Preprocessing: tokenizer, tagging, semantic tagger, parsing, encoding Feature extraction and representation: information extraction Rule-based NLP
NLP-PAC NLP algorithms for Predetermined Asthma Criteria Feature extraction and representation: information extraction, affirmation, negation Rule-based NLP
V3NLP Not available Preprocessing: annotation Feature extraction and representation: information extraction Rule-based NLP
*

See Supplementary Table S4 for a list of references;

**

The purpose of NLP systems/toolkits used to process unstructured PROs;

***

Hybrid NLP approach uses both rule-based and ML-based methods