. Author manuscript; available in PMC: 2024 Dec 1.

Published in final edited form as: Artif Intell Med. 2023 Nov 1;146:102701. doi: 10.1016/j.artmed.2023.102701

Table 3:

The reported NLP systems or toolkits^*

Systems/toolkits	Full Names	Purposes^**	Techniques^***
Generic toolkits
ConText	N/A	Feature extraction and representation: sentence classification	Rule-based NLP
FMA	Freetext Matching Algorithm (FMA)	Preprocessing: annotation Feature extraction and representation: information extraction	Rule-based NLP
GATE	General Architecture for Text Engineering	Preprocessing: tokenization, sentence splitting, POS tagging, annotation Feature extraction and representation: name entity recognition, information extraction Non-neural ML: sentence classification	Hybrid NLP: rule-based, non-neural ML (support vector machine; SVM, WEKA ML)
GENSIM (R)	GENSIM	Feature extraction and representation: topic modeling and word embedding	Non-neural ML
Heidel Time	High quality rule-based extraction and normalization of temporal expressions	Preprocessing: tagging, normalization Feature extraction and representation: information extraction	Rule-based NLP
MALLET	MAchine Learning for LanguagE Toolkit	Feature extraction and representation: document classification, clustering, topic modeling, information extraction	Non-neural ML
NegEX	N/A	Feature extraction and representation: negation, affirmation	Rule-based NLP
Punkt Sentence Tokenizer	nltk.tokenize.punkt module	Preprocessing: tokenization	Non-neural ML
Protégé		Feature extraction and representation: ontology editor or framework	Rule-based NLP
NLTK (Python)	Natural Language Toolkit	Preprocessing: text tokenization, stemming, stop word removal, classification, clustering, POS tagging, parsing, and semantic reasoning	Non-neural ML
SUTime	Stanford NLP annotator	Preprocessing: annotation, recognizing and normalizing time expressions (TIMEx)	Rule-based NLP
Tagger_Date Normalizer plugin	Not available	Preprocessing: tagging, normalization	Rule-based NLP
TextHunter	N/A	Preprocessing: tokenization, stemming, POS tagging Feature extraction and representation: information extraction Non-neural ML: automated concept identification	Hybrid NLP: rule-based, non-neural ML (SVM based “batch learning”)
WordNet	N/A	Lexical database for NLP (ontology)	Rule-based NLP
Clinical NLP toolkits
Clamp	Clinical Language Annotation, Modeling, and Processing Toolkit	Preprocessing: tokenization, POS tagging, annotation, sentence boundary detection, section header identification Feature extraction and representation: assertion, negation, name entity recognition, UMLS encoder Non-neural ML: sentence boundary detetion, section header identification, classification	Hybrid NLP: rule-based NLP, non-neural ML (conditional random fields, CRF).
cTAKES	clinical Text Analysis and Knowledge Extraction System	Preprocessing: sentence boundary detection, tokenization, parsing, dictionary lookup annotation, normalization, POS tagging Feature extraction and representation: affirmation, negationname d section identification, name entity recognition, information extraction Non-neural ML: classification of medical information	Hybrid NLP: rule-based NLP, non-neural ML (CRF, SVM)
CliX NLP	Clinical NLP tools for SNOMED CT	Feature extraction and representation: processing system based on SNOMEDCT	Rule-based NLP
ClinREAD	Rapid Clinical Note Mining for New Languages	Preprocessing: tokenization, POS tagging, vocabulary mapping Feature extraction and representation: name entity recognition	Rule-based NLP
MedLEE	Medical Language Extraction and Encoding System Processing System	Preprocessing: parsing Feature extraction: clinical entities extraction, assertions Non-neural ML: word disambiguation, classification of medical information, generate rules for classifying medical conditions	Hybrid NLP: rule-based NLP, non-neural ML (CRF, SVM)
MedTagger	N/A	Preprocessing: tokenization, POS tagging Feature extraction and representation: information extraction, assertion, negation Non-neural ML: sentence detection, concept identification, patient level risk factor classification	Hybrid NLP: rule-based NLP, non-neural ML (WEKA ML)
MetaMap	N/A	Preprocessing: vocabulary mapping, parsing, tokenization, POS tagging, sentence boundary determination Feature extraction and representation: lexical lookup of input words in the SPECIALIST lexicon – an information extraction system based on UMLS	Rule-based NLP
MTERMS	Medical Text Extraction, Reasoning and Mapping System	Preprocessing: parsing, tokenization, POS tagging, vocabulary mapping, information extraction Feature extraction and representation: affirmation, negation	Rule-based NLP
MedEx	Medication Information Extraction System for Clinical Narratives	Preprocessing: tokenizer, tagging, semantic tagger, parsing, encoding Feature extraction and representation: information extraction	Rule-based NLP
NLP-PAC	NLP algorithms for Predetermined Asthma Criteria	Feature extraction and representation: information extraction, affirmation, negation	Rule-based NLP
V3NLP	Not available	Preprocessing: annotation Feature extraction and representation: information extraction	Rule-based NLP

See Supplementary Table S4 for a list of references;

^**

The purpose of NLP systems/toolkits used to process unstructured PROs;

^***

Hybrid NLP approach uses both rule-based and ML-based methods