Table 3:
Systems/toolkits | Full Names | Purposes** | Techniques*** |
---|---|---|---|
Generic toolkits | |||
ConText | N/A | Feature extraction and representation: sentence classification | Rule-based NLP |
FMA | Freetext Matching Algorithm (FMA) | Preprocessing: annotation Feature extraction and representation: information extraction | Rule-based NLP |
GATE | General Architecture for Text Engineering | Preprocessing: tokenization, sentence splitting, POS tagging, annotation Feature extraction and representation: name entity recognition, information extraction Non-neural ML: sentence classification | Hybrid NLP: rule-based, non-neural ML (support vector machine; SVM, WEKA ML) |
GENSIM (R) | GENSIM | Feature extraction and representation: topic modeling and word embedding | Non-neural ML |
Heidel Time | High quality rule-based extraction and normalization of temporal expressions | Preprocessing: tagging, normalization Feature extraction and representation: information extraction | Rule-based NLP |
MALLET | MAchine Learning for LanguagE Toolkit | Feature extraction and representation: document classification, clustering, topic modeling, information extraction | Non-neural ML |
NegEX | N/A | Feature extraction and representation: negation, affirmation | Rule-based NLP |
Punkt Sentence Tokenizer | nltk.tokenize.punkt module | Preprocessing: tokenization | Non-neural ML |
Protégé | Feature extraction and representation: ontology editor or framework | Rule-based NLP | |
NLTK (Python) | Natural Language Toolkit | Preprocessing: text tokenization, stemming, stop word removal, classification, clustering, POS tagging, parsing, and semantic reasoning | Non-neural ML |
SUTime | Stanford NLP annotator | Preprocessing: annotation, recognizing and normalizing time expressions (TIMEx) | Rule-based NLP |
Tagger_Date Normalizer plugin | Not available | Preprocessing: tagging, normalization | Rule-based NLP |
TextHunter | N/A | Preprocessing: tokenization, stemming, POS tagging Feature extraction and representation: information extraction Non-neural ML: automated concept identification | Hybrid NLP: rule-based, non-neural ML (SVM based “batch learning”) |
WordNet | N/A | Lexical database for NLP (ontology) | Rule-based NLP |
Clinical NLP toolkits | |||
Clamp | Clinical Language Annotation, Modeling, and Processing Toolkit | Preprocessing: tokenization, POS tagging, annotation, sentence boundary detection, section header identification Feature extraction and representation: assertion, negation, name entity recognition, UMLS encoder Non-neural ML: sentence boundary detetion, section header identification, classification | Hybrid NLP: rule-based NLP, non-neural ML (conditional random fields, CRF). |
cTAKES | clinical Text Analysis and Knowledge Extraction System | Preprocessing: sentence boundary detection, tokenization, parsing, dictionary lookup annotation, normalization, POS tagging Feature extraction and representation: affirmation, negationname d section identification, name entity recognition, information extraction Non-neural ML: classification of medical information | Hybrid NLP: rule-based NLP, non-neural ML (CRF, SVM) |
CliX NLP | Clinical NLP tools for SNOMED CT | Feature extraction and representation: processing system based on SNOMEDCT | Rule-based NLP |
ClinREAD | Rapid Clinical Note Mining for New Languages | Preprocessing: tokenization, POS tagging, vocabulary mapping Feature extraction and representation: name entity recognition | Rule-based NLP |
MedLEE | Medical Language Extraction and Encoding System Processing System | Preprocessing: parsing Feature extraction: clinical entities extraction, assertions Non-neural ML: word disambiguation, classification of medical information, generate rules for classifying medical conditions | Hybrid NLP: rule-based NLP, non-neural ML (CRF, SVM) |
MedTagger | N/A | Preprocessing: tokenization, POS tagging Feature extraction and representation: information extraction, assertion, negation Non-neural ML: sentence detection, concept identification, patient level risk factor classification | Hybrid NLP: rule-based NLP, non-neural ML (WEKA ML) |
MetaMap | N/A | Preprocessing: vocabulary mapping, parsing, tokenization, POS tagging, sentence boundary determination Feature extraction and representation: lexical lookup of input words in the SPECIALIST lexicon – an information extraction system based on UMLS | Rule-based NLP |
MTERMS | Medical Text Extraction, Reasoning and Mapping System | Preprocessing: parsing, tokenization, POS tagging, vocabulary mapping, information extraction Feature extraction and representation: affirmation, negation | Rule-based NLP |
MedEx | Medication Information Extraction System for Clinical Narratives | Preprocessing: tokenizer, tagging, semantic tagger, parsing, encoding Feature extraction and representation: information extraction | Rule-based NLP |
NLP-PAC | NLP algorithms for Predetermined Asthma Criteria | Feature extraction and representation: information extraction, affirmation, negation | Rule-based NLP |
V3NLP | Not available | Preprocessing: annotation Feature extraction and representation: information extraction | Rule-based NLP |
See Supplementary Table S4 for a list of references;
The purpose of NLP systems/toolkits used to process unstructured PROs;
Hybrid NLP approach uses both rule-based and ML-based methods