Table 3:
The reported NLP systems or toolkits*
| Systems/toolkits | Full Names | Purposes** | Techniques*** |
|---|---|---|---|
| Generic toolkits | |||
| ConText | N/A | Feature extraction and representation: sentence classification | Rule-based NLP |
| FMA | Freetext Matching Algorithm (FMA) | Preprocessing: annotation Feature extraction and representation: information extraction | Rule-based NLP |
| GATE | General Architecture for Text Engineering | Preprocessing: tokenization, sentence splitting, POS tagging, annotation Feature extraction and representation: name entity recognition, information extraction Non-neural ML: sentence classification | Hybrid NLP: rule-based, non-neural ML (support vector machine; SVM, WEKA ML) |
| GENSIM (R) | GENSIM | Feature extraction and representation: topic modeling and word embedding | Non-neural ML |
| Heidel Time | High quality rule-based extraction and normalization of temporal expressions | Preprocessing: tagging, normalization Feature extraction and representation: information extraction | Rule-based NLP |
| MALLET | MAchine Learning for LanguagE Toolkit | Feature extraction and representation: document classification, clustering, topic modeling, information extraction | Non-neural ML |
| NegEX | N/A | Feature extraction and representation: negation, affirmation | Rule-based NLP |
| Punkt Sentence Tokenizer | nltk.tokenize.punkt module | Preprocessing: tokenization | Non-neural ML |
| Protégé | Feature extraction and representation: ontology editor or framework | Rule-based NLP | |
| NLTK (Python) | Natural Language Toolkit | Preprocessing: text tokenization, stemming, stop word removal, classification, clustering, POS tagging, parsing, and semantic reasoning | Non-neural ML |
| SUTime | Stanford NLP annotator | Preprocessing: annotation, recognizing and normalizing time expressions (TIMEx) | Rule-based NLP |
| Tagger_Date Normalizer plugin | Not available | Preprocessing: tagging, normalization | Rule-based NLP |
| TextHunter | N/A | Preprocessing: tokenization, stemming, POS tagging Feature extraction and representation: information extraction Non-neural ML: automated concept identification | Hybrid NLP: rule-based, non-neural ML (SVM based “batch learning”) |
| WordNet | N/A | Lexical database for NLP (ontology) | Rule-based NLP |
| Clinical NLP toolkits | |||
| Clamp | Clinical Language Annotation, Modeling, and Processing Toolkit | Preprocessing: tokenization, POS tagging, annotation, sentence boundary detection, section header identification Feature extraction and representation: assertion, negation, name entity recognition, UMLS encoder Non-neural ML: sentence boundary detetion, section header identification, classification | Hybrid NLP: rule-based NLP, non-neural ML (conditional random fields, CRF). |
| cTAKES | clinical Text Analysis and Knowledge Extraction System | Preprocessing: sentence boundary detection, tokenization, parsing, dictionary lookup annotation, normalization, POS tagging Feature extraction and representation: affirmation, negationname d section identification, name entity recognition, information extraction Non-neural ML: classification of medical information | Hybrid NLP: rule-based NLP, non-neural ML (CRF, SVM) |
| CliX NLP | Clinical NLP tools for SNOMED CT | Feature extraction and representation: processing system based on SNOMEDCT | Rule-based NLP |
| ClinREAD | Rapid Clinical Note Mining for New Languages | Preprocessing: tokenization, POS tagging, vocabulary mapping Feature extraction and representation: name entity recognition | Rule-based NLP |
| MedLEE | Medical Language Extraction and Encoding System Processing System | Preprocessing: parsing Feature extraction: clinical entities extraction, assertions Non-neural ML: word disambiguation, classification of medical information, generate rules for classifying medical conditions | Hybrid NLP: rule-based NLP, non-neural ML (CRF, SVM) |
| MedTagger | N/A | Preprocessing: tokenization, POS tagging Feature extraction and representation: information extraction, assertion, negation Non-neural ML: sentence detection, concept identification, patient level risk factor classification | Hybrid NLP: rule-based NLP, non-neural ML (WEKA ML) |
| MetaMap | N/A | Preprocessing: vocabulary mapping, parsing, tokenization, POS tagging, sentence boundary determination Feature extraction and representation: lexical lookup of input words in the SPECIALIST lexicon – an information extraction system based on UMLS | Rule-based NLP |
| MTERMS | Medical Text Extraction, Reasoning and Mapping System | Preprocessing: parsing, tokenization, POS tagging, vocabulary mapping, information extraction Feature extraction and representation: affirmation, negation | Rule-based NLP |
| MedEx | Medication Information Extraction System for Clinical Narratives | Preprocessing: tokenizer, tagging, semantic tagger, parsing, encoding Feature extraction and representation: information extraction | Rule-based NLP |
| NLP-PAC | NLP algorithms for Predetermined Asthma Criteria | Feature extraction and representation: information extraction, affirmation, negation | Rule-based NLP |
| V3NLP | Not available | Preprocessing: annotation Feature extraction and representation: information extraction | Rule-based NLP |
See Supplementary Table S4 for a list of references;
The purpose of NLP systems/toolkits used to process unstructured PROs;
Hybrid NLP approach uses both rule-based and ML-based methods