Table 4.
Author< | NLP/Text Mining | Primary Evaluation Metric | Comparative Evaluation |
---|---|---|---|
Brennan & Aronson, 200316 | NLP | Number of matched terms to vocabularies; reported matched frequencies higher for nursing complemented vocabularies | Compared several models on nursing, MeSH, and SNOMED terms |
Portier et al, 201333 | Text mining | Descriptive results including sentiment analysis | None reported |
Freifeld et al, 201427 | NLP | Automated, dictionary-based symptom classification had 72% recall and 86% precision | Results of annotation were compared to FDA Adverse Event Reporting System data |
Gupta et al, 201419 | NLP | Extracts symptoms and conditions with a F-measure of 66–76% | Compared performance of two other programs, the OBA and the MetaMap annotator, with baseline and default parameters |
Park & Ryu, 201425 | Text mining | Descriptive results including symptoms and clinical distinctions | None reported |
Janies et al, 201528 | NLP | No reported algorithm evaluation metrics | None reported |
Jimeno-Yepes et al, 201520 | NLP | Highest performing model (Micromed+Meta) had precision, recall, and F-measure as 72%, 60%, and 66%, respectively | Compared across exact and partials matches for five models |
Karmen et al, 201521 | NLP | Average precision of 84% and an average F-measure of 79% | Compared algorithm results to independent expert ratings |
Liu & Chen, 201523 | NLP | Average F-measure of 90% for drug entity extraction and average F- measure of 80% for medical event extraction | Compared several methods across patient-authored forums |
Nikfarjam et al, 201524 | NLP | 86%, 78%, and 82% for precision, recall, and F-measure, respectively | Comparison between several methods including SVM, ADRMine, MetaMap, and a lexicon-based method |
Tighe et al, 201532 | Text mining | Descriptive results including the average degree centrality of the reduced pain tweet corpus graph was 60.7 | Compared sentiment for relevant terms to objective terms |
Eshleman & Singh, 201618 | NLP & text mining | Precision exceeding 85% and F- measure over 81% | Compared sentiment analysis to graph topology with co-occurring symptoms |
Lee & Donovan, 201635 | Text mining | Descriptive results for symptom findings | None reported |
Marshall et al, 201629 | Text mining | Descriptive results including cooccurrence and clustering for symptom findings | None reported |
Topaz et al, 201630 | NLP | Descriptive results including symptom extraction | None reported |
Sunkureddi et al, 201634 | NLP | Descriptive results including frequency ranking of reactions and patients’ concerns | None reported |
Cocos et al, 201717 | NLP | Approximate match F-measure for RNN of 75% for ADR identification | Compared the BLSTM-RNN ADR classification, a baseline lexicon system, and a condition random-field model |
Cronin et al, 201736 | NLP | Logistic regression for medical communications with AUC of 0.899 | Compared naive bayes, logistic regression, and random forest across different types of patient portal messages |
Lamy et al, 201722 | NLP | No reported algorithm evaluation metrics | None reported |
Lu et al, 201731 | Text mining | Descriptive results including sentiment scores, clustering of groups, and Jaccard similarities | None reported |
Patel et al, 201726 | NLP | No reported algorithm evaluation metrics | Compared method between two datasets |
Note.
Studies have been arranged in chronological order to assess trends over time; ADR=adverse drug reaction; ADRMine (a machine learning-based concept extraction system that uses Conditional Random Fields); AUC=area under the curve; BLST=Binarized Long Short-Term Memory Network; FDA=Federal Drug Administration; F-measure=also known as F1 score or F-score in the published literature; MeSH=Medical Subject Headings; NLP=Natural Language Processing; OBA=open biomedical annotator; MetaMap (tool for recognizing Unified Medical Language System [UMLS] concepts in text); RNN=recurrent neural network; SVM=support vector machine