Table 1.

Glossary of NLP methods and metrics

NLP methods
Key term search	Identify and extract terms from pre-specified list of terms of interest.
Named entity recognition	Locate and translate terms, or named entities, into predefined categories of concepts, often using controlled medical vocabularies.
Rule-based methods	Detect concepts of interest based on an established set of rules or logic, often using regular expressions, which are sequences of characters that define a search pattern.
Convolutional neural network	A deep learning neural network approach identifying, weighting, and connecting “nodes” across multiple convolutional “layers” of nodes (including a convolutional layer) and applying filters between layers.
Conditional random fields	A classification approach that accounts for context in order to recognize patterns and make predictions.
Decision tree	Hierarchical trees of knowledge used to classify concepts of interest.
Logistic regression	A basic building block for neural networks; a classification approach used to discover links between concepts of interest.
Random forest	An “ensemble” of decision trees built using a combination of learning models and used to produce more accurate and stable predictions.
Recurrent neural network	A deep learning neural network approach designed to interpret temporal or sequential information and used to make predictions.
Evaluation Methods
Manual annotation	The task of reading pre-selected texts and marking (i.e., annotating) linguistic components (paragraphs, sentences, phrases, or words) that represents concepts of interest.
Cross-validation; also called held out testing set	A technique to evaluate predictive models by partitioning the original sample into a training set to train the model, and a test set to evaluate it.
Performance metrics
Positive predictive value (PPV); also called precision	The percentage of results that were actually relevant among all results that the system obtained.
Negative predictive value (NPV)	The percentage of results that were actually irrelevant among all results that the system did not obtain.
Sensitivity; also called recall	The percentage of results that were actually obtained by the system among all results that should have been obtained.
Specificity	The percentage of results that were actually not obtained by the system among all results that should not have been obtained.
F-score	A combination of PPV/precision and sensitivity/recall; can be weighted to give more significance to one measure.
Accuracy	The percentage of results that were actually relevant among all results that were and were not obtained.
Area under the curve (AUC)	Reflects the degree to which a model is capable of classifying or distinguishing between classes or events of interest.