Skip to main content
. 2024 May 3;69(10):10TR01. doi: 10.1088/1361-6560/ad387d

Table 1.

Common glossaries in natural language processing (NLP).

Term Definition Term Definition Term Definition
Machine learning (ML) A branch of computer science dealing with the simulation of intelligent behavior in computers Deep learning (DL) An advanced subset of ML that uses layered neural networks to analyze various factors of data Convolutional neural networks (CNNs) A type of deep neural network, primarily used to analyze visual imagery
Natural language processing (NLP) A branch of AI focused on enabling computers to understand, interpret, and manipulate human language Transfer learning A method in ML where a model developed for a task is reused as the starting point for a model on a second task Reinforcement learning An area of ML concerned with how agents ought to take actions in an environment to maximize cumulative reward
Vision-language models Models that understand and generate content combining visual and textual data Bag of words (BoW) A simple text representation model in NLP. It treats text as a collection of words regardless of their order or grammar Corpus A large collection of text documents or spoken language data used for training and testing NLP models
Embedding Vector representations of words in a continuous space, capturing semantic relationships for NLP model improvement Feature engineering Selecting and transforming linguistic features from raw text for NLP model input preparation Hidden markov model (HMM) A statistical model in NLP representing sequence data, useful in tasks like speech recognition
Information retrieval The process in NLP of retrieving relevant information from large text collections based on user queries Jaccard similarity A measure for comparing the similarity of two sets of words or documents based on shared elements Keyword extraction Automatically identifying and extracting key words or phrases from a document for summarization
Lemmatization Reducing words to their base or root form in various word forms like singular/plural or verb tenses Machine translation An NLP task for automatically translating text from one language to another computationally Named entity recognition (NER) Identifying and classifying named entities (people, places, organizations) in text
Ontology Formally representing knowledge by defining concepts and entities and their relationships in NLP Question answering An NLP task of generating accurate answers to questions posed in natural language Recurrence Using recurrent neural networks (RNNs) in NLP for processing data sequences in language modeling
Sentiment analysis Determining the emotional tone or sentiment of text, typically as positive, negative, or neutral Tokenization Breaking text data into individual units (words, n-grams) for analysis in NLP tasks Unsupervised learning Training NLP models on data without explicit labels, allowing independent pattern learning
Vector space model (VSM) A mathematical model transforming text into numerical vectors for similarity calculations in NLP Word sense disambiguation (WSD) Identifying the correct meaning of a word in context, especially for words with multiple meanings Zero-shot learning Training models to perform tasks they have not explicitly been trained on, used in various NLP applications