. 2024 May 3;69(10):10TR01. doi: 10.1088/1361-6560/ad387d

Table 1.

Common glossaries in natural language processing (NLP).

Term	Definition	Term	Definition	Term	Definition
Machine learning (ML)	A branch of computer science dealing with the simulation of intelligent behavior in computers	Deep learning (DL)	An advanced subset of ML that uses layered neural networks to analyze various factors of data	Convolutional neural networks (CNNs)	A type of deep neural network, primarily used to analyze visual imagery
Natural language processing (NLP)	A branch of AI focused on enabling computers to understand, interpret, and manipulate human language	Transfer learning	A method in ML where a model developed for a task is reused as the starting point for a model on a second task	Reinforcement learning	An area of ML concerned with how agents ought to take actions in an environment to maximize cumulative reward
Vision-language models	Models that understand and generate content combining visual and textual data	Bag of words (BoW)	A simple text representation model in NLP. It treats text as a collection of words regardless of their order or grammar	Corpus	A large collection of text documents or spoken language data used for training and testing NLP models
Embedding	Vector representations of words in a continuous space, capturing semantic relationships for NLP model improvement	Feature engineering	Selecting and transforming linguistic features from raw text for NLP model input preparation	Hidden markov model (HMM)	A statistical model in NLP representing sequence data, useful in tasks like speech recognition
Information retrieval	The process in NLP of retrieving relevant information from large text collections based on user queries	Jaccard similarity	A measure for comparing the similarity of two sets of words or documents based on shared elements	Keyword extraction	Automatically identifying and extracting key words or phrases from a document for summarization
Lemmatization	Reducing words to their base or root form in various word forms like singular/plural or verb tenses	Machine translation	An NLP task for automatically translating text from one language to another computationally	Named entity recognition (NER)	Identifying and classifying named entities (people, places, organizations) in text
Ontology	Formally representing knowledge by defining concepts and entities and their relationships in NLP	Question answering	An NLP task of generating accurate answers to questions posed in natural language	Recurrence	Using recurrent neural networks (RNNs) in NLP for processing data sequences in language modeling
Sentiment analysis	Determining the emotional tone or sentiment of text, typically as positive, negative, or neutral	Tokenization	Breaking text data into individual units (words, n-grams) for analysis in NLP tasks	Unsupervised learning	Training NLP models on data without explicit labels, allowing independent pattern learning
Vector space model (VSM)	A mathematical model transforming text into numerical vectors for similarity calculations in NLP	Word sense disambiguation (WSD)	Identifying the correct meaning of a word in context, especially for words with multiple meanings	Zero-shot learning	Training models to perform tasks they have not explicitly been trained on, used in various NLP applications