. 2020 Jun 10;120(16):8066–8129. doi: 10.1021/acs.chemrev.0c00004

Table 1. Common Technical Terms Used in ML and Their Meanings.

technical term	explanation
bagging	acronym for bootstrap aggregating, ensemble technique in which models are fitted on bootstrapped samples from the data and then averaged
bias	error that remains for infinite number of training examples, e.g., due to limited expressivity
boosting	ensemble technique in which weak learners are iteratively combined to build a stronger learner
bootstrapping	calculate statistics by randomly drawing samples with replacement
classification	process of assigning examples to a particular class
confidence interval	interval of confidence around predicted mean response
feature	vector with numeric encoding of a description of a material that the ML uses for learning
fidelity	measure of how close a model represents the real case
fitting	estimating parameters of some models with high accuracy
gradient descent	optimization by following the gradient, stochastic gradient descent approximates the gradient using a mini-batch of the available data
hyperparameters	tuning parameters of the learner (like learning rate, regularization strength) which, in contrast to model parameters, are not learned during training and have to be specified before training
instance based learning	learning by heart, query data are compared to training examples to make a prediction
irreducible error	error that cannot be reduced (e.g., due to noise in the data), i.e., that is also there for a perfect model. Also known as Bayes error rate
label (target)	the property one wants to predict
objective function (cost function)	the function that a ML algorithm tries to minimize
one-hot encoding	method to represent categorical variables by creating a feature column for each category and using value of one to encode the presence and zero to encode the absence
overfitting	the gap between training and test error is large, i.e., the model solely “remembers” the training data but fails to predict on unseen examples
predicting	making predictions for future samples with high accuracy
prediction interval	interval of confidence around predicted sample response, always wider than confidence interval
regression	process of estimating the continuous relationship between a dependent variable and one or more independent variables
regularization	describes techniques that add terms or information to the model to avoid overfitting
stratification	data is divided in homogeneous subgroups (strata) such that sampling will not disturb the class distributions
structured data	data that is organized in tables with rows and columns, i.e., data that resides in relational databases
test set	collection of labels and feature vectors that is used for model evaluation and which must not overlap with the training set
training set	collection of labels and feature vectors that is used for training
transfer	use knowledge gained on one distribution to perform inference on another distribution
unstructured data	e.g., image, video, audio, text. i.e., data that is not organized in a tabular form
validation set	also known as development set, collection of labels and feature vectors that is used for hyperparameter tuning and which must not overlap with the test and training sets
variance	part of the error that is due to finite-size effects (e.g., fluctuations due to random split in training and test set)