Skip to main content
. 2019 Dec 2;14(12):e0225577. doi: 10.1371/journal.pone.0225577

Table 2. Common data operations expressed as MLMs.

Operation Input Space X Output Space Y Parameters Morphism Empirical Risk Function
Data Encoding Abstract Space X Rm embedding parameters Injective map:
F:XRm
trivial (one—hot encoding)
or cost function, e.g. from [20]
PCA Rm Rc cN
ARm×c
x A X-XATF2
Such that: AAT = I
Linear Regression Rm R pRm xp Y-Xp22
Logistic Regression Rm [0, 1] pRm F(x.p)=11+exp(-x·p) Maximum Likelihood
i=1NF(xi;p)yi(1-F(xi;p))1-yi
SVM Rm {−1, 1} (w,b)(Rm,Rm)
slack variables
sRm, cR
wxb w2 + cs2
Decision Tree Set X {y1, y2, …, yk}
for finite k
splitting criterion Tree Gini Impurity
j=1N1-i=01P(Yj=i|Xj=xj)2
Information Gain see [21]
Standardization Rm Rm (c,s)(Rm,Rm) (xc)diag(s)1 KL Divergence, Eq 13
Adaboost Rm R parameters associated
with weak learners
i=1kFi(x) Exponential Loss [22]:
i=1Nexp(-yij=1kFj(xi))
Neural Networks Rm R Weights in Rw F = Fk(Fk−1(Fk−2(…F1(x))))) Loss functions,
examples include
Mean Squared Error:
YF(X)∥2,
Cross Entropy:
-1Ni=1Nyilog(F(xi))-
(1 − yi)log(1 − F(xi)) [23]
Model Evaluation Collection VYk R or Rk Evaluation parameters
Test/validation set
Performance Metric
e.g. Accuracy, Sensitivity, etc.
Complexity Criterion
or other objective
e.g. Aikeke Information Criterion