. 2017 Mar 9;18:160. doi: 10.1186/s12859-017-1565-4

Table 1.

Prominent options for choosing loss function and regularizer in feature extraction algorithms

Name	Loss function (L)	Regularizer (R)
AIC/BIC	∥y−〈ω,x〉∥₂	∥ω∥₀
Lasso	∥y−〈ω,x〉∥₂	∥ω∥₁
Elastic Net	∥y−〈ω,x〉∥₂	$∥ ω ∥_{2}^{2}$ + ∥ω∥₁
Regularized Least Absolute
Deviations Regression	∥y−〈ω,x〉∥₁	∥ω∥₁
Classic SVM	max(0,1−y〈ω,x〉)^a	$\frac{1}{2} ∥ ω ∥_{2}^{2}$
ℓ ₁-SVM	max(0,1−y〈ω,x〉)^a	$\frac{1}{2} ∥ ω ∥_{1}$
Logistic Regression	log(1+exp(−y〈ω,x〉))	$\frac{1}{2} ∥ ω ∥_{1}$

^*This is the so called Hinge loss

The ℓ ₁- and ℓ ₂-norm of a vector z=(z ₁,…,z _d)∈ℝ ^d are defined by $∥ z ∥_{1} = \sum_{j = 1}^{d} | z_{i} |$ and $∥ z ∥_{2} = {(\sum_{j = 1}^{d} | z_{i} |^{2})}^{1 / 2}$ , respectively. The “ ℓ ₀-norm” ∥z∥₀, simply counts the number of non-zero entries of z