. Author manuscript; available in PMC: 2013 Jul 21.

Published in final edited form as: Neural Comput. 2012 May 17;24(9):2473–2507. doi: 10.1162/NECO_a_00321

Algorithm 2.

Stochastic Gradient Descent (SGD) with Endogenous Learning Rate.

t := 1

α_i:= 0 and V_i:= −1 for i = 1, …, N L

do {

Choose one index from k ∈ [1, …, N L].

α^{new} = {\hat{α}}_{k} - η_{eff} \frac{V_{k} (t)}{y_{k} G_{k k}}

α^new = max{C, α^new} and α^new = min{−C, α^new}

Initialize the KKT distance: KKT := 0

loop over all i = 1, …, N L

V_i := V_i + y_i(α^new − α̂_k)G_ik

KKT := KKT + KKT distance(V_i, y_iα̂_i)

end loop

KKT := KKT/(N L)

α̂_k = α^new

t:= t + 1

} while (KKT > θ)

Note: N is the number of data points, L is the number of classes, η_eff is the learning rate, and θ is the stopping criterion. Note that this algorithm needs to compute the V_i values.