Skip to main content
. Author manuscript; available in PMC: 2022 Sep 8.
Published in final edited form as: Proc Mach Learn Res. 2021 Apr;130:10–18.

Algorithm 1.

Training LassoNet

1: Input: training dataset Xn×d, training labels Y, feed-forward neural network fW(·), number of epochs B, hierarchy multiplier M, path multiplier ϵ, learning rate α
2: Initialize and train the feed-forward network on the loss L(X, Y; θ, W)
3: Initialize the penalty, λ = ϵ, and the number of active features, k = d
4: while k > 0 do
5:  Update λ ← (1 + ϵ)λ
6: for b ∈ {1... B} do
7:   Compute gradient of the loss w.r.t to (θ, W) using back-propagation
8:   Update θ ← θαθL and W ← W − αW L
9:   Update (θ, W(1)) ← Hier-Prox(θ, W(1), λ, M)
10: end for
11:  Update k to be the number of non-zero coordinates of θ
12: end while
13: where Hier-Prox is defined in Alg. 2