Skip to main content
. Author manuscript; available in PMC: 2022 Sep 8.
Published in final edited form as: Proc Mach Learn Res. 2021 Apr;130:10–18.

Algorithm 2.

Hierarchical Proximal Operator

1: procedure Hier-Prox(θ, W(1); λ, M)
2: for j ∈ {1,..., d} do
3:   Sort the entries of Wj(1) into |W(j,1)(1)||W(j,K)(1)|
4:   for m ∈ {0, ..., K} do
5:    Compute wmM1+mM2Sλθj+M·i=1m|W(j,i)(1)|
6:    Find the first m such that |W(j,m+1)(1)|wm|W(j,m)(1)|
7:   end for
8: θ˜j1Msignθjwm
9: W˜j(1)signWj(1)minwm,Wj(1)
10: end for
11: return θ˜,W˜(1)
12: end procedure
13: Notation: d denotes the number of features; K denotes the size of the first hidden layer.
14: Conventions: Ln. 6, W(j,K+1)(1)=0, W(j,0)(1)=+; Ln. 9, minimum is applied coordinate-wise.