α
|
Learning rate |
β
1, β2
|
Exponential decay rate of the first-order and second-order moment estimation, respectively |
T, t
|
The maximum iterations and the current t time step, respectively |
β
1
t
, β2t
|
Product of exponential decay rate of the first and second-order moment estimation at t time step, respectively, (1 − β1)∑i=1tβ1t−i=1 − β1t and (1 − β2)∑i=1tβ2t−i=1 − β2t
|
m
t
|
The first-order moment vector at t time step |
v
t
|
The second-order moment vector at t time step |
g
t
|
Current gradient at t time step |
β
1,t
|
Adaptive coefficient |
u
t
|
Prediction gradient |
D
t
|
Random diagonal matrix at t time step |
d
i
t
|
The ith diagonal element of Dt with independent identical Bernoulli distribution |
θ
t
|
The parameter that needs to be optimized |
f
t
|
The sequence of the smooth convex loss function |
P
∗
|
Global optimal position |