Skip to main content
. Author manuscript; available in PMC: 2024 Apr 1.
Published in final edited form as: Transact Mach Learn Res. 2023 Jun;2023:https://openreview.net/forum?id=K0CAGgjYS1.

Table 1:

Effects of per-sample gradient clipping on gradient flow. Here “Yes/No” means guaranteed or not and the loss refers to the training set. “Loss convergence” is conditioned on H(t)0.

Clipping type NTK matrix Symmetric NTK Positive in quadratic form Positive in eigenvalues Loss convergence Monotone loss decay To zero loss
No clipping HrHr
Batch clipping cHcrHr
Large R clipping (Flat & layerwise) HrHr
Small R clipping (Flat) HC
Small R clipping (Layerwise) rHrCr