Table 1:
Effects of per-sample gradient clipping on gradient flow. Here “Yes/No” means guaranteed or not and the loss refers to the training set. “Loss convergence” is conditioned on .
| Clipping type | NTK matrix | Symmetric NTK | Positive in quadratic form | Positive in eigenvalues | Loss convergence | Monotone loss decay | To zero loss |
|---|---|---|---|---|---|---|---|
| No clipping | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| Batch clipping | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| Large R clipping (Flat & layerwise) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| Small R clipping (Flat) | ✗ | ✗ | ✓ | ✗ | ✗ | ✓ | |
| Small R clipping (Layerwise) | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |