Skip to main content
. Author manuscript; available in PMC: 2023 Aug 29.
Published in final edited form as: Proc Mach Learn Res. 2022 Jul;162:26559–26574.

Table 2:

Set norm can improve performance of 50-layer Deep Sets, while layer norm does not (Point Cloud, MNIST Var). In some cases, normalization alone is not enough to overcome vanishing gradients (Hematocrit, Normal Var). Table reports test loss (CE for Point Cloud, MSE otherwise). Lower is better. Uunderlined results are notable failures.

no norm layer norm set norm
Hematocrit 25.879 ± 0.001 25.875 ± 0.002 25.875 ± 0.002
Point Cloud 3.609 ± 0.000 3.619 ± 0.000 1.542 ± 0.086
MNIST Var 5.555 ± 0.001 5.565 ± 0.001 0.259 ± 0.003
Normal Var 8.4501 ± 0.0031 8.4498 ± 0.0054 8.4433 ± 0.0011