. Author manuscript; available in PMC: 2023 Aug 29.

Published in final edited form as: Proc Mach Learn Res. 2022 Jul;162:26559–26574.

Table 2:

Set norm can improve performance of 50-layer Deep Sets, while layer norm does not (Point Cloud, MNIST Var). In some cases, normalization alone is not enough to overcome vanishing gradients (Hematocrit, Normal Var). Table reports test loss (CE for Point Cloud, MSE otherwise). Lower is better. Uunderlined results are notable failures.

	no norm	layer norm	set norm
Hematocrit	25.879 ± 0.001	25.875 ± 0.002	25.875 ± 0.002
Point Cloud	3.609 ± 0.000	3.619 ± 0.000	1.542 ± 0.086
MNIST Var	5.555 ± 0.001	5.565 ± 0.001	0.259 ± 0.003
Normal Var	8.4501 ± 0.0031	8.4498 ± 0.0054	8.4433 ± 0.0011