. Author manuscript; available in PMC: 2023 Aug 29.

Published in final edited form as: Proc Mach Learn Res. 2022 Jul;162:26559–26574.

Table 1:

Set Transformer can perform worse (underlined) with layer norm than with no normalization, particularly when inputs are real-valued. Results are test loss over three seeds (CE for Point Cloud, MSE for rest). Lower is better.

	No norm	Layer norm
Hematocrit	18.7436 ± 0.0148	19.0904 ± 0.1003
Point Cloud	0.9217 ± 0.0119	0.9219 ± 0.0052
Normal Var	0.0023 ± 0.0006	0.0801 ± 0.0076