Skip to main content
. Author manuscript; available in PMC: 2023 Aug 29.
Published in final edited form as: Proc Mach Learn Res. 2022 Jul;162:26559–26574.

Table 1:

Set Transformer can perform worse (underlined) with layer norm than with no normalization, particularly when inputs are real-valued. Results are test loss over three seeds (CE for Point Cloud, MSE for rest). Lower is better.

No norm Layer norm
Hematocrit 18.7436 ± 0.0148 19.0904 ± 0.1003
Point Cloud 0.9217 ± 0.0119 0.9219 ± 0.0052
Normal Var 0.0023 ± 0.0006 0.0801 ± 0.0076