Skip to main content
. Author manuscript; available in PMC: 2023 Aug 29.
Published in final edited form as: Proc Mach Learn Res. 2022 Jul;162:26559–26574.

Table 3:

Clean path residual connections outperform non-clean path residual connections both in Deep Sets and Set Transformer. Clean path residuals with set norm perform best overall. Results are test loss for deep architectures (50 layers Deep Set, 16 layers Set Transformer), lower is better.

Path Residual type Norm Hematocrit (MSE) Point Cloud (CE) Mnist Var (MSE) Normal Var (MSE)
Deep Sets non-clean path layer norm 19.6649 ± 0.0394 0.5974 ± 0.0022 0.3528 ± 0.0063 1.4658 ± 0.7259
feature norm 19.9801 ± 0.0862 0.6541 ± 0.0022 0.3371 ± 0.0059 0.8352 ± 0.3886
set norm 19.3146 ± 0.0409 0.6055 ± 0.0007 0.3421 ± 0.0022 0.2094 ± 0.1115
clean path layer norm 19.4192 ± 0.0173 0.63682± 0.0067 0.3997 ± 0.0302 0.0384 ± 0.0105
feature norm 19.3917 ± 0.0685 0.7148 ± 0.0164 0.3368 ± 0.0049 0.1195 ± 0.0000
set norm 19.2118 ± 0.0762 0.7096 ± 0.0049 0.3441 ± 0.0036 0.0198 ± 0.0041
Set Transformer non-clean path layer norm 19.1975 ± 0.1395 0.9219 ± 0.0052 2.0663 ± 1.0039 0.0801 ± 0.0076
feature norm 19.4968 ± 0.1442 0.8251 ±0.0025 0.4043 ± 0.0078 0.0691 ± 0.0146
set norm 19.0521 ±0.0288 1.9167 ± 0.4880 0.4064 ± 0.0147 0.0249 ± 0.0112
clean path layer norm 18.5747 ± 0.0263 0.6656 ± 0.0148 0.6383 ± 0.0020 0.0104 ± 0.0000
feature norm 19.1967± 0.0330 0.6188 ± 0.0141 0.7946 ±0.0065 0.0074 ± 0.0010
set norm 18.7008 ± 0.0183 0.6280 ± 0.0098 0.8023 ± 0.0038 0.0030 ± 0.0000