Skip to main content
. Author manuscript; available in PMC: 2024 Mar 8.
Published in final edited form as: Proc Mach Learn Res. 2023 Aug;219:285–307.

Table C.4:

Hyperparameter settings for Set Transformer across different data splits.

Set Transformer
Hyperparameter split1 split2 split3
Learning rate 0.0010 0.0008 0.0008
Weight decay 0.00003 0.0001 0.00001
Learning rate schedule cosine cosine cosine