Skip to main content
. Author manuscript; available in PMC: 2019 Dec 23.
Published in final edited form as: Adv Neural Inf Process Syst. 2019 Dec;32:9392–9402.

Figure 4: Scaling with hidden feature representation dimensions.

Figure 4:

We plot model quality versus the hidden dimension size. The slice-aware model (OURS) improves over hard parameter sharing (HPS) on both slices at a fixed hidden dimension size, while being close to mixture of experts (MoE).

Note: MoE has significantly more parameters overall, as it copies the entire model.