Table 2.
Detailed settings of WheatFormer variants.
| Models | C 1,C 2,C 3,C 4 | N 1,N 2,N 3,N 4 | #Head | #Expansion | #Params (MB) |
|---|---|---|---|---|---|
| WheatFormer-S | [96, 192, 384, 768] | [2, 2, 2, 2] | 32 | α=4 | 42.4 |
| WheatFormer-B | [96, 192, 384, 768] | [2, 2, 6, 2] | 32 | α=4 | 60.1 |
| WheatFormer-L | [96, 192, 384, 768] | [2, 2, 18, 2] | 32 | α=4 | 100.6 |
Ci , channel number of the hidden layers in each stage; Ni , layer numbers in each stage; #Head, query dimension of each head; #Expansion, expansion layer of each multilayer perceptron; #Params, amount of model parameters.