Table 3.
ViT model’s implementation details.
Layer No. | Layer Name | Input Shape | Connected to | Output Shape | Parameters | ||
---|---|---|---|---|---|---|---|
1 | Input Layer | – | |||||
2 | Patches | Input Layer | |||||
3 | Patch Encoder | Patches | |||||
4 | Layer Normalization #1 | Patch Encoder | |||||
5 | Multi-Head Attention #1 | Layer Normalization #1 | |||||
6 | Add #1 | , | Multi-Head attention #1, Patch Encoder | 0 | |||
7 | Layer Normalization #2 | Add #1 | |||||
8 | MLP Module #1 | Dense #1 | Layer Normalization #2 | ||||
Dense #2 | Dense #1 | ||||||
9 | Add #2 | , | Dense #2, Add #1 |
0 | |||
10 | Add #3 | , | Add #2, Patch Encoder |
0 | |||
11 | Layer Normalization #3 | Add #3 | |||||
12 | Multi-Head attention #2 | Layer Normalization #3 | |||||
13 | Add #4 | , | Multi-Head attention #2, Add #3 | 0 | |||
14 | Layer Normalization #4 | Add #4 | |||||
15 | MLP Module #2 | Dense #3 | Layer Normalization #4 | ||||
Dense #4 | Dense #3 | ||||||
16 | Add #5 | , | Dense #4, Add #4 |
0 | |||
17 | GlobalAveragePooling1D #1 | Add #5 | 0 | ||||
18 | Dense #5 | GlobalAveragePooling1D #1 | |||||
19 | Dense #6 (Output Layer) | Dense #1 | 4 (for the Maize dataset), 38 (for the PlantVillage dataset) |
, | |||
Total Weight Parameters |
7117382 (for the Maize dataset)
7119590 (for the PlantVillage dataset) |
The value written in bold font highlights the layers of MLP module in this table. These layers are the only difference between existing ViT model and our proposed TrIncNet model.