Skip to main content
. 2023 Jul 27;14:1221557. doi: 10.3389/fpls.2023.1221557

Table 3.

ViT model’s implementation details.

Layer No. Layer Name Input Shape Connected to Output Shape Parameters
1 Input Layer 256×256×3 256×256×3 0
2 Patches 256×256×3 Input Layer 256×768 0
3 Patch Encoder 256×768 Patches 256×256 262400
4 Layer Normalization #1 256×256 Patch Encoder 256×256 512
5 Multi-Head Attention #1 256×256 Layer Normalization #1 256×256 3155200
6 Add #1 256×256 , 256×256 Multi-Head attention #1, Patch Encoder 256×256 0
7 Layer Normalization #2 256×256 Add #1 256×256 512
8 MLP Module #1 Dense #1 256×256 Layer Normalization #2 256×512 131584
Dense #2 256×512 Dense #1 256×256 131328
9 Add #2 256×256 , 256×256 Dense #2,
Add #1
256×256 0
10 Add #3 256×256 , 256×256 Add #2,
Patch Encoder
256×256 0
11 Layer Normalization #3 256×256 Add #3 256×256 512
12 Multi-Head attention #2 256×256 Layer Normalization #3 256×256 3155200
13 Add #4 256×256 , 256×256 Multi-Head attention #2, Add #3 256×256 0
14 Layer Normalization #4 256×256 Add #4 256×256 512
15 MLP Module #2 Dense #3 256×256 Layer Normalization #4 256×512 131584
Dense #4 256×512 Dense #3 256×256 131328
16 Add #5 256×256 , 256×256 Dense #4,
Add #4
256×256 0
17 GlobalAveragePooling1D #1 256×256 Add #5 256 0
18 Dense #5 256 GlobalAveragePooling1D #1 64 16448
19 Dense #6 (Output Layer) 64 Dense #1 4 (for the Maize dataset),
38 (for the PlantVillage dataset)
262 , 2470
Total Weight Parameters 7117382 (for the Maize dataset)
7119590 (for the PlantVillage dataset)

The value written in bold font highlights the layers of MLP module in this table. These layers are the only difference between existing ViT model and our proposed TrIncNet model.