Skip to main content
. 2023 Jul 27;14:1221557. doi: 10.3389/fpls.2023.1221557

Table 4.

TrIncNet model’s implementation details.

Layer No. Layer Name Input Shape Connected to Output Shape Parameters
1 Input Layer 256×256×3 256×256×3 0
2 Patches 256×256×3 Input Layer 256×768 0
3 Patch Encoder 256×768 Patches 256×256 262400
4 Layer Normalization #1 256×256 Patch Encoder 256×256 512
5 Multi-Head Attention #1 256×256 Layer Normalization #1 256×256 3155200
6 Add #1 256×256 , 256×256 Multi-Head attention #1, Patch Encoder 256×256 0
7 Layer Normalization #2 256×256 Add #1 256×256 512
8 Inception Module #1 Reshape #1 256×256 Layer Normalization #2 16×16×256 0
Conv2D #1 16×16×256 Reshape #1 16×16×96 24672
Conv2D #2 16×16×256 Reshape #1 16×16×16 4112
Conv2D #3 16×16×256 Reshape #1 16×16×64 16448
MaxPooling2D #1 16×16×256 Reshape #1 16×16×256 0
Conv2D #4 16×16×96 Conv2D #1 16×16×128 110720
Conv2D #5 16×16×16 Conv2D #2 16×16×32 12832
Conv2D #6 16×16×256 MaxPooling2D #1 16×16×32 8224
Concatenate #1 16×16×64, 16×16×128 , 16×16×32 , 16×16×32 Conv2D #3,
Conv2D #4,
Conv2D #5,
Conv2D #6
16×16×256 0
Reshape #2 16×16×256 Concatenate #1 256×256 0
9 Add #2 256×256 , 256×256 Reshape #2,
Add #1
256×256 0
10 Add #3 256×256 , 256×256 Add #2,
Patch Encoder
256×256 0
11 Layer Normalization #3 256×256 Add #3 256×256 512
12 Multi-Head attention #2 256×256 Layer Normalization #3 256×256 3155200
13 Add #4 256×256 , 256×256 Multi-Head attention #2, Add #3 256×256 0
14 Layer Normalization #4 256×256 Add #4 256×256 512
15 Inception Module #2 Reshape #3 256×256 Layer Normalization #4 16×16×256 0
Conv2D #7 16×16×256 Reshape #3 16×16×96 24672
Conv2D #8 16×16×256 Reshape #3 16×16×16 4112
Conv2D #9 16×16×256 Reshape #3 16×16×64 16448
MaxPooling2D #2 16×16×256 Reshape #3 16×16×256 0
Conv2D #10 16×16×96 Conv2D #7 16×16×128 110720
Conv2D #11 16×16×16 Conv2D #8 16×16×32 12832
Conv2D #12 16×16×256 MaxPooling2D #2 16×16×32 8224
Concatenate #2 16×16×64, 16×16×128 , 16×16×32 , 16×16×32 Conv2D #9,
Conv2D #10,
Conv2D #11,
Conv2D #12
16×16×256 0
Reshape #4 16×16×256 Concatenate #2 256×256 0
16 Add #5 256×256 , 256×256 Reshape #4,
Add #4
256×256 0
17 GlobalAveragePooling1D #1 256×256 Add #5 256 0
18 Dense #1 256 GlobalAveragePooling2D #1 64 16448
19 Dense #2 (Output Layer) 64 Dense #1 4 (for the Maize dataset),
38 (for the Plant-Village dataset)
262 , 2470
Total Weight Parameters 6945574 (for the Maize dataset)
6947782 (for the PlantVillage dataset)

The value written in bold font highlights the layers of Inception module in this table. These layers are the only difference between existing ViT model and our proposed TrIncNet model.