. 2023 Jul 27;14:1221557. doi: 10.3389/fpls.2023.1221557

Table 3.

ViT model’s implementation details.

Layer No.			Layer Name	Input Shape	Connected to	Output Shape	Parameters
1			Input Layer	$256 \times 256 \times 3$	–	$256 \times 256 \times 3$	$0$
2			Patches	$256 \times 256 \times 3$	Input Layer	$256 \times 768$	$0$
3			Patch Encoder	$256 \times 768$	Patches	$256 \times 256$	$262400$
4			Layer Normalization #1	$256 \times 256$	Patch Encoder	$256 \times 256$	$512$
5			Multi-Head Attention #1	$256 \times 256$	Layer Normalization #1	$256 \times 256$	$3155200$
6			Add #1	$256 \times 256$ , $256 \times 256$	Multi-Head attention #1, Patch Encoder	$256 \times 256$	0
7			Layer Normalization #2	$256 \times 256$	Add #1	$256 \times 256$	$512$
8	MLP Module #1		Dense #1	$256 \times 256$	Layer Normalization #2	$256 \times 512$	$131584$
8	MLP Module #1		Dense #2	$256 \times 512$	Dense #1	$256 \times 256$	$131328$
9			Add #2	$256 \times 256$ , $256 \times 256$	Dense #2, Add #1	$256 \times 256$	0
10			Add #3	$256 \times 256$ , $256 \times 256$	Add #2, Patch Encoder	$256 \times 256$	0
11			Layer Normalization #3	$256 \times 256$	Add #3	$256 \times 256$	$512$
12			Multi-Head attention #2	$256 \times 256$	Layer Normalization #3	$256 \times 256$	$3155200$
13			Add #4	$256 \times 256$ , $256 \times 256$	Multi-Head attention #2, Add #3	$256 \times 256$	0
14			Layer Normalization #4	$256 \times 256$	Add #4	$256 \times 256$	$512$
15		MLP Module #2	Dense #3	$256 \times 256$	Layer Normalization #4	$256 \times 512$	$131584$
15		MLP Module #2	Dense #4	$256 \times 512$	Dense #3	$256 \times 256$	$131328$
16			Add #5	$256 \times 256$ , $256 \times 256$	Dense #4, Add #4	$256 \times 256$	0
17			GlobalAveragePooling1D #1	$256 \times 256$	Add #5	$256$	0
18			Dense #5	$256$	GlobalAveragePooling1D #1	$64$	$16448$
19			Dense #6 (Output Layer)	$64$	Dense #1	4 (for the Maize dataset), 38 (for the PlantVillage dataset)	$262$ , $2470$
Total Weight Parameters						7117382 (for the Maize dataset) 7119590 (for the PlantVillage dataset)

The value written in bold font highlights the layers of MLP module in this table. These layers are the only difference between existing ViT model and our proposed TrIncNet model.