Table 4.
TrIncNet model’s implementation details.
| Layer No. | Layer Name | Input Shape | Connected to | Output Shape | Parameters | ||
|---|---|---|---|---|---|---|---|
| 1 | Input Layer | – | |||||
| 2 | Patches | Input Layer | |||||
| 3 | Patch Encoder | Patches | |||||
| 4 | Layer Normalization #1 | Patch Encoder | |||||
| 5 | Multi-Head Attention #1 | Layer Normalization #1 | |||||
| 6 | Add #1 | , | Multi-Head attention #1, Patch Encoder | 0 | |||
| 7 | Layer Normalization #2 | Add #1 | |||||
| 8 | Inception Module #1 | Reshape #1 | Layer Normalization #2 | ||||
| Conv2D #1 | Reshape #1 | ||||||
| Conv2D #2 | Reshape #1 | ||||||
| Conv2D #3 | Reshape #1 | 16448 | |||||
| MaxPooling2D #1 | Reshape #1 | 0 | |||||
| Conv2D #4 | Conv2D #1 | ||||||
| Conv2D #5 | Conv2D #2 | ||||||
| Conv2D #6 | MaxPooling2D #1 | ||||||
| Concatenate #1 | , , |
Conv2D #3,
Conv2D #4, Conv2D #5, Conv2D #6 |
0 | ||||
| Reshape #2 | Concatenate #1 | 0 | |||||
| 9 | Add #2 | , | Reshape #2, Add #1 |
0 | |||
| 10 | Add #3 | , | Add #2, Patch Encoder |
0 | |||
| 11 | Layer Normalization #3 | Add #3 | |||||
| 12 | Multi-Head attention #2 | Layer Normalization #3 | |||||
| 13 | Add #4 | , | Multi-Head attention #2, Add #3 | 0 | |||
| 14 | Layer Normalization #4 | Add #4 | |||||
| 15 | Inception Module #2 | Reshape #3 | Layer Normalization #4 | ||||
| Conv2D #7 | Reshape #3 | ||||||
| Conv2D #8 | Reshape #3 | ||||||
| Conv2D #9 | Reshape #3 | 16448 | |||||
| MaxPooling2D #2 | Reshape #3 | 0 | |||||
| Conv2D #10 | Conv2D #7 | ||||||
| Conv2D #11 | Conv2D #8 | ||||||
| Conv2D #12 | MaxPooling2D #2 | ||||||
| Concatenate #2 | , , |
Conv2D #9,
Conv2D #10, Conv2D #11, Conv2D #12 |
0 | ||||
| Reshape #4 | Concatenate #2 | 0 | |||||
| 16 | Add #5 | , | Reshape #4, Add #4 |
0 | |||
| 17 | GlobalAveragePooling1D #1 | Add #5 | 0 | ||||
| 18 | Dense #1 | GlobalAveragePooling2D #1 | |||||
| 19 | Dense #2 (Output Layer) | Dense #1 | 4 (for the Maize dataset), 38 (for the Plant-Village dataset) |
, | |||
| Total Weight Parameters |
6945574 (for the Maize dataset)
6947782 (for the PlantVillage dataset) |
||||||
The value written in bold font highlights the layers of Inception module in this table. These layers are the only difference between existing ViT model and our proposed TrIncNet model.