Skip to main content
. 2023 Apr 28;13:6986. doi: 10.1038/s41598-023-34190-z

Table 3.

Structure of the backbone feature extraction network.

Input Layer Filter Size exp size out ECA NL stride
3202 × 3 Convolution 3 × 3 16 False h-swish 1
3202 × 16 Bottleneck1 3 × 3 16 16 False ReLU 1
3202 × 16 Bottleneck2 3 × 3 64 24 False ReLU 2
1602 × 24 Bottleneck3 3 × 3 72 24 False ReLU 1
1602 × 24 TF-Bottleneck1 5 × 5 72 40 True ReLU 2
802 × 40 TF-Bottleneck2 5 × 5 120 40 True ReLU 1
802 × 40 TF-Bottleneck3 5 × 5 120 40 True ReLU 1
802 × 40 Bottleneck4 3 × 3 240 80 False h-swish 2
402 × 80 Bottleneck5 3 × 3 200 80 False h-swish 1
402 × 80 Bottleneck6 3 × 3 184 80 False h-swish 1
402 × 80 Bottleneck7 3 × 3 184 80 False h-swish 1
402 × 80 TF-Bottleneck4 3 × 3 480 112 True h-swish 1
402 × 112 TF-Bottleneck5 3 × 3 672 112 True h-swish 1
402 × 112 TF-Bottleneck6 5 × 5 672 160 True h-swish 2
202 × 160 TF-Bottleneck7 5 × 5 960 160 True h-swish 1
202 × 160 TF-Bottleneck8 5 × 5 960 160 True h-swish 1
202 × 160 Flatten False
12 × 64,000 Dense 128 False