Table 3.
Structure of the backbone feature extraction network.
| Input | Layer | Filter Size | exp size | out | ECA | NL | stride |
|---|---|---|---|---|---|---|---|
| 3202 × 3 | Convolution | 3 × 3 | – | 16 | False | h-swish | 1 |
| 3202 × 16 | Bottleneck1 | 3 × 3 | 16 | 16 | False | ReLU | 1 |
| 3202 × 16 | Bottleneck2 | 3 × 3 | 64 | 24 | False | ReLU | 2 |
| 1602 × 24 | Bottleneck3 | 3 × 3 | 72 | 24 | False | ReLU | 1 |
| 1602 × 24 | TF-Bottleneck1 | 5 × 5 | 72 | 40 | True | ReLU | 2 |
| 802 × 40 | TF-Bottleneck2 | 5 × 5 | 120 | 40 | True | ReLU | 1 |
| 802 × 40 | TF-Bottleneck3 | 5 × 5 | 120 | 40 | True | ReLU | 1 |
| 802 × 40 | Bottleneck4 | 3 × 3 | 240 | 80 | False | h-swish | 2 |
| 402 × 80 | Bottleneck5 | 3 × 3 | 200 | 80 | False | h-swish | 1 |
| 402 × 80 | Bottleneck6 | 3 × 3 | 184 | 80 | False | h-swish | 1 |
| 402 × 80 | Bottleneck7 | 3 × 3 | 184 | 80 | False | h-swish | 1 |
| 402 × 80 | TF-Bottleneck4 | 3 × 3 | 480 | 112 | True | h-swish | 1 |
| 402 × 112 | TF-Bottleneck5 | 3 × 3 | 672 | 112 | True | h-swish | 1 |
| 402 × 112 | TF-Bottleneck6 | 5 × 5 | 672 | 160 | True | h-swish | 2 |
| 202 × 160 | TF-Bottleneck7 | 5 × 5 | 960 | 160 | True | h-swish | 1 |
| 202 × 160 | TF-Bottleneck8 | 5 × 5 | 960 | 160 | True | h-swish | 1 |
| 202 × 160 | Flatten | – | – | – | False | – | – |
| 12 × 64,000 | Dense | – | – | 128 | False | – | – |