Table 2.
The architecture of the proposed method.
Layer Name | Output Size | Layer |
---|---|---|
STN | 224 × 224 | Localization network, grid generator, sampler |
Conv1 | 112 × 112 | 7 × 7, 64, stride 2 |
Max pooling | 56 × 56 | 3 × 3, stride 2 |
Stage 2 | 56 × 56 | , Mish |
Stage 3 | 28 × 28 | , Mish |
Stage 4 | 14 × 14 | , Mish |
Non-local attention module | 14 × 14 | Attention × 1 |
Stage 5 | 7 × 7 | , Mish |
Average pooling | 1 × 1 | 7 × 7, stride 1 |
FC, softmax | 1000-d |