Table 3.
Module | Layers | Number of filter | Filter size | Output size |
---|---|---|---|---|
Input | - | - | 3 × 128 × 128 | |
VGG11bn | Conv + BN + ReLU | 64 | 3 × 3 | 64 × 128 × 128 |
Max Pool | 64 | 2 × 2 | 64 × 64 × 64 | |
Conv + BN + ReLU | 128 | 3 × 3 | 128 × 64 × 64 | |
Max Pool | 128 | 2 × 2 | 128 × 32 × 32 | |
Conv + BN + ReLU | 256 | 3 × 3 | 256 × 32 × 32 | |
Conv + BN + ReLU | 256 | 3 × 3 | 256 × 32 × 32 | |
Max Pool | 256 | 2 × 2 | 256 × 16 × 16 | |
Conv + BN + ReLU | 512 | 3 × 3 | 512 × 16 × 16 | |
Conv + BN + ReLU | 512 | 3 × 3 | 512 × 16 × 16 | |
Max Pool | 512 | 2 × 2 | 512 × 8 × 8 | |
Conv + BN + ReLU | 512 | 3 × 3 | 512 × 8 × 8 | |
Conv + BN + ReLU | 512 | 3 × 3 | 512 × 8 × 8 | |
Max Pool | 512 | 2 × 2 | 512 × 4 × 4 | |
Instance feature embedding | Conv | 256 | 1 × 1 | 256 × 4 × 4 |
FC + ReLU + Dropout | - | - | 256 | |
Attention module | FC + Tanh + Dropout | - | - | 512 |
FC | - | - | 1 | |
Classifier | FC | - | - | 1 |