Table 2.
Output size, numbers and sizes of filters, number of strides, and padding in our deep residual CNN structure (3* represents that 3 pixels are included as padding in left, right, up, and down positions of input image of 224 × 224 × 3 whereas 1* shows that 1 pixel is included as padding in left, right, up, and down positions of feature map) (2/1** means 2 at the 1st iteration and 1 from the 2nd iteration).
Layer Name | Size of Feature Map | Number of Filters | Size of Filters | Number of Strides | Amount of Padding | Number of Iterations | |
---|---|---|---|---|---|---|---|
Image input layer | 224 (height) × 224 (width) × 3 (channel) | ||||||
Conv1 | 112 × 112 × 64 | 64 | 7 × 7 × 3 | 2 | 3* | 1 | |
Max pool | 56 × 56 × 64 | 1 | 3 × 3 | 2 | 0 | 1 | |
Conv2 | Conv2-1 | 56 × 56 × 64 | 64 | 1 × 1 × 64 | 1 | 0 | 3 |
Conv2-2 | 56 × 56 × 64 | 64 | 3 × 3 × 64 | 1 | 1* | ||
Conv2-3 | 56 × 56 × 256 | 256 | 1 × 1 × 64 | 1 | 0 | ||
Conv2-4 (Shortcut) | 56 × 56 × 256 | 256 | 1 × 1 × 64 | 1 | 0 | ||
Conv3 | Conv3-1 | 28 × 28 × 128 | 128 | 1 × 1 × 256 | 2/1** | 0 | 4 |
Conv3-2 (Bottleneck) | 28 × 28 × 128 | 128 | 3 × 3 × 128 | 1 | 1* | ||
Conv3-3 | 28 × 28 × 512 | 512 | 1 × 1 × 128 | 1 | 0 | ||
Conv3-4 (Shortcut) | 28 × 28 × 512 | 512 | 1 × 1 × 256 | 2 | 0 | ||
Conv4 | Conv4-1 | 14 × 14 × 256 | 256 | 1 × 1 × 512 | 2/1** | 0 | 6 |
Conv4-2 (Bottleneck) | 14 × 14 × 256 | 256 | 3 × 3 × 256 | 1 | 1* | ||
Conv4-3 | 14 × 14 × 1024 | 1024 | 1 × 1 × 256 | 1 | 0 | ||
Conv4-4 (Shortcut) | 14 × 14 × 1024 | 1024 | 1 × 1 × 512 | 2 | 0 | ||
Conv5 | Conv5-1 | 7 × 7 × 512 | 512 | 1 × 1 × 1024 | 2/1** | 0 | 3 |
Conv5-2 (Bottleneck) | 7 × 7 × 512 | 512 | 3 × 3 × 512 | 1 | 1* | ||
Conv5-3 | 7 × 7 × 2048 | 2048 | 1 × 1 × 512 | 1 | 0 | ||
Conv5-4 (Shortcut) | 7 × 7 × 2048 | 2048 | 1 × 1 × 1024 | 2 | 0 | ||
AVG pool | 1 × 1 × 2048 | 1 | 7 × 7 | 1 | 0 | 1 | |
FC layer | 2 | 1 | |||||
Softmax | 2 | 1 |