Table 2.
A standard convolution is represented by “Conv”.“t1” and “t2” represent convolution stride 1 × 1 and 2 × 2 respectively. “dw” stands for Depth wise separable convolution which is made up with two layers. (i) Depth wise convolutions are used to apply a single filter in every input channel, and (ii) A 1 × 1 Pointwise convolution is used to create a linear combination of the output of the depth wise layer.
Type | Shape | Input size |
---|---|---|
Conv/t2 | 3 × 3 × 3 × 32 | 224 × 224 × 3 |
Conv/t1 | 3 × 3 × 32 dw | 112 × 112 × 32 |
Conv/t1 | 1 × 1 × 32 × 64 | 112 × 112 × 32 |
Conv/t2 |
3 × 3 × 64 dw |
112 × 112 × 64 |
Conv/t1 | 1 × 1 × 64 × 128 | 56 × 56 × 64 |
Conv/t1 | 3 × 3 × 128 dw | 56 × 56 × 128 |
Conv/t1 | 1 × 1 × 128 × 128 | 56 × 56 × 128 |
Conv/t2 |
3 × 3 × 128 dw |
56 × 56 × 128 |
Conv/t1 | 1 × 1 × 128 × 256 | 28 × 28 × 128 |
Conv/t1 | 3 × 3 × 256 dw | 28 × 28 × 256 |
Conv/t1 | 1 × 1 × 256 × 256 | 28 × 28 × 256 |
Conv/t2 |
3 × 3 × 256 dw |
28 × 28 × 256 |
Conv/t1 | 1 × 1 × 256 × 512 | 14 × 14 × 256 |
5 × conv/t1 | 13 × 3 × 512 dw | 14 × 14 × 512 |
5 × conv/t1 | 1 × 1 × 512 × 512 | 14 × 14 × 512 |