. 2020 Aug 1;20:100405. doi: 10.1016/j.imu.2020.100405

Table 2.

A standard convolution is represented by “Conv”.“t1” and “t2” represent convolution stride 1 × 1 and 2 × 2 respectively. “dw” stands for Depth wise separable convolution which is made up with two layers. (i) Depth wise convolutions are used to apply a single filter in every input channel, and (ii) A 1 × 1 Pointwise convolution is used to create a linear combination of the output of the depth wise layer.

Type	Shape	Input size
Conv/t₂	3 × 3 × 3 × 32	224 × 224 × 3
Conv/t₁	3 × 3 × 32 dw	112 × 112 × 32
Conv/t₁	1 × 1 × 32 × 64	112 × 112 × 32
Conv/t₂	3 × 3 × 64 dw	112 × 112 × 64
Conv/t₁	1 × 1 × 64 × 128	56 × 56 × 64
Conv/t₁	3 × 3 × 128 dw	56 × 56 × 128
Conv/t₁	1 × 1 × 128 × 128	56 × 56 × 128
Conv/t₂	3 × 3 × 128 dw	56 × 56 × 128
Conv/t₁	1 × 1 × 128 × 256	28 × 28 × 128
Conv/t₁	3 × 3 × 256 dw	28 × 28 × 256
Conv/t₁	1 × 1 × 256 × 256	28 × 28 × 256
Conv/t₂	3 × 3 × 256 dw	28 × 28 × 256
Conv/t₁	1 × 1 × 256 × 512	14 × 14 × 256
5 × conv/t₁	13 × 3 × 512 dw	14 × 14 × 512
5 × conv/t₁	1 × 1 × 512 × 512	14 × 14 × 512