Skip to main content
. 2021 Mar 2:1–30. Online ahead of print. doi: 10.1007/s12559-020-09779-5

Table 6.

Architectural details of the CNNs employed for performance benchmarking in COVID-19 detection (zoom in for better visualization)

Layer Patch size/stride Depth Output size
(a) ResNet-50
Convolution 7×7×64/2 1 112×112
Max pool 3×3/2 1 56×56
Convolution 1×1×64 3 56×56
3×3×64
1×1×256
Convolution 1×1×128 4 28×28
3×3×128
1×1×512
Convolution 1×1×256 6 14×14
3×3×256
1×1×1024
Convolution 1×1×512 3 7×7
3×3×512
1×1×2048
Average pool 1 1×1000
Fully connected
Softmax
(b) GoogleNet
Convolution 7×7/2 1 112×112×64
Max pool 3×3/2 0 56×56×64
Convolution 3×3/1 2 56×56×192
Max pool 3×3/2 0 28×28×192
Inception(3a) - 2 28×28×256
Inception(3b) - 2 28×28×480
Max pool 3×3/2 0 14×14×480
Inception(4a) - 2 14×14×512
Inception(4b) - 2 14×14×512
Inception(4c) - 2 14×14×512
Inception(4d) - 2 14×14×528
Inception(4e) - 2 14×14×832
Max pool 3×3/2 0 7×7×832
Inception(5a) - 2 7×7×832
Inception(5b) - 2 7×7×1024
Average pool 7×7/1 0 1×1×1024
Dropout(40%) - 0 1×1×1024
Linear - 1 1×1×1024
Softmax - 0 1×1×1024
(c) VGG-16
Convolution 3×3×64/1 2 224×224×64
Max pool 3×3/2 1 112×112×64
Convolution 3×3×128/1 2 112×112×128
Max pool 3×3/2 1 56x56x128
Convolution 3×3×256/1 2 56×56×256
1×1×256/1 1
Max pool 3×3/2 1 28×28×256
Convolution 3×3×512/1 2 28×28×512
1×1×512/1 1
Max pool 3×3/2 1 14×14×512
Convolution 3×3×512/1 2 14×14×512
1×1×512/1 1
Max pool 3×3/2 1 7×7×512
Fully connected - 2 1×4096
Softmax - 1 1×1000
(d) AlexNet
Convolution 11×11/4 1 55×55×96
Max pool 3×3/2 1 27×27×96
Convolution 5×5/1 1 27×27×256
Max pool 3×3/2 1 13×13×256
Convolution 3×3/1 1 13×13×384
Convolution 3×3 1 13×13×384
Convolution 3×3 1 13×13×256
Max pool 3×3/2 1 6×6×256
Fully connected - 2 1×4096
Softmax - 1 1×1000
(e) DarkNet-53
Convolution 3×3×32/1 1 256×256×32
Convolution 3×3×64/2 1 128×128×64
Convolution 1×1×32/1 1 128×128
Convolution 3×3×64/1
Residual -
Convolution 3×3128/2 1 64×64
Convolution 1×1×64/1 2 64×64
Convolution 3×3×128/1
Residual -
Convolution 3×3×256/2 1 32×32
Convolution 1×1×128/1 8 32×32
Convolution 3×3×256/1
Residual -
Convolution 3×3×512/2 1 16×16
Convolution 1×1×256/1 8 16×16
Convolution 3×3×512/1
Residual -
Convolution 3×3×1024/2 1 8×8
Convolution 1×1×512/1 4 8×8
Convolution 3×3×1024/1
Residual -
Average pool - 1 1×1000
FC
Softmax
(f ) ShuffleNet
Convolution 3×3/2 2 112×112
Max pool 3×3/2 2 56×56
Stage 2 -/2 1 28×28
-/1 3 28×28
Stage 3 -/2 1 14×14
-/1 7 14×14
Stage 4 -/2 1 28×28
-/1 3 7×7
Global pool 7×7 - 1×1
FC - - 1×1000
(g)MobileNetV2
Convolution 3×3/2 1 112×112×32
Bottleneck -/1 1 112×112×16
Bottleneck -/2 2 56×56×24
Bottleneck -/2 3 28×28×32
Bottleneck -/2 4 14×14×64
Bottleneck -/1 3 14×14×96
Bottleneck -/2 3 7×7×160
Bottleneck -/1 1 7×7×320
Convolution 1×1/1 1 7×7×1280
Average pool 7×7/- 1 1×1×1280
Convolution 1×1/1 - k
(h) DenseNet-201
Convolution 7×7/2 1 112×112
Max pool 3×3/2 1 56×56
Dense block (1) 1×1 6 56×56
3×3
Transition layer (1) 1×1 1 56×56
3×3/2, avg pool 28×28
Dense block (2) 1×1 12 28×28
3×3
Transition layer (2) 1×1 1 28×28
3×3/2 ,avg pool 14×14
Dense block (3) 1×1 48 14×14
3×3
Transition layer (3) 1×1 1 14×14
3×3/2, avg pool 7×7
Dense block (4) 1×1 32 7×7
3×3
Average pool 7×7 1 1×1
FC 1000
Softmax -
(i) Xception
Convolution 3×3×32/2 1 149×149×32
3×3×64 147×147×64
Separable convolution 3×3×128/1 2 147×147×128
Max pool 3×3/2 1 74×74×128
Separable convolution 3×3×256/1 2 74×74×256
Max pool 3×3/2 1 37×37×256
Separable convolution 3×3×728/1 2 37×37×728
Max pool 3×3/2 1 19×19×728
Separable convolution 3×3×728/1 24 19×19×728
Separable convolution 3×3×728/1 1 19×19×728
3×3×1024/1 19×19×1024
Max pool 3×3/2 1 10×10×1024
Separable convolution 3×3×1536/1 1 10×10×1536
3×3×2048/1 10×10×2048
Average pool - - 1×1×2048
Optional FC - - 1×1×1000
Logistic regression - - 1×1×1000

(a) In ResNet-18, the convolution layers 2 to 5 contain 2 successive convolutions instead of 3 and each is repeated only 2 times

(b) InceptionV3 incorporates factorized 7×7 convolutions and takes in different sized inputs. InceptionResNetV2 incorporates residual networks with factorized convolutions, as well as several reduction blocks

(c) In VGG-19, the 3rd, 4th, and 5th convolution layer has all 3×3 convolutions and is repeated 4 times each instead of 3 successive convolutions

(e) DarkNet-19 has no residual networks, with only max pooling layers reducing the image