Table 6.
Layer | Patch size/stride | Depth | Output size |
---|---|---|---|
(a) ResNet-50 | |||
Convolution | 7×7×64/2 | 1 | 112×112 |
Max pool | 3×3/2 | 1 | 56×56 |
Convolution | 1×1×64 | 3 | 56×56 |
3×3×64 | |||
1×1×256 | |||
Convolution | 1×1×128 | 4 | 28×28 |
3×3×128 | |||
1×1×512 | |||
Convolution | 1×1×256 | 6 | 14×14 |
3×3×256 | |||
1×1×1024 | |||
Convolution | 1×1×512 | 3 | 7×7 |
3×3×512 | |||
1×1×2048 | |||
Average pool | − | 1 | 1×1000 |
Fully connected | |||
Softmax | |||
(b) GoogleNet | |||
Convolution | 7×7/2 | 1 | 112×112×64 |
Max pool | 3×3/2 | 0 | 56×56×64 |
Convolution | 3×3/1 | 2 | 56×56×192 |
Max pool | 3×3/2 | 0 | 28×28×192 |
Inception(3a) | - | 2 | 28×28×256 |
Inception(3b) | - | 2 | 28×28×480 |
Max pool | 3×3/2 | 0 | 14×14×480 |
Inception(4a) | - | 2 | 14×14×512 |
Inception(4b) | - | 2 | 14×14×512 |
Inception(4c) | - | 2 | 14×14×512 |
Inception(4d) | - | 2 | 14×14×528 |
Inception(4e) | - | 2 | 14×14×832 |
Max pool | 3×3/2 | 0 | 7×7×832 |
Inception(5a) | - | 2 | 7×7×832 |
Inception(5b) | - | 2 | 7×7×1024 |
Average pool | 7×7/1 | 0 | 1×1×1024 |
Dropout(40%) | - | 0 | 1×1×1024 |
Linear | - | 1 | 1×1×1024 |
Softmax | - | 0 | 1×1×1024 |
(c) VGG-16 | |||
Convolution | 3×3×64/1 | 2 | 224×224×64 |
Max pool | 3×3/2 | 1 | 112×112×64 |
Convolution | 3×3×128/1 | 2 | 112×112×128 |
Max pool | 3×3/2 | 1 | 56x56x128 |
Convolution | 3×3×256/1 | 2 | 56×56×256 |
1×1×256/1 | 1 | ||
Max pool | 3×3/2 | 1 | 28×28×256 |
Convolution | 3×3×512/1 | 2 | 28×28×512 |
1×1×512/1 | 1 | ||
Max pool | 3×3/2 | 1 | 14×14×512 |
Convolution | 3×3×512/1 | 2 | 14×14×512 |
1×1×512/1 | 1 | ||
Max pool | 3×3/2 | 1 | 7×7×512 |
Fully connected | - | 2 | 1×4096 |
Softmax | - | 1 | 1×1000 |
(d) AlexNet | |||
Convolution | 11×11/4 | 1 | 55×55×96 |
Max pool | 3×3/2 | 1 | 27×27×96 |
Convolution | 5×5/1 | 1 | 27×27×256 |
Max pool | 3×3/2 | 1 | 13×13×256 |
Convolution | 3×3/1 | 1 | 13×13×384 |
Convolution | 3×3 | 1 | 13×13×384 |
Convolution | 3×3 | 1 | 13×13×256 |
Max pool | 3×3/2 | 1 | 6×6×256 |
Fully connected | - | 2 | 1×4096 |
Softmax | - | 1 | 1×1000 |
(e) DarkNet-53 | |||
Convolution | 3×3×32/1 | 1 | 256×256×32 |
Convolution | 3×3×64/2 | 1 | 128×128×64 |
Convolution | 1×1×32/1 | 1 | 128×128 |
Convolution | 3×3×64/1 | ||
Residual | - | ||
Convolution | 3×3128/2 | 1 | 64×64 |
Convolution | 1×1×64/1 | 2 | 64×64 |
Convolution | 3×3×128/1 | ||
Residual | - | ||
Convolution | 3×3×256/2 | 1 | 32×32 |
Convolution | 1×1×128/1 | 8 | 32×32 |
Convolution | 3×3×256/1 | ||
Residual | - | ||
Convolution | 3×3×512/2 | 1 | 16×16 |
Convolution | 1×1×256/1 | 8 | 16×16 |
Convolution | 3×3×512/1 | ||
Residual | - | ||
Convolution | 3×3×1024/2 | 1 | 8×8 |
Convolution | 1×1×512/1 | 4 | 8×8 |
Convolution | 3×3×1024/1 | ||
Residual | - | ||
Average pool | - | 1 | 1×1000 |
FC | |||
Softmax | |||
(f ) ShuffleNet | |||
Convolution | 3×3/2 | 2 | 112×112 |
Max pool | 3×3/2 | 2 | 56×56 |
Stage 2 | -/2 | 1 | 28×28 |
-/1 | 3 | 28×28 | |
Stage 3 | -/2 | 1 | 14×14 |
-/1 | 7 | 14×14 | |
Stage 4 | -/2 | 1 | 28×28 |
-/1 | 3 | 7×7 | |
Global pool | 7×7 | - | 1×1 |
FC | - | - | 1×1000 |
(g)MobileNetV2 | |||
Convolution | 3×3/2 | 1 | 112×112×32 |
Bottleneck | -/1 | 1 | 112×112×16 |
Bottleneck | -/2 | 2 | 56×56×24 |
Bottleneck | -/2 | 3 | 28×28×32 |
Bottleneck | -/2 | 4 | 14×14×64 |
Bottleneck | -/1 | 3 | 14×14×96 |
Bottleneck | -/2 | 3 | 7×7×160 |
Bottleneck | -/1 | 1 | 7×7×320 |
Convolution | 1×1/1 | 1 | 7×7×1280 |
Average pool | 7×7/- | 1 | 1×1×1280 |
Convolution | 1×1/1 | - | k |
(h) DenseNet-201 | |||
Convolution | 7×7/2 | 1 | 112×112 |
Max pool | 3×3/2 | 1 | 56×56 |
Dense block (1) | 1×1 | 6 | 56×56 |
3×3 | |||
Transition layer (1) | 1×1 | 1 | 56×56 |
3×3/2, avg pool | 28×28 | ||
Dense block (2) | 1×1 | 12 | 28×28 |
3×3 | |||
Transition layer (2) | 1×1 | 1 | 28×28 |
3×3/2 ,avg pool | 14×14 | ||
Dense block (3) | 1×1 | 48 | 14×14 |
3×3 | |||
Transition layer (3) | 1×1 | 1 | 14×14 |
3×3/2, avg pool | 7×7 | ||
Dense block (4) | 1×1 | 32 | 7×7 |
3×3 | |||
Average pool | 7×7 | 1 | 1×1 |
FC | 1000 | ||
Softmax | - | ||
(i) Xception | |||
Convolution | 3×3×32/2 | 1 | 149×149×32 |
3×3×64 | 147×147×64 | ||
Separable convolution | 3×3×128/1 | 2 | 147×147×128 |
Max pool | 3×3/2 | 1 | 74×74×128 |
Separable convolution | 3×3×256/1 | 2 | 74×74×256 |
Max pool | 3×3/2 | 1 | 37×37×256 |
Separable convolution | 3×3×728/1 | 2 | 37×37×728 |
Max pool | 3×3/2 | 1 | 19×19×728 |
Separable convolution | 3×3×728/1 | 24 | 19×19×728 |
Separable convolution | 3×3×728/1 | 1 | 19×19×728 |
3×3×1024/1 | 19×19×1024 | ||
Max pool | 3×3/2 | 1 | 10×10×1024 |
Separable convolution | 3×3×1536/1 | 1 | 10×10×1536 |
3×3×2048/1 | 10×10×2048 | ||
Average pool | - | - | 1×1×2048 |
Optional FC | - | - | 1×1×1000 |
Logistic regression | - | - | 1×1×1000 |
(a) In ResNet-18, the convolution layers 2 to 5 contain 2 successive convolutions instead of 3 and each is repeated only 2 times
(b) InceptionV3 incorporates factorized 7×7 convolutions and takes in different sized inputs. InceptionResNetV2 incorporates residual networks with factorized convolutions, as well as several reduction blocks
(c) In VGG-19, the 3rd, 4th, and 5th convolution layer has all 3×3 convolutions and is repeated 4 times each instead of 3 successive convolutions
(e) DarkNet-19 has no residual networks, with only max pooling layers reducing the image