Table 1.
The architecture of the Inception V3 model. Each row represents a layer of the network, and the input of a particular layer is the output of the previous layer. For the inception layers, each row represents parallel sub-layers that were concatenated prior to being passed to the subsequent layer. The list of parameters in each row represents serial processes within the sub-layer. With the Inception 6 layer, the parameters within the parentheses represent additional parallel sub-layers within the serial processes. The number of times each Inception layer is repeated is prepended to each layer name
Type | Patch size/strides | Input size |
---|---|---|
Conv 1 | 3 × 3/2 | 300 × 300 × 1 |
Conv 2 | 3 × 3/1 | 149 × 149 × 32 |
Conv 3 | 3 × 3/1 | 147 × 147 × 32 |
Max pool 1 | 3 × 3/2 | 147 × 147 × 64 |
Conv 4 | 1 × 1/1 | 73 × 73 × 64 |
Conv 5 | 3 × 3/1 | 73 × 73 × 80 |
Max pool 2 | 3 × 3/2 | 71 × 71 × 192 |
3× Inception 1 | 1 × 1/1 | 35 × 35 × 192 |
1 × 1/1, 3 × 3/1 | ||
1 × 1/1, 3 × 3/1, 3 × 3/1 | ||
Avg pool 3 × 3/1, 1 × 1/1 | ||
Inception 2 | 3 × 3/2 | 35 × 35 × 288 |
Max pool 3 × 3/2 | ||
1 × 1/1, 3 × 3/1, 3 × 3/2 | ||
4× Inception 3 | 1 × 1/1 | 17 × 17 × 768 |
1 × 1/1, 1 × 7/1, 7 × 1/1 | ||
1 × 1/1, 1 × 7/1, 7 × 1/1, 1 × 7/1, 7 × 1/1 | ||
Avg pool 3 × 3/1, 1 × 1/1 | ||
Auxiliary | Avg pool 5 × 5/3, 1 × 1/1, linear, softmax | 17 × 17 × 768 |
Inception 5 | 1 × 1/1, 3 × 3/2 | 17 × 17 × 768 |
1 × 1/1, 1 × 7/1, 7 × 1/1, 3 × 3/2 | ||
Max pool 3 × 3/2 | ||
2× Inception 6 | 1 × 1/1 | 8 × 8 × 1280 |
1 × 1/1, (1 × 3/1, 3 × 1/1) | ||
1 × 1/1, 3 × 3/1, (1 × 3/1, 3 × 1/1) | ||
Avg pool 3 × 3/1, 1 × 1/1 | ||
Avg pool | 8 × 8/1 | 8 × 8 × 2048 |
Output | Dropout, linear, softmax | 1 × 1 × 2048 |