Table 2.
Layer | Type | Filters | Size |
---|---|---|---|
Feature extraction | |||
1 | Convolutional | 32 | 3 × 3 |
Max-pooling | |||
2 | Convolutional | 64 | 3 × 3 |
Max-pooling | |||
3 | Convolutional | 128 | 3 × 3 |
4 | Convolutional | 64 | 1 × 1 |
5 | Convolutional | 128 | 3 × 3 |
Max-pooling | |||
6 | Convolutional | 256 | 3 × 3 |
7 | Convolutional | 128 | 1 × 1 |
8 | Convolutional | 256 | 3 × 3 |
Max-pooling | |||
9 | Convolutional | 512 | 3 × 3 |
10 | Convolutional | 256 | 1 × 1 |
11 | Convolutional | 512 | 3 × 3 |
12 | Convolutional | 256 | 1 × 1 |
13 | Convolutional | 512 | 3 × 3 |
Max-pooling | |||
14 | Convolutional | 1024 | 3 × 3 |
15 | Convolutional | 512 | 1 × 1 |
16 | Convolutional | 1024 | 3 × 3 |
17 | Convolutional | 512 | 1 × 1 |
18 | Convolutional | 1024 | 3 × 3 |
19 | Convolutional | 1024 | 3 × 3 |
20 | Convolutional | 1024 | 3 × 3 |
21 | Convolutional | 1024 | 3 × 3 |
Detection | |||
Dropout (0.2) | |||
1 | Convolutional | 6 | 1 × 1 |
The CNN consists of 21 convolutional and 5 max-pooling layers for feature extraction. For detection, a single convolutional layer is used with a dropout layer in front to reduce overfitting. The dropout layer is only used during training, not during prediction