Skip to main content
. 2019 Jun 19;2:218. doi: 10.1038/s42003-019-0437-z

Table 2.

YOLO network architecture

Layer Type Filters Size
Feature extraction
1 Convolutional 32 3 × 3
Max-pooling
2 Convolutional 64 3 × 3
Max-pooling
3 Convolutional 128 3 × 3
4 Convolutional 64 1 × 1
5 Convolutional 128 3 × 3
Max-pooling
6 Convolutional 256 3 × 3
7 Convolutional 128 1 × 1
8 Convolutional 256 3 × 3
Max-pooling
9 Convolutional 512 3 × 3
10 Convolutional 256 1 × 1
11 Convolutional 512 3 × 3
12 Convolutional 256 1 × 1
13 Convolutional 512 3 × 3
Max-pooling
14 Convolutional 1024 3 × 3
15 Convolutional 512 1 × 1
16 Convolutional 1024 3 × 3
17 Convolutional 512 1 × 1
18 Convolutional 1024 3 × 3
19 Convolutional 1024 3 × 3
20 Convolutional 1024 3 × 3
21 Convolutional 1024 3 × 3
Detection
Dropout (0.2)
1 Convolutional 6 1 × 1

The CNN consists of 21 convolutional and 5 max-pooling layers for feature extraction. For detection, a single convolutional layer is used with a dropout layer in front to reduce overfitting. The dropout layer is only used during training, not during prediction