YOLO-LITE |
YOLO-LITE raw network structure [19], as shown in Table 1
|
YOLOv3 |
YOLOv3 raw network structure [14], as shown in Figure 1
|
MobileNetV1-YOLOv3 |
Backbone uses MobileNetV1 while using YOLOv3 detector part |
MobileNetV2-YOLOv3 |
Backbone uses MobileNetV2 while using YOLOv3 detector part |
Trial 1 |
All convolution layers in YOLOv3 were replaced by depth-separable convolution, and the number of ResBlocks in Darknet53 was replaced from 1-2-8-8-4 to 1-2-4-6-4. |
Trial 2 |
The convolution layer was reduced in the detector part of Trial 1 by one layer. |
Trial 3 |
The number of ResBlocks in the backbone network of Trial 2 was reduced from 1-2-4-6-4 to 1-1-1-1-1. |
Trial 4 |
A parallel structure was added based on Trial 2, the resolution was reconstructed using a 1 × 1 convolutional kernel, and the channel was fused using a 3 × 3 convolutional kernel after the connection. |
Trial 5 |
Based on Trial 4, the number of ResBlocks in the backbone network was replaced by 1-1-2-4-2, and the resolution was reconstructed using a 3 × 3 convolutional kernel. |
Trial 6 |
A parallel structure was added based on YOLOv3, which used a 1 × 1 ordinary convolution. |
Trial 7 |
All convolutions in Trial 6 were replaced by depth-separable convolutions. |
Trial 8 |
The region was exactly the same as that of YOLOv3, and the last layer became wider when the backbone extracted features. |
Trial 9 |
The backbone was exactly the same as that in Trial 8, and three region levels were reduced by two layers for each. |
Trial 10 |
Three region levels were reduced by two layers for each, the region was narrowed simultaneously, and the backbone was exactly the same as that of YOLO-LITE. |
Trial 11 |
The backbone was exactly the same as that in Trial 8, and three region levels were reduced by four layers for each. |
Trial 12 |
The backbone was exactly the same as that of YOLO-LITE, three region levels were reduced by four layers for each, and the region was narrowed simultaneously (three region levels were reduced by two layers for each based on Trial 10). |
Trial 13 |
Three ResBlocks were added based on Trial 12. |
Trial 14 |
Three HR structures were added based on Trial 12. |
Trial 15 |
Based on Trial 14, the downsampling method was changed from the convolution step to the maximum pool, and a layer of convolution was added after the downsampling. |
Trial 16 |
The convolution kernel of the last layer of HR was changed from 1 × 1 to 3 × 3 based on Trial 15. |
Trial 17 |
Three ResBlocks were added to Trial 15. |
Trial 18 |
Nine layers of inverted-bottleneck ResBlocks were added to Trial 15. |
Trial 19 |
Based on Trial 18, the output layers of HR structure were increased by one 3 × 3 convolution layer for each, for a total of three layers. |
Trial 20 |
The number of ResBlocks per part was adjusted to three, based on Trial 17. |
Trial 21 |
The last ResBlocks was moved forward to reduce the number of channels, based on Trial 20. |