. 2020 Mar 27;20(7):1861. doi: 10.3390/s20071861

Table 5.

Network structure description of different trials.

Model	Structure Description
YOLO-LITE	YOLO-LITE raw network structure [19], as shown in Table 1
YOLOv3	YOLOv3 raw network structure [14], as shown in Figure 1
MobileNetV1-YOLOv3	Backbone uses MobileNetV1 while using YOLOv3 detector part
MobileNetV2-YOLOv3	Backbone uses MobileNetV2 while using YOLOv3 detector part
Trial 1	All convolution layers in YOLOv3 were replaced by depth-separable convolution, and the number of ResBlocks in Darknet53 was replaced from 1-2-8-8-4 to 1-2-4-6-4.
Trial 2	The convolution layer was reduced in the detector part of Trial 1 by one layer.
Trial 3	The number of ResBlocks in the backbone network of Trial 2 was reduced from 1-2-4-6-4 to 1-1-1-1-1.
Trial 4	A parallel structure was added based on Trial 2, the resolution was reconstructed using a 1 × 1 convolutional kernel, and the channel was fused using a 3 × 3 convolutional kernel after the connection.
Trial 5	Based on Trial 4, the number of ResBlocks in the backbone network was replaced by 1-1-2-4-2, and the resolution was reconstructed using a 3 × 3 convolutional kernel.
Trial 6	A parallel structure was added based on YOLOv3, which used a 1 × 1 ordinary convolution.
Trial 7	All convolutions in Trial 6 were replaced by depth-separable convolutions.
Trial 8	The region was exactly the same as that of YOLOv3, and the last layer became wider when the backbone extracted features.
Trial 9	The backbone was exactly the same as that in Trial 8, and three region levels were reduced by two layers for each.
Trial 10	Three region levels were reduced by two layers for each, the region was narrowed simultaneously, and the backbone was exactly the same as that of YOLO-LITE.
Trial 11	The backbone was exactly the same as that in Trial 8, and three region levels were reduced by four layers for each.
Trial 12	The backbone was exactly the same as that of YOLO-LITE, three region levels were reduced by four layers for each, and the region was narrowed simultaneously (three region levels were reduced by two layers for each based on Trial 10).
Trial 13	Three ResBlocks were added based on Trial 12.
Trial 14	Three HR structures were added based on Trial 12.
Trial 15	Based on Trial 14, the downsampling method was changed from the convolution step to the maximum pool, and a layer of convolution was added after the downsampling.
Trial 16	The convolution kernel of the last layer of HR was changed from 1 × 1 to 3 × 3 based on Trial 15.
Trial 17	Three ResBlocks were added to Trial 15.
Trial 18	Nine layers of inverted-bottleneck ResBlocks were added to Trial 15.
Trial 19	Based on Trial 18, the output layers of HR structure were increased by one 3 × 3 convolution layer for each, for a total of three layers.
Trial 20	The number of ResBlocks per part was adjusted to three, based on Trial 17.
Trial 21	The last ResBlocks was moved forward to reduce the number of channels, based on Trial 20.