Table 2.
Test of different tricks (combinations) for performance enhancement on the optimal basic network (YOLOv5-x)
Step | Objectives | Tricks (combinations) | mAP@0.5 | Inclusion in the BALFilter Reader |
---|---|---|---|---|
1 | Improving the intersection between the prediction and annotation bounding boxes | GIoU loss | 92.1% | No |
DIoU loss | 91.3% | No | ||
CIoU loss | 93.0% | Yes | ||
2 | Data augmentation to improve the generalization of the basic network | CIoU loss & image flip | 93.7% | No |
CIoU loss & image flip & mosaic | 95.0% | No | ||
CIoU loss & image flip & mixup | 92.0% | No | ||
CIoU loss & image flip & mosaic & HSV augmentation | 95.7% | Yes | ||
3 | Improved recognition of poorly distinguishable objects | CIoU loss & image flip & mosaic & HSV augmentation & focal loss | 94.7% | No |
4 | Deduplication of multiple bounding boxes | CIoU loss & image flip & mosaic & HSV augmentation & default-NMS | 95.7% | Yes |
CIoU loss & image flip & mosaic & HSV augmentation & soft-NMS | 95.7% | No | ||
CIoU loss & image flip & mosaic & HSV augmentation & weighted-NMS | 95.7% | No | ||
5 | Overall performance improvement via multiple rounds of inference | CIoU loss & image flip & mosaic & HSV augmentation & default-NMS & TTA (NMS) | 96.2% | Yes |
The bold part is the combination of selected tricks for the best performance (highest mAP@0.5), which is also the focus of this work