Abstract
Marine litter is a serious threat to marine ecosystems, and the timely removal of floating waste from inland waters is effective in preventing floating debris from entering the sea. An accurate object detection system is a prerequisite for efficiently clearing floaters. However, complex light conditions in the water, small size objects and other factors pose a huge challenge for floating object detection. In order to facilitate the solution of the floating object pollution problem and promote the application of AI technology in the water industry, we proposed the first floater dataset of waters collected from real water scenarios based on shore-based filming equipment, IWHR_AI_Lable_Floater_V1. The dataset consists of 3000 images containing accurate annotation information to support vision-based water surface floater detection tasks. We conducted a number of baseline experiments to evaluate the performance of mainstream object detection algorithms on this dataset. The results show that the detection accuracies of the models, including the state-of-the-art model YOLOv9, are all low, which also indicates that floating object detection is a challenging task.
Subject terms: Hydrology, Environmental impact, Computer science
Background & Summary
In recent years, the problem of marine litter, which includes plastics, metals, textiles, wood and a variety of organic and inorganic substances, has increasingly attracted widespread attention, with negative impacts on marine ecosystems, human health and economic development1,2. As rivers serve as an important link between land and sea, a large number of them discharge waste, chemicals and pollutants into the sea, leading to an increase in marine litter. In addition, river debris tends to flow into the oceans as a result of currents and winds, making it one of the main sources of marine litter. Therefore, controlling and reducing the generation and discharge of river litter is of great significance in mitigating marine litter pollution and protecting the marine ecosystem. As a common water quality pollutant, river floating debris, including human discarded domestic rubbish, dead plants and animals, and industrial discharges, brings serious harm to the water environment and ecosystems, such as affecting water quality, endangering aquatic organisms, obstructing rivers, and endangering human health3. Timely detection of floating debris and the adoption of effective measures for its treatment and prevention are essential for improving water quality, protecting the aquatic ecosystem and safeguarding human health.
As shown in Fig. 1(a), the current river floating debris cleaning mainly relies on manpower, due to the floating debris of a wide variety of types, wide distribution, large number, and by the water flow, seasonal and other factors, the manual cleaning method of the task is heavy and inefficient, there is also a high cost. In addition, the watershed may be fast-flowing, complex terrain, to clean workers bring potential personal safety risks. As people’s awareness of the protection of the river environment gradually increases, the demand for river waste disposal is becoming more and more urgent. Traditional manual cleaning methods can clean up the rubbish, but the efficiency is low and there are potential safety hazards. In recent years, unmanned surface vehicles(USVs) have become popular because of their low maintenance costs, high mobility, long endurance, and the ability to operate in dangerous areas that are difficult for humans to reach4–6. USVs equipped with floater capture devices can not only improve cleaning efficiency and reduce labour costs, but also reduce the impact of human factors on the environment. As shown in Fig. 1(b), an accurate floater detection system is indispensable for USVs to achieve efficient and reliable capture of floaters. With the development of computer vision, it has gradually become possible to support USVs to identify floating objects reliably, in real time and accurately.
Fig. 1.
Methods of cleaning up floaters. (a) Manual cleaning, (b) USV cleaning.
If machine vision is used to detect floating objects in the river, floating objects in the water such as plastic bottles and beverage cans are often too small in size, which makes it easy to lose features and impossible to locate and identify the object. The reason for this is that small objects have fewer pixels, fewer available features, and higher requirements for positioning accuracy, which undoubtedly increases the difficulty of accurate detection7. In addition, the complex environment of the waters also brings many challenges to machine vision-based detection methods, such as the light in the waters in the darker or strongly reflective conditions, the outline and color of the floater may not be easy to be accurately identified; The reflection of objects on the shore on the water surface inevitably adds to the difficulty of floating object detection. Examples of the above problems are illustrated in Fig. 2.
Fig. 2.
Challenges in floater detection. (a) Small objects, (b) Backlight, (c) Water reflections, (d) Reflection of objects on the shore.
Public datasets provide researchers and developers with a large amount of annotated data and experimental benchmarks, facilitating scientific research and technological innovation, and researchers can use these datasets for algorithm validation, model training, and performance evaluation, thus advancing the development of related fields. In order to promote the application of AI in the water industry, we released a dataset IWHR_AI_Lable_Floater_V1 (10.6084/m9.figshare.27376851.v1)8, which consists of a floater dataset from the viewpoints of video surveillance cameras and mobile devices. The dataset was collected in 2022 from the section of the Grand Canal from Beiguan Barrage to Tongji Road Bridge and the nearby Reduce Transportation Ditch in Tongzhou District, Beijing, and contains 3000 annotated images containing floating debris, with a total of 23692 annotated objects covering common household waste, including plastic bottles and foam boards, as well as floating debris such as water plants and algae. This dataset not only focuses on data authenticity and diversity, but also provides the basis for an integrated linkage technology between shore-based cameras and USVs, whereby the USVs automatically travel to clean up floating objects based on information detected by shore-based cameras. In addition to object detection, the dataset can also be used for floater tracking studies to obtain data on water flow and floater migration patterns, providing insight into the dynamics of floater distribution in inland waterways. This is an important reference for evaluating the pollution of waters and developing appropriate treatment measures.
Finally, we evaluate commonly used algorithms on our dataset. In summary, the contributions of this paper can be summarized as follows:
According to the survey, IWHR_AI_Lable_Floater_V1 is a dataset of floaters in real scenarios of inland waters under the viewpoints of video surveillance cameras and mobile devices, which provides a valuable resource for researchers and developers.
IWHR_AI_Lable_Floater_V1 contains a large number of images of small-sized floaters with diverse features and scenarios, which effectively contributes to the research of small object detection in waters.
We conducted baseline experiments based on some mainstream detectors.
Related work
Vision-based object detection
For water surface object detection, traditional water surface object detection algorithms rely on hand-designed feature extraction methods, such as image processing9 and Gaussian mixture model10. Although these methods can achieve certain results under specific conditions, they are often sensitive to environmental changes and lighting conditions, with low detection accuracy and robustness.
With the rise of deep learning, especially the application of Convolutional Neural Networks (CNNs), the performance of surface object detection has been dramatically improved11–13. With trained deep learning models, salient features in images can be learned to improve the accuracy and robustness of object detection. For example, the one-stage object detectors YOLO (You Only Look Once) series14–19, CenterNet20, and RetinaNet21, which generate candidate regions directly from the input image and perform category prediction and bounding box regression, and two-stage classical detectors such as Faster R-CNN22, which are widely used in water surface object detection. Among them, YOLO truly realizes real-time object detection, and after continuous iteration, YOLOv923 and YOLOv1024 were released in 2024, which well support the tasks of object detection, instance/semantic segmentation, and panorama segmentation.
In addition, the combination of image preprocessing25–27, such as image enhancement and dehazing, and dehazing techniques can effectively eliminate the impact of hazy weather on the image quality, so that the detection algorithm can work stably under various weather conditions. For complex backgrounds such as reflections and ripples on the water surface, the use of image enhancement techniques can improve the image quality, highlight the object features, and improve the detection accuracy.
Related dataset
In the area of waste detection, scholars in the computer vision research community have spared no effort in actively promoting research and releasing a series of vision-based waste detection datasets. TrashNet28 is an image dataset for waste classification. It contains images of different types of waste such as paper, glass, metal, plastic, etc. and can be used for training and evaluation of waste classification tasks. The Taco29, released in 2020, is an instance segmentation dataset for litter identification in natural environments, containing photographs of litter taken in different environments, ranging from tropical beaches to the streets of London, and is continuously collecting and adding new annotations using its website (http://tacodataset.org/). Liu30 et al. have created the Marine Image Dataset (MID), a dataset containing both large and small obstacles, which also provides ground-truth data for the horizon. The recently released FloW-Img31 is a dataset of floating objects on the surface of the water taken from the viewpoint of USVs, containing 2000 images and 5271 annotated objects. Based on these datasets, some vision-based float detection work has been carried out3,32.
As shown in Fig. 3, although these datasets are useful for litter detection, they may not cover the full range of features of floating debris in inland waterways and their application scenarios are quite different from inland waterways. The images in our release IWHR_AI_Lable_Floater_V1 were captured by shore-based filming equipment, providing researchers and developers with a valuable data resource for real-world scenarios.
Fig. 3.
Comparison of the IWHR_AI_Lable_Floater_V1 dataset with other datasets.
Methods
This section describes the IWHR_AI_Lable_Floater_V1 dataset in detail.
Acquisition platform
We used shore-based devices such as video surveillance cameras and mobile phones to collect videos containing floating objects to collect data at real river and lake sites. These shore-based devices use CMOS sensors, and video surveillance cameras constructed on river/lake shores capture images with a resolution of 1920 × 1080, mobile phones capture images with a resolution of 1920 × 1080, 880 × 1920, and digital cameras capture images with a resolution of 3840 × 2160.
Data collection and annotation
In order to further enrich the diversity of the samples, videos and pictures with different angles, lighting conditions, and backgrounds were collected by adjusting the shooting angle, time, and environment during the collection of raw images, and one picture was extracted from the video every approximately 20 frames, and then, 3000 good-quality pictures were selected from all of them. Then, the well-known bounding box annotation software LabelImg was used to annotate the object and category information of the floaters in the images. It should be noted that the principle of annotation is that in addition to single targets such as bottles, plastic bags, etc., floating objects made up of water plants, algae or other litter accumulations also need to be annotated. The annotation team consists of experienced computer vision researchers to ensure the quality of the information annotated in the dataset. An example of classical annotation is shown in Fig. 4, from which it can be seen that the floating objects include common household wastes such as plastic bottles and foam boards, as well as accumulating floating objects such as water plants and algae.
Fig. 4.
Classically annotated images in the IWHR_AI_Lable_Floater_V1.
Dataset statistics
Currently, the IWHR_AI_Lable_Floater_V1 dataset contains 3000 images of floating objects on the surface of the water and the corresponding annotations, 672 of which were captured by a video surveillance camera, and the rest of which were captured by a mobile cell phone. The number of objects contained in each image is shown in Fig. 5, which contains a total of 23692 annotated floater objects.
Fig. 5.

Number of floaters included in each image.
If the statistics are based on the pixel ratio of the object mask relative to the image, it is found that as many as 94.5% of the object pixels account for less than 10%, as shown in Fig. 6.
Fig. 6.
The visualization of the IWHR_AI_Lable_Floater_V1. (a) Distribution of object center positions, (b) Distribution of object sizes.
Data Records
The dataset is available at Figshare8 (10.6084/m9.figshare.27376851.v1), which contains image files in JPG format and corresponding annotation files in XML format, and the folder structure is shown in Fig. 7.
Fig. 7.
Folder structure of the dataset.
Technical Validation
We have evaluated the performance of nine commonly used deep learning-based object detection algorithms based on the IWHR_AI_Lable_Floater_V1 dataset, including the two-stage object detection algorithms Faster R-CNN and CentreNet2, the single-stage object detectors SSD, RetinaNet, YOLOv5s, YOLOv6s, YOLOv7, YOLOv8s, and we also evaluate the newly released state-of-the-art model YOLOv9. The dataset was first partitioned into training, validation, and test sets in the ratio 8:1:1, and these models were trained with the help of a server equipped with NVIDIA RTX 3090 GPU, with 300 epochs for each training. In order to show the effect of different random seeds on the test results when splitting the dataset, we set three random seeds, 100, 101, and 102. During the training process, the training loss and validation mAP after each epoch are recorded, as shown in Figs. 8, 9, which show the trends of the training loss and validation mAP with the number of epochs for each detector, respectively. It can be seen that each detector converges to a lower loss level after the training is completed. Except for YOLOv5, YOLOv8, and YOLOv9, all the models contain 300 epochs, which is due to they use of early stopping to avoid overfitting during training while saving computational resources and time.
Fig. 8.
Training loss for different detectors. (a) Fast R-CNN, (b) RetinaNet, (c) SSD, (d) CenterNet2, (e) YOLOv5s, (f) YOLOv6s, (g) YOLOv7, (h) YOLOv8s, (i) YOLOv9.
Fig. 9.
Validation mAP for different detectors. (a) Fast R-CNN, (b) RetinaNet, (c) SSD, (d) CenterNet2, (e) YOLOv5s, (f) YOLOv6s, (g) YOLOv7, (h) YOLOv8s, (i) YOLOv9.
The trained detectors was evaluated on the test set at random seeds of 100, 101, and 102, respectively, and the test mean of the three seeds was used as the final evaluation result of this detector, as shown in Table 1.
Table 1.
Results of different detectors on the test set.
| Model | Input Size (pixel) | mAPtest (%) | Speed RTX 3090 (ms/frame) | Params (M) |
|---|---|---|---|---|
| Faster R-CNN | 600 × 600 | 54.6 | 110.0 | 514.8 |
| RetinaNet | 640 × 640 | 60.7 | 87.8 | 482.8 |
| SSD | 512 × 512 | 60.2 | 26.0 | 246.2 |
| CenterNet2 | 640 × 640 | 60.1 | 111.1 | 127.1 |
| YOLOv5s | 640 × 640 | 65.9 | 2.0 | 14.4 |
| YOLOv6s | 640 × 640 | 63.6 | 2.7 | 40.6 |
| YOLOv7 | 640 × 640 | 65.7 | 8.6 | 74.8 |
| YOLOv8s | 640 × 640 | 66.2 | 3.7 | 22.5 |
| YOLOv9 | 640 × 640 | 67.7 | 5.8 | 252.4 |
The results on the test set are shown in Table 1. It can be seen that the mAP of different detectors on our dataset is at the level of around 65.0%. We believe that there are still some challenges for the detection of small objects on the water surface.
Acknowledgements
This work is supported by Beijing Science and technology planning project (Z201100001820022), Multi-scale prediction technology and platform for upstream water inflow of plateau lakes - A case study of Siling Co watershed (WR110145B0112024).
Author contributions
Conceptualization, G.Q. and M.Y.; methodology, G.Q.; software, G.Q.; validation, G.Q.; formal analysis, G.Q.; investigation, G.Q. and M.Y.; data curation, G.Q. and M.Y.; writing—original draft preparation, G.Q.; writing—review and editing, G.Q.; visualization, G.Q.; supervision, M.Y. and H.W.; project administration, M.Y. and H.W.; funding acquisition, M.Y. All authors have read and agreed to the published version of the manuscript.
Code availability
Python 3.9 was used for data processing and model training, and the reference code used for training and testing in this work can be found in the GitHub repository at the following links: https://github.com/jwyang/faster-rcnn.pytorch, https://github.com/yhenon/pytorch-retinanet, https://github.com/amdegroot/ssd.pytorch, https://github.com/xingyizhou/CenterNet2, https://github.com/ultralytics/yolov5, https://github.com/meituan/YOLOv6, https://github.com/WongKinYiu/yolov7, https://github.com/ultralytics/ultralytics, https://github.com/WongKinYiu/yolov9, https://github.com/open-mmlab/mmdetection. We also have hosted the algorithms and data processing code used into the FigShare8 repository with the 10.6084/m9.figshare.27376851.v1.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Chiu, C., Liao, C., Kuo, T. & Huang, H., Using citizen science to investigate the spatial-temporal distribution of floating marine litter in the waters around Taiwan. Mar Pollut Bull157 (2020). [DOI] [PubMed]
- 2.Li, W. C., Tse, H. F. & Fok, L. Plastic waste in the marine environment: A review of sources, occurrence and effects. Sci Total Environ566, 333 (2016). [DOI] [PubMed] [Google Scholar]
- 3.Qiao, G., Yang, M. & Wang, H., A Detection Approach for Floating Debris Using Ground Images Based on Deep Learning. Remote Sens-Basel14 (2022).
- 4.Ryu, J. D., Min, H. S., Cho, H. J. & Ha, K. N. B. I., Flow Analysis of Thrust for Cleaning USV. International Conference on Control Automation and Systems, pp. 1657 (345 E 47TH ST, NEW YORK, NY 10017 USA, 2021).
- 5.Helinski, O. K., Poor, C. J. & Wolfand, J. M. Ridding our rivers of plastic: A framework for plastic pollution capture device selection. Mar Pollut Bull165, 112095 (2021). [DOI] [PubMed] [Google Scholar]
- 6.Pan, L., Li, M. & Ma, Y. Application of new intelligent unmanned ships in improving water resources monitoring and river and lake supervision. Water Resources Development Research22, 19 (2022). [Google Scholar]
- 7.Tong, K., Wu, Y. & Zhou, F., Recent advances in small object detection based on deep learning: A review. Image Vision Comput97 (2020).
- 8.Qiao, G., Yang, M. & Wang, H. IWHR_AI_Lable_Floater_V1: An annotated Dataset and Benchmark for Detecting Floating Debris in Inland Waters. figshare10.6084/m9.figshare.27376851.v1 (2024). [DOI] [PubMed]
- 9.Stricelj, A. & Kacic, Z. Detection of objects on waters’ surfaces using CEIEMV method. Computers \& Electrical Engineering46, 511 (2015). [Google Scholar]
- 10.Jin, X., Niu, P. & Liu, L. A GMM-Based Segmentation Method for the Detection of Water Surface Floats. IEEE ACCESS7, 119018 (2019). [Google Scholar]
- 11.Liu, T., Pang, B., Zhang, L., Yang, W. & Sun, X. Sea Surface Object Detection Algorithm Based on YOLO v4 Fused with Reverse Depthwise Separable Convolution (RDSC) for USV. J Mar Sci Eng9 (2021).
- 12.Li, N. et al. DENS-YOLOv6: a small object detection model for garbage detection on water surface. Multimed Tools Appl (2023).
- 13.Zhou, Z. et al. An Image-Based Benchmark Dataset and a Novel Object Detector for Water Surface Object Detection. Front Neurorobotics15 (2021). [DOI] [PMC free article] [PubMed]
- 14.Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. B. I., You Only Look Once: Unified, Real-Time Object Detection. IEEE Conference on Computer Vision and Pattern Recognition, pp. 779 (IEEE Comp Soc; Comp Vis Fdn, 345 E 47TH ST, NEW YORK, NY 10017 USA, 2016).
- 15.Redmon, J. & Farhadi, A. B. I., YOLO9000: Better, Faster, Stronger. IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517 (IEEE; IEEE Comp Soc; CVF, 345 E 47TH ST, NEW YORK, NY 10017 USA, 2017).
- 16.Redmon, J. & Farhadi, A. YOLOv3: An Incremental Improvement. arXiv (2018).
- 17.Bochkovskiy, A., Wang, C. & Liao, H. M. YOLOv4: Optimal Speed and Accuracy of Object Detection (2020).
- 18.Li, C. et al. YOLOv6: a single-stage object detection framework for industrial applications., (2022).
- 19.Wang, C., Bochkovskiy, A. & Liao, H. M. B. I. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. IEEE Conference on Computer Vision and Pattern Recognition, pp. 7464 (IEEE; CVF; IEEE Comp Soc, 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA, 2023).
- 20.Duan, K. et al. CenterNet: Keypoint Triplets for Object Detection. IEEE International Conference on Computer Vision, pp. 6568 (IEEE; IEEE Comp Soc; CVF, 345 E 47TH ST, NEW YORK, NY 10017 USA, 2019).
- 21.Lin, T., Goyal, P., Girshick, R., He, K. & Dollar, P. Focal Loss for Dense Object Detection. Ieee T Pattern Anal42, 318 (2020). [DOI] [PubMed] [Google Scholar]
- 22.Ren, S., He, K., Girshick, R. & Sun, J., Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Advances in Neural Information Processing Systems (10010 NORTH TORREY PINES RD, LA JOLLA, CALIFORNIA 92037 USA, 2015).
- 23.Wang, C. & Liao, H. M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. Cham: Springer Nature Switzerland (2025).
- 24.Wang, A. et al. YOLOv10: Real-Time End-to-End Object Detection. arXiv preprint arXiv:2405.14458 (2024).
- 25.Biliskov, I. & Papic, V., Development of a Seafloor Litter Database and Application of Image Preprocessing Techniques for UAV-Based Detection of Seafloor Objects. Electronics13 (2024).
- 26.Islam, M. J., Xia, Y. Y. & Sattar, J. Fast Underwater Image Enhancement for Improved Visual Perception. Ieee Robot Autom Let5, 3227 (2020). [Google Scholar]
- 27.Li, C. Y., Guo, J. C., Cong, R. M., Pang, Y. W. & Wang, B. Underwater Image Enhancement by Dehazing With Minimum Information Loss and Histogram Distribution Prior. Ieee T Image Process26, 5664 (2016). [DOI] [PubMed] [Google Scholar]
- 28.Yang, M. & Thung, G. Classification of Trash for Recyclability Status.
- 29.Proença, P. F. & Simões, P. TACO: Trash Annotations in Context for Litter Detection. Arxiv (2020).
- 30.Liu, J. Y., Li, H. Y., Luo, J., Xie, S. R. & Sun, Y. Efficient obstacle detection based on prior estimation network and spatially constrained mixture model for unmanned surface vehicles. J Field Robot38, 212 (2021). [Google Scholar]
- 31.Cheng, Y. W. et al. FloW: A Dataset and Benchmark for Floating Waste Detection in Inland Waters. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), pp. 10933 (18th IEEE/CVF International Conference on Computer Vision (ICCV), 2021).
- 32.Renfei, C., Jian, W., Yong, P., Zhongwen, L. & Hua, S. Detection and tracking of floating objects based on spatial-temporal information fusion. Expert Syst Appl225 (2023).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
Data Availability Statement
Python 3.9 was used for data processing and model training, and the reference code used for training and testing in this work can be found in the GitHub repository at the following links: https://github.com/jwyang/faster-rcnn.pytorch, https://github.com/yhenon/pytorch-retinanet, https://github.com/amdegroot/ssd.pytorch, https://github.com/xingyizhou/CenterNet2, https://github.com/ultralytics/yolov5, https://github.com/meituan/YOLOv6, https://github.com/WongKinYiu/yolov7, https://github.com/ultralytics/ultralytics, https://github.com/WongKinYiu/yolov9, https://github.com/open-mmlab/mmdetection. We also have hosted the algorithms and data processing code used into the FigShare8 repository with the 10.6084/m9.figshare.27376851.v1.








