Abstract
Fish is a vital food source, providing essential nutrients and playing a crucial role in global food security. In Tamil Nadu, fish is particularly important, contributing significantly to the local diet, economy, and livelihoods of numerous fishing communities along its extensive coastline. Our objective is to develop an efficient fish detection system in pond environments to contribute to small-scale industries by facilitating fish classification, growth monitoring, and other essential aquaculture practices through a non-invasive approach. This dataset comprises of Orange Chromide fish species (Etroplus maculatus) captured under several computer vision challenges, including occlusion, turbid water conditions, high fish density per frame, and varying lighting conditions. We present annotated images derived from underwater video recordings in Retteri Pond, Kolathur, Chennai, Tamil Nadu (GPS coordinates: Lat 13.132725, Long 80.212555). The footage was captured using an underwater camera without artificial lighting, at depths less than 4 m to maintain naturalness in underwater images. The recorded videos were converted to 2D images, which were manually annotated using the Roboflow tool. This carefully annotated dataset, offers a valuable resource for aquaculture engineers, marine biologists, and experts in computer vision, and deep learning, aiding in the creation of automated detection tools for underwater imagery.
Keywords: Non-invasive analysis, Fish, Growth level monitor, Detection, Annotated images
Specifications Table
Subject | Agricultural Science: Aquaculture, Computer Vision and Underwater Images. |
Specific subject area | Utilizing underwater imagery in challenging pond environments, this project focuses on detecting fish in practical pond settings. By employing advanced image processing techniques, fish are detected within individual frames. |
Type of data | Raw Images, annotation |
Data collection | The cross-tour underwater camera was employed to capture images of fish, positioned at a 90-degree angle with a 135-degree wide-angle lens at a depth of less than 4 me m. Videos were recorded at 60fps, with one-minute intervals between captures. The data collection procedure encompassed pond selection and managing challenges such as low-resolution and difficult images. Keyframe extraction was manually performed to ensure the selection of unique frames, resulting in a curated set of 586 images. These images were carefully annotated using the Roboflow tool software. The dataset was then divided into training, validation, and test sets, comprising 409, 118, and 59 images respectively, each with corresponding labels. The labels are in PyTorch TXT format. |
Data source location | Location: Retteri Pond, Kolathur, Chennai, Tamil Nadu City/Country: Chennai, India GPS Location: Lat 13.132725, Long 80.212555 |
Data accessibility | Repository name: Mendeley data Data identification number: 10.17632/7w45jx35hd.1 Direct URL to data: https://data.mendeley.com/datasets/7w45jx35hd/1 |
1. Value of the Data
-
•
The dataset includes 10,607 annotated fish instances across 586 images. This ensures comprehensive coverage for detection tasks and is critical for evaluating the dataset's utility in training and testing object detection models. The dataset's relevance for aquaculture management, environmental monitoring, fish population assessment, and ecological studies is now clearly highlighted.
-
•
The dataset features several computer vision challenges, including occlusion, turbid water conditions with turbidity levels ranging from 40 to 80 NTU, high fish density per frame, and varying lighting conditions with illumination values between 100 and 500 lux. These factors significantly impact image quality and detection performance, making the dataset ideal for developing robust computer vision models capable of handling complex real-world scenarios.
-
•
This dataset will be particularly beneficial for small-scale aquaculture farmers, enabling them to implement non-invasive fish detection and classification techniques, thereby improving fish population management and enhancing productivity.
-
•
The Fish4Knowledge [1] dataset contains fish videos recorded in open ocean environments, which differ from pond environments that present specific challenges such as turbid water and high fish density per frame, commonly found in aquaculture settings. Similarly, the WildFish [2] dataset offers images of fish in clear water conditions with minimal occlusion, unlike our dataset, which includes images with significant occlusion, providing a more rigorous testbed for developing robust detection algorithms. An advantage of our dataset is that it better simulates real-world conditions found in small-scale aquaculture, making it highly relevant for practical applications and research. Table 1 visualizes sample images of different existing datasets.
-
•
There is a notable lack of annotated data for South Indian Pond underwater environments, making this dataset a valuable contribution to the field by filling this gap and providing unique insights and opportunities for specialized research and applications.
Table 1.
Visualization – comparison chart.
Images | Dataset Name | Environment | Source | Limitation |
---|---|---|---|---|
![]() |
WildFish [8] | Multiple water bodies | Internet Search engines | Not many occluded fishes. |
![]() |
Labelled Fishes in the wild [10] | oceanic waters | National Oceanic and Atmospheric Administration (NOAA) | Captured Only in ocean, low turbidity. |
![]() |
Fish4Knowledge [1] | Ocean | Coast of Taiwan | Low turbidity, illumination effect and not too many fishes. |
![]() |
FishNet [7] | Freshwater and marine environments | Ocean – Not specifically mentioned | No turbidity. |
![]() |
DeepFish [9] | Ocean | Australia | Low turbidity challenge. |
2. Background
The motivation behind compiling this dataset originated from the need to address the challenges faced in monitoring and managing fish populations in small-scale aquaculture settings, particularly in South Indian ponds. Traditional methods of fish population assessment are often invasive, labor-intensive, and prone to inaccuracies, which prompted the exploration of automated solutions using computer vision and deep learning techniques. This dataset, comprising annotated images of Orange Chromide (Etroplus maculatus) fish species captured from underwater videos, was developed to provide a robust foundation for such automated systems. The system we developed is designed for real-time processing, allowing for the detection of fish with minimal latency. The model operates at speeds sufficient to analyze incoming video frames near the frame rate, making it suitable for live monitoring applications. This real-time capability is crucial for small-scale industries and farmers, as it enables immediate, actionable insights without the need for post-processing, further enhancing the system's practical value. The dataset includes various complexities, such as occlusion, turbid water conditions, high fish density per frame, and natural lighting conditions. These factors are common in pond environments but are often underrepresented in existing datasets. This dataset also contributes to the broader field of image enhancement research, offering a practical testbed for developing and testing new computer vision algorithms.
3. Data Description
The dataset comprises 586 images, divided into training (409 images), validation (118 images), and test (59 images) sets, as detailed in Table 2. This custom split ratio was chosen due to the relatively small size of the dataset, with the aim of providing a sufficiently large training set while maintaining adequate validation and test sets for reliable performance evaluation. These images, taken from ponds in Kolathur, Chennai, are formatted at 640 × 640 pixels in .jpg, with corresponding annotations [3] in .txt format. The annotation type implemented here is the bounding box style. The repository is organized into Train, Valid, and Test directories, as illustrated in the Fig 1. The videos, captured at 60 fps with 1-min intervals between recordings, were taken inside the pond at a depth of less than 4 m, covering a wide angle of approximately 135°. From each video, frames were extracted at regular intervals of 60 s, providing sufficient temporal resolution to capture the variability in fish movement and environmental conditions. Although the dataset comprises 586 original images, which may initially seem small for deep learning models, it can be effectively expanded using augmentation techniques such as rotation, flipping, scaling, and color adjustments. These augmentations introduce variations that simulate real-world conditions, helping to mitigate overfitting by enhancing the model's ability to generalize to unseen data. Additionally, we selected this dataset size after preliminary experiments, which demonstrated that, with proper augmentation and regularization techniques, the models did not exhibit significant overfitting. Further, we ensured that the train-validation split was carefully chosen to provide robust performance evaluation. However, as part of future work, expanding the dataset through additional collection may further improve model performance and generalization.
Table 2.
Tabulation for the image count in each folder.
Category | Number of Images with corresponding labels |
---|---|
Train | 409 |
Valid | 118 |
Test | 59 |
Total images | 586 |
Fig. 1.
Organization of folder structure.
The data has been collected and curated in a systematic manner to ensure accuracy and reliability. There is only one class called fish so the class is indicated as 0. For an image of width W and height H, the pixel values for xmin, ymin, xmax, ymax represent the original bounding box coordinates in the image before normalization. The normalized bounding box coordinates are computed as follows:
(1) |
(2) |
(3) |
(4) |
The sample coordinates are calculated for the bounding box annotation and are shown below:
x-coordinate of top-left corner (xmin): xmin_normalized = 121 / 640 = 0.1890625
y-coordinate of top-left corner (ymin): ymin_normalized = 180 / 640 = 0.28125
x-coordinate of bottom-right corner (xmax): xmax_normalized = 125 / 640 = 0.19531
y-coordinate of bottom-right corner (ymax): ymax_normalized = 144 / 640 = 0.225
(class, xmin_normalized, ymin_normalized, xmax_normalized, ymax_normalized) = (0 0.1890625, 0.28125, 0.19531, 0.225)
A legend is provided in Fig 2 to show the original (non-normalized) coordinates for xmin, ymin, xmax, ymax, along with their corresponding normalized values in calculation. This visual guide helps clarify the source of the bounding box annotations and the normalization process.
Fig. 2.
Pixel coordinates values for the image.
4. Experimental Design, Materials and Methods
4.1. Underwater video capture
Underwater images [4] play a critical role in computer vision for fish detection, facilitating the accurate monitoring and management of aquatic ecosystems. Underwater videos were captured from ponds in Kolathur, Chennai, using cameras positioned at less than 4 m m deep. The entire flow process is shown in Fig 3. The videos were taken at 1-minute intervals, ensuring a diverse underwater environment sampling, including varying turbidity levels, low resolution, and direct sunlight conditions. A 64 GB memory card is used for extended underwater video recordings, essential for monitoring diverse pond environments with high fish density and varying conditions. The dataset consists of images with a resolution of 640 × 480 pixels, captured using low-resolution cameras to make the dataset more accessible for small-scale industries and farmers. All 586 images were standardized to this resolution to ensure consistency during training and testing. This standardization helps prevent issues that could arise from varying resolutions, ensuring effective model training while demonstrating the feasibility of cost-effective fish detection and classification.
Fig. 3.
Flow process for Dataset Generation.
4.2. Data compilation
The captured videos were processed to extract 586 images, which were systematically divided into training (409 images), validation (118 images), and testing (59 images) sets. Each image is in 640 × 640 pixel resolution and saved in .jpg format and sample images are shown in Fig 4. Corresponding annotations are provided in .txt format, with each annotation detailing the bounding box coordinates for detected fish [5]. This dataset effectively captures underwater challenges such as foggy appearance, varying light conditions, and occlusions, providing a rigorous testbed for developing robust computer vision models even with low-resolution underwater imagery. Table 3 shows the underwater camera specification.
Fig. 4.
Sample raw images from the dataset.
Table 3.
Underwater camera specification.
Attribute | Details |
---|---|
Camera Model | Crosstour CT9000 |
Sensor Resolution | High-resolution image sensor |
Maximum Pixels | 16 million pixels |
Recording Resolutions | 4 K at 25fps, 2.7 K at 30fps, 1080p at 60fps |
Megapixel Settings | 4MP to 16MP |
Storage | Class 10 microSD cards up to 32GB |
Battery | Rechargeable 3.7 V 1050 mAh battery |
Weight | 64 g |
Field of View | 170-degree wide-angle lens |
4.3. Annotation process
The annotation format varies across different models, influenced by the parent models used. The annotator ensures clear and distinguishable annotations by leaving adequate space around each fish object. This prevents confusion with neighbouring fish and aids in accurate identification during training and analysis. When fish are partially obscured by other objects or fish, the annotator focus on accurately delineating the visible portions. This approach provides a precise representation of the fish's shape despite occlusions [6]. In cases of overlapping fish, each fish is distinctly outlined, maintaining individual identity within the overlapping area. Even though fish fins are transparent, they are annotated to capture their presence and shape accurately. The annotation process involves outlining the fin regions, accounting for their delicate structure, and aiding in the comprehensive understanding of fish morphology. Small fish that are not easily visible to the naked eye, specifically those smaller than 2 pixels in length, are excluded from the annotation process. This threshold was set to ensure that only fish of a detectable size are included in the dataset, reducing noise and focusing on meaningful detections for the model.
Limitations
As images are captured in a natural pond environment, hazy, foggy conditions, and fish occlusions are unavoidable. Researchers may need to implement additional techniques to enhance visibility and clarity.
Ethics Statement
The authors confirm that they have followed the ethical standards required for publication in Data in Brief. They also assert that this study does not include human subjects, animal testing, or data sourced from social media platforms.
CRediT Author Statement
Sasithradevi A: Conceptualization, Methodology, Data Curation, Writing - Review & Editing; Vijayalakshmi M: Data Curation, Writing - Original Draft.
Acknowledgments
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data Availability
References
- 1.Fisher R.B., et al. Fish4Knowledge: collecting and analyzing massive coral reef fish video data. Mach. Vis. Appl. 2014;25(1):17–30. doi: 10.1007/978-3-319-30208-9. [DOI] [Google Scholar]
- 2.Zhuang P., Wang Y., Qiao Y. Proceedings of the 26th ACM International Conference on Multimedia. 2018. Wildfish: a large benchmark for fish recognition in the wild; pp. 1301–1309. [Google Scholar]
- 3.Šiaulys A., et al. A fully-annotated imagery dataset of sublittoral benthic species in Svalbard, Arctic. Data Br. 2021;35 doi: 10.1016/j.dib.2021.106823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Saleh A., Sheaves M., Azghadi M.R. Computer vision and deep learning for fish classification in underwater habitats: a survey. Fish Fisher. 2022;23(4):977–999. doi: 10.1111/faf.12666. [DOI] [Google Scholar]
- 5.Bartunek D., Cisar P. Data for non-invasive (photo) individual fish identification of multiple species. Data Br. 2023;48 doi: 10.1016/j.dib.2023.109221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Qian Z.-M., et al. Automatically detect and track multiple fish swimming in shallow water with frequent occlusion. PloS One. 2014;9(9) doi: 10.1371/journal.pone.0106506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Khan F.F., Li X., Temple A.J., Elhoseiny M. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023. FishNet: a large-scale dataset and benchmark for fish recognition, detection, and functional trait prediction; pp. 20496–20506. [Google Scholar]
- 8.Zhuang P., Wang Y., Qiao Y. Proceedings of the 26th ACM International Conference on Multimedia. 2018. Wildfish: a large benchmark for fish recognition in the wild; pp. 1301–1309. [Google Scholar]
- 9.Saleh A., Laradji I.H., Konovalov D.A., Bradley M., Vazquez D., Sheaves M. A realistic fish-habitat dataset to evaluate algorithms for underwater visual analysis. Sci. Rep. 2020;10(1):14671. doi: 10.1038/s41598-020-71639-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Cutter G., Stierhoff K., Zeng J. 2015 IEEE Winter Applications and Computer Vision Workshops. IEEE; 2015. Automated detection of rockfish in unconstrained underwater videos using haar cascades and a new image dataset: labeled fishes in the wild; pp. 57–62. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.