Abstract
Machine learning is a powerful tool to improve efficiency of industrial processes, but it has not yet been well utilized in aquacultural and hatchery applications. The goal of the present study was to evaluate the feasibility of using a broad array of machine learning approaches (testing of > 200 vectorization and model combinations, reporting on 20) to classify ultrasound images spanning annual ovarian development (i.e., from undeveloped to mature) of channel catfish (Ictalurus punctatus). The specific objectives were to: 1) establish dataset preprocessing to standardize image features; 2) develop and train image classification models with deep learning methods; 3) develop and train models with traditional machine learning methods; 4) compare performance of deep learning and traditional methods on two classification problems (2-class and 5-class), and 5) propose insights to deploy models in practical aquaculture applications for research and hatchery use. A total of 931 ultrasound images of catfish ovaries were used to train and evaluate models for a 2-class problem (as a ‘yes’ or ‘no’ answer) to support hormone-injection decisions for spawning management in hatcheries, and a 5-class problem for classifying gonadal development stages for research. By using feature extraction, cropping, dimension reduction, and histogram normalization, a preprocessing method was created to standardize images to develop traditional (i.e., vector input), and deep learning convolutional neural network (CNN) (i.e., image input) approaches. Traditional machine learning models with image classification achieved 100% median accuracy on the 2-class problem (with the models RN-50 and RN-152), and 96% median accuracy for the 5-class problem (with VGG-19 image vectorization). The deep learning approach for the 2-class problem had a median accuracy of > 98% for 15models. The 5-class deep learning models produced a steady increase in median accuracy with training net size, achievable through expansion of the dataset. These models can be developed further, but traditional models (using CNN architectures to simply calculate image vectors) outperformed the deep learning approach. These models can be directly applicable to aquaculture in hatcheries and reproductive biology research, in addition to a wide variety of other image-based applications.
Keywords: Hatchery, Machine Learning, Channel catfish, Ovarian development, Ultrasound, Image Classification
1. Introduction
Corporate and industrial application of machine learning in all its forms is revolutionizing our world and daily lives. The relationships extracted from data frameworks can be practical for tasks such as image classification, a process too complex to be solved by a list of rules or by classical computer algorithms. Machine learning is at the core of artificial intelligence and is used for applications such as image processing, especially for images of different resolutions and formats (Lemley, et al., 2017; Lézoray, et al., 2008). This technology has become increasingly available and is finding application in biological research, in particular where multidisciplinary or interdisciplinary teams can be assembled. Machine learning has just begun to be utilized in aquaculture applications and research (Yue and Shen, 2021; Zhao, et al., 2021), such as feeding intensity (Zhou, et al., 2019), biomass detection (Yang, et al., 2021), size estimation (Monkman, et al., 2019), fish counting (Hernández-Ontiveros, et al., 2018), sex identification (Barulin, 2019), and water quality prediction (Cao, et al., 2020).
There are numerous approaches in machine learning, each comprising algorithms or sets of computational methods where datasets are used to train models to predict desired information (Mohri, et al., 2018). Deep learning and traditional machine learning are two major approaches. In the context of machine vision, traditional machine learning uses vectorized images, focuses on relevant features, and interprets the data through analytical models based on patterns identified by the algorithms. As the problems became more complex, the need for a scalable approach (e.g., deep learning) was required. The “Perceptron” was the first deep learning framework developed by neuroscientists to model information organization in the brain (Rosenblatt, 1958). This framework allowed scalability by stacking layers (increasing depth) or expanding the number of outputs within each layer (increasing width). This enables higher complexity models to be automatically created from raw data without specific knowledge of the image context (Stanik, et al., 2019). Deep learning transforms the inputs repeatedly before producing an output, whereas traditional methods do not rely on multiple transformations (Wang, et al., 2020). Deep learning has been further developed beyond the original Perceptron model as it has expanded into different fields. For example, convolutional neural network (CNN) architectures utilize time series, natural language (such as speech or writings), and other data with an underlying grid geometry to assist imaging analysis. The most simplistic CNN architecture employs multiple layers (e.g., convolutional and pooling) to increase model depth. Additional convolutions can be applied at each layer to increase the model width.
Ultrasonography is an imaging technology that allows viewing and evaluation of internal structures (e.g., tissues and organs) without the use of invasive procedures such as surgery, biopsy, or killing of the organism. This imaging technology is gaining use in fish reproduction in a wide range of species to identify sex in species that would otherwise be problematic or impossible based on their external appearance, and to assess gonadal development (Novelo and Tiersch, 2012). The information obtained can be used to make decisions in hatcheries to increase reproduction efficiency, providing benefits for research and management of cultured fishes, wild fisheries, and conservation of imperiled species. It is safe, and thus appropriate for use with valuable populations (such as endangered fishes or improved hatchery broodstocks) causing minimal stress and providing rapid assessments. However, a limitation of ultrasound imaging is the requirement of trained personnel to interpret the images. The time required to train depends on previous knowledge and skill levels. This can take three to four months (preferably starting a few weeks prior to the channel catfish spawning period) for users with ultrasonography experience, and knowledge of fish anatomy and reproductive cycle. For new users without this background, it can take up to one year for the duration of the annual reproductive cycle (Novelo and Tiersch, 2016). Machine learning provides opportunities to analyze and classify ultrasound images rapidly and automatically.
The goal of this study was to evaluate the practical feasibility of using a diverse array of machine learning approaches to classify ultrasound images spanning ovarian development (i.e., from undeveloped to mature) of channel catfish (Ictalurus punctatus). The specific objectives were to: 1) establish dataset preprocessing to standardize image features; 2) develop and train image classification models with deep learning methods; 3) develop and train models with traditional machine learning methods; 4) compare performance of deep learning and traditional methods on two classification problems (2-class and 5-class), and 5) propose insights to deploy models in practical applications for research and hatchery use. This was a comprehensive assessment of the major approaches used for image classification by machine learning. Overall, machine learning offers great opportunities for decision making and research applications in aquaculture, and this study outlines the basic considerations in identifying appropriate models, their training methods, and deployment for use.
2. Materials and methods
2.1. Ultrasound image acquisition and identification
Ultrasound images of ovarian reproductive condition in channel catfish were used as the dataset. These images were acquired for research aimed to improve efficiency of artificial reproduction for commercial-scale hatcheries in production of hybrid catfish (by crossing of female channel catfish with male blue catfish I. furcatus) (Novelo and Tiersch, 2016). Ultrasound images were obtained using a portable TelaVet 1000 (Classic Medical, Tequesta, FL) ultrasound system and a multiple frequency (5 to 8 MHz range) waterproof probe (model LV7.5/60/96Z). Ultrasound settings were manipulated using the TelaVet software user interface: frequency (ultrasound wave probe emission) was set at 5 or 8 MHz; depth of ultrasound penetration was set at 80 or 110 mm, and gain (amplification of the returning ultrasound echo signal) was set at 58% or 70% (Guitreau, et al., 2012; Novelo and Tiersch, 2016).
During image capture, fish were completely submersed in water in nets (in recirculating systems or flow-through raceways), or in a portable container (Sportsman™ 52 Quart, Igloo Products Corp., Katy, TX) adjacent to broodstock ponds. The ultrasound probe was aligned alongside the dorsal fin and placed adjacent to the ventral left side of the catfish (Guitreau, et al., 2012). Ultrasound images were collected at the Aquaculture Research Station of the Louisiana State University Agricultural Center to assist commercial-scale production of hybrids at the Baxter Lands Company Hatchery in Desha County, Arkansas.
The dataset used in this study consisted of 931 ultrasound images (Novelo and Tiersch, 2016). Originally, a larger dataset including these images was classified into seven stages based on observable biological features of ovarian development by a team of experts with > 5 years of training on interpretation of ultrasound images of catfish ovaries. The stages were classified (Novelo, 2014) as follows: 1 (undeveloped), 2 (under-developed), 3 (developing), 4 (advanced), 5 (mature), 6 (spawned), and 7 (atretic) with specific features describing each. Stages 1 through 5 represented the developmental progression of the ovary from its incipient state (i.e., Stage 1, outside the spawning season) through gradual maturation until the point of spawning. At Stage 5 spawning is generally imminent, but in a hatchery setting gonadotropic hormones are typically injected to synchronize and intensify egg release when fish are in late phases of ovarian maturity (i.e., ultrasound Stages 4 and 5). For the purpose of evaluating machine learning in this study, we chose to include only images from Stages 1–5 to focus on ovarian maturation, with specific emphasis on support of decision making for hormone injection. Stages 6 (spawned) and 7 (atretic, or regressing) are typically easily identified by external examination of the fish or occur outside of the spawning season. Each image in the dataset had thus been identified with a Stage from 1–5.
A broad range of machine learning models were selected and trained to address two problems: 2-class or 5-class classification. In the 2-class problem, a binary classification (i.e., pooling of stages 1–3 as being not ready to spawn, and pooling of stages 4–5 as being ready to spawn after injection) was used, which would be useful to determine whether to administer gonadotropic hormone. The 5-class problem was established based on the corresponding developmental sequence (i.e., segregating the ultrasound Stages 1–5), which would be useful for researchers to study reproductive biology, for example through the annual cycle. Details on the machine learning models selected for this study are discussed below. The overall evaluation strategy used in the present study is shown in Fig. 1.
Fig. 1.

An overview of the strategy used in the present study for testing, evaluation, and deployment of a broad range of machine learning models for ultrasound image classification.
2.2. Data preprocessing
Image preprocessing was performed using the Wolfram Language (Wolfram Research, 2021) before training of machine learning models to reduce undesired artifacts and thus increase prediction accuracy. The first task in preprocessing removed extraneous variation in images caused by differences in device operation and software usage. The raw images comprised two groups (Fig. 2) based on two batches of image collection that used different settings, including frequencies, focus settings, storage file types, and probe positions. Both groups were included to represent the diversity in image types and complications that would be normally encountered in machine learning applications.
Fig. 2.

Ultrasound images of channel catfish ovaries were collected with two different groups of instrument settings (top) yielding differences in contrast and positioning that required preprocessing prior to machine learning analyses. A progression of ovarian developmental stages was included in the dataset, with no attempt to balance the numbers (bottom) of images across the stages.
Images in Group 1 were collected at a frequency of 8 Mhz (and focus setting of 10 to 15 mm) as “.bmp” files; whereas images in Group 2 were collected at 5MHz (and focus setting of 35 mm) as “.tiff” files. Images collected with higher frequencies (e.g., Group 1) tend to provide higher resolution and contrast with less penetration into adjacent tissues. The lower contrast of images from Group 2 was corrected by normalizing the image histograms to fit a uniform distribution of grayscale values. In addition, images in Group 2 were positioned more centrally, whereas ovaries from Group 1 were positioned closer to the upper edge of the images (Fig. 2). To address these positional differences, all images were cropped to contain 200 rows of pixels below the first row from the top to the bottom that surpassed a fixed value (50) of summed pixel intensity, calculated by addition of the intensity value of 407 pixels in each row, which ranged from 0 (black) to 1 (white). This automatic cropping step standardized the location of the ovary in each image without affecting its shape.
2.3. Deep learning
For readers without previous background, general deep learning approaches are reviewed by Razzak, et al. (2018). Models using CNN optimize convolutional filters that are activated by specific patterns as they pass over an image. These models often require large datasets to build effective perceptual filters. When this is not feasible, “transfer learning” can be used to take advantage of existing networks that were previously trained on a large dataset among user communities for specified tasks (Torrey and Shavlik, 2010). These pre-trained networks can be repurposed to perform a comparable task on a new dataset. When utilized in image classification, the convolutional filters are left unchanged and only the final prediction layers are retrained on the new dataset. This study refers to “standard transfer learning” as the process of retraining pre-trained networks using an optimization method called “mini-batch stochastic gradient descent” (SGD) (Qian, et al., 2015).
The CNN architectures (Fig. 1) were selected from iterations of existing models designed for the ImageNet Large Scale Visual Recognition Challenge (https://image-net.org/challenges/LSVRC/): SqueezeNet (SN), MobileNet (MN), EfficientNet (EN), ResNet (RN), Inception Net (IN), Squeeze-and-Excitation Net (SEN), and networks by the Visual Geometry Group (VGG). ImageNet is a free-of-charge image database, which comprises more than 15 million labeled high-resolution images in over 22,000 categories (Deng, et al., 2009; Krizhevsky, et al., 2012). The models presented here were trained with transfer learning (after fine-tuning their convolutional layers with the ImageNet dataset) and also without transfer learning (training using only the ultrasound data from this study) for comparison. The Wolfram Language and open-source machine learning platform TensorFlow (Abadi, et al., 2016) were used for CNN development and training. ImageNet pre-trained models were obtained from the Wolfram Neural Net Repository (https://resources.wolframcloud.com/NeuralNetRepository/). Models that were not trained using transfer learning were trained using random weight initialization (Cao, et al., 2018).
Over the past decade, models developed for the ImageNet competition have introduced many advances in the design of CNN architectures. While a single convolution only acts locally on an image, the engineers who designed VGG started the trend of stacking convolutions such that each successive layer would receive inputs from a larger portion of the original image, increasing that layer’s receptive field (Simonyan and Zisserman, 2014). Deep models have been demonstrated to be more efficient at modeling higher complexity functions than wide neural networks (Eldan and Shamir, 2016). A common issue that affects deep networks is that the optimizer has difficulties recognizing how early layers affect the final prediction, known as the “vanishing gradient problem”.
The RN and IN architectures each employ mechanisms to account for this effect. The IN architecture uses intermediate classifiers at early layers with weighted effects on the final loss function to account for the vanishing gradient. This model also employs 1 × 1 convolutions as dimension reduction by projecting all features back down into a single channel (Szegedy, et al., 2015). The RN architecture uses connections between non-sequential layers (skip connections) so that the gradient can backpropagate around layers that may otherwise cause it to decrease (He, et al., 2016). Increasing the depth of the network yields higher-order features, but if the number of feature maps grows too large the final layers may have trouble discerning which are most relevant. The architecture of SEN trains fully connected layers to adaptively weight the feature maps. This allows the network to discriminate by feature importance (Hu, et al., 2018).
The complex architectures above are computationally expensive to train and run. Internal 1 × 1 convolutions are used by SN as bottlenecks to reduce the computational load on larger filters. This model also reduces its number of parameters by not using fully connected layers for final prediction (Iandola, et al., 2016). For mobile devices, MN was designed for minimal computational cost and complexity by employing low cost convolutions (Howard, et al., 2017). Finally, the goal of EN was to balance architectural factors such as network width and depth to minimize complexity while still achieving high performance on ImageNet (Tan and Le, 2019).
2.4. Traditional models
Traditional machine learning does not make use of the spatial geometry in images and uses numerical vector data as inputs instead. Two approaches were used in this study for image vectorization. The first approach used the ImageNet pre-trained CNNs described above as static feature extractors. The second approach used statistical methods of dimension reduction to convert the images into vectors of 2,000 components. The linear methods (Fig. 1) used included principal component analysis (PCA), latent semantic analysis (LSA), and Hadamard projection (HP). The non-linear manifold learning techniques used were t-distributed stochastic neighbor embedding (TSNE) (Van der Maaten and Hinton, 2008), local linear embedding (LLE) (Roweis and Saul, 2000), isometric mapping (IM) (Tenenbaum, et al., 2000), and multi-dimensional scaling (MS) (Torgerson, 1952).
These numeric representations of the images were subsequently used to train traditional models, including k-nearest neighbors (KNN), multi-layer perceptron (MLP), logistic regression (LR), gaussian naïve bayes (gNB), linear support vector machine (lSVC), as well as ensemble methods such as random forest classifier (RFC), gradient boosting (GB), adaboost (AB), and finally a “hard vote classifier” among lSVC, GB and RFC predictions (VC). Many of these models without inherent feature selection will generalize best when their inputs are carefully selected for relevance to the classification target (Bellman, 1966). This was achieved by sorting each extracted component by its correlation to the ovarian stages of development and iteratively training on higher dimensional data. For each model, the best set of input features was selected based on validation accuracy. The optimal model configurations, as described by hyperparameters, were found using a randomized grid search. Traditional methods were evaluated using the Scikit-learn machine learning library for the Python programming language (Pedregosa, et al., 2011). The Wolfram Language was used for dimension reduction tasks and vectorization via CNNs.
Thus, a broad variety of traditional models were chosen for comparison to deep learning including KNN, MLP, SVC, RFC, AB, and GB. Of these, KNN was the simplest of the traditional models used, which assigns a label based on the average class of the k nearest vectorized images. The MLP is the classic (fully connected) deep learning model comprised of feedforward layers. These are the same as the final layers used for classification in CNNs but in this case they may have different hyperparameters than those trained in transfer learning. Support vector classifiers use a function to map their inputs into a higher dimensional space where the data can be separated by a hyperplane. In the case of lSVC, this function is linear which allowed us to efficiently train the model on a large dataset. These models were designed for binary classification but can be used for multi-class problems by taking a “one vs. many approach” (Pedregosa, et al., 2011).
Decision trees use a metric to categorize the homogeneity of the target variable in subsets of their training set. The learning algorithm recursively partitions the input domain until further splits no longer add value according to the homogeneity metric. Boosting methods such as AB and GB train a collection of weak classifiers (decision trees were used in this study) iteratively by prioritizing the previously misclassified examples with each new classifier added to the ensemble. For example, AB will prioritize misclassified examples by assigning them a higher weight when training the next decision tree. This makes each classifier focus on what the previous model missed, and the final classification is made by an accuracy-weighted “majority vote” of the ensemble. On the other hand, GB trains more like gradient descent, adding new decision tree outputs to the final prediction in an attempt to correct prediction error, instead of updating the original decision tree to reduce loss.
It is useful to understand what features or regions of images are used by computer vision models to determine prediction. To explore this, tools that generate Class Activation Maps (CAM) have been developed (Zhou, et al., 2016). This approach works for CNN models in which prediction layers (i.e., softmax activation) are directly preceded by convolutional layers. A gradient-weighted CAM (or Grad-CAM) approach has been developed to improve the CAM technique (Selvaraju, et al., 2017). Grad-CAM can be used with array of CNN architectures wider than the original CAM, such as those with fully connected layers (e.g., VGG). An existing Grad-CAM algorithm (sourced from https://keras.io/examples/vision/grad_cam/) was applied to visualize active areas used for classifying images in the 2-class scheme. GradCam was applied to visualize heatmaps generated by the RN-50 architecture.
2.5. Model evaluation and comparison
2.5.1. Classification accuracy
Classification accuracy can be defined as the total number of correct predictions divided by the total number of predictions made. To minimize skewing caused by uneven class distributions, averaged accuracy was used for model comparison in this study. Averaged accuracy was calculated by using a one-vs.-many approach, adding the accuracies of each class and dividing by the number of classes (Sokolova and Lapalme, 2009). This ensured that all classes had equal weight on the model accuracy score even if a class was over-represented in the data.
To increase sample size of classification accuracy calculation, K-fold cross validation was used as a training strategy which involved partitioning the dataset into K equal-sized groups (“folds”) such that no samples were repeated between the folds. A value of K = 5 was chosen for the models used in this study. One unique fold was held out for validation, and the remaining four partitions were used as training data. Each model was trained five times, using a different validation partition each time. The performance of the model on each distinct validation fold of the dataset was considered when reporting the overall performance of the model (Raschka, 2018). Because each model was trained five separate times resulting in five sets of results, median values (referred to as ‘median accuracy) among the five values for averaged accuracy were used for model comparisons (Harmon, et al., 2021). Score calculation was done from the confusion (error) matrices reported by the free machine learning library scikit-learn (Pedregosa, et al., 2011) and the Wolfram Language (Wolfram Research, 2021).
2.5.2. Computational resource needs
The floating-point operations (FLOPs) metric was used to estimate the requirement of computational resources independent of hardware. This method measures computational requirements by counting the FLOPs performed during the training process (Bianco, et al., 2018; Srivastava, et al., 2021). For convolutional neural networks, 2-D convolutional layers typically contributed the majority (> 90%) of FLOPs (Srivastava, et al., 2021). The time complexity of all convolutional layers in a CNN can be expressed (He and Sun, 2014) as:
where d is the total number of convolutional layers in the model, nl is the number of filters, nl-1 is the number of input channels to layer l, sl is the spatial size or kernel of the filter, and ml2 is the spatial size of the output feature map.
In complexity analysis, this notation describes the limiting behavior of the function describing the total number of convolution operations, which is dependent on the parameters in each convolutional layer. The sum of the product of these parameters for each convolutional layer is equal to the total number of convolution operations the model performs in a single forward pass during training. Each deep learning model used in this study has a different architecture, so these parameters varied depending on the specific objectives of the model.
This metric was used to compare the deep learning models (traditional models are less computationally intensive), and the number of FLOPs was used to determine which models would require more operations, and thus take longer to train. For training, the network feeds the image forward through each layer, and updates its parameters from the last layer to the input layer. For testing, the model does not update its parameters, and only uses a single forward pass through the network to make a prediction, which is less computationally demanding than training (He and Sun, 2014).
3. Results
3.1. Preprocessing
Preprocessing increased uniformity (Fig. 3) in contrast and the positioning of the ovary in the images collected by use of different settings (Groups 1 and 2). Histogram normalization ensured that there were no superfluous statistical differences in the dataset other than those due to the ultrasound image content. Empty areas above and below ovaries were cropped to enable uniform positioning.
Fig. 3.

Preprocessing of ultrasound images of channel catfish ovaries. Images were classified into five ovarian development stages (top) by experienced researchers prior to preprocessing. After preprocessing (center) the images showed uniform contrast and positioning. Example images (bottom) of Stages 1 and 5 show the developmental range and observable changes in internal morphology that occur as the ovaries mature: enlargement of oocytes, expansion of the ovary, thinning of the body wall.
3.2. Deep learning
The median accuracy of transfer learning using ImageNet pre-trained models was compared for the 2-class and 5-class problems (Fig. 4). The two-class problem was solved with a median accuracy of above 98% for 11 of 15 ImageNet pre-trained models. The models which achieved superior (100%) median accuracy were RN-50 and RN-152. The 5-class problem was more challenging and led to a wider variation of accuracy scores. The best performing model was EN followed by MN V2, RN-50, and SE-RN-101. The worst performing model for transfer learning in both the 5- and 2-class problems was IN V1. This model was improved by methods described in the Discussion through creation of another architecture (SE-IN-BN).
Fig. 4.

Classification accuracy of machine learning prediction by use of deep learning. Accuracies were obtained from models trained by use of transfer learning with ImageNet dataset (left), and trained by use of random weight initializations (right). The floating-point operations (FLOPs) count assumes two output classes. Predictions for the 2-class problem tended to result in higher accuracy and lower variation compared with those for the 5-class problem. Shapes represent median accuracy, and bars represent the minimum and maximum accuracies (five accuracy scores were obtained from cross validation but three are reported herein to enhance visual clarity).
Models trained from random weights largely outperformed the transfer learning results, with the exception of the minimal versions such as EN and SN (Fig. 4, Table 1). The 2-class problem was solved with > 98% median accuracy by all models, and with 100% median accuracy by 8 of 15 architectures. In the 5-class problem, all non-minimal architectures achieved above 88% median accuracy. The median cross-validation scores for these models were nearly indistinguishable (i.e., all contained within an interval of size 0.01).
Table 1.
Comparison of median accuracy and computational resource needs by evaluation of floating-point operations (FLOPs) for convolutional neural network (CNN) models on the 5-class problem. Highest accuracies among the different approaches are indicated by bold font.
| CNN Model | Median accuracy |
FLOPs (× 109) | Parameters (× 106) | ||
|---|---|---|---|---|---|
| Transfer learning | Random weights | Traditional learning | |||
| VGG-19 | 0.869 | 0.886 | 0.982 | 139.6 | |
| SEN-154 | 0.883 | 0.881 | 0.980 | 66.2 | 113.3 |
| VGG-16 | 0.864 | 0.894 | 0.978 | 15.5 | 134.3 |
| SE-RN-101 | 0.886 | 0.893 | 0.976 | 7.6 | 47.4 |
| RN-50 | 0.884 | 0.898 | 0.974 | 3.9 | 23.6 |
| IN V1 | 0.850 | 0.895 | 0.971 | 1.6 | 6.0 |
| SE-IN-BN | 0.879 | 0.891 | 0.971 | 2.0 | 10.9 |
| SE-RNX-101 | 0.882 | 0.891 | 0.964 | 22.8 | 47.0 |
| RN-101 | 0.867 | 0.895 | 0.917 | 7.6 | 42.6 |
| wRN-50–2 | 0.879 | 0.898 | 0.852 | 11.4 | 66.9 |
| IN V3 | 0.876 | 0.896 | 0.853 | 5.7 | 23.9 |
| RN-152 | 0.876 | 0.891 | 0.889 | 11.3 | 58.3 |
| EN | 0.885 | 0.872 | 0.850 | 15.3 | 4.1 |
| MN V2 | 0.883 | 0.874 | 0.852 | 11.0 | 6.2 |
| SN V1.1 | 0.874 | 0.864 | 0.861 | 0.4 | 1.2 |
3.3. Traditional models
Traditional models learn from data with various approaches. Most of these models performed similarly when the vectorization method was fixed, so the results in this section will be focused on comparing those methods against each other. This portion of the study sought to compare general dimension reduction methods to vectorization by CNNs pre-trained on the large dataset used in the ImageNet competition. Once again, the 2-class problem was solved with a median accuracy of > 98% using most approaches (17 among 20) (Fig. 5A). For the 5-class problem, 10 of 15 pre-trained CNNs outperformed their scores derived from transfer learning with a mean increase of 5% accuracy. The highest median accuracy was 98% when VGG-19 was used for image vectorization (Fig. 5B). The problem of dimensionality that hindered the high-parameter models when trained using transfer learning via SGD was less significant when these models used dimension reduction in a traditional machine learning pipeline, especially for models like RFC with inherent feature selection (Fig. 4, Fig. 5B, Table 1).
Fig. 5.

Classification accuracy by use of traditional machine learning. Accuracies reported from cross-validation for traditional models on vectorized images for the 2-class problem (A) and 5-class problem (B). A five-fold cross validation method was used to increase sample size of average estimation. The results from the five validations are reported as bars (lowest and highest accuracies) and three lines (second highest, median, and second lowest accuracies) within the boxes. Numbers by the boxes indicate the number of input used/the total number of encoded components. Boundaries of boxes indicate the 25% and 75% quartiles. Blue boxes represent statistical methods and orange boxes represent convolutional methods.
The models that produced highest median accuracies for the two problems were chosen to compare their performance among the various vectorization techniques (Fig. 6). The number of input vectors that will be useful for a traditional model will be heavily based on the vectorization method and whether the model has built-in feature selection. The impacts of the vectorization method may best be seen by comparing SEN convolutional encoders to the simpler linear dimension reduction methods on the 5-class problem (Fig. 6B).
Fig. 6.

The effect of input dimension on top-performing traditional models. The median accuracies for lSVC on the 2-class problem (A) and RFC on the 5-class problem (B) are shown as a function of vectorized image dimension. The vector components for each encoder were selected in order of their correlation with the classification target. Models relevant to the Discussion and deployment comparisons are labeled by their abbreviations. Solid lines indicate convolutional models and dashed lines indicate traditional models.
The class activation map (Grad-CAM) showed little activation (Fig. 7) within the ovary regions when machine learning provided incorrect prediction. When correct predictions were made, pixels were active within the ovary for Negative and Positive classes, indicating that the model used these regions for differentiation. This tool can assist selection of machine learning models and evaluation of effectiveness of image preprocessing.
Fig. 7.

Heatmap of image regions by their influence on the output classes of a trained RN-50 model. Color heatmap showing pixel-level activations computed by Grad-CAM for two representative images from the 2-class problem (i.e., “Injection” or “No injection”). The intensity of red coloration indicates the strength of influence on decision making by the model.
3.4. Deployment and computational resources
For model deployment, the complexity and computational cost of top-scoring models were key considerations. For the 2-class problem faced by hatcheries (to inject hormone or not), accurate models would be available with low computational cost. An example of a low-cost approach would be PCA dimension reduction followed by a feed-forward network with one hidden layer. This pipeline would involve 3 matrix multiplication operations to proceed from the input image to prediction with 98% accuracy (Strang, 2019). Near-100% median accuracy was achievable by most deep convolutional approaches which can process images quickly in batches on a graphics processing unit. The best models for the 5-class problem included a deep convolutional vectorization step followed by a traditional machine learning estimator. Each of these steps could be parallelized individually using open-source frameworks.
The effect of dataset size on model accuracy was an important factor for implementation and computational cost. The RN-50 network was chosen for comparison between complete training of the full network and use of transfer learning (Fig. 8). Subsets of the dataset were constructed by taking incrementally larger portions of the training set for each fold while the validation set was held constant.
Fig. 8.

Effect of dataset size on model performance. Median accuracy of the RN-50 architecture as a function of the number of images used for training. Transfer learning of the final layers outperformed training the entire model from random weights on small datasets. Bands enclose the 0.25 and 0.75 quartiles of the cross-validation scores.
Transfer learning takes advantage of learned features (convolutional filters in this study) from a large dataset, and applies them in a related task for a smaller dataset. Because the deep learning models trained by use of transfer learning did not re-learn any convolutional filters, the computational resources required for training were far less than for the models with random weight initialization (Fig. 8). Random-weight models have more trainable parameters than do transfer-learning models, thus they can be more finely tuned to an application. The initial random state of these models requires them to first learn the relevant features de novo, which can negatively affect their performance on smaller datasets.
The pre-trained model yielded 95% median accuracy on the 2-class problem when trained with <10 labeled images. When restricted to training with such sample sizes, the full RN-50 network with random weights was more prone to becoming too specified for generalization to novel data (a phenomenon called “overfitting”). For the 2-class problem, only 100 samples were needed for both approaches to produce above 98% median accuracy. For the 5-class problem, both models showed a steady increase in accuracy with training set size. This suggests that a higher accuracy may be achievable by deep learning models on the 5-class problem with a larger dataset.
4. Discussion
Ultrasound imaging technology continues to play an increasingly important role in management of wild and cultured fish for improving reproductive control and efficiency as evidenced by the increasing number of studies and species addressed with this technology (Novelo and Tiersch, 2012). These studies addressed sex identification and reproductive assessment in species including European Bass Dicentrarchus labrax (Macrì, et al., 2013), Sockeye Salmon Oncorhynchus nerka (Frost, et al., 2014), European Eels Anguila anguilla (Bureau du Colombier, et al., 2015), Channel Catfish Ictalurus punctatus (Novelo and Tiersch, 2016), sturgeons (Golpour, et al., 2021; Masoudifard, et al., 2011; Memiş, et al., 2016; Munhofen, et al., 2014), and Burbot Lota lota (McGarvey, et al., 2021).
Machine learning has great potential in solving problems for commercial aquaculture and research. The recent application of machine learning in aquatic species has led to a variety of approaches for estimation of biomass metrics and water quality, as well as classification of species, sex, maturity, and behavioral patterns of fish (Zhao, et al., 2021). Computer vision techniques based on CNN feature extraction have been used for automatic counting and species prediction (Li, et al., 2021). The present report provides an initial evaluation of how computer vision models can be identified and selected for accurate and actionable decision making in hatcheries and research settings. Classification of ovarian stages is key to timing of hormone injection and is currently identified visually in hatcheries by operators. The approach outlined here identified several models with high accuracy on this binary prediction and other models suitable for more precise identification of discrete stages.
4.1. Preprocessing
Preprocessing of the images was used to make the dataset more uniform, which enabled the models to recognize important image features with reduced training and computational resources. This has been demonstrated to be beneficial in other image classification tasks (Bayramoglu, et al., 2015). Standardization of image positioning was necessary because deep convolutional models with large receptive fields (e.g., VGG) are sensitive to the position of objects within images (Kayhan and Gemert, 2020). Histogram normalization also reduced statistical variation among samples due to intensity and contrast.
4.2. Deep learning
The top performing transfer learning models were RN, SENet, and the minimal architectures. Transfer learning, which used SGD to train the final classification layer, was less accurate for architectures such as VGG with more than 100 × 106 internal parameters in their final layers. Because SGD was trained on randomly selected subsets of data, the input distribution of each layer varied based on the samples used in each batch. Incorporating batch normalization (BN) made this a part of the network architecture which allowed internal layers to be optimized faster (Ioffe and Szegedy, 2015). Models that trained for longer times were less likely to generalize well on unseen data (Strang, 2019). Every transfer learning model that achieved above 88% median accuracy for the 5-class problem used BN in its design (the models not including BN layers were IN V1, SN-V1.1, VGG-16, and VGG-19). Another class of networks with useful transfer learning results were SENets, which included a mechanism in their skip connections to weight extracted image features. The same networks with SENet connections outperformed their counterparts when used for transfer learning. This can be seen in RN-101 and the same network with SEN connections (SE-RN-101). A combination of BN layers and SENet connections were added to the IN V1 model to create SE-IN-BN. These architecture changes each resulted in a 2%−3% increase in accuracy for the 5-Class transfer learning problem.
Complete models trained from random initializations required more computational resources, but they produced overall better results. The top performing model was wRN-50–2, followed by RN-50. Deep neural networks can recognize high-level features but suffer from vanishing gradients. This is critical when the complete model is trained as the gradient must progress to early layers for fine-tuning. Early intermittent classifiers were used by IN and residual skip connections were used by RN to increase model depth and facilitate learning. The latter design technique was effective in decreasing the amount of gradient confusion that affects deep or over-parameterized networks trained using SGD (Sankararaman, et al., 2020).
The opposite trend of increased SENet performance was observed when the complete models were trained from random initializations. The differences between IN V1 and RN-101 with their SENet counterparts were less than 1%. This may be caused by requirement of a larger dataset to be completed effectively for determination of feature weights. On the original ImageNet dataset, the median accuracy of IN V1 improved from 69% to 76% when modified to SE-IN-BN and RN-101 improved from 76% to 78% median accuracy with the addition of SENet connections in SE-RN-01.
These results highlight the need for a large, open repositories of images to support ultrasound applications. The transfer learning in the present study came from the ImageNet dataset, which contains 1000 classes of images with three color channels (i.e., RGB). The ultrasound dataset used in this study contained 5 classes of images with a single-color channel (grayscale), which means that some features learned from the pre-trained models were less useful when applied to the ultrasound classification task in the present study. Models pre-trained on a large ultrasound dataset may be a focus of future studies and could offer greater accuracies than models pre-trained on datasets such as ImageNet. The size of the target dataset and the available computing resources for this type of classification task would dictate whether applying transfer learning is an appropriate method.
4.3. Traditional models
The model provided the highest median accuracy for the 2-class problem was linear-SVC (lSVC), this is a linearization of a support vector machine which was designed for binary classification. This model performed best with LLE vectorization, which locally acts like linear PCA on the neighborhood of each point and combines these maps to a global non-linear dimension reduction (Pedregosa, et al., 2011). In the 5-class problem, the convolutional vectorization methods largely outperformed the statistical methods. This was also true for the 2-class problem with some exceptions for LLE, PCA, and LSA. An observable trend for the more complex 5-class problem was that increasing the number of convolutional vectorized inputs generally increased the accuracy. In the detailed results with RFC, encoders such as SENets and VGG models had gradual increases in usable information with each additional input, while many IN and RN architectures did not produce useful results until all their features were used. Feature importance in SENets was weighted, whereas in other networks all features were treated uniformly.
The correlation between input dimension and accuracy did not hold for many of the statistical encoders, especially the linear projections encoded by H, PCA, and LDA. This is because statistical encoders tend to capture the majority of the dataset variance in the first few components, so additional components beyond that tend to capture noise (Strang, 2019). This effect was most apparent for the lSVC classifier because this model always used all input features for prediction. Models that incorporated feature selection, such as RFC, exhibited less negative correlation to the number of input features. These models did not utilize all the input features, so after proper class distinction the extraneous input variables were ignored. An important conclusion to be drawn from the 5-class results is that vectorization techniques designed for images, such as the convolutional encoders, can offer more useful low-dimensional descriptions than general dimension-reduction techniques when the problem is sufficiently complex.
4.4. Performance comparison for different scenarios
We evaluated two scenarios of image classification. The binary (2-class) classification was intended to address applied hatchery problems such as determining whether the reproductive condition of fish would justify injection with spawning hormone. Injections made outside of the window of suitable ovarian condition will result in waste of hormone and female broodstock, increasing costs for hatchery management. This binary classification was contingent on analysis of images in real time to provide a rapid yes or no answer. The 5-class classification problem represented a more detailed approach such as for research or broodstock maintenance programs in which assessments are needed throughout the year to elucidate reproductive biology or to prepare for spawning season.
Major differences in predictive accuracy were observed between the 2-class and 5-class problems. Both classification approaches had models that were able to produce 100% median accuracy for the 2-class problem. The 5-class problem, however, was more complex and required more sophisticated methods to achieve higher accuracy. Convolutional and transfer learning approaches were limited in only using SGD-trained SoftMax regression with the CNN vectorized images. The traditional models that achieved the highest scores on the 5-class problem were often different from the linear layer fined-tuned in transfer learning. For instance, the random forest classifier used a collection of uncorrelated decision trees that collectively converged or “voted” on the correct class, this produced 98% median accuracy with VGG-19 vectorization.
4.5. Considerations for model deployment
One potential method of deploying binary classifiers would be in real-time classification to assist hatchery workers in deciding whether a given fish is ready for hormone injection. A theoretical workflow would involve use of a trained machine-learning model to classify images while the fish are handled for routine ultrasound imaging (Guitreau, et al., 2012). Because images were acquired from short video segments, the acquisition method would need to output single-frame images to the model. Software usage such as this could provide hatchery workers with a tool capable of providing useful information to improve operations, and it could provide a powerful research tool for evaluation of ultrasound images in fish species. These approaches could play a role in the future for standardization across studies and species, and to allow direct comparisons of results, which is currently problematic.
Thus, the machine learning models trained during this study can be useful to aquaculture and hatchery research, and they were chosen to provide practical test scenarios to evaluate a broad array of machine learning models. An important distinction in this work was the difference between the time and resources necessary for training of the models, versus those required for actual use in classification. Of all models and vectorization methods, 17 out of 20 were able to distinguish between the binary classification of ultrasound images with > 95% classification accuracy (based on Novelo and Tiersch, 2016). Although the larger deep learning models such as VGG and RN required additional dedicated video memory and took longer to train for larger datasets and larger values of K (in K-fold cross validation), their actual use after training was less computationally demanding.
A reasonably powerful existing computer system (NVIDIA GeForce RTX 2070 SUPER with 8 GB of vRAM) was used to train the deep learning models; however, single images can be classified using a pre-trained model without the need for a computer with a dedicated graphics processing unit (GPU). Using pre-trained models offered several advantages over de novo training of new models. The process required less experience with programming and machine learning libraries, less time and effort to produce useful information, and fewer computational resources, allowing use of older hardware, especially laptops that lack a dedicated GPU. The main disadvantage of using pre-trained models was that the original training dataset could have been sufficiently different from the target dataset to diminish classification accuracy.
4.6. Team development, and data and code availability
Operation of this study utilized an interdisciplinary approach that combined expertise from various disciplines, such as reproductive biology, computer science, biological engineering, and aquaculture. Collaboration with multiple contributors was necessary to comprehend the structure of the dataset, evaluate machine models, and integrate these aspects with the intended applications. Work of this type requires multidisciplinary skill sets and would advance more rapidly if the training of humans was linked with the training of the machines. Aquaculture naturally attracts talents from multidisciplinary (and in many cases, interdisciplinary) fields, and thus it could as such provide a fertile interaction space for future work in machine learning.
This project brought together faculty members and undergraduate students to bridge the technical and applied sides of the work, and it served as an efficient training ground for all involved. Another way to advance this work would be to encourage sharing of datasets and models through established internet repositories. Therefore, the image dataset and models trained during this study will be made publicly available on two commonly used websites: UC-Irvine Machine Learning Repository (https://archive.ics.uci.edu/ml/index.php) and Kaggle (Kaggle.com) for use by others in classifying similar images and for training of new models. The code used in this study can be found online at https://github.com/clintg105/Catfish-Ultrasound-Classification.
5. Conclusions
Machine learning is an emerging technology that is being applied in many fields of research and in industry. However, the application of machine learning in aquatic sciences is only beginning. The present study demonstrated the feasibility of applying machine learning in aquaculture by developing models to classify ultrasound images for catfish ovarian development. Data preprocessing, such as cropping and contrast normalization, were found to be necessary to facilitate analysis. Traditional machine learning models with image classification achieved 100% median accuracy on the 2-class problem (with the models RN-50 and RN-152), and 96% median accuracy for the 5-class problem (with VGG-19 image vectorization). The deep learning approach for the 2-class problem had a median accuracy of > 98% for 15 models. In addition to classification of images, the opportunities to apply machine learning to other areas in aquaculture are essentially unlimited.
Finally, the present study highlights the importance of interdisciplinary teams to bridge the divide between biologists and computer programmers with the technical aspects of machine learning. Programmers can provide useful advice regarding dataset creation such as target dataset size and preferable image qualities. Biologists can provide applied challenges such as formulation of the problem, characterization of the datasets, and identification of potential applications. Public sharing of datasets used for machine learning work would greatly aid transfer learning techniques, especially for images with similar qualities and should be given high priority.
Highlights.
Machine learning can classify ultrasound images of fish ovarian development.
931 ultrasound images of catfish ovaries were used to train and evaluated models.
> 200 machine learning vectorization and model combinations were evaluated.
Classification accuracy > 99% can be achieved by machine learning.
Interdisciplinary teams facilitate machine learning development for aquaculture.
Acknowledgements
This work was supported in part by funding from the National Institutes of Health, Office of Research Infrastructure Programs (R24-OD010441 and R24-OD028443), with additional support provided by National Institute of Food and Agriculture, United States Department of Agriculture (Hatch projects LAB94420 and NC1194), a USDA NAGP-AGGRC Cooperative Agreement (Awards 58–3012-8–006 and 58–6066-8–045), Louisiana State University Research & Technology Foundation (AG-2019-LIFT-005), Louisiana Sea Grant Undergraduate Research Opportunities Program, and LSU-ACRES (Audubon Center for Research of Endangered Species) Collaborative Program. This manuscript was approved for publication Louisiana State University Agricultural Center as number 2021–241-36630.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declaration of interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467
- Barulin NV, 2019. Using machine learning algorithms to analyse the scute structure and sex identification of sterlet Acipenser ruthenus (Acipenseridae). Aquacult. Res 50, 2810–2825. [Google Scholar]
- Bayramoglu N, Kannala J, Heikkilä J, 2015. Human epithelial Type 2 cell classification with convolutional neural networks, 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 1–6. [Google Scholar]
- Bellman R, 1966. Dynamic programming. Science 153, 34–37. [DOI] [PubMed] [Google Scholar]
- Bianco S, Cadene R, Celona L, Napoletano P, 2018. Benchmark analysis of representative deep neural network architectures. IEEE Access 6, 64270–64277. [Google Scholar]
- Bureau du Colombier S, Jacobs L, Gesset C, Elie P, Lambert P, 2015. Ultrasonography as a non-invasive tool for sex determination and maturation monitoring in silver eels. Fish. Res 164, 50–58. [Google Scholar]
- Cao W, Wang X, Ming Z, Gao J, 2018. A review on neural networks with random weights. Neurocomputing 275, 278–287. [Google Scholar]
- Cao X, Liu Y, Wang J, Liu C, Duan Q, 2020. Prediction of dissolved oxygen in pond culture water based on K-means clustering and gated recurrent unit neural network. Aquacult. Eng 91, 102122. [Google Scholar]
- Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L, 2009. Imagenet: A large-scale hierarchical image database, 2009 IEEE conference on computer vision and pattern recognition. Ieee, pp. 248–255. [Google Scholar]
- Eldan R, Shamir O, 2016. The power of depth for feedforward neural networks. ArXiv. abs/1512.03965
- Frost DA, McAuley WC, Kluver B, Wastel M, Maynard D, Flagg TA, 2014. Methods and accuracy of sexing sockeye salmon using ultrasound for captive broodstock management. N. Am. J. Aquacult 76, 153–158. [Google Scholar]
- Golpour A, Broquard C, Milla S, Dadras H, Baloch AR, Saito T, Pšenička M, 2021. Determination of annual reproductive cycle in male sterlet, Acipenser ruthenus using histology and ultrasound imaging. Fish Physiol. Biochem 47, 703–711. [DOI] [PubMed] [Google Scholar]
- Guitreau AM, Eilts BE, Novelo ND, Tiersch TR, 2012. Fish handling and ultrasound procedures for viewing the ovary of submersed, nonanesthetized, unrestrained Channel catfish. N. Am. J. Aquacult 74, 182–187. [Google Scholar]
- Harmon SA, Patel PG, Sanford TH, Caven I, Iseman R, Vidotto T, Picanco C, Squire JA, Masoudi S, Mehralivand S, 2021. High throughput assessment of biomarkers in tissue microarrays using artificial intelligence: PTEN loss as a proof-of-principle in multi-center prostate cancer cohorts. Mod. Pathol 34, 478–489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He K, Sun J, 2014. Convolutional neural networks at constrained time cost
- He K, Zhang X, Ren S, Sun J, 2016. Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. [Google Scholar]
- Hernández-Ontiveros JM, Inzunza-González E, García-Guerrero EE, López-Bonilla OR, Infante-Prieto SO, Cárdenas-Valdez JR, Tlelo-Cuautle E, 2018. Development and implementation of a fish counter by using an embedded system. Comput. Electron. Agric 145, 53–62. [Google Scholar]
- Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H, 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
- Hu J, Shen L, Sun G, 2018. Squeeze-and-excitation networks, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141. [Google Scholar]
- Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K, 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint arXiv:1602.07360
- Ioffe S, Szegedy C, 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift, International conference on machine learning. Proceedings of Machine Learning Research, pp. 448–456. [Google Scholar]
- Kayhan OS, Gemert J.C.v., 2020. On translation invariance in cnns: Convolutional layers can exploit absolute spatial location, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14274–14285. [Google Scholar]
- Krizhevsky A, Sutskever I, Hinton GE, 2012. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst 25, 1097–1105. [Google Scholar]
- Lemley J, Bazrafkan S, Corcoran P, 2017. Deep learning for consumer devices and services: Pushing the limits for machine learning, artificial intelligence, and computer vision. IEEE Consumer Electronics Magazine 6, 48–56. [Google Scholar]
- Lézoray O, Charrier C, Cardot H, Lefèvre S, 2008. Machine learning in image processing Springer. [Google Scholar]
- Li D, Miao Z, Peng F, Wang L, Hao Y, Wang Z, Chen T, Li H, Zheng Y, 2021. Automatic counting methods in aquaculture: A review. J. World Aquacult. Soc 52, 269–283. [Google Scholar]
- Macrì F, Liotta L, Bonfiglio R, De Stefano C, Ruscica D, Aiudi G, 2013. Ultrasound measurement of reproductive organs in juvenile European sea bass Dicentrarchus labrax. J. Fish Biol 83, 1439–1443. [DOI] [PubMed] [Google Scholar]
- Masoudifard M, Vajhi AR, Moghim M, Nazari RM, Naghavi AR, Sohrabnejad M, 2011. High validity sex determination of three years old cultured beluga sturgeon (Huso huso) using ultrasonography. J. Appl. Ichthyol 27, 643–647. [Google Scholar]
- McGarvey LM, Ilgen JE, Guy CS, McLellan JG, Webb MAH, 2021. Gonad size measured by ultrasound to assign stage of maturity in burbot. J. Fish. Wildl. Manag 12, 241–249. [Google Scholar]
- Memiş D, Yamaner G, Tosun DD, Eryalçin KM, Chebanov M, Galich E, 2016. Determination of sex and gonad maturity in sturgeon (Acipenser gueldenstaedtii) using ultrasound technique. J. Appl. Aquacult 28, 252–259. [Google Scholar]
- Mohri M, Rostamizadeh A, Talwalkar A, 2018. Foundations of machine learning MIT press. [Google Scholar]
- Monkman GG, Hyder K, Kaiser MJ, Vidal FP, 2019. Using machine vision to estimate fish length from images using regional convolutional neural networks. Methods Ecol. Evol 10, 2045–2056. [Google Scholar]
- Munhofen JL, Jiménez DA, Peterson DL, Camus AC, Divers SJ, 2014. Comparing ultrasonography and endoscopy for early gender identification of juvenile siberian sturgeon. N. Am. J. Aquacult 76, 14–23. [Google Scholar]
- Novelo ND, 2014. A standardized ultrasonography cassification for channel catfish ovarian development Louisiana State University, Baton Rouge, Louisiana. [Google Scholar]
- Novelo ND, Tiersch TR, 2012. A review of the use of ultrasonography in fish reproduction. N. Am. J. Aquacult 74, 169–181. [Google Scholar]
- Novelo ND, Tiersch TR, 2016. Development and evaluation of an ultrasound imaging reproductive index based on the ovarian cycle of channel catfish, Ictalurus punctatus. J. World Aquacult. Soc 47, 526–537. [Google Scholar]
- Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12, 2825–2830. [Google Scholar]
- Qian Q, Jin R, Yi J, Zhang L, Zhu S, 2015. Efficient distance metric learning by adaptive sampling and mini-batch stochastic gradient descent (SGD). Mach. Learn 99, 353–372. [Google Scholar]
- Raschka S, 2018. Model evaluation, model selection, and algorithm selection in machine learning, Statistics University of Wisconsin-Madison, pp. 49. [Google Scholar]
- Razzak MI, Naz S, Zaib A, 2018. Deep learning for medical image processing: Overview, challenges and the future. Classification in BioApps, 323–350.
- Rosenblatt F, 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev 65, 386–408. [DOI] [PubMed] [Google Scholar]
- Roweis ST, Saul LK, 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326. [DOI] [PubMed] [Google Scholar]
- Sankararaman KA, De S, Xu Z, Huang WR, Goldstein T, 2020. The impact of neural network overparameterization on gradient confusion and stochastic gradient descent, Proceedings of Machine Learning Research, pp. 8469–8479. [Google Scholar]
- Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D, 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization, Proceedings of the IEEE international conference on computer vision, pp. 618–626. [Google Scholar]
- Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
- Sokolova M, Lapalme G, 2009. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag 45, 427–437. [Google Scholar]
- Srivastava S, Divekar AV, Anilkumar C, Naik I, Kulkarni V, Pattabiraman V, 2021. Comparative analysis of deep learning image detection algorithms. Journal of Big Data 8, 66. [Google Scholar]
- Stanik C, Haering M, Maalej W, 2019. Classifying multilingual user feedback using traditional machine learning and deep learning, 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW). IEEE, pp. 220–226. [Google Scholar]
- Strang G, 2019. Linear algebra and learning from data Wellesley-Cambridge Press. [Google Scholar]
- Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A, 2015. Going deeper with convolutions, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9. [Google Scholar]
- Tan M, Le Q, 2019. EfficientNet: Rethinking model scaling for convolutional neural networks in: Kamalika C, Ruslan S (Eds.), Proceedings of the 36th International Conference on Machine Learning. PMLR, Proceedings of Machine Learning Research, pp. 6105–6114. [Google Scholar]
- Tenenbaum JB, De Silva V, Langford JC, 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323. [DOI] [PubMed] [Google Scholar]
- Torgerson WS, 1952. Multidimensional scaling: I. Theory and method. Psychometrika 17, 401–419. [DOI] [PubMed] [Google Scholar]
- Torrey L, Shavlik J, 2010. Transfer learning, Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, pp. 242–264.
- Van der Maaten L, Hinton G, 2008. Visualizing data using t-SNE. J. Mach. Learn. Res 9. [Google Scholar]
- Wang Z, Hong T, Piette MA, 2020. Building thermal load prediction through shallow machine learning and deep learning. Applied Energy 263, 114683. [Google Scholar]
- Wolfram Research, I., 2021. Mathematica, Champaign, Illinois. [Google Scholar]
- Yang X, Zhang S, Liu J, Gao Q, Dong S, Zhou C, 2021. Deep learning for smart fish farming: applications, opportunities and challenges. Rev. Aquac 13, 66–90. [Google Scholar]
- Yue K, Shen Y, 2021. An overview of disruptive technologies for aquaculture. Aquac. Fish (in press) 10.1016/j.aaf.2021.04.009. [DOI]
- Zhao S, Zhang S, Liu J, Wang H, Zhu J, Li D, Zhao R, 2021. Application of machine learning in intelligent fish aquaculture: A review. Aquac 540, 736724. [Google Scholar]
- Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A, 2016. Learning deep features for discriminative localization, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921–2929. [Google Scholar]
- Zhou C, Xu D, Chen L, Zhang S, Sun C, Yang X, Wang Y, 2019. Evaluation of fish feeding intensity in aquaculture using a convolutional neural network and machine vision. Aquac 507, 457–465. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Operation of this study utilized an interdisciplinary approach that combined expertise from various disciplines, such as reproductive biology, computer science, biological engineering, and aquaculture. Collaboration with multiple contributors was necessary to comprehend the structure of the dataset, evaluate machine models, and integrate these aspects with the intended applications. Work of this type requires multidisciplinary skill sets and would advance more rapidly if the training of humans was linked with the training of the machines. Aquaculture naturally attracts talents from multidisciplinary (and in many cases, interdisciplinary) fields, and thus it could as such provide a fertile interaction space for future work in machine learning.
This project brought together faculty members and undergraduate students to bridge the technical and applied sides of the work, and it served as an efficient training ground for all involved. Another way to advance this work would be to encourage sharing of datasets and models through established internet repositories. Therefore, the image dataset and models trained during this study will be made publicly available on two commonly used websites: UC-Irvine Machine Learning Repository (https://archive.ics.uci.edu/ml/index.php) and Kaggle (Kaggle.com) for use by others in classifying similar images and for training of new models. The code used in this study can be found online at https://github.com/clintg105/Catfish-Ultrasound-Classification.
