A new artificial intelligent approach to buoy detection for mussel farming

Ying Bi; Bing Xue; Dana Briscoe; Ross Vennell; Mengjie Zhang

doi:10.1080/03036758.2022.2090966

. 2022 Jun 26;53(1):27–51. doi: 10.1080/03036758.2022.2090966

A new artificial intelligent approach to buoy detection for mussel farming

Ying Bi ^a,^CONTACT, Bing Xue ^a, Dana Briscoe ^b, Ross Vennell ^b, Mengjie Zhang ^a

PMCID: PMC11459752 PMID: 39439995

ABSTRACT

Aquaculture is an important industry in New Zealand (NZ). Mussel farmers often manually check the state of the buoys that are required to support the crop, which is labour-intensive. Artificial intelligence (AI) can provide automatic and intelligent solutions to many problems but has seldom been applied to mussel farming. In this paper, a new AI-based approach is developed to automatically detect buoys from mussel farm images taken from a farm in the South Island of NZ. The overall approach consists of four steps, i.e. data collection and preprocessing, image segmentation, keypoint detection and feature extraction, and classification. A convolutional neural network (CNN) method is applied to perform image segmentation. A new genetic programming (GP) method with a new representation, a new function set and a new terminal set is developed to automatically evolve descriptors for extracting features from keypoints. The new approach is applied to seven subsets and one full dataset containing images of buoys over different backgrounds and compared to three baseline methods. The new approach achieves better performance than the compared methods. Further analysis of the parameters and the evolved solutions provides more insights into the performance of the new approach to buoy detection.

KEYWORDS: Artificial intelligence, computer vision, aquaculture, evolutionary learning, object detection, deep learning

Introduction

In New Zealand (NZ), aquaculture is an important industry generated over $650 million NZD in annual revenue in 2020 (Aquaculture New Zealand 2020). The revenue is expected to grow exponentially over the next 14 years, i.e. reaching $3 billion by 2035 (New Zealand Government 2020). The NZ government will strategically scale up the finfish and mussel farms from inshore to 1000s of hectares in nearshore and open-ocean environments (New Zealand Government 2020). With expansion of finfish and mussel farms, it becomes increasingly important to invent automated ways to manage the farms, and reduce the operational costs, risks, and environmental footprint. This will require more precise, real-time farm intelligence which affects production and manages risks, e.g. disease management, water quality, climate change, and biofouling.

The green-lipped, or Greenshell^TM mussel (Perna canaliculus) is the most iconic and economically important aquaculture species for NZ, producing more than 33,300 tonnes and over $200 million NZD in exports per year (Fisheries New Zealand 2009). Cultivation and harvest of these mussels is a specialised process and one that could benefit from the applications of artificial intelligence (AI) solutions. Mussels are grown by suspension from longlines that in turn support thousands of metres of crop line, where the mussels attach themselves to Science for Technological Innovation (2019). A series of large plastic buoys are connected by ropes, forming a backbone structure that is anchored several metres below the surface, with each longline extending for several kilometres in length (Department of Conservation 2003). A single farm can be more than 20 hectares in size and the integrity of these buoys is critical to the mussel growth at sufficient depth and quantities.

As the mussels grow, they weigh down the buoys, submerging them. As this happens, more buoys must be added to the lines. The process for assessing when the buoys need to be replaced/added is manually conducted. Further, during storms, buoys can become dislodged from the mussel lines altogether. It is a significant but time-intensive operational task to manually check and maintain the lines of buoys on a farm. The loss of too many buoys may be detrimental to the crop and production yield. To address the need of determining when to add new buoys as the crop grows or replace lost buoys, it would be advantageous to automatically track the locations of all buoys on a mussel farm over time and determine when more buoys need to be added to the lines.

More specifically, if a camera is attached to the end of a line of buoys taking photos, a computer vision model could be applied on a low-cost chip and used to detect when a line of buoys is too low in the water or when a buoy gets dislodged. A trigger can then be sent to the farmer alerting them of the event. Due to the fact that mussel farmers often have to drive their boats significant distances to manually assess their lines, this method has the potential to substantially reduce costs for farmers. To address this, it is necessary to build an intelligent system to automate the process of monitoring the structural integrity of mussel farms. This may consist of multiple steps or tasks, e.g. detecting buoys with a single image, tracking buoys from videos, and detecting missing buoys from images or videos.

This paper will focus on the first step of building an intelligent system for mussel farming, i.e. aiming to develop a new AI approach to automatically detecting buoys from images. AI has a wide range of algorithms that mimic human mind, thinking and acting (Al-Sahaf et al. 2019). Typical algorithms include support vector machines (SVM), k-nearest neighbour (KNN), neural networks (NNs), convolutional neural networks (CNNs), reinforcement learning, and evolutionary learning such genetic programming (GP), genetic algorithms (GAs) and particle swarm optimisation (PSO) (Al-Sahaf et al. 2019; Lu 2019; Zhou et al. 2020). These methods have been widely applied to many problems in the fields of agriculture, aquaculture, education, robotics, finance, manufacturing, healthcare, and security (Lu 2019).

The buoy detection task investigated in this paper is a real-world aquaculture application. However, the task is challenging due to the following three reasons. First, the images sampled from the mussel farms have a size of $1920 \times 1080$ , containing a large number of pixels. Analysing content from such relatively such images is necessary since many buoys are small, but this will be computationally expensive. Second, an image often contains many buoys of various sizes, i.e. depending on the distance from the buoy object to the camera. This requires the method to be able to detect objects with variable sizes, particularly small-size objects. Third, real-world images often have high variations and image distortions due to different camera angles and positions, lighting, weather, and wave conditions, and different levels of contrast and noise. This further increases the difficulty of the object detection task.

This paper develops a new AI approach to buoy detection on mussel farm images from the Marlborough Sounds of NZ. The new approach includes steps of data collection and processing, image segmentation, keypoint detection, feature extraction, and classification. The data collection and processing step aims to prepare the data by sampling images from the mussel farms and labelling these images for the tasks. The image segmentation step is to segment the water part from the whole mussel farm images by using a CNN-based method, i.e. U-Net. This step can remove sub-images without buoys to reduce the computational cost for the later steps. The keypoint detection step aims to find regions of interest from the image and the feature extraction step is to extract meaningful features from the keypoints for detection. In this step, a new GP-based approach with a new representation, a new function set and a new terminal set are developed to automatically evolve image descriptors from keypoints of the large images for feature extraction. In the classification step, promising keypoints are detected as the buoy objects. The performance of the new approach is evaluated on the full dataset and several small subsets of the mussel farming images. The proposed approach achieves better performance than the compared methods.

Background and related work

Background

Convolutional neural networks (CNNs) and u-Net

CNNs are a variant of artificial neural networks (ANNs). A standard CNN typically contains convolutional layers, pooling layers, and fully connected layer. A typical example is illustrated in Figure 1. CNNs have been widely applied to many tasks such as image classification, object detection, image segmentation, and object tracking (Alzubaidi et al. 2021). U-Net is a well-known variant of CNNs for image segmentation (Ronneberger et al. 2015). It was developed for biomedical image segmentation with a small number of training images. The architecture of U-Net is similar to the word ‘U’, consisting of the left part and the right part. The left part is known as a contracting path, which follows a common CNN architecture. The right part is known as an expansive path, which appends the feature maps from different stages of the contracting path to the corresponding stage in the expensive path. This enables the model to retain the structural integrity from the detailed feature mappings in the contracting layers, resulting in much more accurate and detailed segmentation masks.

Figure 1. — Illustration of a simple CNN.

Genetic programming (GP)

GP is an evolutionary computation (EC) technique under the big umbrella of AI. It aims to automatically evolve computer programmes in a form of trees to solve different problems (Koza 1992). Unlike other EC methods, GP uses a tree-based encoding to represent solutions to a problem. An example GP tree is shown in Figure 2, where internal nodes are functions/operators and leaf nodes are features/variables or random constants. This tree is expressed as $(x_{2} / x_{3}) \times (x_{1} - 0.2)$ , which is a model for regression or constructing a feature. The GP representation can be easily modified under the tree structure to adapt to the problems being solved. GP with different representations have been developed to extract image features for classification (Bi et al. 2021b).

The main steps of using GP to solve a problem are demonstrated in Figure 3. It starts by randomly initialising a population of solutions/trees. The goodness of solutions/trees, i.e. fitness, is evaluated using a fitness function. At each iteration, selection is applied to select parents and a new population is generated from parents using genetic operators. The overall process stops until a predefined termination criterion is satisfied. The best solution/tree is returned at the end of the evolutionary process.

Related work

Image segmentation

Image segmentation aims to partition images into different segments according to the meaning of the pixels. Traditional methods include threshold-based methods, pixel classification-based methods, edge detection-based methods, etc (NR Pal and SK Pal 1993). In recent decades, CNNs particularly deep CNNs have become popular for image segmentation. Typical methods include fully convolutional network (FCN) (Shelhamer et al. 2016), U-Net (Ronneberger et al. 2015), and R-CNN (He et al. 2017). In Sultana et al. (2020), a detailed review of CNNs for image segmentation is presented.

Since U-Net is used to perform image segmentation on the mussel farm images in this paper, we provide a brief review of U-Net below. Ronneberger et al. (2015) proposed a U-Net method for biomedical image segmentation. The method achieved promising results when the number of training images is small. Steccanella et al. (2018) proposed a U-Net method for waterline detection on the images sampled by autonomous surface vehicles. In Steccanella et al. (2020), two U-Nets with different loss functions and transfer learning were investigated. The results showed that pre-trained U-Net with fine-tuning on the new dataset can improve the performance. In McLeay et al. (2021), preliminary work on using U-Net for waterline detection on mussel farm images was conducted and the U-Net method achieved promising results.

Feature detection and description in object detection

Feature detection and description is important for solving object recognition or detection tasks. Feature detection aims to detect representative information by finding or locating keypoints, i.e. corners, blobs, lines/edges, and morphological regions (Ma et al. 2021). Typical keypoint detection and description methods include scale-invariant feature transform (SIFT), speeded up robust features (SURF), and oriented FAST and rotated BRIEF (ORB) (Tareen and Saleem 2018).

Ma et al. (2021) provided a detailed review of feature detection and description methods for image matching. The methods include hand-crafted or deep-learning-based feature detectors and descriptors. In Lepetit and Fua (2006), a real-time keypoint recognition method based on randomised trees was proposed for real-time detection. The task is to build models that can classify the keypoints into the object and non-object classes. The keypoint-based approach is important in a deep learning-based object detection system. In Dong et al. (2020), CentripetalNet was proposed to match corners using centripetal shift. The method achieved better performance than many existing detectors. Wu et al. (2020) developed PolarNet to detect high-quality keypoints and learn features of corner points for localising objects. This method was anchor-free and achieved promising results in a large object detection task.

However, the task of buoy detection has rarely been explored using existing object detection methods including both traditional methods and deep-learning-based methods. This task is difficult due to the high variations across the real-world images and the different sizes of objects in images. The images are often sampled at different times under different environments. The buoys in the images often have different sizes, which makes the task even more difficult. Traditional and deep learning methods can be explored to solve this task. As a starting point, motivated by Lepetit and Fua (2006), we will develop an object detection method based on keypoint detection. Compared with deep learning methods, this method is more transparent and easy to understand.

GP for image feature extraction

Yan et al. (2021) proposed a GP to automatically extract informative features for fish image classification. This method achieved better results than traditional methods on one well-known dataset. Bi et al. (2018) developed a GP method to automatically learn local features from the detected regions and global features from the whole image for classification. This method achieved better performance than traditional methods on eight datasets. Price et al. (2019) developed GOOFeD based on GP to automatically evolve a set of image operators to generate informative features for classification. In Bi et al. (2021a), a GP method was developed to automatically learn effective features from low-quality images by simultaneously performing region detection and feature extraction. This method achieves better performance than the compared methods. The results showed that the GP method was less sensitive to the distortion factors than the compared traditional methods. Bi et al. (2022a) developed a dual-tree GP method to learn features for few-shot image classification. In Bi et al. (2021c), a set of image-related operators were used in GP as internal nodes to evolve models that can extract variable numbers and types of features for image classification. This method achieved better performance than many competitive methods on 12 image datasets.

The existing methods show the promise of GP in learning informative features for classification. However, these methods have seldom been applied to images belonging to aquaculture applications, including real-world object detection tasks. This paper will further explore the potential of GP in real-world aquaculture applications.

The datasets

In this study, we use real-world datasets, which are captured from the mussel farm in the Marlborough Sounds of NZ, as shown in Figure 4. A set of different videos were taken from this farm at different viewpoints, lighting, weather conditions, and times. Images are extracted from these videos to create the datasets. In this study, we establish seven small datasets of images from these videos according to the viewpoints and the conditions so that we can analyse which types of images are easy to detect buoys and which are difficult. This study aims to not only effectively detect buoys but also comprehensively understand and explore this real-world task. Note that the task of buoy detection has rarely been solved by AI techniques and a detailed analysis is expected to be provided by our study.

All the images are manually labelled using the open-source computer vision annotation tool (CVAT n.d.). The labelling tasks include labelling the water and non-water parts and labelling all the visible buoys in images using rectangle regions, as shown in Figure 5. The labelling tasks are labour-intensive and time-consuming, i.e. labelling one image needs more than 10 minutes because an image contains many buoys and some of buoys are very small. This is an important reason for using small datasets.

Figure 5. — Example images and the corresponding labels/masks. Note that the original images are compressed.

In the experiments, we randomly split each subset/dataset into the training set, the validation set, and the test set according to the ratios: 60%, 20% and 20%. The detailed information is listed in Table 1. The last row shows the dataset with all the images. The size of all the images is $1080 \times 1920$ . The training and validation sets are used in model training and evolutionary learning processes. The test set is used to measure the performance of different methods. Example images are illustrated in Figure 6. It is clear that there are many small buoys in the water and the variations of images involve viewpoint, weather condition, sunshine reflection, contrast, etc. These factors make the task very challenging.

Table 1.

Summary of data subsets.

Dataset	Training set	Validation set	Test set
Subset 1	23	8	8
Subset 2	23	8	8
Subset 3	20	7	7
Subset 4	15	5	5
Subset 5	14	5	4
Subset 6	22	7	7
Subset 7	27	9	9
All	144	49	48

Open in a new tab

Figure 6. — Example images with labels from subsets 1-7. Note that the original images are compressed.

The proposed approach to buoy detection

This section describes the proposed approach to buoy detection. The overall approach is comprised of four main steps, i.e. data collection and processing, image segmentation, keypoint detection and feature extraction, and classification. The overall approach/system is shown in Figure 7. The test process based on the learned models is shown in Figure 8.

Figure 7. — The overall buoy detection approach/system. It consists of steps of image collection and data processing, image segmentation, keypoint detection and feature extraction, and classification. In the image segmentation step, the pre-trained U-Net method is fine-tuned by using the training set and the validation set (for early stopping). In the third step, the keypoint detection method is applied and the training set of labelled keypoints is used to guide the search for GP to find the optimal tree/programme for feature extraction. In the classification step, a classifier is trained by using the transformed training set of keypoints to classify the keypoints in the testing images.

Figure 8. — The testing process of the proposed system.

(1)
In the first step, image data are collected from the mussel farms of NZ and are labelled manually. The data are used to form several small data subsets and each subset is split into a training set, a validation set, and a test set.
(2)
The second step is to segment the images into the water and non-water parts, which aims to remove sub-images that do not include buoys from the images and reduce the computational cost of the later steps. U-Net is used in this step for image segmentation. At this step, the full images are transformed into ‘smaller’ images with only having the water part.
(3)
The third step is keypoint detection and feature extraction. Keypoint detection aims to find potential regions that may contain buoys from the real-world image, which is performed by using existing keypoint detection methods. Feature extraction aims to extract informative features from the keypoints in order to detect whether each keypoint is buoy or not. At this step, a new GP method is developed to automatically evolve descriptors that can extract informative features from the keypoints for effective detection.
(4)
The final step is classification, which aims to classify each keypoint in the testing images into the buoy or non-buoy class using the features extracted by the best GP tree. The classification is performed using SVM. Several measures are used to evaluate the performance of the new approach.

Image segmentation using u-Net

In this step, a pre-trained CNN method, i.e. U-Net (Steccanella et al. 2020), is used to perform segmentation. The task is to classify every pixel into the water and non-water classes to generate a binary segmentation mask. The the whole method is shown in Figure 9. It starts by resizing the large input image with a size of $1080 \times 1920$ to a size of $160 \times 160$ , which refers to Steccanella et al. (2020). The image is fed into U-Net to generate a segmentation mask. The mask is resized to the original size of $1080 \times 1920$ and applied to the input image. The final image is the one with only having the water part, as shown in Figure 9.

Figure 9. — Overall process of image segmentation using U-Net.

The loss function in U-Net is dice similarity coefficient (DSC) (Milletari et al. 2016) defined as

L_{D S C} = \frac{2 \cdot T P}{2 \cdot T P + F P + F N}

(1)

where TP denotes the number of true positives, FP denotes the number of false positives, and FN denotes the number of false negatives. The true segmentation mask is used to obtain these values. Compared with the commonly used loss function a.k.a. binary cross entropy, DSC can better handle the class imbalance problem and typically achieve better performance (Ronneberger et al. 2015).

At this step, the images of the training set are used to fine-tune the pre-trained U-Net and the validation set is used for model selection. Once the model is trained, all the new (testing) images are transformed by the model by removing the water part.

Keypoint detection

This study focuses on the real-world mussel farm images with the buoy objects of different sizes. To efficiently locate these buoys, keypoint detection is employed to find small regions that may contain a buoy. There are several commonly used keypoint detection methods, such as Harris Corner Detection, SIFT, SURF, ORB, and BRISK (Tareen and Saleem 2018; Ma et al. 2021). We tested three different methods, as shown in Figure 10. Finally, we selected ORB because it is fast and it can detect keypoints that contain corners or shapes of the buoys.

Figure 10. — Comparisons of three different keypoint detection methods on the mussel farm images. The red point in the image denotes the detected keypoint from the images.

Evolving descriptor using GP

Although ORB can describe features from the keypoint, the features may not be effective for classifying whether the keypoint is buoy or not. To address this, we develop a new GP method to automatically evolve descriptors that can extract informative features from the keypoint to achieve effective classification.

Solution representation

The solution/tree representation of the GP method is based on strongly typed GP (STGP) (Montana 1995), as shown in Figure 11. The overall structure consists of six different layers, i.e. input, region selection, image filtering/pooling, feature extraction, feature concatenation, and output. The input layer represents the input of the GP tree/programme. The region selection layer aims to select the promising regions around the keypoint. The image filtering/pooling layer aims to process the regions to generate informative features. The feature extraction layer extracts a set of features from the regions. The feature concatenation layer concatenates the extracted features to form a feature vector. The output layer represents the output features of the GP tree/programme. Overall, the input of the GP programme/tree is a keypoint, and the output of the GP programme/tree is a set of features, which are used for classification. The example programme/tree in Figure 11 shows that the new GP method can use different functions to build a tree according to the programme structure.

At the different layers, a set of functions are employed for different purposes. To make the GP method evolve trees/programmes with various depths or sizes, the image filtering/pooling layer and the feature concatenation layer have a flexible tree depth, indicating that there are multiple nodes for these two layers in a GP tree.

Terminal set and function set

The terminal set is comprised of terminals that can be used to build leaf nodes of GP trees/programmes. All the terminals, i.e. Img, kp, h, p, t, v, σ, $o_{1}$ , $o_{2}$ , $k_{1}$ , and $k_{2}$ are listed in Table 2. Img denotes the images, where the buoys will be detected from. It is a two-dimensional array and each value represents one pixel. kp denotes the coordinate of the keypoint, i.e. the X axis and the Y axis. Note that the coordinator of the top-left pixel is (0, 0). h represents the height and p denotes the position of the region selected by the RS function. t and v are two parameters in the Gabor function related to θ and f. σ denotes the standard deviation value in the filters based on the Gaussian function. $o_{1}$ and $o_{2}$ represent the order of the Gaussian derivatives. $k_{1}$ and $k_{2}$ represent the kernel size of the MaxP function, i.e. $k_{1} \times k_{2}$ . Except for Img and kp, the values of other terminals can be automatically selected during the evolutionary process.

Table 2.

Terminal set of the GP approach.

Terminal	Type	Value range	Description
Img	array	$(0, 1)$	The normalised gray-scale image
kp	coordinate	$[0, 1080 or 1920]$	The coordinate of a keypoint, including two values
h	integer	$[5, 300]$ with a step of 5	The height of the selected region. The width of the region is 1.5h
p	integer	$[0, 4]$	The position of the selected region.
t	integer	$[0, 7]$	The parameter for the orientation θ of the Gabor filter, i.e. $θ = π t / 8$
v	integer	$[0, 4]$	The parameter for the frequency f of the Gabor filter, i.e. $f = π / 2 {\sqrt{2}}^{v}$
σ	integer	$[1, 3]$	The standard deviation of the Gaussian function
$o_{1}$ , $o_{2}$	integer	$[0, 2]$	The order of the Gaussian derivatives along the X and Y axis
$k_{1}$ , $k_{2}$	integer	${2, 4}$	The kernel size in the MaxP function

Open in a new tab

Table 3 lists all the functions in the function set, including region selection functions, image filtering/pooling functions, feature extraction functions, and feature concatenation functions.

Table 3.

Image filtering and pooling functions.

Function	Input	Output	Description
Feature concatenation functions
$F e a C o n 2$	2 V ector	V ector	Concatenate two feature vectors into one feature vector
$F e a C o n 3$	3 V ector	V ector	Concatenate three feature vectors into one feature vector
$F e a C o n 4$	4 V ector	V ector	Concatenate four feature vectors into one feature vector
Feature extraction functions
Hist	Region	V ector	Extract 64 histogram features
DSIFT	Region	V ector	Extract 128 features from the region using dense SIFT
LBP	Region	V ector	Extract 59 uniform LBP features. The value of the radius is 1.5 and the number of neighbours is 8 in LBP
Image filtering/pooling functions
MaxP,	Region, $k_{1}$ , $k_{2}$	Region	Perform max-pooling with a kernel size of $k_{1} \times k_{2}$
Sobel	Region	Region	Perform $3 \times 3$ Sobel filtering
$S o b e l_X$	Region	Region	Perform $3 \times 3$ Sobel filtering along the horizontal direction
$S o b e l_Y$	Region	Region	Perform $3 \times 3$ Sobel filtering along the vertical direction
Gau	Region, σ	Region	Perform Gaussian filtering with standard deviation σ
GauD	Region, σ, $o_{1}$ , $o_{2}$	Region	Calculate the derivatives of Gaussian filter with standard deviation σ and orders ( $o_{1}$ , $o_{2}$ )
Gabor	Region, θ, f	Region	Perform Gabor filtering with θ orientation and f frequency
Lap	Region	Region	Perform $3 \times 3$ Laplacian filtering
$L o G 1$	Region	Region	Perform Laplacian of Gaussian filtering with standard deviation 1
$L o G 2$	Region	Region	Perform Laplacian of Gaussian filtering with standard deviation 2
Median	Region	Region	Perform $3 \times 3$ median filtering
Mean	Region	Region	Perform $3 \times 3$ mean filtering
Min	Region	Region	Perform $3 \times 3$ min filtering
Max	Region	Region	Perform $3 \times 3$ max filtering
$L B P_F$	Region	Region	Generate LBP image
$H O G_F$	Region	Region	Generate HOG image
Region selection functions
RS	Img, kp, h, p	Region	Select a region of size $h \times 1.5 h$ around the keypoint kp at position p from the input image Img

Open in a new tab

Region selection functions: The RS function is developed to select a region around the keypoint from the input image. It has four inputs, i.e. Img, kp, h, and p. Figure 12 illustrates how this function is used select a region around the keypoint. The region of the keypoint is controlled by the parameters h and p, i.e. h denotes the height and p denotes the position of the region. The size is $h \times 1.5 h$ . There are five possible regions around one keypoint, as shown in Figure 12. Each RS function will select one of these regions for feature extraction in the later process. The proposed method can automatically select the regions for effective feature extraction.

Figure 12. — Illustration of the region selection function RS.

Image filtering/pooling functions: These functions include commonly used filters, max-pooling, and other functions. These functions aim to process the selected regions around the keypoint and generate better regions/images for feature extraction. These functions are MaxP, Sobel, $S o b e l_X$ , $S o b e l_Y$ , Gau, GauD, Gabor, Lap, $L o G 1$ , $L o G 2$ , Median, Mean, Min, Max, $L B P_F$ , and $H O G_F$ . More details of these functions are presented in Table 3. The GauD, Sobel, $S o b e l_X$ , $S o b e l_Y$ , Lap, $L o G 1$ , and $L o G 2$ functions can detect edge or flat area from the image. The MaxP function without padding can extract meaningful features and reduce the size of the image. The Gau, Median, Mean, Min, and Max functions can reduce the noise from the image. The Gabor, $L B P_F$ , and $H O G_F$ functions can generate different features.

Feature extraction functions: Three functions, i.e. Hist, DSIFT and LBP are used for extracting informative features. These functions are commonly used image descriptors (Bi et al. 2021b), which generate statistic features, appearance features, and texture features of the objects in the images, respectively. The input of these functions is a region/image, and the output is a set of features. These methods generate a different number of features. The GP method can automatically select those functions to find the most effective features for performing the task.

Feature concatenation functions: The feature concatenation functions aim to concatenate the features extracted from different internal nodes of GP trees to form a feature vector. The three functions are $F e a C o n 2$ , $F e a C o n 3$ , and $F e a C o n 4$ , which take two, three and four feature vectors as inputs and return one feature vector. These functions can be internal nodes or root node of GP trees.

Fitness function

Fitness function is used to evaluate the goodness of GP trees/programmes in solving a problem. In the GP method, the fitness function is balanced accuracy defined as

B a l_A c c u r a c y = 0.5 \times \frac{T P}{T P + F N} + 0.5 \times \frac{T N}{T N + F P}

(2)

where the value range of the fitness function is $[0, 1]$ .

To calculate the fitness value, we use SVM to build classifiers for classifying the keypoints since it is commonly used in GP for image classification (Bi et al. 2022a 2022b, 2021d). In fitness evaluations, the training set is used to train SVM classifiers, and the validation set is used to calculate the balanced accuracy. In this process, keypoints of each image in the two subsets are detected and features of these keypoints are described by using the GP tree. The features are normalised using the min-max normalisation method and then fed into SVM for training and testing. To balance the performance and the computational cost, 100 keypoints from each image are detected for fitness evaluation.

Overall learning/training algorithm

The overall algorithm of GP is described in Algorithm 1. It starts by constructing the evolutionary training and test sets for GP learning. In this process, ORB is used to detect keypoints from each image in the training set and the validation set, and a class label is assigned to each keypoint. If the keypoint is located at the buoy regions, it will be labelled as positive. Otherwise, it will be labelled as negative. Then a population of GP trees are initialised using the tree generation method. The population is evaluated using the fitness function, as shown in Lines 4–10 in Algorithm 1. At each generation, a new population is generated using elitism, crossover and mutation operators and evaluated using the fitness function, as described in Lines 14–20 in Algorithm 1. The elitism operator copies a set of the best trees into the new population. The crossover operator swaps the branches/subtrees of two selected trees (parents) to generate two new trees (offspring). The mutation operator replaces a randomly selected branch/subtree of a tree with a new randomly generated branch/subtree to generate a new tree (offspring). These operations are illustrated in Figure 7. The overall evolutionary learning process of GP will be terminated if the maximal number of generations is reached. The tree with the highest fitness value is returned.

Algorithm 1: GP for descriptor evolving/Learning

Input : $D_{t r a i n}$ : the training set; $D_{v a l}$ : the validation set.

Output: $B e s t_t r e e$ : the best GP tree.

1 $K_{e t r} = {X_{e t r}, Y_{e t r}} \leftarrow$ Detect all the keypoints from the images in the training set using ORB and label them to form the evolutionary training set;

2 $K_{e t e} = {X_{e t e}, Y_{e t r}} \leftarrow$ Detect all the keypoints from the images in the validation set using ORB and label them to form the evolutionary test set;

3 $P_{0} \leftarrow$ Initialise a population of GP trees according to the solution representation;

4 for each tree t in $P_{0}$ do

5 Use tree t to extract features for each keypoint in $K_{e t r}$ and $K_{e t e}$ ;

6 Normalise $K_{e t r}$ and $K_{e t e}$ ;

7 Train SVM classifiers uing $K_{e t r}$ ;

8 $B a l_A c c u r a c y \leftarrow$ test the SVM classifiers on $K_{e t e}$ and calculate the balanced accuracy;

9 Set $B a l_A c c u r a c y$ as the fitness value of tree t;

10 end

11 Update $B e s t_t r e e$ ;

12 $g \leftarrow 0$ ;

13 while g<G do

14 $I \leftarrow$ select the best trees from $P_{g}$ using elitism operator;

15 $S \leftarrow$ select a number of trees from $P_{g}$ using tournament selection;

16 $O_{g + 1} \leftarrow$ generate new trees (offspring) from S using subtree crossover and subtree mutation operators;

17 Evaluate the fitness of each tree t in $O_{g + 1}$ according to line 5-9;

18 $P_{g + 1} \leftarrow O_{g + 1} \cup I$ ;

19 Update $B e s t_t r e e$ ;

20 $g \leftarrow g + 1$ ;

21 end

22 Return $B e s t_t r e e$ .

Open in a new tab

Classification and performance measure

After the evolutionary process, the best GP tree is used to classify the keypoints of images in the test set. In this process, the training set and the test set in Table 1 are used. In this process, 500 keypoints are detected from each image in the training set and the test set. The features of the keypoints are extracted using the best GP tree and normalised using the min-max normalisation method. All the keypoints of the training set are used to train the SVM classifiers, which are used to classify all the keypoints of the test set. To evaluate the performance, four well-known performance measures, i.e. balanced accuracy, precision, recall, and F1 score, are used.

Experiments

Comparison methods

The new method is compared with three methods to show its effectiveness. The first method is ORB, which uses 32 features to describe each keypoint for classification (Rublee et al. 2011). The second method is ORB+FE, which uses the ORB, LBP, and SIFT features for classification. The LBP and SIFT features are extracted from the regions nearby the keypoint with a window size of $20 \times 30$ . It is difficult to manually set a suitable size because the buoy objects have various sizes. We tried several sizes and find this size is the best. In classification, these features are normalised using min-max normalisation before feeding into the SVM classification algorithm. The remaining one method is CNN with five layers (CNN). The CNN method is comprised of two convolutional layers with 32 and 64 filters respectively, one max-pooling layer, and two fully connected layers with 128 and 2 neurons, respectively. This CNN method is used for feature extraction and classification.

Parameter settings and implementations

In the experiments, the parameter settings for U-Net are based on Steccanella et al. (2020) and McLeay et al. (2021). The implementation of U-Net uses the Pytorch package. The optimiser is Adam with a learning rate of 0.001, the exponential decay rate for the first momentum is 0.9, the exponential decay rate for the second momentum is 0.999, and the batch size is 32. Early stopping is employed according to the loss value on the validation set to avoid overfitting, i.e. the training will terminate if the loss value is not changed in 20 epochs.

In ORB, the number of keypoints is manually set, i.e. ranging from 100 to 1000 in this study. The settings of other parameters are the default ones in the OpenCV-Python package for simplicity. The number of keypoints is an important parameter in this method. More experimental analysis of it will be provided in the following section.

For the GP method, we use the commonly used parameter settings (Bi et al. 2022a; Peng et al. 2021), as listed in Table 4. The GP method has been executed 10 times using different random seeds, following the convention of the EC community.

Table 4.

GP parameters.

Parameter	Value
Population size	100
Maximal number of generations	50
Selection	Tournament (size $= 5$ )
Initialisation	Ramped half-and-half
Initial tree depth	3–6
Maximal tree depth	8
Elitism rate	0.01
Crossover rate	0.8
Mutation rate	0.19

Open in a new tab

For the CNN methods, the number of epochs is set to 50 since the methods with this setting shows convergence and achieves high accuracy. The batch size is 128 and the optimiser is Adam. The implementation of CNNs is based on Keras. The training set size is the same as that of GP.

Results and discussions

Segmentation results

The image segmentation results obtained by U-Net on the seven data subsets and the full dataset are presented in Table 5. The results show that U-Net performs well on these seven subsets by gaining high accuracy and F1 scores, i.e. over 97%. The results indicate that U-Net can well segment the images into the water and non-water classes. Example images in Figure 13 also confirm that U-Net is effective for image segmentation.

Table 5.

Segmentation results (%) obtained by the u-Net method on the seven subsets and the full dataset.

Subset	Balanced Acc.	Precision	Recall	F1 score
1	95.49	95.93	99.98	97.91
2	96.18	96.82	99.93	98.34
3	95.58	96.03	99.79	97.87
4	98.38	98.61	99.97	99.28
5	97.35	97.29	1.0	98.62
6	96.92	97.75	99.60	98.66
7	96.21	96.41	99.71	98.03
All	96.56	96.86	99.89	98.35

Open in a new tab

Figure 13. — Example images to show the segmentation results obtained by U-Net.

Object classification results

Classification results

The classification results of the four methods are presented in Table 6. The maximal, mean and standard deviation values of the results obtained by ORB+GP are reported. For better comparisons, the bar plots of the results are shown in Figure 14.

Table 6.

Buoy classification results (%) obtained by ORB, ORB+FE, ORB+CNN, and ORB+GP on the seven subsets and the full dataset.

Method	Balanced Acc.	Precision	Recall	F1 score
Subset 1
ORB	73.20	32.96	72.64	45.34
ORB+FE	78.28	32.17	90.38	47.45
ORB+CNN	76.03	44.47	66.67	53.35
ORB+GP(best)	93.82	91.14	89.56	89.24
ORB+GP(mean ± std.)	90.68 ± 0.02	78.43 ± 0.08	84.00 ± 0.03	80.88 ± 0.05
Subset 2
ORB	71.90	16.89	65.75	26.87
ORB+FE	90.19	34.06	92.52	49.79
ORB+CNN	84.70	65.73	72.03	68.74
ORB+GP(best)	93.01	90.41	86.62	87.25
ORB+GP(mean ± std.)	89.68 ± 3.05	75.89 ± 11.38	80.27 ± 5.89	77.61 ± 8.01
Subset 3
ORB	64.11	79.98	62.36	70.08
ORB+FE	66.54	78.14	85.26	81.55
ORB+CNN	66.77	82.17	63.45	71.61
ORB+GP(best)	86.80	86.81	97.05	89.39
ORB+GP(mean ± std.)	82.88 ± 2.84	76.96 ± 5.70	93.23 ± 2.62	84.18 ± 3.56
Subset 4
ORB	61.71	65.50	56.64	60.75
ORB+FE	68.84	65.61	90.43	76.05
ORB+CNN	77.50	74.15	90.81	81.64
ORB+GP(best)	94.22	97.06	94.57	93.82
ORB+GP(mean ± std.)	91.27 ± 0.02	91.12 ± 0.03	89.81 ± 0.03	90.42 ± 0.02
Subset 5
ORB	64.18	76.67	59.11	66.76
ORB+FE	68.84	74.65	89.86	81.55
ORB+CNN	78.55	80.66	94.04	86.84
ORB+GP(best)	89.92	89.93	91.48	89.50
ORB+GP(mean ± std.)	83.37 ± 2.70	74.42 ± 8.55	86.95 ± 4.60	79.75 ± 4.42
Subset 6
ORB	50.20	70.75	46.72	56.28
ORB+FE	59.11	74.92	92.39	82.74
ORB+CNN	65.72	77.85	94.79	85.49
ORB+GP(best)	86.46	87.91	93.50	90.02
ORB+GP(mean ± std.)	79.96 ± 3.90	77.81 ± 6.36	90.29 ± 2.04	83.47 ± 4.06
Subset 7
ORB	46.98	57.48	43.09	49.26
ORB+FE	64.39	69.01	93.37	79.36
ORB+CNN	61.71	67.08	95.82	78.91
ORB+GP(best)	91.73	93.13	93.70	92.16
ORB+GP(mean ± std.)	88.14 ± 2.11	85.05 ± 5.10	91.34 ± 1.20	87.99 ± 2.59
All
ORB	58.08	49.50	45.95	47.65
ORB+FE	78.87	23.70	66.87	34.99
ORB+CNN	76.71	68.29	87.96	76.88
ORB+GP(best)	93.48	91.04	93.07	91.61
ORB+GP(mean ± std.)	91.58 ± 1.66	88.67 ± 2.54	90.26 ± 1.95	89.45 ± 2.14

Open in a new tab

From Table 6 and Figure 14, it can be found that ORB+GP achieves higher balanced accuracy, precision, recall, and F1 score than ORB, ORB+FE and ORB+CNN. The results show that the features learned by GP are more effective than the features described by ORB and the FE method in classifying the keypoints. Comparing ORB with ORB+FE, it can be found that the features extracted by FE are more effective than the features extracted by ORB. The results also show that GP is more effective than CNN for feature extraction and classification on the mussel farm images. The results show that ORB and ORB+FE have a large number of false positives and/or false negatives so that their precision and/or recall values are very low. The ORB+CNN method has low balanced accuracy and precision but high recall and F1 score, meaning it obtains a large number of false positives and a small number of false negatives. Compared with these three methods, ORB+GP achieves better results, which shows that ORB+GP can correctly classify the keypoints in the buoy and non-buoy classes. Overall, the proposed ORB+GP method can achieve promising performance on this task.

Analysis of the number of keypoints

In the proposed method, the number of keypoints is an important parameter. Specifically, the number of keypoints denotes how many instances to be classified into the buoy and non-buoy classes. To analyse the effect, ORB is used to detect different numbers of keypoints from the test image and the trained classifier is applied for classifying those keypoints into the buoy and non-buoy classes. In the classifier training, 500 keypoints of each image are used. The experiments are conducted on Subset 4. For ORB+GP, the best tree from the 10 runs (shown in Figure 17) is used.

Figure 17. — Visualisation of an example tree evolved by the GP method on Subset 4.

The results are presented in Table 7. The results show that the performance of ORB+GP is better and more stable than ORB, ORB+FE and ORB+CNN when different numbers of keypoints are used. ORB+GP achieves over 90% balanced accuracy in each case, while ORB and ORB+FE achieve less than 70% balanced accuracy and ORB+CNN achieves less than 80% balanced accuracy. From Figure 15, it can be found that the F1 score values of the three baseline methods are significantly decreased when the number of keypoints is increased. Compared with these three three methods, the F1 score value obtained by ORB+GP is more stable and slightly decreased when using a large number of keypoints. The results show that ORB+GP can achieve high accuracy using different numbers of keypoints.

Table 7.

Results of ORB, ORB+FE and ORB+GP using different numbers of keypoints from the test images of subset 4.

Method	Balanced Acc.	Precision	Recall	F1 score
#Keypoints $= 100$
ORB	68.47	92.43	57.14	70.62
ORB+FE	53.60	82.32	99.75	90.20
ORB+CNN	69.10	88.22	93.63	90.84
ORB+GP	98.96	98.92	99.28	99.10
#Keypoints $= 200$
ORB	65.38	84.26	56.01	67.29
ORB+FE	57.00	73.92	94.20	82.84
ORB+CNN	68.53	81.05	92.21	86.27
ORB+GP	96.35	97.49	95.46	96.47
#Keypoints $= 400$
ORB	61.84	69.63	55.20	61.58
ORB+FE	63.50	64.96	91.89	76.11
ORB+CNN	74.57	73.87	93.18	82.41
ORB+GP	94.90	95.41	93.90	94.65
#Keypoints $= 500$
ORB	61.71	65.50	56.64	60.75
ORB+FE	68.84	65.61	90.43	76.05
ORB+CNN	74.42	67.90	89.67	77.28
ORB+GP	94.22	95.45	92.25	93.82
#Keypoints $= 600$
ORB	61.89	62.38	56.43	59.26
ORB+FE	67.56	61.48	88.02	72.40
ORB+CNN	72.06	68.57	79.01	73.42
ORB+GP	94.59	95.96	92.30	94.10
#Keypoints $= 800$
ORB	61.94	57.47	56.56	57.01
ORB+FE	67.21	56.33	87.23	68.46
ORB+CNN	74.23	65.28	82.70	72.97
ORB+GP	94.23	93.26	92.97	93.11
#Keypoints $= 1000$
ORB	62.62	53.90	56.37	55.11
ORB+FE	65.98	50.95	84.45	63.56
ORB+CNN	76.60	63.05	85.51	72.59
ORB+GP	93.37	92.64	91.35	92.00

Open in a new tab

Figure 15. — Comparisons of the three methods in terms of balanced accuracy and F1 score.

Figure 16 shows the images with keypoints that have been classified into the buoy and non-buoy classes using the three different methods. The green rectangle regions show the main differences in the results obtained by these three methods. In the results of ORB using 100 keypoints, the small green rectangle region (i.e. region 1) shows that a large number of buoy keypoints are classified into the non-buoy class. Compared with ORB, ORB+FE, ORB+CNN and ORB+GP classify most of the buoy keypoints into the correct classes in green region 1. In green region 2, ORB, ORB+FE, and ORB+CNN classify non-buoy keypoints into the buoy class. In contrast, ORB+GP classifies all the non-buoy keypoints correctly. Similarly, when the number of keypoints is 1000, ORB+GP can correctly classify non-buoy keypoints into the non-buoy class, while the three baseline methods classify a lot of non-buoy keypoints into the wrong class. Figure 16 also shows detecting a small number of keypoints may miss some very small buoys in the images, while a large number of keypoints may cover many false positive regions, which also increases the computational cost. Therefore, it should be careful to set the number when detecting keypoints from the image.

Analysis of example GP trees

In this section, the best trees evolved by the GP method are analysed to show what and how features are extracted. The best tree from Subset 4 is visualised in Figure 17. The example tree has four branches to extract a combination of SIFT and LBP features from keypoints. The left branch uses the RS function to select a $290 \times 435$ ( $1.5 \times 290$ ) region around the keypoint, i.e. the keypoint is the centre of the region. The region is processed by the Median, $L o G 2$ , and Max operators and then the DSIFT features are extracted. The second branch (from the left to right) of the example tree uses the LBP function to extract features from the selected $295 \times 442$ region, which is the top-left region of the keypoint. The remaining two branches use a number of image filtering functions to process the detected region and extract DSIFT features. Overall, this example programme extracts 443 (59+128×3) features from each keypoint. The analysis shows that the GP method can automatically evolve programmes/trees with variable sizes and shapes to extract a flexible number of features for effectively solving a task.

Summary

By analysing the results of ORB+GP, the following observations are summarised.

(1)
ORB+GP can achieve high balanced accuracy, precision, recall, and F1 score on these seven subsets and the full dataset. The main reason is that ORB+GP can automatically select regions with variable sizes from the keypoint and find the most effective features for classifying whether the keypoint is a buoy or not.
(2)
The overall process of ORB+GP is transparent and it is easy to analyse which steps affect the performance. In addition, ORB+GP evolves solutions that have potentially high interpretability. From the GP trees, it is clear what features are extracted and effective for the task.
(3)
ORB+GP only requires a few training images to perform the task. This paper focuses on using small data because labelling the images is time-consuming and labour-intensive. The new approach can provide solutions in this scenario.
(4)
ORB+GP can obtain promising results on most images. As shown in Figure 18, ORB+GP can find small and large buoys from the images and correctly classify them into the corresponding classes in most cases.
(5)
ORB+GP fails when the images contain strong capillary waves and have low contrast, as shown in Figure 18. ORB+GP relies on the keypoint detection method. If the keypoint detection cannot detect the buoys as keypoints, these buoys will be missed in the later classification step, as shown in Figure 19.

Figure 18. — The images where the ORB+GP method achieves promising results. The red point denotes the predicted positive/buoy class and the yellow point denotes the predicted negative/non-buoy class.

Figure 19. — The images where the proposed ORB+GP method fails.

Conclusions

This paper proposed a new AI approach to automatically detect buoys from mussel farm images of NZ. The overall approach included the steps of data collection and preprocessing, image segmentation, keypoint detection, feature extraction, and classification. Different methods were employed to perform corresponding tasks. Particularly, U-Net was applied to image segmentation and a GP method was developed to automatically evolve image descriptors for extracting features from keypoints for classification. The new approach was compared with three baseline methods on seven data subsets and the full dataset. The results showed that the new approach achieved promising results.

Additional to achieving high accuracy, the new approach has merits, e.g. the whole process is transparent, and the evolved solutions by GP have potential high interpretability. However, it has some limitations since it may fail on some images with strong capillary waves and low contrast, where the keypoint detection method fails. In the future, we will further improve this method by developing a more accurate keypoint detection method or developing an effective object detection method without keypoint detection, e.g. a deep learning method based on CNN. However, since the dataset is small, effective CNN architectures and training strategies such as transfer learning will be necessary for solving this task. In the future, we will explore these directions in order to develop more effective methods for buoy detection.

Acknowledgments

We would like to thank Max Scheel at Cawthron Research Institute for providing datasets, and Chris Cornelisen for his guidance as the Leader of SfTI Spearhead Project on Precision Farming.

Funding Statement

This work was supported in part by the Science for Technological Innovation National Science Challenge (SfTI) fund under contract 2019-S7-CRS and MBIE Data Science SSIF Fund under the contract RTVU1914.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

Al-Sahaf H, Bi Y, Chen Q, Lensen A, Mei Y, Sun Y, Tran B, Xue B, Zhang M.. 2019. A survey on evolutionary machine learning. Journal of the Royal Society of New Zealand. 49(2):205–228. [Google Scholar]
Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel MA, Al-Amidie M, Farhan L.. 2021. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data. 8(1):1–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
Aquaculture New Zealand . 2020. Aquaculture for New Zealand: a sector overview with key facts and statistics for 2020. https://drive.google.com/file/d/1yAlnzMbUPyuvbJxtD8Onao LXydEl5Vic/view.
Bi Y, Xue B, Zhang M.. 2021a. Genetic programming-based discriminative feature learning for low-quality image classification. IEEE Transactions on Cybernetics. 1–14. doi: 10.1109/TCYB.2021.3049778. [DOI] [PubMed] [Google Scholar]
Bi Y, Xue B, Zhang M.. 2021b. Genetic programming for image classification: an automated approach to feature learning. Springer Nature Switzerland. [Google Scholar]
Bi Y, Xue B, Zhang M.. 2021c. Genetic programming with image-related operators and a flexible program structure for feature learning in image classification. IEEE Transactions on Evolutionary Computation. 25(1):87–101. [Google Scholar]
Bi Y, Xue B, Zhang M.. 2021d. Multi-objective genetic programming for feature learning in face recognition. Applied Soft Computing. 103:107152. [Google Scholar]
Bi Y, Xue B, Zhang M.. 2022a. Dual-tree genetic programming for few-shot image classification. IEEE Transactions on Evolutionary Computation. 26(3):555–569. [Google Scholar]
Bi Y, Xue B, Zhang M.. 2022b. Learning and sharing: a multitask genetic programming approach to image feature learning. IEEE Transactions on Evolutionary Computation. 26(2):1–15. [Google Scholar]
Bi Y, Zhang M, Xue B.. 2018. Genetic programming for automatic global and local feature extraction to image classification. In: Proceedings of IEEE Congress on Evolutionary Computation, Rio de Janeiro, Brazil. p. 1–8.
CVAT . n.d.. Computer vision annotation tool. https://github.com/openvinotoolkit/cvat.
Department of Conservation . 2003. Potential effects of mussel farming on New Zealand's marine mammals and seabirds: a discussion paper. https://www.doc.govt.nz/Documents/science-and-technical/musselfarms01.pdf.
Dong Z, Li G, Liao Y, Wang F, Ren P, Qian C.. 2020. Centripetalnet: pursuing high-quality keypoint pairs for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual Conference. p. 10519–10528.
Fisheries New Zealand . 2009. June. Green-shell mussels. https://fs.fish.govt.nz/Page.aspx?pk=122.
He K, Gkioxari G, Dollár P, Girshick R.. 2017. Mask R-CNN: In: Proceedings of the IEEE International Conference on Computer Vision, Venice. p. 2961–2969.
Koza JR. 1992. Genetic programming: on the programming of computers by means of natural selection. Cambridge: MIT press. [Google Scholar]
Lepetit V, Fua P.. 2006. Keypoint recognition using randomized trees. IEEE Transactions on Pattern Analysis and Machine Intelligence. 28(9):1465–1479. [DOI] [PubMed] [Google Scholar]
Lu Y. 2019. Artificial intelligence: a survey on evolution, models, applications and future trends. Journal of Management Analytics. 6(1):1–29. [Google Scholar]
Ma J, Jiang X, Fan A, Jiang J, Yan J.. 2021. Image matching from handcrafted to deep features: a survey. International Journal of Computer Vision. 129(1):23–79. [Google Scholar]
McLeay AJ, McGhie A, Briscoe D, Bi Y, Xue B, Vennell R, Zhang M.. 2021. Deep convolutional neural networks with transfer learning for waterline detection in mussel farms. In: Proceedings of IEEE Symposium Series on Computational Intelligence, Orlando, FL, USA. p. 1–8.
Milletari F, Navab N, Ahmadi S-A.. 2016. V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of the Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA. p. 565–571.
Montana DJ. 1995. Strongly typed genetic programming. Evolutionary Computation. 3(2):199–230. [Google Scholar]
New Zealand Government . 2020. Nov. The New Zealand government acquaculture strategy. https://www.mpi.govt.nz/dmsdocument/15895-The-Governments-Aquaculture-Strategy-to-2025.
Pal NR, Pal SK.. 1993. A review on image segmentation techniques. Pattern Recognition. 26(9):1277–1294. [Google Scholar]
Peng B, Wan S, Bi Y, Xue B, Zhang M.. 2021. Automatic feature extraction and construction using genetic programming for rotating machine fault diagnosis. IEEE Transactions on Cybernetics. 51(10):4909–4923. [DOI] [PubMed] [Google Scholar]
Price SR, Anderson DT, Price SR.. 2019. GOOFeD: extracting advanced features for image classification via improved genetic programming. In: Proceedings of IEEE Congress on Evolutionary Computation, Wellington, New Zealand. p. 1596–1603.
Ronneberger O, Fischer P, Brox T.. 2015. U-net: convolutional networks for biomedical image segmentation. In: Proceedings of the Medical Image Computing and Computer Assisted Intervention Society, Munich, Germany. p. 234–241.
Rublee E, Rabaud V, Konolige K, Bradski G.. 2011. ORB: an efficient alternative to sift or surf. In: Proceedings of International Conference on Computer Vision, Barcelona, Spain. p. 2564–2571.
Science for Technological Innovation . 2019. Dec. Let's turn those buoys into real-time mussel sensors. https://www.sftichallenge.govt.nz/news/lets-turn-those-buoys-real-time-mussel-sensors.
Shelhamer E, Long J, Darrell T.. 2016. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 39(4):640–651. [DOI] [PubMed] [Google Scholar]
Steccanella L, Bloisi D, Blum J, Farinelli A.. 2018. Deep learning waterline detection for low-cost autonomous boats. In: Proceedings of International Conference on Intelligent Autonomous Systems, Singapore. p. 613–625.
Steccanella L, Bloisi DD, Castellini A, Farinelli A.. 2020. Waterline and obstacle detection in images from low-cost autonomous boats for environmental monitoring. Robotics and Autonomous Systems. 124:103346. [Google Scholar]
Sultana F, Sufian A, Dutta P.. 2020. Evolution of image segmentation using deep convolutional neural network: a survey. Knowledge-based Systems. 201:106062. [Google Scholar]
Tareen S, Saleem Z.. 2018. A comparative analysis of sift, surf, kaze, akaze, orb, and brisk. In: Proceedings of International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, Pakistan. p. 1–10.
Wu X, Doyen S, Steven H.. 2020. Polarnet: learning to optimize polar keypoints for keypoint based object detection. In: Proceedings of International Conference on Learning Representations, Virtual Conference.
Yan Z, Bi Y, Xue B, Zhang M.. 2021. Automatically extracting features using genetic programming for low-quality fish image classification. In: Proceedings of IEEE Congress on Evolutionary Computation, Kraków, Poland. p. 2015–2022.
Zhou X, Chai C, Li G, Sun J.. 2020. Database meets artificial intelligence: a survey. IEEE Transactions on Knowledge and Data Engineering. 34(3):1096–1116. [Google Scholar]

[CIT0001] Al-Sahaf H, Bi Y, Chen Q, Lensen A, Mei Y, Sun Y, Tran B, Xue B, Zhang M.. 2019. A survey on evolutionary machine learning. Journal of the Royal Society of New Zealand. 49(2):205–228. [Google Scholar]

[CIT0002] Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel MA, Al-Amidie M, Farhan L.. 2021. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data. 8(1):1–74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0003] Aquaculture New Zealand . 2020. Aquaculture for New Zealand: a sector overview with key facts and statistics for 2020. https://drive.google.com/file/d/1yAlnzMbUPyuvbJxtD8Onao LXydEl5Vic/view.

[CIT0004] Bi Y, Xue B, Zhang M.. 2021a. Genetic programming-based discriminative feature learning for low-quality image classification. IEEE Transactions on Cybernetics. 1–14. doi: 10.1109/TCYB.2021.3049778. [DOI] [PubMed] [Google Scholar]

[CIT0005] Bi Y, Xue B, Zhang M.. 2021b. Genetic programming for image classification: an automated approach to feature learning. Springer Nature Switzerland. [Google Scholar]

[CIT0006] Bi Y, Xue B, Zhang M.. 2021c. Genetic programming with image-related operators and a flexible program structure for feature learning in image classification. IEEE Transactions on Evolutionary Computation. 25(1):87–101. [Google Scholar]

[CIT0007] Bi Y, Xue B, Zhang M.. 2021d. Multi-objective genetic programming for feature learning in face recognition. Applied Soft Computing. 103:107152. [Google Scholar]

[CIT0008] Bi Y, Xue B, Zhang M.. 2022a. Dual-tree genetic programming for few-shot image classification. IEEE Transactions on Evolutionary Computation. 26(3):555–569. [Google Scholar]

[CIT0009] Bi Y, Xue B, Zhang M.. 2022b. Learning and sharing: a multitask genetic programming approach to image feature learning. IEEE Transactions on Evolutionary Computation. 26(2):1–15. [Google Scholar]

[CIT0010] Bi Y, Zhang M, Xue B.. 2018. Genetic programming for automatic global and local feature extraction to image classification. In: Proceedings of IEEE Congress on Evolutionary Computation, Rio de Janeiro, Brazil. p. 1–8.

[CIT0011] CVAT . n.d.. Computer vision annotation tool. https://github.com/openvinotoolkit/cvat.

[CIT0012] Department of Conservation . 2003. Potential effects of mussel farming on New Zealand's marine mammals and seabirds: a discussion paper. https://www.doc.govt.nz/Documents/science-and-technical/musselfarms01.pdf.

[CIT0013] Dong Z, Li G, Liao Y, Wang F, Ren P, Qian C.. 2020. Centripetalnet: pursuing high-quality keypoint pairs for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual Conference. p. 10519–10528.

[CIT0014] Fisheries New Zealand . 2009. June. Green-shell mussels. https://fs.fish.govt.nz/Page.aspx?pk=122.

[CIT0015] He K, Gkioxari G, Dollár P, Girshick R.. 2017. Mask R-CNN: In: Proceedings of the IEEE International Conference on Computer Vision, Venice. p. 2961–2969.

[CIT0016] Koza JR. 1992. Genetic programming: on the programming of computers by means of natural selection. Cambridge: MIT press. [Google Scholar]

[CIT0017] Lepetit V, Fua P.. 2006. Keypoint recognition using randomized trees. IEEE Transactions on Pattern Analysis and Machine Intelligence. 28(9):1465–1479. [DOI] [PubMed] [Google Scholar]

[CIT0018] Lu Y. 2019. Artificial intelligence: a survey on evolution, models, applications and future trends. Journal of Management Analytics. 6(1):1–29. [Google Scholar]

[CIT0019] Ma J, Jiang X, Fan A, Jiang J, Yan J.. 2021. Image matching from handcrafted to deep features: a survey. International Journal of Computer Vision. 129(1):23–79. [Google Scholar]

[CIT0020] McLeay AJ, McGhie A, Briscoe D, Bi Y, Xue B, Vennell R, Zhang M.. 2021. Deep convolutional neural networks with transfer learning for waterline detection in mussel farms. In: Proceedings of IEEE Symposium Series on Computational Intelligence, Orlando, FL, USA. p. 1–8.

[CIT0021] Milletari F, Navab N, Ahmadi S-A.. 2016. V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of the Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA. p. 565–571.

[CIT0022] Montana DJ. 1995. Strongly typed genetic programming. Evolutionary Computation. 3(2):199–230. [Google Scholar]

[CIT0023] New Zealand Government . 2020. Nov. The New Zealand government acquaculture strategy. https://www.mpi.govt.nz/dmsdocument/15895-The-Governments-Aquaculture-Strategy-to-2025.

[CIT0024] Pal NR, Pal SK.. 1993. A review on image segmentation techniques. Pattern Recognition. 26(9):1277–1294. [Google Scholar]

[CIT0025] Peng B, Wan S, Bi Y, Xue B, Zhang M.. 2021. Automatic feature extraction and construction using genetic programming for rotating machine fault diagnosis. IEEE Transactions on Cybernetics. 51(10):4909–4923. [DOI] [PubMed] [Google Scholar]

[CIT0026] Price SR, Anderson DT, Price SR.. 2019. GOOFeD: extracting advanced features for image classification via improved genetic programming. In: Proceedings of IEEE Congress on Evolutionary Computation, Wellington, New Zealand. p. 1596–1603.

[CIT0027] Ronneberger O, Fischer P, Brox T.. 2015. U-net: convolutional networks for biomedical image segmentation. In: Proceedings of the Medical Image Computing and Computer Assisted Intervention Society, Munich, Germany. p. 234–241.

[CIT0028] Rublee E, Rabaud V, Konolige K, Bradski G.. 2011. ORB: an efficient alternative to sift or surf. In: Proceedings of International Conference on Computer Vision, Barcelona, Spain. p. 2564–2571.

[CIT0029] Science for Technological Innovation . 2019. Dec. Let's turn those buoys into real-time mussel sensors. https://www.sftichallenge.govt.nz/news/lets-turn-those-buoys-real-time-mussel-sensors.

[CIT0030] Shelhamer E, Long J, Darrell T.. 2016. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 39(4):640–651. [DOI] [PubMed] [Google Scholar]

[CIT0031] Steccanella L, Bloisi D, Blum J, Farinelli A.. 2018. Deep learning waterline detection for low-cost autonomous boats. In: Proceedings of International Conference on Intelligent Autonomous Systems, Singapore. p. 613–625.

[CIT0032] Steccanella L, Bloisi DD, Castellini A, Farinelli A.. 2020. Waterline and obstacle detection in images from low-cost autonomous boats for environmental monitoring. Robotics and Autonomous Systems. 124:103346. [Google Scholar]

[CIT0033] Sultana F, Sufian A, Dutta P.. 2020. Evolution of image segmentation using deep convolutional neural network: a survey. Knowledge-based Systems. 201:106062. [Google Scholar]

[CIT0034] Tareen S, Saleem Z.. 2018. A comparative analysis of sift, surf, kaze, akaze, orb, and brisk. In: Proceedings of International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, Pakistan. p. 1–10.

[CIT0035] Wu X, Doyen S, Steven H.. 2020. Polarnet: learning to optimize polar keypoints for keypoint based object detection. In: Proceedings of International Conference on Learning Representations, Virtual Conference.

[CIT0036] Yan Z, Bi Y, Xue B, Zhang M.. 2021. Automatically extracting features using genetic programming for low-quality fish image classification. In: Proceedings of IEEE Congress on Evolutionary Computation, Kraków, Poland. p. 2015–2022.

[CIT0037] Zhou X, Chai C, Li G, Sun J.. 2020. Database meets artificial intelligence: a survey. IEEE Transactions on Knowledge and Data Engineering. 34(3):1096–1116. [Google Scholar]

PERMALINK

A new artificial intelligent approach to buoy detection for mussel farming

Ying Bi

Bing Xue

Dana Briscoe

Ross Vennell

Mengjie Zhang

ABSTRACT

Introduction

Background and related work

Background

Convolutional neural networks (CNNs) and u-Net

Figure 1.

Genetic programming (GP)

Figure 2.

Figure 3.

Related work

Image segmentation

Feature detection and description in object detection

GP for image feature extraction

The datasets

Figure 4.

Figure 5.

Table 1.

Figure 6.

The proposed approach to buoy detection

Figure 7.

Figure 8.

Image segmentation using u-Net

Figure 9.

Keypoint detection

Figure 10.

Evolving descriptor using GP

Solution representation

Figure 11.

Terminal set and function set

Table 2.

Table 3.

Figure 12.

Fitness function

Overall learning/training algorithm

Classification and performance measure

Experiments

Comparison methods

Parameter settings and implementations

Table 4.

Results and discussions

Segmentation results

Table 5.

Figure 13.

Object classification results

Classification results

Table 6.

Figure 14.

Analysis of the number of keypoints

Figure 17.

Table 7.

Figure 15.

Figure 16.

Analysis of example GP trees

Summary

Figure 18.

Figure 19.

Conclusions

Acknowledgments

Funding Statement

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases