Skip to main content
Poultry Science logoLink to Poultry Science
. 2024 Apr 6;103(6):103711. doi: 10.1016/j.psj.2024.103711

Sex identification of ducklings based on acoustic signals

JJ Yin *,, WG Li *,, YF Liu *,, DQ Xiao *,†,1
PMCID: PMC11636844  PMID: 38652956

Abstract

Sex identification of ducklings is a critical step in the poultry farming industry, and accurate sex identification is beneficial for precise breeding and cost savings. In this study, a method for identifying the sex of ducklings based on acoustic signals was proposed. In the first step, duckling vocalizations were collected and an improved spectral subtraction method and high-pass filtering were applied to reduce the influence of noise. Then, duckling vocalizations were automatically detected by using a double-threshold endpoint detection method with 3 parameters: short-time energy (STE), short-time zero-crossing rate (ZCR), and duration (D). Following the extraction of Mel-Spectrogram features from duckling vocalizations, an improved Res2Net deep learning algorithm was used for sex classification. This algorithm was introduced with the Squeeze-and-Excitation (SE) attention mechanism and Ghost module to improve the bottleneck of Res2Net, thereby improving the model accuracy and reducing the number of parameters. The ablative experimental results showed that the introduction of the SE attention mechanism improved the model accuracy by 2.01%, while the Ghost module reduced the number of model parameters by 7.26M and the FLOPs by 0.85G. Moreover, this algorithm was compared with 5 state-of-the-art (SOTA) algorithms, and the results showed that the proposed algorithm has the best cost-effectiveness, with accuracy, recall, specificity, number of parameters, and FLOPs of 94.80, 94.92, 94.69, 18.91M, and 3.46G, respectively. After that, the vocalization detection score and the average confidence strategy were used to predict the sex of individual ducklings, and the accuracy of the proposed model reached 96.67%. In conclusion, the method proposed in this study can effectively detect the sex of ducklings and serve as a reference for automated sex identification of ducklings.

Key words: duckling, sex identification, acoustic features, deep learning, animal welfare

INTRODUCTION

In the poultry farming industry, there are significant differences in growth and economics between male and female individuals due to differences in physiology, behavior, and ecology (Trocino et al., 2015; Krunt et al., 2022; Huang et al., 2023; Lin et al., 2023). Failure to differentiate the sex of individuals early can result in mixing of male and female ducks, leading to increased costs and time in later stages of rearing. Therefore, sex identification is necessary for all ducklings after hatching, and commonly used techniques for duck sex identification include cloacal examination, feather-associated sex linkage, and genetic testing (Kaleta and Redmann, 2008; Alin et al., 2019; Morinha et al., 2012). Cloacal examination is the most commonly used technique among these techniques, but it has certain limitations. The technique of cloacal examination is challenging and requires experienced operators (Otsuka et al., 2016). It requires a significant amount of manpower for large hatcheries where the number of hatched ducklings can reach tens of thousands, which undoubtedly adds to the farm's expenses. Additionally, even skilled operators cannot guarantee 100% accuracy; and traditional sexing techniques are invasive and may cause stress, bleeding and infection, resulting in a mortality rate of up to 1%. (Malagó-Jr. et al., 2005; England et al., 2021).

Vocalization is the main way for poultry to transmit information, which can reflect various information of poultry. Many scholars have analyzed poultry vocalizations to determine the growth and health status of poultry (Fontana et al., 2015; Du et al., 2021). Fontana et al. (2017) successfully established and validated a model describing the growth rate of broilers based on the peak frequency of broiler vocalizations. There was a significant correlation between the body weight predicted by the model and the observed body weight (r = 96%, P ≤ 0.001). Huang et al. (2019) proposed a method for detecting chickens infected with avian influenza by sound recognition. The longer the time after virus inoculation, the higher the probability of identifying avian influenza by the trained support vector machines (SVM) model, which was more than 80% at 22 h after inoculation. Cuan et al. (2022b) used a deep poultry vocalization network for early detection of Newcastle disease based on poultry vocalizations with accuracies of 82.15, 90.00, 93.60, and 98.50% within d 1, 2, 3, and 4 postinfection, respectively. Aydin and Berckmans (2016) utilized sound technology to automatically detect short-term feeding behaviors of broiler chickens, and the proposed sound system was capable of efficiently and automatically estimate broiler meal size, meal duration, meal frequency and feeding rate. Carpentier et al. (2018) developed an algorithm designed to identify sneezing behavior in broilers between 15 and 45 d of age. The algorithm is able to accurately monitor sneezing events in dimly lit conditions where chickens rarely vocalize.

Due to differences in the structure of the vocal organs of birds of different sexes, the calls they make will be different (Cate, 1997). Therefore, the sex of various birds can be identified using acoustic techniques, which is a noninvasive method to avoid potential trauma and maximize the protection of animal welfare and rights (Pereira et al., 2015; Volodin et al., 2015; Sadeghi and Banakar, 2017). Many studies have shown that sex detection methods based on acoustic signals are effective for a wide range of bird species. Eda-Fujiwara et al. (2004) analyzed the frequency and duration of vocalizations in male and female storks during the breeding season. Male storks called at a lower frequency and for a longer duration, while the opposite was true for the female storks. Cuan et al. (2022a) compared 3 deep-learning capabilities to identify the sex of chicks. The gated recurrent units (GRUs) model had the highest accuracy of 76.15% in the sex detection task for vocalization instances, while the convolutional neural networks (CNNs) model had the highest accuracy of 91.25% in the task of predicting the sex of chicks based on vocalizations. Venuto et al. (2001) found differences in the sounds made by males and females of the same species. Female distress calls were generally longer in duration; in addition, females used them more frequently than males and regulated the overall vocalization to a greater extent.

However, there have been few studies on the detection of sex in poultry using deep learning methods and relatively few studies on sex detection in ducks. In this study, we have established a method for automatically recognizing the sex of ducklings based on sound, which improves the accuracy of recognizing the sex of ducklings through improved spectral subtraction and improved Res2Net deep learning algorithms, and increases the possibility of deploying the model on edge devices by reducing the number of parameters and FLOPs of the model.

MATERIALS AND METHODS

Experimental Setup

This experiment was carried out on February 24, 2023, at the Silao Hatchery of Wen's Group in Yunfu City, Guangdong Province, China, where the room temperature was 25℃. The study subjects were 1-day-old white pansy ducklings, and audio data samples were collected from 160 ducklings, including 80 males and 80 females. The sex of the ducklings was determined only by 2 professionals using cloacal examination (average accuracy 99.5%). During the experiment, a duckling was randomly removed from the 32℃ thermostat and placed individually in the experimental box to collect audio data. The experimental box is made of PVC and has dimensions of 40 cm in length, 30 cm in width, and 30 cm in height. Inside the box, it is equipped with a heat preservation lamp to maintain a suitable temperature and prevent the ducklings from feeling cold. The heat preservation lamp also serves as a light source, creating a bright environment similar to the outside world for the ducklings. Inside the box, there is also an AOOCAN MU900 microphone, which is connected to a computer outside the box for data recording and processing. In addition, a piece of transparent glass was installed at the top of the box to make it easier to observe the behavior of the ducklings. The architecture of the experimental set-up is shown in Figure 1. The experiment was conducted with the microphone set to a bit depth of 32 bits, a sampling frequency of 48 kHz; and a single channel recording for the audio data. Two minutes of audio data were recorded for each duckling, and a total of 160 audio files were obtained.

Figure 1.

Figure 1

Schematic diagram of the experimental setup.

Figure 2 shows the waveforms and frequency amplitude curves of the calls of 2 male and 2 female ducklings, which shows that the vocalization frequencies of both males and females are centered between 3000 Hz and 5000 Hz, and there is no obvious difference. Therefore, in this paper, we classify the gender of ducklings by extracting audio features and deep learning methods with their powerful data mining abilities.

Figure 2.

Figure 2

The waveforms and frequency amplitude curves of the calls of 2 male and 2 female ducklings. (A) male, (B) female.

Preprocessing

Improved spectral subtraction

In recordings of audio data, various noises such as duckling footsteps, fan noise and human handling sounds may be present, similar to the environmental conditions in a real hatchery. In order to reduce the effect of these noises on the data, audio noise reduction is required. Spectral subtraction is one of the most commonly used noise reduction techniques; and it is widely used in speech and sound signal processing tasks (Vaseghi, 1996; Yektaeian and Amirfattahi, 2007; Karam et al., 2014). However, spectral subtraction assumes that the noise is stationary, which makes it difficult to accurately estimate the noise segments under different audio conditions. Moreover, the audio processed by spectral subtraction may have exhibit rhythmic fluctuations resembling “musical noise” (Esch and Vary, 2009). Therefore, an improved spectral subtraction method was proposed in this study, the specific steps are as follows:

1. Assume that the time domain signal of the original audio signal is:

y(n)=x(n)+e(n) (1)

where y(n) denotes the mixed signal, x(n) denotes the clean acoustic signal, and e(n) denotes the noise signal.

2. Windowing and framing: A Hamming window with a length of 23.2 ms (including 1024 samples) was used to divide the sound signal into frames, and step size was set to 256 sampling points.

3. The FFT (Fast Fourier Transform) was performed on each frame to obtain the frequency domain signal, and the frequency domain expression for frame i is:

Yi(k)=Xi(k)+Ei(k) (2)

where k is the fast Fourier transform parameter, k = 1, 2...,1024.

4. Calculated the amplitude |Yi(k)| and phase angle:

θi(k)=arctan[Im(Yi(k))Re(Yi(k))] (3)

5. The average energy spectrum of the first 30 frames without sound events was calculated as the noise spectrum:

E(k)=130130|Yi(k)|2 (4)

6. Spectral subtraction:

|Xi(k)|2={β|E(k)|2|Yi(k)|2α|E(k)|2|Yi(k)|2α|E(k)|2<β|E(k)|2else (5)

where the reduction parameter α and compensation parameter β were set to 4 and 0.001, respectively.

7. Introduction of a smoothing mechanism: calculate the maximum residual noise.

Maxnosie=maxi[0,30](Ei(k)|E(k)|)

(6)

If the amplitude spectrum of the current frame is smaller than the maximum residual noise, the minimum amplitude spectrum of a number of neighboring frames is used instead:

|Xi(k)|={mini[il,i+l]|Xi(k)||Xi(k)||Xi(k)|<Maxnosieelse (7)

Finally, the final sound signal x(n) is calculated by IFFFT (Inverse Fast Fourier Transform) using the smoothed amplitude spectrum and the phase angle of Equation 3.

The spectrogram of the original audio signal after the traditional spectral subtraction and the improved spectral subtraction is shown in Figure 3. The improved spectral subtraction method can effectively remove most of the noise, and after manual inspection, there is no obvious distortion in the noise-reduced audio. The signal-to-noise ratios of the original audio, the audio processed by the traditional subtraction method, and the audio processed by the improved subtraction method are examined at 31.61 dB, 36.68 dB, and 58.65 dB, respectively. It is worth noting that the improved subtraction method shows a significant increase in the signal-to-noise ratio of the audio, reaching a gain of 27.04 dB, while the traditional subtraction method improves the signal-to-noise ratio by only 5.07 dB. These results show that the improved spectral subtraction method is more prominent in noise reduction compared to the traditional spectral subtraction method.

Figure 3.

Figure 3

Spectrogram of 2 different spectral subtraction methods. (A) the original signal, (B) the signal after traditional spectral subtraction, (C) the signal after improved spectral subtraction.

Pre-emphasis

Due to the distortion and attenuation that speech signals undergo during transmission, the high-frequency components of the speech signal are weakened or lost. Pre-emphasis can counteract this distortion and attenuation by enhancing the high-frequency components of the speech signal, thereby improving the quality and audibility of the speech signal. Pre-emphasis is achieved by filtering the speech signal, and the filter is usually a first-order high-pass filter. The expression of the signal after pre-emphasis is shown in Equation (8):

x^(n)=x(n)αx(n1) (8)

where the filter corresponds to a transfer function of H(z)=1αz1 and a pre-emphasis coefficient α of 0.97.

Methods for detection of vocal endpoints in poultry

The audio data contains not only the vocalizations of the ducklings, but also other sounds and a large number of silent clips. In order to analyze the vocalizations of the ducklings more accurately, endpoint detection processing is required for the audio data. In this paper, the endpoint detection was achieved by using the double-threshold 2-stage judgement method with the 3 parameters of short-time energy (STE), short-time zero crossing rate (ZCR) and duration (D).

The STE reflects the intensity of the sound. In a quiet environment, the STE of sound events is significantly higher than that of noise. The speech signal of the i th frame is defined as xi(m), and the short time energy of this frame is the STEi:

STEi=m=0N1xi2(m) (9)

where N is the frame length. m denotes the m th sampling point within the frame.

The ZCR describes the frequency at which the audio signal crosses the zero point on the time axis, which can express the frequency information of the sound to some extent. Define the short-time crossing zero rate of frame i th as ZCRi:

ZCRi=12i=0N1|sgn[xi(m)]sgn[xi(m1)]| (10)

where sgn[] is the sign function:sgn[xi(m)]={11xi(m)0xi(m)<0.

The low and high thresholds were set for STE and ZCR respectively using the double-threshold judgment method, and a threshold is set for D. The low and high thresholds for STE are S1 and S2, respectively, the low and high thresholds for ZCR are Z1 and Z2, respectively, and the threshold for D is set to D1. The endpoint detection flow is shown in Figure 4.

Figure 4.

Figure 4

Flow chart of endpoint detection by double-threshold method with 3 parameters.

The result after endpoint detection is shown in Figure 5. The solid line indicates the starting point of the sound event and the dashed line indicates the end point. The sound events automatically selected by this algorithm are manually verified and the results are correct.

Figure 5.

Figure 5

Endpoint detection results.

Sex Identification of Ducklings

Duckling acoustic signal feature extraction

After analysis, the duration of the duckling call varies from 0.1 to 0.3 s. Since audio segments with different time lengths are converted into feature maps and then normalizing them will cause the deformation of the vocal lines in the image, which will affect the results of sound recognition. In this study, the audio tails are copied and filled in at the end of all audio data to make the length of 1 s. This ensures that all audio data is the same length, which guarantees the consistency of voice lines in the image and improves the accuracy of sound recognition.

Audio signals contain many acoustic features, among which the Mel-Spectrogram is widely used in animal vocalization recognition and has good robustness (Cuan et al., 2020; Liao et al., 2023). The extraction process of Mel-spectrogram is as follows: after the preprocessed ducklings sound signal of ducklings is sub-framed and windowed, the FFT is used to convert the time-domain signal into the frequency-domain signal. Next, the energy spectrum of each frame is calculated and the energy spectrum is passed through the Mel filter bank, and finally the Mel-spectrogram is obtained.

SpecAugment data augmentation methodology

Data augmentation is a widely used technique in the field of speech recognition. It improves model performance by generating more training samples, reducing the risk of overfitting, and improving model generalization. In this study, the extracted features are augmented using the SpecAugment data augmentation method. SpecAugment is a simple and effective data augmentation method that increases the diversity of the data by time warping, frequency masking and time masking (Park et al., 2019). The method first applies time warping by randomly selecting a center frame and stretching or compressing the frames before and after it. Then, certain frequencies and time periods are randomly selected for masking. An example of a Mel-spectrogram of an audio segment after undergoing SpecAugment is shown in Figure 6.

Figure 6.

Figure 6

(A) Original Mel-Spectrogram, (B) Mel-Spectrogram processed by SpecAugment method.

Overall technical route

In order to reliably recognize the sex of ducklings in a real hatchery environment, the technical route proposed in this study is shown in Figure 7. The study focuses on the improved spectral subtraction method and the improved Res2Net deep learning algorithm to enhance the performance of duckling sex recognition. Initially, an improved spectral subtraction method is utilized to denoise the audio and minimize the influence of noise on the model performance. Furthermore, to efficiently utilize features across channels based on their importance. the squeeze-and-excitation (SE) visual attention module is integrated into Res2Net. Finally, the Ghost module is introduced into the model to reduce the number of parameters and computational burden.

Figure 7.

Figure 7

Overall technical route.

Squeeze-and-excitation visual attention module

In a convolutional neural network, each channel is responsible for extracting different features, but not all features are equally important for the final classification task, which can result in many important feature channels being underestimated, thus limiting model performance. To address this issue, the squeeze-and-excitation (SE) attention module introduces a feature fusion mechanism in the channel dimension, which allows different weights to be assigned channels. This enables the model to focus more on important information (Hu et al., 2018).

The structure of the SE module is shown in Figure 8, consisting primarily of squeeze, excitation, and feature map recalibration operations. Firstly, the c channels of the input image are compressed through global average pooling, resulting in c one-dimensional vectors. The calculation formula is shown in Equation 11:

zc=Fsq(uc)=1H×Wi=1Hj=1Wuc(i,j) (11)

where: Fsq denotes the squeezing operation; uc denotes the feature map tensor; c denotes the channel index.

Figure 8.

Figure 8

The structure of the SE visual attention module.

Then, the feature vector z obtained from the squeezing operation is input to the excitation operation as shown in Equation 12:

s=Fex(z,W)=σ(g(z,W))=σ(W2δ(W1z)) (12)

where Fex denotes the excitation operation; W1 and W2 are the descending and ascending weights of the fully connected or convolutional layers, respectively; σ denotes the Sigmoid function; and δ denotes the ReLU activation function.

Finally, the channel weight vector s is weighted by multiplying it channel by channel with the feature weights of the original feature map, resulting in a recalibration of the original image feature information in the channel dimension. This process generates a new feature map with weighted information, as shown in Equation 13.

oc=Fscale(uc,sc)=scuc (13)

Ghost module

For convolutional neural networks, feature extraction often leads to redundant feature maps. Deeper networks with a large number of stacked convolutional layers can result in a significant amount of redundant feature maps, increasing both the number of parameters and the computational cost. GhostNet is a novel lightweight network architecture proposed by Huawei Noah's Ark Laboratory (Han et al., 2020). This module achieves lower computational and parameter requirements compared to commonly used models while maintaining the same level of accuracy. The network introduces a novel Ghost module on top of existing convolutional neural network architectures to generate additional feature maps through low-cost operations. The schematic diagram of the Ghost module is shown in Figure 9 and consists of 2 steps. Firstly, a general convolution is applied to generate intrinsic feature maps Y' with a smaller number of channels, which consumes fewer computational resources. Then, a linear transformation called the ϕi() operation is performed to obtain Ghost feature maps. Finally, the intrinsic feature maps and Ghost feature maps are concatenated to form Y.

Figure 9.

Figure 9

The structure of the Ghost module.

In conventional convolution, suppose the input feature map is XRh×w×c, where c, h, and w represent the number of input channels, height and width respectively, the size of the convolution kernel is fRc×k×k×n, k refers to the size of convolution kernel, and n refers to the number of channels of the output feature map. Therefore, the output feature map of conventional convolution is YRh×w×n, with a total computation of h×w×k×k×n×c. Since the values of n and c are generally very large, the computation amount is also large.

Ghost module's convolution method input feature map size is also XRh×w×c, it first applies a conventional convolution with a channel number of m, where m is much smaller than n. The output feature map has a size of YRh×w×n, and fRc×k×k×m represents the convolutional kernel used. The total computation required is denoted by h×w×k×k×m×c. In order to maintain the same number of channels as the original channel number n, the parameter s is introduced, and then a linear convolution is performed on the m-dimensional intrinsic maps, where n = m×s. Finally, the intrinsic feature map is superimposed on the feature map obtained by linear convolution. Assuming that the Ghost module reports containing 1 intrinsic feature map and m×(s1)=ns×(s1) linear convolution operations, each convolution kernel is of size d×d, d and k are of similar magnitude, and sc. The theoretical accelerated computation ratio and parameter compression ratio of the Ghost module are by:

rs=h·w·k·k·n·cns·h·w·c·k·k+(s1)·ns·h·w·d·d=c·k·k1s·c·k·k+s1s·d·ds·cs+c1s (14)
rc=k·k·n·cns·c·k·k+(s1)·ns·d·ds·cs+c1s (15)

It can be obtained in (14), (15) that the Ghost module has an s-fold improvement in both computational speed and parameter count reduction compared to general convolution.

Res2Net deep learning network

Res2Net is a novel multi-scale network architecture proposed by Gao et al. (2021). Its structure is shown in Figure 10. The input feature map is divided into multiple branches within the Res2Net block, where each branch performs a convolution with a different dilation rate. The outputs of these branches are concatenated and run through a 1×1 convolution to fully fuse the information. This connectivity allows Res2Net to have a larger receptive field and more combinations of receptive fields than ResNet.

Figure 10.

Figure 10

Res2Net module structure diagram.

Specifically, the input to Res2Net is a feature map with channel number C and dimension H×W. After a 1×1 convolution, the input feature map is uniformly divided into s feature subsets xi(i{1,2,...,s}) according to the channel dimensions. Except x1, each subset xi, goes through a 3×3 convolution kernel Ki(). The feature mapping subset xi is added to the next set of residual connections after computing the corresponding convolutions, and is used as its input data. The output yi of Res2Net is shown in Equation 16.

yi={xiKi(xi)Ki(xi+yi1)i=1i=22<is (16)

Sex recognition of ducklings based on Ghost-SE-Res2Net

The Res2Net network extracts multi-scale features by performing convolutional operations on multiple branches, a process that imposes a large computational cost. In order to reduce the model network parameters and lighten the network model, this study proposes a duckling sex recognition model based on the Ghost-SE-Res2Net block (GSR BLK).

The network structure of the improved Res2Net for duckling sex recognition is shown in Figure 11. Firstly, the SE attention module is added to the Res2Net module, which is adaptively recalibrated according to the importance of each feature channel and fused with the original features, which effectively improves the utilization rate of important features. In addition, by replacing the 3×3 convolutional kernels Ki() in the Res2Net module with the Ghost module the number of parameters and the computation of the model are effectively reduced. By using the Ghost-SE-Res2Net block to replace the bottleneck module, the computation can be accelerated while improving the accuracy.

Figure 11.

Figure 11

Schematic diagram of ducklings sex identification algorithm based on improved Res2Net.

Evaluation Indicators

In this study, each vocalization corresponding to the sex was detected, and the performance of the model was evaluated based on the results of the detection and the actual situation, as shown in Table 1, where TM is the correctly detected male duckling, FF is the male duckling detected as a female duckling, FM is the female duckling detected as a male duckling, and TF is the correctly detected female duckling.

Table 1.

Confusion matrix.

Prediction male Prediction female
Male True Male (TM) False Male (FM)
Female False Female (FF) True Female (TF)

The Accuracy, Recall and Specificity indicators were used to evaluate the model. Accuracy is a basic evaluation indicator that describes whether the prediction of the overall result is correct or not, as shown in Equation (17). The Recall rate reflects the sensitivity of the algorithm to male ducklings and is the probability of correctly predicting male ducklings, as shown in Equation (18). Specificity reflects the sensitivity of the algorithm to female ducklings and is the probability of correctly predicting female ducklings, as shown in Equation (19).

Accuracy=TM+TFTM+TF+FM+FF (17)
Recall=TMTM+FM (18)
Specificity=TFTF+FF (19)

RESULTS

Data Set Construction

After preprocessing, the audio data of 160 ducklings were divided into 22,347 clear and nonoverlapping audio segments. Considering the huge difference in the number of vocalizations of different ducklings, with the least number of vocalizations being only 91 and the highest number of vocalizations being 257. If all vocalizations were involved in the training, the ducklings with more calls would have a greater impact on the model, and it would be difficult for the model to learn the sex difference, therefore, 80 vocalizations were randomly selected from each duckling call to form the dataset. Next, 160 ducklings were divided in a 3:1 ratio for model training and testing. Specifically, 10,240 vocalizations from 128 ducklings will be used to train the model, and these vocalizations will be randomly divided into training set and validation set in a 3:1 ratio. The remaining 2560 vocalizations from 32 ducklings are used as the test set to test the generalization ability of the model. This dataset partitioning strategy ensures stable performance of the model on different data and reduces the risk of overfitting. Furthermore, the dataset is doubled by using SpecAugment data augmentation method for acoustic features of duckling calls, which makes the training data richer and more diverse. The specific dataset division is shown in Table 2.

Table 2.

The detail number of ducklings calls dataset.

Dataset Ducklings Vocalizations Data augmentation Total
Training set 120 7680 7680 15360
Validation set 2560 2560 5120
Test set 40 2560 2560 5120

Experimental Environment and Parameter Settings

In this study, the hardware configuration is Intel(R) Core (TM) i5-12400F CPU and 16G NVIDIA GeForce RTX 3060 GPU, and the operating system is 64-bit Windows 10 with CUDA11.6 and CuDNN8.3.0. All model training and testing are based on the PyTorch framework. The model-related parameters are shown in Table 3.

Table 3.

Relevant parameters of the experiment.

Parameters Value/Type
Optimizer Adam
Batch size 64
Initial learning rate 0.001
Weight decay 0.00001
Epoch 100

The Effect of Improved Res2Net on Sex Recognition in Ducklings

Figure 12 shows the Accuracy and Loss of training and validation of the proposed improved Res2Net during training. As the number of epochs increases, the loss of the model continues to decrease and the accuracy of the model gradually approaches 1. This trend indicates that the model has learned the data from the training set well.

Figure 12.

Figure 12

Training and validation accuracies (A) and losses (B) of the proposed Res2Net model.

After completing the model training, the effectiveness of the proposed duckling sex recognition algorithm was evaluated using the test set. The test results show that the accuracy, recall, and specificity of the algorithm are 94.80, 94.92, and 94.69%, respectively, which indicates that the research method is able to reliably and accurately recognize the sex of the ducklings and provide potential technical support for the automatic assessment of duckling sex.

To further investigate the effectiveness of the Res2Net network improvement in this study, 4 ablation experiments were designed to analyze the impact of the SE module and Ghost module on the accuracy and the number of parameters of the model, and the results of the related experiments are shown in Table 4.

Table 4.

Contribution of different modules to the Res2Net network.

Module SE block Ghost block #Params (M) Flops (G) Accuracy (%)
Exp No.1 × × 23.65 4.30 93.01
Exp No.2 × 26.17 4.31 95.02
Exp No.3 × 16.39 3.45 92.42
Exp No.4 18.91 3.46 94.80

To validate the effectiveness of the SE attention module, the bottleneck modules in the original Res2Net model were replaced with the bottleneck improved solely with the SE module. As shown in Table 4, after the improvement using only the SE attention module, the model accuracy increased by 2.01%, but the number of parameters increased by 2.52M, which indicates that the introduction of the SE attention module to differentiate the importance of channel features helps to construct better channel features and improve the accuracy, but at the same time, the number of parameters of the model is also unavoidably increased.

To explore the effectiveness of the Ghost module, all the 3×3 convolution blocks in the Res2Net module were replaced with Ghost module. The results show that the number of parameters of the model is reduced by 7.26M and Flops is reduced by 0.85G, while the recognition accuracy of the model is almost unchanged.

In the model where both SE attention module and Ghost module are introduced, the accuracy is increased by 1.79%, the number of parameters is reduced by 4.74M, and the Flops are reduced by 0.84G. The above results show that the combination of SE attention module and Ghost module for the improvement of Res2Net in this study can increase the accuracy and reduce the number of parameters in the model.

Comparison of Different Algorithms

To validate the superiority of the proposed algorithm, it was compared with advanced algorithms such as Vgg16, Inceptionv3, AlexNet, ResNet50 and Densenet161. Firstly, the models of the above 5 algorithms were trained on the training set, and then their performance was evaluated on the test set. The corresponding experimental results are shown in Table 5. The algorithm proposed in this paper has the highest accuracy, precision and specificity, and has the lowest number of parameters. Compared with the 5 compared algorithms, the proposed algorithm shows excellent performance in all parameter indicators. Taken together, the algorithm proposed in this study has the best cost performance, which shows that the proposed algorithm can reliably identify the sex of ducklings in real scenarios.

Table 5.

Ducklings sex identification results of different algorithms.

Algorithm #Params (M) FLOPs (G) Accuracy (%) Recall (%) Specificity (%)
Vgg-16 138.36 15.5 92.73 91.88 93.59
Inceptionv3 27.16 2.85 93.16 94.38 91.95
AlexNet 61.10 0.72 90.86 90.31 91.41
ResNet50 25.56 4.12 90.43 88.44 92.42
Densenet-161 28.68 7.82 92.62 90.86 94.38
Ghost-SE-Res2Net (Ours) 18.91 3.46 94.80 94.92 94.69

Sex Identification of Individual Ducklings

The above study is only the detection results of vocalizations, in order to verify the ability of the proposed model to detect the sex of individual ducklings by calling. In this study, the calls of 60 ducklings (30 female and 30 male) were collected following the same method as above, and the calls of these 60 ducklings were not involved in the model training process. In the experiment, 40 calls were randomly selected from each duckling's calls for prediction, and the prediction scores of each sex represented the confidence level of its prediction. Finally, an average confidence strategy was used to determine the sex of each duckling, that is, the result with the highest average confidence was selected as the prediction result. The experimental results are shown in Figure 13. The first 30 samples are truly labeled as males, and the last 30 samples are truly labeled as females, and the number of model error detections is only 2, with an accuracy rate of 96.67%.

Figure 13.

Figure 13

Individual sex detection results for ducklings.

DISCUSSION

In the poultry farming industry, it is a major challenge to identify the sex of ducklings while maximizing the welfare of the ducklings. Currently, as with chicks, the use of the cloacal examination method to determine the sex of 1-day-old ducklings is most widely used throughout the poultry industry. Although the cloacal examination method is fast and accurate, it requires the operator to be well trained and have a lot of practice (Otsuka et al., 2016), and the method is invasive and may cause injury to the ducklings, resulting in a mortality rate of up to 1.0% of the ducklings (England et al., 2021). In addition, microbial cross-contamination between ducklings may occur if the feces in the cloaca contain transmitted pathogenic bacteria or viruses (Kaleta and Redmann, 2008). Molecular sexing techniques are also often used to identify the sex of birds (Cerit and Avanus, 2007), and although this method is very accurate, it is unlikely to be applied in practice to large numbers of ducklings due to time and cost. A new method of sexing 1-day-old chicks using an endoscopic system has been reported in the literature, in which a probe is inserted into the intestines of chicks from the cloaca, and the sex of the chicks is determined by observing the presence or absence of testes or ovaries in the intestinal wall through the pictures on the monitor, with an average accuracy of 90.2 (Otsuka et al., 2016). However, this method is still invasive and the average accuracy is not up to the requirements of poultry farms.

In order to safeguard the welfare of birds, more and more researchers have attempted to utilize bird vocalizations to determine sex (Eda-Fujiwara et al., 2004; Venuto et al. 2001; Li et al., 2022). The use of vocalizations for automated sex determination in birds has been shown to be an efficient, low-cost, and noninvasive method that can be used as a viable alternative to traditional invasive sexing techniques or to validate their results (Volodin et al., 2015). Pereira et al. (2015) found that the frequency of the second resonance peak and the distance between the resonance peaks were highly correlated with the sex of the chick by analyzing the fundamental frequency pitch, sound intensity, first resonance peak, and second resonance peak of broiler voice. However, the study has not yet established a specific detection model for detection experiments. Sadeghi and Banakar (2017) analyzed the time domain, frequency domain and time-frequency domain features of 1-day-old chick calls using data mining methods, and classified the chick sound signals using an improved distance assessment method and SVM. The accuracy of gender detection was 68.51, 70.37, and 90.74% in the time, frequency and time-frequency domains, respectively. However, it was not clear which chick made the sound for the detected vocalization, nor was the sex of the individual chick detected. Since the model may produce different detection results for the same chick's vocalization, it is necessary to determine the sex of the chicks based on their vocalization detection results. Cuan et al. (2022a) used a deep learning model to detect the sex of chick vocalizations, and also predicted the sex of individual chicks using the vocalization detection results and majority voting methods, the model achieved a high accuracy of 91.25% in detecting the sex of chicks. However, using the majority voting method to detect the sex of chicks only takes into account the labeling results of the model output, which ignores the confidence level and does not take full advantage of the underlying information of the model. The average confidence approach used in this paper better utilizes the model output information.

In this study, the vocalizations of 160 ducklings were first captured by an isolation method. Then these audios were noise-reduced using the improved spectral reduction method proposed in this paper to minimize the effect of noise. Finally, the improved Res2Net deep learning algorithm is used to recognize and classify them, and the model accuracy reaches 94.77%. In addition, this paper also uses the average confidence strategy to predict the sex of individual ducklings, and the average time to predict the sex of a duckling is 35 s, with an accuracy of 96.67%. This indicates that the method proposed in this paper can effectively detect the sex of ducklings, the only problem is the need to isolate the ducklings for a while, but compared with the cloacal examination method the isolation of this paper has little effect on the ducklings. In future research, we will optimize the algorithmic model to reduce the required duration of duckling calls. On the basis of protecting the welfare of ducklings, we will eventually realize a complete set of automatic determination of duckling sex, which will provide an effective reference scheme for large-scale application in actual breeding environments.

CONCLUSION

Sex identification of ducklings is a critical step in the poultry farming industry. In this study, a noninvasive identification method based on acoustic signals was proposed to address the time-consuming and labor-intensive problem of identifying the sex of ducklings, which is prone to potentially harming the ducklings. An improved spectral subtraction method was used to reduce the noise of duckling calls, and a double-threshold endpoint detection method with 3 parameters was adopted to detect the vocal segment, and then a SpecAugment data augmentation method to utilized to augment the Mel-spectrogram of duckling calls. Finally, an improved Res2Net deep learning algorithm was designed to identify and classify the ducklings. Some conclusions are as following:

  • 1.

    The improved spectral subtraction method proposed in this study is more effective in removing noise compared to the traditional spectral subtraction method. The signal-to-noise ratio is improved by 27.04 dB by processing the audio through the improved spectral subtraction method, while the conventional spectral subtraction method only improves it by 5.07 dB.

  • 2.

    The 3-parameter dual-threshold endpoint detection method proposed in this study is able to effectively detect ducklings vocal segments.

  • 3.

    In this study, the SE attention mechanism and Ghost module were introduced to improve the Res2Net deep learning algorithm with Accuracy, Recall, Specificity of 94.80, 94.92, 94.69, and parameter number of 18.91 M. Compared with the state-of-the-art algorithms, the present algorithm has the advantages of high accuracy and low parameter number.

  • 4.

    The average confidence strategy used in this study achieved an accuracy of 96.67% in determining the sex of individual ducklings, a result that is very close to the level of manual identification.

ACKNOWLEDGMENTS

This work was supported by the National Key Research and Development Program of China (grant number 2021YD2000802), Key Research and Development Project of Guangdong (grant number 2023B0202140001), and Guangdong Provincial Science and Technology Program Project: Demonstration and Promotion of Multi-dimensional Life Information Sensing Equipment for Livestock and Poultry Breeding (grant number 2022B0202160010).

DISCLOSURES

The authors certify that there is no conflict of interest with any financial organization regarding the material discussed in the manuscript.

REFERENCES

  1. Alin K., Fujitani S., Kashimori A., Suzuki T., Ogawa Y., Kondo N. Non-invasive broiler chick embryo sexing based on opacity value of incubated eggs. Comput. Electron. Agric. 2019;158:30–35. [Google Scholar]
  2. Aydin A., Berckmans D. Using sound technology to automatically detect the short-term feeding behaviours of broiler chickens. Comput. Electron. Agric. 2016;121:25–31. [Google Scholar]
  3. Carpentier L., Berckmans D., Youssef A., Berckmans D., van Waterschoot T., Johnston D., Ferguson N., Earley B., Fontana I., Tullo E., Guarino M., Vranken E., Norton T. Automatic cough detection for bovine respiratory disease in a calf house. Biosyst Engin. 2018;173:45–56. [Google Scholar]
  4. Cate C.T. Sex differences in the vocalizations and syrinx of the collared dove (Streptopelia decaocto) Auk. 1997;114:22–39. [Google Scholar]
  5. Cerit H., Avanus K. Sex identification in avian species using DNA typing methods. World's Poult. Sci. J. 2007;63:91–100. [Google Scholar]
  6. Cuan K., Li Z., Zhang T., Qu H. Gender determination of domestic chicks based on vocalization signals. Comput. Electron. Agric. 2022;199 [Google Scholar]
  7. Cuan K., Zhang T., Huang J., Fang C., Guan Y. Detection of avian influenza-infected chickens based on a chicken sound convolutional neural network. Comput. Electron. Agric. 2020;178 [Google Scholar]
  8. Cuan K., Zhang T., Li Z., Huang J., Ding Y., Fang C. Automatic Newcastle disease detection using sound technology and deep learning method. Comput. Electron. Agric. 2022;199 [Google Scholar]
  9. Du X., Teng G., Wang C., Carpentier L., Norton T. A tristimulus-formant model for automatic recognition of call types of laying hens. Comput. Electron. Agric. 2021;187 [Google Scholar]
  10. Eda-Fujiwara H., Yamamoto A., Sugita H., Takahashi Y., Kojima Y., Sakashita R., Ogawa H., Miyamoto T., Kimura T. Sexual dimorphism of acoustic signals in the oriental white stork: non-invasive identification of sex in birds. JZOO. 2004;21:817–821. doi: 10.2108/zsj.21.817. [DOI] [PubMed] [Google Scholar]
  11. England A.D., Kheravii S.K., Musigwa S., Kumar A., Daneshmand A., Sharma N.K., Gharib-Naseri K., Wu S.B. Sexing chickens (Gallus gallus domesticus) with high-resolution melting analysis using feather crude DNA. Poult. Sci. 2021;100 doi: 10.1016/j.psj.2020.12.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Esch T., Vary P. Pages 4409–4412 in: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. Presented at the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. 2009. Efficient musical noise suppression for speech enhancement system. [Google Scholar]
  13. Fontana I., Tullo E., Butterworth A., Guarino M. An innovative approach to predict the growth in intensive poultry farming. Comput. Electron. Agric. 2015;119:178–183. [Google Scholar]
  14. Fontana I., Tullo E., Carpentier L., Berckmans D., Butterworth A., Vranken E., Norton T., Berckmans D., Guarino M. Sound analysis to model weight of broiler chickens. Poult. Sci. 2017;96:3938–3943. doi: 10.3382/ps/pex215. [DOI] [PubMed] [Google Scholar]
  15. Gao S.-H., Cheng M.-M., Zhao K., Zhang X.-Y., Yang M.-H., Torr P. Res2Net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2021;43:652–662. doi: 10.1109/TPAMI.2019.2938758. [DOI] [PubMed] [Google Scholar]
  16. Han K., Wang Y., Tian Q., Guo J., Xu C., Xu C. Presented at the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. GhostNet: More Features From Cheap Operations; pp. 1580–1589. [Google Scholar]
  17. Hu J., L. Shen, and G. Sun. 2018. Squeeze-and-Excitation Networks. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.7132–7141. 10.1109/CVPR.2018.00745, Accessed April 2024. [DOI]
  18. Huang J., Rao L., Zhang W., Chen X., Li H., Zhang F., Xie J., Wei Q. Effect of crossbreeding and sex on slaughter performance and meat quality in Xingguo gray goose based on multiomics data analysis. Poult. Sci. 2023;102 doi: 10.1016/j.psj.2023.102753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Huang J., Wang W., Zhang T. Method for detecting avian influenza disease of chickens based on sound analysis. Biosyst. Eng. 2019;180:16–24. [Google Scholar]
  20. Kaleta E.F., Redmann T. Approaches to determine the sex prior to and after incubation of chicken eggs and of day-old chicks. World's Poult. Sci. J. 2008;64:391–399. [Google Scholar]
  21. Karam M., Khazaal H.F., Aglan H., Cole C. Noise removal in speech processing using spectral subtraction. J. Sign. Informat. Proc. 2014;5:32–41. [Google Scholar]
  22. Krunt O., Kraus A., Zita L., Machova K., Chmelikova E., Petrasek S., Novak P. The effect of housing system and gender on relative brain weight, body temperature, hematological traits, and bone quality in Muscovy Ducks. Animals. 2022;12:370. doi: 10.3390/ani12030370. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Li Z., Zhang T., Cuan K., Fang C., Zhao H., Guan C., Yang Q., Qu H. Sex detection of chicks based on audio technology and deep learning methods. Animals. (Basel) 2022;12:3106. doi: 10.3390/ani12223106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Liao Z., Hu S., Hou R., Liu M., Xu P., Zhang Z., Chen P., Liao Z., Hu S., Hou R., Liu M., Xu P., Zhang Z., Chen P. Automatic recognition of giant panda vocalizations using wide spectrum features and deep neural network. MBE. 2023;20:15456–15475. doi: 10.3934/mbe.2023690. [DOI] [PubMed] [Google Scholar]
  25. Lin M.J., Chang S.C., Lin L.J., Peng S.Y., Lee T.T. Effect of the age and sex on growth performance and feather quality of 13 to 25-weeks-old White Roman geese. Poult. Sci. 2023;102 doi: 10.1016/j.psj.2023.102941. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Malagó-Jr W., Medaglia A., Matheucci-Jr E., Henrique-Silva F. New PCR multiplexes for sex typing of ostriches. Braz. J. Biol. 2005;65:743–745. doi: 10.1590/s1519-69842005000400023. [DOI] [PubMed] [Google Scholar]
  27. Morinha F., Cabral J.A., Bastos E. Molecular sexing of birds: a comparative review of polymerase chain reaction (PCR)-based methods. Theriogenology. 2012;78:703–714. doi: 10.1016/j.theriogenology.2012.04.015. [DOI] [PubMed] [Google Scholar]
  28. Otsuka M., Miyashita O., Shibata M., Sato F., Naito M. A novel method for sexing day-old chicks using endoscope system. Poult. Sci. 2016;95:2685–2689. doi: 10.3382/ps/pew211. [DOI] [PubMed] [Google Scholar]
  29. Park, D. S., Chan, W., Zhang, Y., Chiu, C.-C., Zoph, B., Cubuk, E. D., Le, Q. V., 2019. SpecAugment: a simple data augmentation method for automatic speech recognition, Pages 2613–2617 in: Interspeech 2019.
  30. Pereira E.M., Nääs I.D.A., Garcia R.G. Vocalization of broilers can be used to identify their sex and genetic strain. Eng. Agríc. 2015;35:192–196. [Google Scholar]
  31. Sadeghi M., Banakar A. Gender determination of fowls by using bio acoustical data mining methods and support vector machine. J Agric Sci Technol. 2017;19:1041–1055. [Google Scholar]
  32. Trocino A., Piccirillo A., Birolo M., Radaelli G., Bertotto D., Filiou E., Petracci M., Xiccato G. Effect of genotype, gender and feed restriction on growth, meat quality and the occurrence of white striping and wooden breast in broiler chickens. Poult. Sci. 2015;94:2996–3004. doi: 10.3382/ps/pev296. [DOI] [PubMed] [Google Scholar]
  33. Vaseghi S.V. In: Advanced Signal Processing and Digital Noise Reduction. Vaseghi S.V., editor. Vieweg+Teubner Verlag; 1996. Spectral Subtraction; pp. 242–260. [Google Scholar]
  34. Venuto V., Ferraiuolo V., Bottoni L., Massa R. Distress call in six species of African Poicephalus parrots. Ethol. Ecol. Evol. 2001;13:49–68. [Google Scholar]
  35. Volodin I.A., Volodina E.V., Klenova A.V., Matrosova V.A. Gender identification using acoustic analysis in birds without external sexual dimorphism. Avian Res. 2015;6:20. [Google Scholar]
  36. Yektaeian M., Amirfattahi R. Pages 1–4 in: 2007 6th International Conference on Information, Communications & Signal Processing. Presented at the 2007 6th International Conference on Information, Communications & Signal Processing. 2007. Comparison of spectral subtraction methods used in noise suppression algorithms. [Google Scholar]

Articles from Poultry Science are provided here courtesy of Elsevier

RESOURCES