Edge-AI Enabled Acoustic Monitoring and Spatial Localisation for Sow Oestrus Detection

Hao Liu; Haopu Li; Yue Cao; Riliang Cao; Guangying Hu; Zhenyu Liu

doi:10.3390/ani16050804

. 2026 Mar 4;16(5):804. doi: 10.3390/ani16050804

Edge-AI Enabled Acoustic Monitoring and Spatial Localisation for Sow Oestrus Detection

Hao Liu ¹, Haopu Li ², Yue Cao ³, Riliang Cao ⁴, Guangying Hu ⁴, Zhenyu Liu ^2,^5,^*

Editor: Danilo Florentino Pereira

PMCID: PMC12984331 PMID: 41829012

Simple Summary

Timely detection of sow oestrus is essential for enhancing reproductive efficiency and reducing non-productive days in large-scale pig farms. Traditional methods rely heavily on manual observation, which is labour-intensive and subjective. This study developed an intelligent edge monitoring system that uses non-contact acoustic sensing to capture sow vocalisations and artificial intelligence algorithms to automatically identify oestrus status and locate specific animals. The system was strictly validated against reproductive hormone levels to ensure scientific accuracy. By integrating “sound-location” technology, the system enables round-the-clock monitoring and individualised management without interfering with the sows’ natural behaviour. This approach enhances animal welfare while providing a low-cost, real-time, and efficient solution for precision management in modern smart livestock farming.

Keywords: sow oestrus detection, TinyML, Edge computing, LSTM, sound source localisation, precision livestock farming

Abstract

Timely and accurate detection of sow oestrus is crucial for enhancing reproductive efficiency and reducing non-productive days (NPDs) in large-scale pig farms. However, traditional manual observation is labour-intensive and subjective, while cloud-based deep learning solutions face challenges such as high latency and privacy risks when applied in intensive housing environments. This study developed an edge-intelligent monitoring system that integrates deep temporal modelling with sound source localisation technology. A three-stage hierarchical screening strategy was utilised to select and deploy a lightweight Stacked-LSTM model on the resource-constrained ESP32-S3 hardware platform. This model was trained and calibrated using a high-quality acoustic dataset validated against serum reproductive hormones, specifically follicle-stimulating hormone (FSH), luteinising hormone (LH), and progesterone ( $P_{4}$ ). Experimental results demonstrate that the optimised model achieved a classification accuracy of 96.17%, with an inference latency of only 41 ms, thereby fully satisfying the stringent real-time monitoring requirements while maintaining a minimal memory footprint. Furthermore, the system integrates a localisation algorithm based on Generalised Cross-Correlation with Phase Transform (GCC-PHAT). Through spatial geometric modelling, the system successfully implements the functional mapping of vocalisation events to individual gestation stalls (Stall IDs). Laboratory pressure tests validated the robustness and low-cost deployment advantages of the “edge recognition–cloud synchronization” architecture, providing a reliable technical framework for the precision management of smart livestock farming.

1. Introduction

The global swine industry serves as a vital cornerstone of the meat supply chain. Its rapid transition toward large-scale and intensive production has imposed increasingly stringent requirements on production efficiency. Consequently, Precision Livestock Farming (PLF) has emerged as an essential strategy for enhancing productivity, ensuring animal welfare, and achieving sustainable development in modern agriculture [1]. In large-scale swine production, the reproductive performance of sows—with pigs per sow per year (PSY) as the primary metric—serves as a critical determinant of a farm’s economic efficiency, directly dictating its profitability and capacity for sustainable development [2]. The accurate and timely detection of sow oestrus status, coupled with the precise identification of the optimal window for artificial insemination (AI)—typically 0–24 h prior to ovulation—is of paramount importance. Missing the ideal insemination timing not only significantly diminishes conception rates and litter sizes but also leads to an accumulation of costly non-productive days (NPDs), thereby imposing a substantial economic burden on farm operating costs [3,4].

Conventionally, oestrus detection has relied predominantly on manual empirical observation and boar exposure techniques, such as the back-pressure test (BPT) [5]. These conventional approaches possess significant limitations. First, manual observation is heavily reliant on the professional experience of the personnel, resulting in subjective outcomes that lack consistency. Second, given the relatively brief duration of sow oestrus (averaging 40–60 h), conducting round-the-clock, high-frequency inspections is both labour-intensive and prohibitively expensive. Finally, some sows may exhibit “silent oestrus” (also referred to as recessive oestrus) or complex reproductive cycle disorders, which lack discernible external physiological or behavioural manifestations, leading to low detection rates with traditional methods [6,7]. To overcome the limitations of traditional methods, various automated monitoring technologies have been introduced within the framework of Precision Livestock Farming (PLF). Computer vision-based systems predict oestrus by analysing temporal changes in sow posture and behavioural patterns—such as increased locomotion, ear pricking, and boar-seeking frequency. However, these methods are often susceptible to fluctuating light conditions, physical occlusions (e.g., stalls or crates), and significant individual behavioural variations [6,8]. Infrared thermography (IRT) has also been demonstrated to reflect oestrogen level fluctuations by monitoring vulvar skin temperature. Although proven effective in certain studies, its diagnostic accuracy is frequently compromised by fluctuations in ambient temperature and surface fouling, such as the presence of manure [9,10].

In recent years, non-contact acoustic monitoring has emerged as a promising non-invasive approach characterised by high temporal continuity. It has demonstrated significant potential in the automated monitoring of animal behaviour and health status [11,12]. Compared with conventional identification methods, this technology offers several advantages, including operational simplicity, cost-effectiveness, and the capability for round-the-clock real-time monitoring. Furthermore, as a non-invasive data acquisition method, it is particularly suitable for continuous and large-scale livestock monitoring scenarios. By integrating non-contact acoustic sensing, the system ensures continuous monitoring with minimal interference to the animals’ natural behaviour, addressing key concerns in modern animal welfare. Acoustic feature extraction represents the cornerstone of classification tasks. Among various methods, Mel-frequency cepstral coefficients (MFCC) have emerged as one of the most widely utilised features in both speech and animal vocalisation recognition, owing to their ability to mimic the perceptual characteristics of the human auditory system [13]. Cordeiro et al. (2018) [14] utilised acoustic features, such as pitch and the second formant (F2), in conjunction with a Decision Tree algorithm to classify the sex, age, and stress status of pigs, achieving a classification accuracy of 81.92%. Regarding early warning systems for swine diseases, initial research efforts primarily concentrated on the identification of cough sounds. For instance, Exadaktylos et al. (2008) employed Power Spectral Density (PSD) features combined with a Fuzzy C-means clustering approach, achieving an accuracy rate of 82% in cough recognition [15]. Chung et al. (2013) utilised MFCC for acoustic feature extraction and integrated Support Vector Data Description (SVDD) with Sparse Representation-based Classification (SRC) algorithms to construct a hierarchical detection system [16]. This framework demonstrated superior performance in disease identification tasks, achieving an accuracy of 91%. Nevertheless, these conventional machine learning approaches often exhibit limited robustness and generalization capabilities when confronted with the complex low signal-to-noise ratios (SNR) and multi-source interference inherent in real-world pig farm environments. The profound integration of deep learning architectures, such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNNs), within the acoustic domain has substantially bolstered both the accuracy and efficiency of complex classification tasks. By transforming acoustic signals into two-dimensional time-frequency representations, such as spectrograms, CNN can effectively capture local time-frequency features in a manner analogous to image processing. For instance, Yin et al. (2021) utilised transfer learning with a fine-tuned AlexNet model and spectrogram features as input, achieving a remarkable pig cough recognition accuracy of 96.8% [17]. Shen et al. (2021) proposed a fused MFCC-CNN feature approach, wherein multi-frame MFCC matrices were fed into a single-layer CNN, further enhancing the recognition accuracy for pig cough sounds to 97.7% [18]. Liao et al. (2022) introduced the TransformerCNN model, which integrates the local feature extraction capabilities of CNN with the global sequence encoding strengths of Transformers. This hybrid architecture achieved a classification accuracy of 96.05% for pig emotional vocalisations, encompassing states such as calm, feeding, panic, and anxiety [19]. Acoustic monitoring technology for sow oestrus identification holds immense potential. By analysing vocalisation patterns, real-time monitoring and the precise determination of oestrus status can be achieved. Wang et al. (2022) proposed a CNN model based on Log-Mel spectrograms and an improved MobileNetV3_esnet, specifically designed to provide early warnings for silent oestrus vocalisations. This approach achieved a classification accuracy of 97.52% with a model size of 5.94 MB [20]. In another approach, Chen et al. (2022) achieved an accuracy of 96% in oestrus vocalisation recognition by fusing Chirplet and MFCC features and utilizing CNNs for classification [21]. Cao et al. (2025) further refined the granularity of monitoring objectives by subdividing the oestrous cycle into four distinct stages: pro-oestrus, oestrus, post-oestrus, and non-oestrus. By employing the UM-ASPP-MobileViT model for stage-wise identification, they achieved a classification accuracy of 96.52% [22]. Furthermore, the latest multimodal studies—such as APO-CVIT and Conformer networks integrated with deep normalization and multispectral attention enhancement—have further explored the potential of multi-dimensional feature fusion in addressing the challenges of complex piggery environments [23,24]. However, these state-of-the-art deep learning models typically impose substantial computational demands, making it challenging to achieve low-latency, real-time inference on resource-constrained edge devices [25].

To migrate Artificial Intelligence (AI) from cloud-based infrastructures to on-site farming environments for real-time and autonomous decision-making, the paradigms of Tiny Machine Learning (TinyML) and Edge AI have emerged [26,27]. TinyML aims to optimize model footprints, lower power consumption, and minimize inference latency, thereby enabling high-efficiency execution on low-cost, resource-constrained embedded devices such as Microcontroller Units (MCUs) [28,29]. Such a deployment paradigm is integral to the realization of round-the-clock, stall-level sow oestrus monitoring systems. Moreover, in intensive housing systems utilizing gestation stalls, accurately localizing the specific stall of a vocalisation source is paramount for individualised management; however, this remains a significant bottleneck in conventional acoustic monitoring.

Therefore, the aim of this study is to develop a high-performance acoustic monitoring system for sow oestrus based on TinyML and Edge AI paradigms. The proposed system integrates deep temporal modelling with efficient edge deployment and spatial localisation technology to address the challenges of individualised management in intensive pig production. The primary contributions of this work are twofold: (1) Technically, we propose a lightweight Stacked-LSTM architecture and a GCC-PHAT-based spatial localisation framework on a low-power ESP32-S3 platform to ensure real-time recognition and individual identification. (2) Biologically, this study establishes an objective validation framework by cross-referencing acoustic data with clinical behavioural signs and key reproductive endocrine profiles. By subdividing the oestrous cycle into distinct physiological stages, we provide a more granular and scientifically robust approach to oestrus detection than traditional subjective observations. By bridging engineering innovation with animal science validation, this study provides a non-invasive and automated solution for precision livestock farming.

2. Materials and Methods

2.1. Data Collection

2.1.1. Experimental Design

The study was conducted at a commercial pig farm in Deshenggou Village, Guxian Town, Qinxian County, Changzhi City, Shanxi Province, China (Longitude: 112.620067° E, Latitude: 36.5967939° N). Figure 1a provides a schematic representation of the logical layout of the experimental stalls within the monitoring area. To mitigate the potential impact of individual variability on the experimental results, purebred Yorkshire gilts (age: 6–8 months; weight: 110–130 kg) were selected to ensure high consistency in physiological maturity and oestrus behavioural responses. None of the animals had farrowed previously, and all were undergoing their natural oestrous cycles without any hormone synchronization. To maintain the integrity of the natural acoustic and hormonal data throughout the experimental period, the gilts were not inseminated. This allowed for the observation of undisturbed oestrous cycles and avoided any physiological interference that might arise from pregnancy or insemination-related stress. The experiment was carried out from 1 May to 30 May 2025, covering a complete natural oestrous cycle. The lighting conditions utilized a combination of natural daylight and artificial illumination, providing a consistent effective light period of approximately 14 h per day (roughly from 06:00 to 20:00) to maintain normal reproductive physiological rhythms. To systematically record changes in physiological indicators and acoustic characteristics throughout this period, continuous data acquisition was performed daily between 08:00 and 22:00 to capture the key dynamics of all oestrus stages. The recording window was strategically offset from the lighting period (06:00–20:00) for two primary reasons. First, the 06:00–08:00 interval was excluded to avoid high-pressure cleaning and manual management, which generate significant broadband impulsive noise and would severely compromise the Signal-to-Noise Ratio (SNR) of the acoustic dataset. Second, the recording was extended to 22:00 to monitor transitional vocalization patterns as sows transitioned from the active light period to the nocturnal rest phase (22:00–08:00). This specific time window ensured optimal visibility for synchronised video recording during the majority of the collection period, which is essential for accurate behavioural ground-truth annotation. Throughout the experimental period, the piggery was maintained under standardised management protocols. The gilts were fed twice daily (at 08:30 and 16:30) with a daily allowance of 2.5 kg of a standard commercial gilt diet (approximately 14% crude protein and 3.1 Mcal/kg digestible energy), and water was provided freely available via nipple drinkers. The ambient temperature was strictly maintained between 18 °C and 22 °C, with relative humidity kept at 50–70%. Regular ventilation and cleaning were performed to ensure the gilts remained in a natural physiological state and to minimise environmental stress. All animal procedures were performed in accordance with the ethical standards of the institution and minimized animal stress through non-contact acoustic sensing and standardized management. The resulting monitored data were archived in real-time and subjected to periodic quality audits to confirm data validity.

Schematic of the experimental environment and acquisition setup. (a) Logical layout of the experimental stalls. (b) Standardized placement of the acquisition equipment relative to the subject.

2.1.2. Audio-Visual Acquisition

To construct a high-quality sow oestrus acoustic database, a synchronised acquisition system integrating acoustic signals and behavioural imagery was developed. The standardized positioning of this system relative to the subject is illustrated in Figure 1b. For audio acquisition, a Takstar SGC-578 (Guangdong Takstar Electronic Co., Ltd., Huizhou, China) high-sensitivity condenser microphone was employed as the primary pickup device and was suspended vertically 0.4 m above the sow’s dorsal region; this height was strategically defined to optimise the signal-to-noise ratio (SNR) by effectively capturing target vocalisations while suppressing ambient background noise. To ensure sensor consistency and eliminate potential inter-device bias across the entire dataset, this single set of high-precision acquisition equipment was sequentially deployed above each subject’s pen during the collection period. To compensate for the limitations of stationary recording in capturing low-amplitude signals, supplemental close-range recordings were performed using handheld devices to enhance dataset completeness. All analogue audio signals were digitised via a host-integrated Realtek ALC257 (Realtek Semiconductor Corp., Hsinchu, Taiwan, China) high-fidelity sound card, with the sampling rate uniformly set to 44.1 kHz and a sampling depth of 16-bit. To preserve the raw integrity of acoustic features, all built-in digital signal processing (DSP) plugins, such as noise reduction and automatic gain control (AGC), were disabled, and the data were stored in lossless WAV format to retain the maximum frequency-domain details and time-domain characteristics of the sow vocalisations. To provide an objective basis for subsequent data labelling, a rigorous audio-video synchronisation protocol was implemented, wherein high-definition cameras were deployed on adjustable telescopic brackets above the gestation stalls to monitor the vocalisation and behavioural dynamics from a top-down perspective. Video streams were recorded at a frame rate of 30 frames per second (fps) and encoded in MP4 format. During the data cleaning phase, these synchronised video records served as a clear behavioural reference, assisting researchers in eliminating interfering audio signals unrelated to oestrus and ensuring that the acoustic samples ultimately fed into the model corresponded precisely to specific oestrus behavioural features.

2.1.3. Physiological Data Collection

Due to the hormonal fluctuations in sows around the oestrus period, this study employed timed hormone sampling to establish a physiological baseline. Blood samples (5–8 mL) were collected via ear vein puncture to minimize animal stress. To ensure synchronization with the acoustic monitoring window (08:00–22:00), sampling was performed three times daily at 08:00, 15:00, and 20:00 throughout the experimental period. Following collection, samples were rapidly refrigerated and transported to the laboratory within one hour. Serum concentrations of Follicle-Stimulating Hormone (FSH), Luteinising Hormone (LH), and Progesterone ( $P_{4}$ ) were quantified using Radioimmunoassay (RIA) via the standard curve methodology [30].

2.2. Data Preprocessing

2.2.1. Data Labelling and Dataset Construction

To ensure the scientific integrity and objectivity of the acoustic data labels, this study moved away from a solitary behavioural assessment criterion and instead established a dual-validation mechanism integrating “endocrine physiological indicators and clinical behavioural verification”, which served as the foundation for the dataset construction process (as illustrated in Figure 2). All animal procedures were performed in accordance with the ethical standards of the institution. At the physiological level, the concentrations of key reproductive hormones (FSH, LH, and $P_{4}$ ) were quantified via RIA, following the procedures detailed in Section 2.1.3. Based on endocrine principles, the classification criteria for the various oestrus stages were defined as follows: Pre-oestrus was identified by a significant increase in FSH concentration as a biochemical marker to stimulate follicle development; Mid-oestrus was determined based on the sharp LH surge, which serves as the critical physiological signal triggering ovulation; and Late-oestrus was characterised by the gradual recovery of $P_{4}$ concentration, marking the conclusion of ovulation and the maintenance of luteal function [31].

Flowchart of sow oestrus acoustic dataset construction and quality validation.

To ensure the accuracy and purity of the dataset, a joint validation strategy of “behavioural observation and physiological detection” was adopted for labelling. By manually reviewing the synchronised audio-visual data, time windows of typical oestrus behaviours, such as the standing reflex, were identified via video footage to precisely extract target acoustic segments while effectively avoiding environmental noise interference. Subsequently, the timestamps of these clinical behaviours were cross-referenced with the hormonal profiles obtained via RIA. Only when the behavioural characteristics were fully consistent with the endocrine physiological indicators was the audio from that period confirmed as a valid sample and assigned the corresponding oestrus stage label; data points with mismatches between indicators were discarded to ensure the biological purity of the dataset. To adapt to the computational constraints of the ESP32-S3 edge terminal and optimise model inference overhead, all screened valid audio clips were down-sampled to 16 kHz/16-bit specifications using Audacity software (version 3.7.3, Audacity Team, Pittsburgh, PA, USA) and cropped into standard 2.0 s time windows. Following this, amplitude normalisation was applied to eliminate energy fluctuations caused by varying acoustic distances arising from the sows’ dynamic postures and the supplemental handheld recordings. To enhance data diversity, the pre-processed audio underwent four-fold data augmentation [24], resulting in a dataset of 7846 audio samples. This dataset was then randomly partitioned into a training set (80%, 6279 samples) and an independent test set (20%, 1567 samples) to evaluate model performance. The distribution of these samples across different categories is presented in Table 1. This dataset covers five categories, including the three stages of oestrus, normal vocalisations, and ambient noise, establishing a high-quality data foundation for the subsequent extraction of robust acoustic features by the Stacked-LSTM model.

Table 1.

Distribution of the augmented sow oestrus acoustic dataset (Unit: samples).

Category	Raw Samples	Training Set	Test Set	Total
Pre_oestrus	406	1306	318	1624
Mid_oestrus	386	1244	298	1542
Late_oestrus	415	1083	290	1373
Normal	401	1284	319	1603
Noise	426	1362	342	1704
Total	2034	6279	1567	7846

Open in a new tab

Following the completion of audio sample segmentation and labelling, this study generated spectrograms of the acoustic signals from each experimental stage to intuitively characterise the energy distribution features and dynamic evolution patterns in the time-frequency domain (as illustrated in Figure 3). Through comparative analysis, vocalisations at different physiological stages demonstrated significant acoustic disparities: during the Pre-oestrus stage, acoustic energy was primarily concentrated in the low-frequency range, with a relatively low PSD and characteristics dominated by low-frequency grunts, indicating that vocal activity had not yet reached its peak. The acoustic frequency characteristics changed drastically during the Mid-oestrus stage, where the spectral centroid shifted significantly upwards, and the spectrograms displayed high-intensity broadband power density peaks within the 2000–6000 Hz frequency band. This shift toward higher-frequency energy appears consistent with the increased physiological arousal typical of sows during mid-oestrus. According to bioacoustics principles [11], states of high physiological arousal are often associated with changes in laryngeal tension and respiratory intensity, which may result in vocalisations with enhanced high-frequency components. In this context, the pre-ovulatory surge in LH could potentially serve as a physiological driver for these changes, providing a plausible biological context for the observed high-intensity acoustic profile and restless behaviour during the peak period. As ovulation concluded and hormone levels receded during the Late-oestrus stage, high-frequency energy attenuated rapidly, and the spectrograms reverted to low-frequency, low-energy envelope characteristics. In contrast, while Normal vocalisations possessed a certain harmonic structure, they were characterised by extremely short durations and narrow energy distribution bands; meanwhile, the frequency distribution of Environmental Noise was extremely discrete and disordered, with power density uniformly distributed across the entire spectrum and lacking periodic harmonic structures. These significant time-frequency domain differences establish a solid physical foundation for the subsequent extraction of highly discriminative acoustic features by the Stacked-LSTM model.

Spectrograms of oestrus stages and background noise.

2.2.2. Optimization of MFCC Feature Extraction for Edge Deployment

To achieve low-latency and high-precision oestrus vocalisation recognition on resource-constrained Microcontroller Unit (MCU) terminals, this study selected MFCC as the core acoustic feature. MFCC integrate the non-linear perceptual characteristics of the human auditory system with the de-correlation capabilities of the Discrete Cosine Transform (DCT), effectively representing the formant structure of sow oestrus vocalisations using extremely low-dimensional data; this significantly reduces the computational load and storage requirements for the subsequent deep learning model [32]. In this research, the embedded DSP module of the Edge Impulse platform was utilised to perform feature extraction. This module converts the raw audio stream into a feature matrix suitable for neural network input through pipelined processing. The specific processing workflow is illustrated in Figure 4, and the key parameter configurations and mathematical principles are as follows:

DSP pipeline for MFCC feature extraction based on Edge Impulse.

Firstly, signal preprocessing and framing were performed. To compensate for high-frequency energy decay, a first-order high-pass filter was applied to the raw signal using a pre-emphasis coefficient of 0.98. Subsequently, based on the $16 kHz$ sampling rate, the frame length was set to $25 ms$ with a stride of $20 ms$ . A Hamming window was applied to suppress spectral leakage and ensure the short-term stationarity of the acoustic signals. Next, a 512-point Fast Fourier Transform (FFT) was executed on each windowed frame to transform the time-domain signals into the frequency domain, followed by the calculation of the power spectrum $P (k)$ .To simulate the non-linear frequency perception of biological auditory systems, a Mel filter bank consisting of 32 triangular filters was utilised to filter the power spectrum. The lower cut-off frequency was set to $80 Hz$ to physically filter out extremely low-frequency interference inherent in farm environments. The mapping relationship between the frequency $f$ and the Mel scale $Mel (f)$ satisfies Equation (1):

Mel (f) = 2595 \cdot \log_{10} (1 + \frac{f}{700})

(1)

This study adopted a triangular filter bank to weight the spectral energy, where filters are equally spaced on the Mel scale with bandwidths proportional to their centre frequencies (see Equation (S1) in Supplementary Materials for mathematical details). To account for variations in recording distances and equipment gains, a local normalization process was applied to the MFCC features. This process utilised a sliding window of 151 frames (with a half-window length $L = 75$ ) to mitigate convolutional channel effects. The mathematical details of this normalisation are provided in Supplementary Materials (Equation (S2)). Following this DSP pipeline, each 2 s audio segment was transformed into a standardized $99 \times 13$ feature matrix for model input.

2.3. Edge Lightweight Recognition Model Design

To investigate the most suitable recognition algorithm for sow oestrus acoustic characteristics, this study first compared traditional machine learning (e.g., Random Forest, RF; Support Vector Machine, SVM) with deep learning algorithms through offline baseline testing (refer to Section 3.1.1). After establishing the significant advantages of deep learning in processing complex time-frequency features, and to ensure the recognition model could adapt to the rigorous hardware resource constraints of the ESP32-S3 microcontroller (SRAM < 512 KB), this study adopted the TinyML and Edge AI deployment paradigms [26,27]. Based on a comparative investigation of various TinyML platforms and the proven performance of the Edge Impulse platform in adaptive prediction and embedded deployment [28], three lightweight candidate network architectures based on different feature extraction mechanisms were specifically designed: Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and Gated Recurrent Unit (GRU). All candidate models were subjected to low-level operator reconstruction via the Edge Impulse EON™ compiler and maintained at a similar parameter magnitude to ensure the fairness of comparative experiments and the scientific validity of the model evaluation.

2.3.1. Stacked Long Short-Term Memory Architecture Details

Given that sow oestrus vocalisations exhibit significant time-varying characteristics, this study constructed a customised Stacked-LSTM model designed to capture long-term temporal dependencies within the MFCC features [29]. This model emerged as the top-performing architecture in subsequent experiments, and its specific configuration is illustrated in Figure 5.

Schematic diagram of the proposed lightweight Stacked-LSTM model.

The model input is a flattened vector of size $1 \times 1287$ , which is first reconstructed into a $99 \times 13$ temporal matrix via a Reshape layer, where 99 denotes the number of time steps and 13 represents the MFCC feature dimensionality of each frame. Prior to the recurrent layers, a Batch Normalisation (BN) layer is introduced to standardise the input features, effectively mitigating the vanishing gradient problem and accelerating the convergence rate during the training process. A dual-layer Stacked-LSTM architecture is then employed to extract time-frequency evolution features: the first layer comprises 8 units with full sequence output enabled, while the second layer contains 16 units for high-level feature abstraction. To enhance generalisation capability and prevent overfitting, a Dropout layer with a rate of 0.1 is integrated after each LSTM layer. In this study, a Global Average Pooling (GAP 1D) layer is utilised to replace the conventional Flatten layer, directly compressing the dimensionality from $99 \times 16$ to $1 \times 16$ . This design maintains translation invariance while substantially reducing the parameter scale of the subsequent fully connected layer. Finally, a Softmax activation function is applied to output the probability distribution across five categories: Pre-oestrus, Mid-oestrus, Late-oestrus, Normal, and Noise.

2.3.2. Benchmark Models

To validate the superiority of the proposed Stacked-LSTM, two benchmark models with comparable parameter scales were constructed: a 1D-CNN and a GRU. The 1D-CNN model utilises a dual-layer one-dimensional convolutional structure with a kernel size of 3, specifically designed to capture local frequency-domain textures within the spectrograms rather than long-term temporal dependencies. Its architectural sequence consists of the following layers: Conv1D(8) $\to$ MaxPooling $\to$ Conv1D(16) $\to$ MaxPooling $\to$ Flatten $\to$ Dense. Conversely, the GRU model serves as a streamlined variant of the LSTM, incorporating only update and reset gates to reduce computational complexity. A dual-layer network with a GRU(8) $\to$ GRU(16) configuration was implemented to evaluate the model’s accuracy retention and engineering feasibility under a further reduced gating count.

2.4. Edge Intelligent Monitoring System

To achieve edge-side real-time monitoring and precise early warning of sow oestrus behaviour, this study developed an embedded prototype system integrating acoustic sensing, edge recognition, spatial localisation, and remote communication. Through the collaborative operation of four core functional units, a complete closed-loop from low-level sensing to cloud-based decision-making was established (as illustrated in Figure 6a).

Integrated system implementation. (a) System hardware block diagram. (b) ESP32-S3-based prototype.

2.4.1. System Hardware Architecture

The physical prototype developed in this study (illustrated in Figure 6b) precisely corresponds to the four core functional units defined in the system’s logical architecture. Firstly, the processing unit is centred around an ESP32-S3-DevKitC-1 development board, (Espressif Systems Co., Ltd., Shanghai, China) which features a dual-core processor with a clock frequency of up to $240 MHz$ and integrated AI vector acceleration instructions. To address the significant memory requirements of the Stacked-LSTM model, the system utilises the N32R16V module, providing sufficient heap memory via a high-speed OPI bus to support simultaneous dual-channel 48 kHz audio acquisition and real-time processing of the 1287-dimensional feature tensors. Regarding the sensing unit, two INMP441 digital microphones (InvenSense, San Jose, CA, USA) are arranged in a linear array with a spacing of 0.20 m, capturing audio via the I2S interface. A dual-rate sampling strategy is implemented: raw 48 kHz data are utilised for sound source localisation based on the GCC-PHAT algorithm, which is then down-sampled to 16 kHz via three-fold decimation for feature extraction by the Stacked-LSTM model. Simultaneously, a DHT11 sensor (Guangzhou Aosong Electronics Co., Ltd., Guangzhou, China) is integrated for real-time monitoring of the piggery temperature and humidity; this provides environmental context for assessing the sow’s physiological state and serves as a basis for speed-of-sound compensation ( $c \approx 331.3 + 0.606 T$ ) to eliminate localisation biases induced by environmental variations. Finally, the communication unit employs the built-in WiFi module for JSON data transmission via the MQTT protocol, complemented by an OLED screen acting as the feedback unit for on-site early warning displays, thereby achieving a comprehensive closed-loop from perception to feedback at the physical implementation level.

2.4.2. Spatial Localisation Implementation Based on the GCC-PHAT Algorithm

To accurately determine the physical location of oestrous sows, the system synchronously executes a sound source localisation algorithm based on GCC-PHAT while performing audio classification tasks [33]. This approach couples acoustic features with spatial coordinate depth. The system leverages the Time Difference of Arrival (TDOA) captured by the dual INMP441 microphones to infer the sound source angle. Given a microphone spacing $d$ and a time delay $τ$ for the sound wave reaching the two microphones, the incident angle $θ$ relative to the array normal is defined by Equation (2):

θ = \arcsin (\frac{c \cdot τ}{d})

(2)

where $c$ represents the speed of sound. The system utilises the DHT11 to obtain the real-time ambient temperature for speed-of-sound compensation to enhance localisation precision.

To address the challenges of echoes and background reverberation in the semi-enclosed piggery environment, this study employed the Phase Transform (PHAT) weighting function [34]. This algorithm calculates the generalised cross-power spectrum to perform a ‘whitening’ treatment on the signal spectrum, which retains only the phase information while discarding amplitude data [35] (see Equation (S3) in Supplementary Materials for mathematical details). This operation results in an extremely sharp peak in the time-domain cross-correlation function, significantly improving the robustness of TDOA estimation in complex noise environments.

To map the calculated azimuth angle $θ$ to the actual physical pig pens, a logical mapping model based on spatial geometric relationships was constructed. Specifically, the horizontal displacement $x$ of the sound source from the array normal is determined by the relationship $x = h \cdot \tan (θ)$ . Given a standard pig pen width of $w = 0.65 m$ and a sensor array deployment height of $h = 3.0 m$ , the system utilises these geometric relationships to divide the continuous angle signal $θ$ into 7 discrete logical regions, designated as Stall ID 1–7.

2.5. Performance Evaluation Criteria

To evaluate the effectiveness of the proposed oestrus monitoring system on the ESP32-S3 terminal, performance was assessed from two perspectives: classification efficacy and edge-side computational efficiency.

For classification performance, four standard metrics were employed: Accuracy, Precision, Recall, and the F1-score. Accuracy provides an overall measure of correct predictions across all five acoustic categories. However, considering the inherent sample imbalance in natural farm environments—where oestrus vocalisations are significantly rarer than background noise—Precision and Recall were used to evaluate the model’s ability to minimize false alarms and its sensitivity to critical events, respectively. The F1-score, as the harmonic mean of these two metrics, served as a balanced indicator of recognition stability. The mathematical definitions for these metrics (Equations (S4)–(S7)) are provided in the Supplementary Materials.

In addition to classification accuracy, edge-side performance metrics were introduced to ensure the system’s practical feasibility. Inference Latency was measured as the end-to-end execution time for processing a single feature frame, which is critical for real-time responsiveness. Resource Utilisation was evaluated through static Flash usage and peak runtime RAM consumption. These metrics ensure that the model operates within the hardware constraints of the ESP32-S3, maintaining long-term system stability in intensive production environments.

3. Results and Discussion

3.1. Model Selection

To bridge the gap between high-precision recognition algorithms and rigorous edge hardware constraints, this study implemented a systematic three-stage hierarchical screening strategy (as illustrated in Figure 7). This structured approach facilitates a multi-dimensional evaluation of candidate models: ranging from initial offline performance verification to on-device feasibility testing and final architectural trade-offs. Through progressive filtering based on accuracy thresholds, hardware compatibility, and real-time efficiency, this strategy ensures that the final deployed model (LSTM-float32) achieves an optimal balance between biological monitoring reliability and embedded system stability.

Workflow of the hierarchical screening strategy for optimal model selection.

3.1.1. Comparative Analysis and Preliminary Model Screening

To investigate the non-linear characterisation capabilities of various architectures for sow oestrus acoustic features, this study conducted a comprehensive evaluation of five representative classifiers. The quantitative performance across four key dimensions—Accuracy, Precision, Recall, and F1-Score—is summarized in Table 2.

Table 2.

Performance comparison of different classification models.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1 Score (%)
CNN	93.39	93.65	93.39	93.37
LSTM	96.66	96.66	96.66	96.65
GRU	94.35	94.77	94.35	94.36
RF	71.82	74.42	71.82	71.93
SVM	71.34	73.45	71.34	71.08

Open in a new tab

As illustrated in Table 2, a distinct performance gradient exists between models of varying complexity. Experimental results reveal a significant performance gap between deep learning architectures and traditional machine learning models. The global accuracies for RF and SVM were only 71.82% and 71.34%, respectively, representing a clear recognition deficiency compared to deep learning frameworks.

In contrast, deep neural networks demonstrated superior classification precision across all five categories. Among them, the CNN (93.39%), GRU (94.35%), and LSTM (96.66%) achieved high performance. To further elucidate these differences, Figure 8 presents the confusion matrices and overall classification accuracies for these five algorithms on the validation set. Notably, the LSTM model exhibited exceptional robustness, particularly for the critical ‘Pre-oestrus’ stage where the recall reached 96.9%. This superior performance is likely associated with its internal gating mechanism, which enables the model to effectively retain and transmit long-range temporal dependencies. The high offline accuracy of the Stacked-LSTM (96.66%) is close to the findings of Cao et al. (2025) [22], who emphasized that subdividing the oestrous cycle into distinct stages improves detection granularity. By effectively capturing sequential evolutionary patterns and subtle acoustic shifts, this temporal modelling structure proves better suited for the non-stationary characteristics of physiological signals than traditional models.

Comparison of confusion matrices across five classification models.

To further analyse the precision and reliability of each model in determining different oestrus stages, this study introduced the Precision metric. This aimed to reveal the credibility of the predicted results across all target categories, as illustrated in Figure 9. Experimental data indicates that traditional machine learning models exhibit a significant imbalance in classification performance. Specifically, the Precision of the RF model in the “Normal” category was only $57.3 %$ , while the SVM achieved a precision of just $61.2 %$ for the “Pre-oestrus” stage. This performance decay suggests that shallow models, relying solely on linear or simple non-linear combinations of handcrafted features, struggle to achieve deep modelling of the subtle frequency-domain dynamic textures inherent in sow oestrus vocalisations. In sharp contrast, CNN, LSTM, and GRU demonstrated superior Precision across all categories. Among them, the LSTM model proved to be the most robust, with its Precision consistently maintained above $95 %$ for all categories. Notably, its Precision for the “Pre-oestrus” stage reached $96.0 %$ . This consistent performance across multiple metrics further corroborates the structural advantages of the LSTM architecture in processing physiological cycle signals, as previously justified.

Comparison of Precision for five classification models.

The t-distributed Stochastic Neighbour Embedding (t-SNE) algorithm was employed to visualise the feature manifolds, thereby elucidating the intrinsic mechanism by which deep networks outperform traditional algorithms. This technique was applied to perform dimensionality reduction and visualisation of the high-dimensional features extracted from the penultimate layer (the feature vector prior to the Softmax activation) of the CNN, LSTM, and GRU models, as illustrated in Figure 10. Experimental observations reveal that all three neural network architectures effectively transform the raw audio signals into highly discriminative feature representations. In the resulting visualisation, samples from different oestrus stages exhibit significant clustering effects, where intra-class samples are tightly aggregated, and distinct topological boundaries exist between inter-class samples. This superior intra-class cohesiveness and inter-class separability provide empirical evidence consistent with the high recognition accuracies (exceeding 93%) achieved by the deep learning models.

Visualization of feature distributions using t-SNE.

Comparative analysis reveals that the feature distribution generated by the LSTM model is the most compact and regular among the candidates. Particularly in distinguishing between the “Pre-oestrus” and “Mid-oestrus” stages, the LSTM demonstrates a significantly clearer inter-class gap compared to the CNN. This visualization provides empirical evidence consistent with the model’s superior temporal feature modelling capability, effectively mapping complex acoustic patterns into a discriminable feature subspace.

Comprehensive multi-dimensional analysis—based on global recognition rates (Figure 8), category-specific precision stability (Figure 9), and feature space clustering quality (Figure 10)—indicates that deep learning architectures exhibit significant performance advantages over traditional machine learning schemes. Experimental evidence suggests that traditional models fail to meet the stringent requirements for extremely low false alarm rates and high sensitivity in practical monitoring scenarios; consequently, they were eliminated at this stage.

3.1.2. Edge-Side Deployment Feasibility Verification

Following the completion of the offline algorithm screening, this study further evaluated the engineering implementation performance of the candidate models on embedded hardware. In this stage, the Edge Impulse EON™ compiler was utilised to reconstruct model operators and optimise memory, followed by deployment to the ESP32-S3 hardware platform. The evaluation metrics encompassed theoretical resource estimation and practical operational stability within the hardware environment. Since the development platform had not yet fully integrated a resource monitoring model specifically for the ESP32-S3, the ESP32-EYE ( $240 MHz$ ), which shares a similar architecture, was selected as a benchmark reference. Table 3 summarises the theoretical estimation data regarding static resource utilisation for each model.

Table 3.

Performance and resource evaluation of candidate models on ESP32-S3.

Model Architecture	CNN		LSTM		GRU
Precision Type	int8	float32	int8	float32	int8	float32
MFCC Latency (ms)	1179	1179	722	722	722	722
Inference Latency (ms)	7	94	1151	534	4423	2433
Total Latency (ms)	1186	1273	1873	1256	5145	3155
RAM Usage (KB)	25.4	25.4	22.3	22.3	22.3	22.3
Flash Usage (KB)	32.0	32.5	46.2	48.3	32.0	33.4
Accuracy (%)	90.68	90.43	96.36	96.17	96.30	96.55

Open in a new tab

Despite the GRU model demonstrating excellent classification accuracy during the offline validation phase ( $96.55 %$ ), it encountered severe engineering failures during actual deployment to the ESP32-S3 platform. Due to the lack of targeted low-level operator optimisation for its complex internal hidden-state update logic within the current TensorFlow Lite Micro interpreter, the system frequently triggered heap corruption during inference tasks. In practical testing, the serial monitor repeatedly returned “Invoke failed” error codes, further revealing significant computational resource scheduling conflicts for this architecture under constrained hardware environments. These failures reflect a critical challenge in the TinyML paradigm: while the GRU is mathematically more streamlined than the LSTM, its implementation in embedded interpreters often lacks the same level of low-level operator optimization for specific MCU architectures. As noted by Gookyi et al. (2024) [26], the compatibility between recurrent architectures and underlying hardware kernels is a primary determinant of system stability. Furthermore, even under ideal operating conditions, its estimated inference latency ( $3155 ms$ in float32 mode) significantly exceeded the $2.0 s$ real-time monitoring cycle threshold, posing a high risk of overflow for the underlying audio sampling buffer. Consequently, based on the dual factors of engineering unreliability and insufficient real-time performance, the GRU model was excluded from further consideration for hardware deployment.

In contrast to the GRU model, the CNN and LSTM architectures exhibited exceptional hardware adaptability and operational stability on the ESP32-S3 platform. Both the CNN (int8-quantised) and the LSTM (float32) successfully passed the rigorous on-device stability tests. The measured execution times for these models are summarised in Table 4. Experimental results indicate that the total on-device processing time for these models ranges approximately between $140 ms$ and $180 ms$ . This consumption is well within the system’s total 2 s monitoring cycle, confirming their fundamental feasibility for real-time edge deployment.

Table 4.

Actual inference latency of CNN and LSTM models on ESP32-S3 hardware.

Model Architecture	CNN		LSTM
Precision Type	int8	float32	int8	float32
MFCC Latency (ms)	135	135	135	135
Inference Latency (ms)	2	30	45	41
Total Latency (ms)	137	165	180	176

Open in a new tab

3.1.3. Performance Trade-Off and Final Decision

After excluding traditional algorithms and the GRU model due to low engineering feasibility, this study conducted a deep performance trade-off analysis between the CNN and LSTM. Although the CNN in int8-quantised mode demonstrated exceptional inference efficiency—with a measured classifier execution time of only $2 ms$ —precision oestrus monitoring in livestock farming is highly sensitive to recognition accuracy and operational risks. Consequently, this study adhered to an “accuracy-first” selection principle to mitigate breeding delays and increased labour costs resulting from false alarms or missed detections. Experimental results indicate that the LSTM (float32) achieved a classification accuracy of $96.17 %$ (with a global accuracy peak of $96.7 %$ in offline tests), and its characterisation capability for critical temporal stages, such as Pre-oestrus, was significantly superior to that of the CNN. While its total measured on-device latency ( $176 ms$ ) was slightly higher than that of the CNN ( $137 ms$ ), the $41 ms$ dedicated to classifier inference accounts for only $2.05 %$ of the $2 s$ sampling cycle. This efficiency provides substantial computational headroom, facilitating the parallel execution of subsequent operations such as sound source localisation without compromising real-time performance. Consequently, the results suggest that the system satisfies the requirements for near-real-time monitoring; thus, the millisecond-level ‘over-acceleration’ offered by the CNN did not translate into significant systemic gains within the practical monitoring workflow. In conclusion, the LSTM (float32) model was selected as the final deployment architecture. By maintaining an extremely low dynamic memory footprint of $22.3 KB$ , the system fully leverages the temporal modelling potential and floating-point computational performance of the ESP32-S3, achieving an optimal closed-loop of precision, stability, and real-time responsiveness. Notably, while Wang et al. (2022) [20] achieved high accuracy with a model size of 5.94 MB, our optimized Stacked-LSTM maintains high precision with a peak RAM footprint of only 22.3 KB, demonstrating its superior potential for low-cost edge deployment in intensive pig farms.

3.2. System Integration and Functional Verification

To validate the feasibility of the “edge recognition–cloud synchronisation” architecture, this study constructed an integrated testing platform in a controlled laboratory environment. Experimental results demonstrate that the system successfully achieved bi-directional interaction between the edge terminal (ESP32-S3) and the cloud monitoring platform using the MQTT protocol. This IoT architectural concept aligns with the framework proposed by Chen et al. (2021) [36], which also utilizes edge–cloud interaction for intelligent swine monitoring. As illustrated in Figure 11, when the edge terminal detects an oestrus signal (e.g., the Pre-oestrus stage), both the PC-based dashboard and the mobile application synchronise the recognition results, confidence scores, and environmental parameters within seconds. This confirms the consistency and real-time responsiveness of the multi-terminal visualisation interface when handling asynchronous telemetry data. During a continuous 72 h hardware and software stress test, the system executed the complete pipeline—comprising sampling, feature extraction, model inference, and data transmission—with high stability. The experiments confirmed that the LSTM model operates reliably on the ESP32-S3 without encountering stack overflows or connection anomalies, such as the memory issues and “Invoke failed” errors previously observed during the testing of the GRU model. Furthermore, the testing verified that the system accurately performs angle estimation based on sub-sample delays and successfully maps acoustic events in physical space to the corresponding logical stall units (Stall ID 1–7), completing the ‘Sense-Locate-Report’ functional loop at the system level.

Multi-terminal visualization interface of the Sow Oestrus Detection System. (a) PC-based management dashboard. (b) Mobile application interface.

4. Conclusions

This research successfully developed and validated a real-time sow oestrus monitoring and localisation system based on TinyML and edge computing architectures. Leveraging these innovations, an efficient, objective, and cost-effective non-contact oestrus monitoring solution is provided for modern intensive pig farms, significantly facilitating the practical implementation and field deployment of Precision Livestock Farming (PLF) technologies. By deeply optimising LSTM neural networks on the ESP32-S3 platform and implementing the GCC-PHAT algorithm, the system achieves a complete closed-loop from audio acquisition to individual-level early warning, effectively balancing high recognition precision with low inference latency.

Furthermore, the system’s performance was cross-validated against biological markers (serum FSH, LH, and $P_{4}$ ) and clinical behavioural observations. While this confirms the robust synchronization between acoustic patterns and physiological states, future studies will focus on broader field evaluations to further demonstrate the system’s practical reliability and diagnostic efficiency in diverse production settings.

Despite the promising results, this work should be regarded as a preliminary trial, and several limitations must be considered for broader industrial application. First, the current model was developed and validated specifically using a limited cohort of Yorkshire gilts; while the physiological drivers of oestrus vocalizations are biologically conserved, the model’s performance across different breeds or parities may exhibit variations and requires further cross-population validation. Second, although the system demonstrates robustness against common piggery sounds, its stability in complex commercial environments—characterized by high-density housing with multiple sows and overlapping vocalizations—remains to be fully explored. Factors such as high-power ventilation, severe reverberation, and the variation in acoustic distance caused by dynamic sow postures necessitate even more robust signal processing. Furthermore, the current limited daily sampling window leaves the patterns of nocturnal oestrus as a subject for future study.

To address these challenges, future research will explore Blind Source Separation (BSS) and spatial beamforming to enhance recognition robustness during concurrent vocalisation events. To achieve long-term maintenance-free deployment, we plan to leverage the ESP32-S3’s Ultra-Low Power (ULP) co-processor and explore micro-energy harvesting solutions. Ultimately, we intend to incorporate visual pose recognition to construct a “sound–image–environment” multi-modal fusion model, further reducing false alarms and providing more precise decision support for predicting the optimal ovulation window.

Acknowledgments

We thank the staff of Deshenggou Village, Guxian Town, Qinxian County, Shanxi Province, and the local pig farms for their help with the data collection process. We also thank the experts who participated in data collection, labelling, and the model evaluation process.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ani16050804/s1. Section S1: MFCC Implementation; Section S2: GCC-PHAT Calculation; Section S3: Evaluation Metrics.

animals-16-00804-s001.zip^{(120KB, zip)}

Author Contributions

H.L. (Hao Liu): Methodology, Software, Formal analysis, Visualization, Writing—original draft preparation. H.L. (Haopu Li): Investigation, Data curation. Y.C.: Investigation, Data curation. R.C.: Investigation. G.H.: Resources, Validation. Z.L.: Conceptualization, Writing—review and editing, Supervision, Project administration, Funding acquisition. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

This animal study protocol was approved by the Animal Ethics Committee of Shanxi Agricultural University (approval number: [SXAU-EAW-2022P.GT.0070080141]; approval date: [1 December 2022]).

Informed Consent Statement

Written informed consent was obtained from the owner of the animals involved in this study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy restrictions.

Conflicts of Interest

All authors declare no conflicts of interest.

Funding Statement

This work was supported by the Research Project Supported by the Shanxi Scholarship Council of China (grant number: 2023-092) and the key R&D program of Shanxi Province (grant number: 202302010101002).

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

1.Xu J., Ying Y., Wu D., Hu Y., Cui D. Recent Advances in Pig Behavior Detection Based on Information Perception Technology. Comput. Electron. Agric. 2025;235:110327. doi: 10.1016/j.compag.2025.110327. [DOI] [Google Scholar]
2.Sharifuzzaman M., Mun H.-S., Ampode K.M.B., Lagua E.B., Park H.-R., Kim Y.-H., Hasan M.K., Yang C.-J. Technological Tools and Artificial Intelligence in Estrus Detection of Sows—A Comprehensive Review. Animals. 2024;14:471. doi: 10.3390/ani14030471. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Kemp B., Soede N.M. Relationship of Weaning-to-Estrus Interval to Timing of Ovulation and Fertilization in Sows. J. Anim. Sci. 1996;74:944. doi: 10.2527/1996.745944x. [DOI] [PubMed] [Google Scholar]
4.Fang J., Yang L., Tang X., Han S., Cheng G., Wang Y., Chen L., Zhao B., Wu J. Sow Estrus Detection Based on the Fusion of Vulvar Visual Features. Animals. 2025;15:2709. doi: 10.3390/ani15182709. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Glencorse D., Grupen C.G., Bathgate R. A Review of the Monitoring Techniques Used to Detect Oestrus in Sows. Animals. 2025;15:331. doi: 10.3390/ani15030331. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Lei K., Zong C., Du X., Teng G., Feng F. Oestrus Analysis of Sows Based on Bionic Boars and Machine Vision Technology. Animals. 2021;11:1485. doi: 10.3390/ani11061485. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Xue H., Chen J., Ding Q., Sun Y., Shen M., Liu L., Chen X., Zhou J. Automatic Detection of Sow Posture and Estrus Based on Convolutional Neural Network. Front. Phys. 2022;10:1037129. doi: 10.3389/fphy.2022.1037129. [DOI] [Google Scholar]
8.Xu Z., Sullivan R., Zhou J., Bromfield C., Lim T.T., Safranski T.J., Yan Z. Detecting Sow Vulva Size Change around Estrus Using Machine Vision Technology. Smart Agric. Technol. 2023;3:100090. doi: 10.1016/j.atech.2022.100090. [DOI] [Google Scholar]
9.Simões V.G., Lyazrhi F., Picard-Hagen N., Gayrard V., Martineau G.-P., Waret-Szkuta A. Variations in the Vulvar Temperature of Sows during Proestrus and Estrus as Determined by Infrared Thermography and Its Relation to Ovulation. Theriogenology. 2014;82:1080–1085. doi: 10.1016/j.theriogenology.2014.07.017. [DOI] [PubMed] [Google Scholar]
10.Lee J.H., Lee D.H., Yun W., Oh H.J., An J.S., Kim Y.G., Kim G.M., Cho J.H. Quantifiable and Feasible Estrus Detection Using the Ultrasonic Sensor Array and Digital Infrared Thermography. J. Anim. Sci. Technol. 2019;61:163–169. doi: 10.5187/jast.2019.61.3.163. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Manteuffel G., Puppe B., Schön P.C. Vocalization of Farm Animals as a Measure of Welfare. Appl. Anim. Behav. Sci. 2004;88:163–182. doi: 10.1016/j.applanim.2004.02.012. [DOI] [Google Scholar]
12.Murphy E., Nordquist R.E., Van Der Staay F.J. A Review of Behavioural Methods to Study Emotion and Mood in Pigs, Sus Scrofa. Appl. Anim. Behav. Sci. 2014;159:9–28. doi: 10.1016/j.applanim.2014.08.002. [DOI] [Google Scholar]
13.Sharma G., Umapathy K., Krishnan S. Trends in Audio Signal Feature Extraction Methods. Appl. Acoust. 2020;158:107020. doi: 10.1016/j.apacoust.2019.107020. [DOI] [Google Scholar]
14.Cordeiro A.F.d.S., Nääs I.D.A., Da Silva Leitão F., De Almeida A.C.M., De Moura D.J. Use of Vocalisation to Identify Sex, Age, and Distress in Pig Production. Biosyst. Eng. 2018;173:57–63. doi: 10.1016/j.biosystemseng.2018.03.007. [DOI] [Google Scholar]
15.Exadaktylos V., Silva M., Aerts J.-M., Taylor C.J., Berckmans D. Real-Time Recognition of Sick Pig Cough Sounds. Comput. Electron. Agric. 2008;63:207–214. doi: 10.1016/j.compag.2008.02.010. [DOI] [Google Scholar]
16.Chung Y., Oh S., Lee J., Park D., Chang H.-H., Kim S. Automatic Detection and Recognition of Pig Wasting Diseases Using Sound Data in Audio Surveillance Systems. Sensors. 2013;13:12929–12942. doi: 10.3390/s131012929. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Yin Y., Tu D., Shen W., Bao J. Recognition of Sick Pig Cough Sounds Based on Convolutional Neural Network in Field Situations. Inf. Process. Agric. 2021;8:369–379. doi: 10.1016/j.inpa.2020.11.001. [DOI] [Google Scholar]
18.Shen W., Tu D., Yin Y., Bao J. A New Fusion Feature Based on Convolutional Neural Network for Pig Cough Recognition in Field Situations. Inf. Process. Agric. 2021;8:573–580. doi: 10.1016/j.inpa.2020.11.003. [DOI] [Google Scholar]
19.Liao J., Li H., Feng A., Wu X., Luo Y., Duan X., Ni M., Li J. Domestic Pig Sound Classification Based on TransformerCNN. Appl. Intell. 2022;53:4907–4923. doi: 10.1007/s10489-022-03581-6. [DOI] [Google Scholar]
20.Wang Y., Li S., Zhang H., Liu T. A Lightweight CNN-Based Model for Early Warning in Sow Oestrus Sound Monitoring. Ecol. Inform. 2022;72:101863. doi: 10.1016/j.ecoinf.2022.101863. [DOI] [Google Scholar]
21.Chen P., Yin D., Yang B., Tang W. Journal of Physics: Conference Series. Volume 2203. IOP Publishing; Bristol, UK: 2022. A Fusion Feature for the Oestrous Sow Sound Identification Based on Convolutional Neural Networks; p. 012049. [DOI] [Google Scholar]
22.Cao Y., Yin Z., Duan Y., Cao R., Hu G., Liu Z. Research on Improved Sound Recognition Model for Oestrus Detection in Sows. Comput. Electron. Agric. 2025;231:109975. doi: 10.1016/j.compag.2025.109975. [DOI] [Google Scholar]
23.Cai J., Liu W., Liu T., Wang F., Li Z., Wang X., Li H. APO-CViT: A Non-Destructive Estrus Detection Method for Breeding Pigs Based on Multimodal Feature Fusion. Animals. 2025;15:1067. doi: 10.3390/ani15071067. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Duan Y., Yang Y., Cao Y., Liu J., Li L., Cao R., Hu G., Liu Z. A Multimodal Deep Learning Network for Precise Detection of Estrus and Pseudo-Estrus in Sows. Smart Agric. Technol. 2025;12:101279. doi: 10.1016/j.atech.2025.101279. [DOI] [Google Scholar]
25.Attri I., Awasthi L., Sharma T.P. TinyML for Plant Disease Detection: Efficient Edge AI Solutions for Apple and Mango Leaves. Procedia Comput. Sci. 2025;258:2870–2877. doi: 10.1016/j.procs.2025.04.547. [DOI] [Google Scholar]
26.Gookyi D.A.N., Wulnye F.A., Arthur E.A.E., Ahiadormey R.K., Agyemang J.O., Agyekum K.O.-B.O., Gyaang R. TinyML for Smart Agriculture: Comparative Analysis of TinyML Platforms and Practical Deployment for Maize Leaf Disease Identification. Smart Agric. Technol. 2024;8:100490. doi: 10.1016/j.atech.2024.100490. [DOI] [Google Scholar]
27.Gragnaniello M., Borghese A., Marrazzo V.R., Maresca L., Breglio G., Irace A., Riccio M. Real-Time Myocardial Infarction Detection Approaches with a Microcontroller-Based Edge-AI Device. Sensors. 2024;24:828. doi: 10.3390/s24030828. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Tolani M., Saathwik G.E., Roy A., Ameeth L.A., Rao D.B., Bajpai A., Balodi A. Machine Learning Based Adaptive Traffic Prediction and Control Using Edge Impulse Platform. Sci. Rep. 2025;15:17161. doi: 10.1038/s41598-025-00762-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Mihigo I.N., Zennaro M., Uwitonze A., Rwigema J., Rovai M. On-Device IoT-Based Predictive Maintenance Analytics Model: Comparing TinyLSTM and TinyModel from Edge Impulse. Sensors. 2022;22:5174. doi: 10.3390/s22145174. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Tiberiu Constantin N., Petrișor Posastiuc F., Raluca Andrei C. Progesterone: An Essential Diagnostic Resource in Veterinary Medicine. In: Wang Z., editor. Progesterone-Basic Concepts and Emerging New Applications. IntechOpen; Rijeka, Croatia: 2024. [Google Scholar]
31.Duan Y., Yang Y., Cao Y., Wang X., Cao R., Hu G., Liu Z. Integrated Convolution and Attention Enhancement-You Only Look Once: A Lightweight Model for False Estrus and Estrus Detection in Sows Using Small-Target Vulva Detection. Animals. 2025;15:580. doi: 10.3390/ani15040580. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Jahangir R., Asif Nauman M., Alroobaea R., Almotiri J., Mohsin Malik M., Alzahrani S.M. Deep Learning-Based Environmental Sound Classification Using Feature Fusion and Data Enhancement. Comput. Mater. Contin. 2023;74:1069–1091. doi: 10.32604/cmc.2023.032719. [DOI] [Google Scholar]
33.Knapp C., Carter G. The Generalized Correlation Method for Estimation of Time Delay. IEEE Trans. Acoust. Speech Signal Process. 1976;24:320–327. doi: 10.1109/TASSP.1976.1162830. [DOI] [Google Scholar]
34.Omologo M., Svaizer P. Proceedings of ICASSP ’94. IEEE International Conference on Acoustics, Speech and Signal Processing. Volume 2. IEEE; New York, NY, USA: 1994. Acoustic Event Localization Using a Crosspower-Spectrum Phase Based Technique; pp. 273–276. [Google Scholar]
35.DiBiase J.H. A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays. Brown University; Providence, RI, USA: 2000. [Google Scholar]
36.Chen W.-E., Lin Y.-B., Chen L.-X. PigTalk: An AI-Based IoT Platform for Piglet Crushing Mitigation. IEEE Trans. Ind. Inform. 2021;17:4345–4355. doi: 10.1109/TII.2020.3012496. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

animals-16-00804-s001.zip^{(120KB, zip)}

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy restrictions.

[B1-animals-16-00804] 1.Xu J., Ying Y., Wu D., Hu Y., Cui D. Recent Advances in Pig Behavior Detection Based on Information Perception Technology. Comput. Electron. Agric. 2025;235:110327. doi: 10.1016/j.compag.2025.110327. [DOI] [Google Scholar]

[B2-animals-16-00804] 2.Sharifuzzaman M., Mun H.-S., Ampode K.M.B., Lagua E.B., Park H.-R., Kim Y.-H., Hasan M.K., Yang C.-J. Technological Tools and Artificial Intelligence in Estrus Detection of Sows—A Comprehensive Review. Animals. 2024;14:471. doi: 10.3390/ani14030471. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3-animals-16-00804] 3.Kemp B., Soede N.M. Relationship of Weaning-to-Estrus Interval to Timing of Ovulation and Fertilization in Sows. J. Anim. Sci. 1996;74:944. doi: 10.2527/1996.745944x. [DOI] [PubMed] [Google Scholar]

[B4-animals-16-00804] 4.Fang J., Yang L., Tang X., Han S., Cheng G., Wang Y., Chen L., Zhao B., Wu J. Sow Estrus Detection Based on the Fusion of Vulvar Visual Features. Animals. 2025;15:2709. doi: 10.3390/ani15182709. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5-animals-16-00804] 5.Glencorse D., Grupen C.G., Bathgate R. A Review of the Monitoring Techniques Used to Detect Oestrus in Sows. Animals. 2025;15:331. doi: 10.3390/ani15030331. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6-animals-16-00804] 6.Lei K., Zong C., Du X., Teng G., Feng F. Oestrus Analysis of Sows Based on Bionic Boars and Machine Vision Technology. Animals. 2021;11:1485. doi: 10.3390/ani11061485. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7-animals-16-00804] 7.Xue H., Chen J., Ding Q., Sun Y., Shen M., Liu L., Chen X., Zhou J. Automatic Detection of Sow Posture and Estrus Based on Convolutional Neural Network. Front. Phys. 2022;10:1037129. doi: 10.3389/fphy.2022.1037129. [DOI] [Google Scholar]

[B8-animals-16-00804] 8.Xu Z., Sullivan R., Zhou J., Bromfield C., Lim T.T., Safranski T.J., Yan Z. Detecting Sow Vulva Size Change around Estrus Using Machine Vision Technology. Smart Agric. Technol. 2023;3:100090. doi: 10.1016/j.atech.2022.100090. [DOI] [Google Scholar]

[B9-animals-16-00804] 9.Simões V.G., Lyazrhi F., Picard-Hagen N., Gayrard V., Martineau G.-P., Waret-Szkuta A. Variations in the Vulvar Temperature of Sows during Proestrus and Estrus as Determined by Infrared Thermography and Its Relation to Ovulation. Theriogenology. 2014;82:1080–1085. doi: 10.1016/j.theriogenology.2014.07.017. [DOI] [PubMed] [Google Scholar]

[B10-animals-16-00804] 10.Lee J.H., Lee D.H., Yun W., Oh H.J., An J.S., Kim Y.G., Kim G.M., Cho J.H. Quantifiable and Feasible Estrus Detection Using the Ultrasonic Sensor Array and Digital Infrared Thermography. J. Anim. Sci. Technol. 2019;61:163–169. doi: 10.5187/jast.2019.61.3.163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11-animals-16-00804] 11.Manteuffel G., Puppe B., Schön P.C. Vocalization of Farm Animals as a Measure of Welfare. Appl. Anim. Behav. Sci. 2004;88:163–182. doi: 10.1016/j.applanim.2004.02.012. [DOI] [Google Scholar]

[B12-animals-16-00804] 12.Murphy E., Nordquist R.E., Van Der Staay F.J. A Review of Behavioural Methods to Study Emotion and Mood in Pigs, Sus Scrofa. Appl. Anim. Behav. Sci. 2014;159:9–28. doi: 10.1016/j.applanim.2014.08.002. [DOI] [Google Scholar]

[B13-animals-16-00804] 13.Sharma G., Umapathy K., Krishnan S. Trends in Audio Signal Feature Extraction Methods. Appl. Acoust. 2020;158:107020. doi: 10.1016/j.apacoust.2019.107020. [DOI] [Google Scholar]

[B14-animals-16-00804] 14.Cordeiro A.F.d.S., Nääs I.D.A., Da Silva Leitão F., De Almeida A.C.M., De Moura D.J. Use of Vocalisation to Identify Sex, Age, and Distress in Pig Production. Biosyst. Eng. 2018;173:57–63. doi: 10.1016/j.biosystemseng.2018.03.007. [DOI] [Google Scholar]

[B15-animals-16-00804] 15.Exadaktylos V., Silva M., Aerts J.-M., Taylor C.J., Berckmans D. Real-Time Recognition of Sick Pig Cough Sounds. Comput. Electron. Agric. 2008;63:207–214. doi: 10.1016/j.compag.2008.02.010. [DOI] [Google Scholar]

[B16-animals-16-00804] 16.Chung Y., Oh S., Lee J., Park D., Chang H.-H., Kim S. Automatic Detection and Recognition of Pig Wasting Diseases Using Sound Data in Audio Surveillance Systems. Sensors. 2013;13:12929–12942. doi: 10.3390/s131012929. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17-animals-16-00804] 17.Yin Y., Tu D., Shen W., Bao J. Recognition of Sick Pig Cough Sounds Based on Convolutional Neural Network in Field Situations. Inf. Process. Agric. 2021;8:369–379. doi: 10.1016/j.inpa.2020.11.001. [DOI] [Google Scholar]

[B18-animals-16-00804] 18.Shen W., Tu D., Yin Y., Bao J. A New Fusion Feature Based on Convolutional Neural Network for Pig Cough Recognition in Field Situations. Inf. Process. Agric. 2021;8:573–580. doi: 10.1016/j.inpa.2020.11.003. [DOI] [Google Scholar]

[B19-animals-16-00804] 19.Liao J., Li H., Feng A., Wu X., Luo Y., Duan X., Ni M., Li J. Domestic Pig Sound Classification Based on TransformerCNN. Appl. Intell. 2022;53:4907–4923. doi: 10.1007/s10489-022-03581-6. [DOI] [Google Scholar]

[B20-animals-16-00804] 20.Wang Y., Li S., Zhang H., Liu T. A Lightweight CNN-Based Model for Early Warning in Sow Oestrus Sound Monitoring. Ecol. Inform. 2022;72:101863. doi: 10.1016/j.ecoinf.2022.101863. [DOI] [Google Scholar]

[B21-animals-16-00804] 21.Chen P., Yin D., Yang B., Tang W. Journal of Physics: Conference Series. Volume 2203. IOP Publishing; Bristol, UK: 2022. A Fusion Feature for the Oestrous Sow Sound Identification Based on Convolutional Neural Networks; p. 012049. [DOI] [Google Scholar]

[B22-animals-16-00804] 22.Cao Y., Yin Z., Duan Y., Cao R., Hu G., Liu Z. Research on Improved Sound Recognition Model for Oestrus Detection in Sows. Comput. Electron. Agric. 2025;231:109975. doi: 10.1016/j.compag.2025.109975. [DOI] [Google Scholar]

[B23-animals-16-00804] 23.Cai J., Liu W., Liu T., Wang F., Li Z., Wang X., Li H. APO-CViT: A Non-Destructive Estrus Detection Method for Breeding Pigs Based on Multimodal Feature Fusion. Animals. 2025;15:1067. doi: 10.3390/ani15071067. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24-animals-16-00804] 24.Duan Y., Yang Y., Cao Y., Liu J., Li L., Cao R., Hu G., Liu Z. A Multimodal Deep Learning Network for Precise Detection of Estrus and Pseudo-Estrus in Sows. Smart Agric. Technol. 2025;12:101279. doi: 10.1016/j.atech.2025.101279. [DOI] [Google Scholar]

[B25-animals-16-00804] 25.Attri I., Awasthi L., Sharma T.P. TinyML for Plant Disease Detection: Efficient Edge AI Solutions for Apple and Mango Leaves. Procedia Comput. Sci. 2025;258:2870–2877. doi: 10.1016/j.procs.2025.04.547. [DOI] [Google Scholar]

[B26-animals-16-00804] 26.Gookyi D.A.N., Wulnye F.A., Arthur E.A.E., Ahiadormey R.K., Agyemang J.O., Agyekum K.O.-B.O., Gyaang R. TinyML for Smart Agriculture: Comparative Analysis of TinyML Platforms and Practical Deployment for Maize Leaf Disease Identification. Smart Agric. Technol. 2024;8:100490. doi: 10.1016/j.atech.2024.100490. [DOI] [Google Scholar]

[B27-animals-16-00804] 27.Gragnaniello M., Borghese A., Marrazzo V.R., Maresca L., Breglio G., Irace A., Riccio M. Real-Time Myocardial Infarction Detection Approaches with a Microcontroller-Based Edge-AI Device. Sensors. 2024;24:828. doi: 10.3390/s24030828. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28-animals-16-00804] 28.Tolani M., Saathwik G.E., Roy A., Ameeth L.A., Rao D.B., Bajpai A., Balodi A. Machine Learning Based Adaptive Traffic Prediction and Control Using Edge Impulse Platform. Sci. Rep. 2025;15:17161. doi: 10.1038/s41598-025-00762-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29-animals-16-00804] 29.Mihigo I.N., Zennaro M., Uwitonze A., Rwigema J., Rovai M. On-Device IoT-Based Predictive Maintenance Analytics Model: Comparing TinyLSTM and TinyModel from Edge Impulse. Sensors. 2022;22:5174. doi: 10.3390/s22145174. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30-animals-16-00804] 30.Tiberiu Constantin N., Petrișor Posastiuc F., Raluca Andrei C. Progesterone: An Essential Diagnostic Resource in Veterinary Medicine. In: Wang Z., editor. Progesterone-Basic Concepts and Emerging New Applications. IntechOpen; Rijeka, Croatia: 2024. [Google Scholar]

[B31-animals-16-00804] 31.Duan Y., Yang Y., Cao Y., Wang X., Cao R., Hu G., Liu Z. Integrated Convolution and Attention Enhancement-You Only Look Once: A Lightweight Model for False Estrus and Estrus Detection in Sows Using Small-Target Vulva Detection. Animals. 2025;15:580. doi: 10.3390/ani15040580. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B32-animals-16-00804] 32.Jahangir R., Asif Nauman M., Alroobaea R., Almotiri J., Mohsin Malik M., Alzahrani S.M. Deep Learning-Based Environmental Sound Classification Using Feature Fusion and Data Enhancement. Comput. Mater. Contin. 2023;74:1069–1091. doi: 10.32604/cmc.2023.032719. [DOI] [Google Scholar]

[B33-animals-16-00804] 33.Knapp C., Carter G. The Generalized Correlation Method for Estimation of Time Delay. IEEE Trans. Acoust. Speech Signal Process. 1976;24:320–327. doi: 10.1109/TASSP.1976.1162830. [DOI] [Google Scholar]

[B34-animals-16-00804] 34.Omologo M., Svaizer P. Proceedings of ICASSP ’94. IEEE International Conference on Acoustics, Speech and Signal Processing. Volume 2. IEEE; New York, NY, USA: 1994. Acoustic Event Localization Using a Crosspower-Spectrum Phase Based Technique; pp. 273–276. [Google Scholar]

[B35-animals-16-00804] 35.DiBiase J.H. A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays. Brown University; Providence, RI, USA: 2000. [Google Scholar]

[B36-animals-16-00804] 36.Chen W.-E., Lin Y.-B., Chen L.-X. PigTalk: An AI-Based IoT Platform for Piglet Crushing Mitigation. IEEE Trans. Ind. Inform. 2021;17:4345–4355. doi: 10.1109/TII.2020.3012496. [DOI] [Google Scholar]

PERMALINK

Edge-AI Enabled Acoustic Monitoring and Spatial Localisation for Sow Oestrus Detection

Hao Liu

Haopu Li

Yue Cao

Riliang Cao

Guangying Hu

Zhenyu Liu

Roles

Simple Summary

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.1.1. Experimental Design

Figure 1.

2.1.2. Audio-Visual Acquisition

2.1.3. Physiological Data Collection

2.2. Data Preprocessing

2.2.1. Data Labelling and Dataset Construction

Figure 2.

Table 1.

Figure 3.

2.2.2. Optimization of MFCC Feature Extraction for Edge Deployment

Figure 4.

2.3. Edge Lightweight Recognition Model Design

2.3.1. Stacked Long Short-Term Memory Architecture Details

Figure 5.

2.3.2. Benchmark Models

2.4. Edge Intelligent Monitoring System

Figure 6.

2.4.1. System Hardware Architecture

2.4.2. Spatial Localisation Implementation Based on the GCC-PHAT Algorithm

2.5. Performance Evaluation Criteria

3. Results and Discussion

3.1. Model Selection

Figure 7.

3.1.1. Comparative Analysis and Preliminary Model Screening

Table 2.

Figure 8.

Figure 9.

Figure 10.

3.1.2. Edge-Side Deployment Feasibility Verification

Table 3.

Table 4.

3.1.3. Performance Trade-Off and Final Decision

3.2. System Integration and Functional Verification

Figure 11.

4. Conclusions

Acknowledgments

Supplementary Materials

Author Contributions

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Funding Statement

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases