Development of a real-time cattle lameness detection system using a single side-view camera

Bo Bo Myint; Tsubasa Onizuka; Pyke Tin; Masaru Aikawa; Ikuo Kobayashi; Thi Thi Zin

doi:10.1038/s41598-024-64664-7

. 2024 Jun 14;14:13734. doi: 10.1038/s41598-024-64664-7

Development of a real-time cattle lameness detection system using a single side-view camera

Bo Bo Myint ¹, Tsubasa Onizuka ¹, Pyke Tin ¹, Masaru Aikawa ², Ikuo Kobayashi ³, Thi Thi Zin ^1,^✉

PMCID: PMC11178932 PMID: 38877097

Abstract

Recent advancements in machine learning and deep learning have revolutionized various computer vision applications, including object detection, tracking, and classification. This research investigates the application of deep learning for cattle lameness detection in dairy farming. Our study employs image processing techniques and deep learning methods for cattle detection, tracking, and lameness classification. We utilize two powerful object detection algorithms: Mask-RCNN from Detectron2 and the popular YOLOv8. Their performance is compared to identify the most effective approach for this application. Bounding boxes are drawn around detected cattle to assign unique local IDs, enabling individual tracking and isolation throughout the video sequence. Additionally, mask regions generated by the chosen detection algorithm provide valuable data for feature extraction, which is crucial for subsequent lameness classification. The extracted cattle mask region values serve as the basis for feature extraction, capturing relevant information indicative of lameness. These features, combined with the local IDs assigned during tracking, are used to compute a lameness score for each cattle. We explore the efficacy of various established machine learning algorithms, such as Support Vector Machines (SVM), AdaBoost and so on, in analyzing the extracted lameness features. Evaluation of the proposed system was conducted across three key domains: detection, tracking, and lameness classification. Notably, the detection module employing Detectron2 achieved an impressive accuracy of 98.98%. Similarly, the tracking module attained a high accuracy of 99.50%. In lameness classification, AdaBoost emerged as the most effective algorithm, yielding the highest overall average accuracy (77.9%). Other established machine learning algorithms, including Decision Trees (DT), Support Vector Machines (SVM), and Random Forests, also demonstrated promising performance (DT: 75.32%, SVM: 75.20%, Random Forest: 74.9%). The presented approach demonstrates the successful implementation for cattle lameness detection. The proposed system has the potential to revolutionize dairy farm management by enabling early lameness detection and facilitating effective monitoring of cattle health. Our findings contribute valuable insights into the application of advanced computer vision methods for livestock health management.

Subject terms: Computer science, Information technology

Introduction

Video analysis and a Siamese attention model (Siam-AM) offer a potential solution for tracking all four legs of cattle and detecting lameness in dairy herds. The process involves feature extraction, applying attention weighting, and comparing similarities to achieve precise leg tracking¹. Dairy cattle lameness significantly impacts their health and well-being, leading to decreased milk production, extended calving intervals, and increased costs for producers^2–4. Research suggests associations between low body condition scores, hoof overgrowth, early lactation, larger herd sizes, and higher parity with increased odds of lameness in stall-housed cattle. However, data retrieval challenges and limited study comparability highlight the need for robust evidence to develop effective intervention strategies⁵. Early identification of lameness allows for more cost-effective treatment, making automatic detection models an optimal solution to reduce expenses⁶. Additionally, lameness reduction offers the potential for decreased recurrence of past events and improved cattle body condition scores⁷. Prompt detection and management of lameness are crucial for the sustainable growth of the dairy industry. However, manual detection becomes increasingly challenging as dairy farming expands⁸.

Studies investigating the intra- and inter-rater reliability of lameness assessment in cattle using locomotion scores, both live and from video, have shown that experienced raters demonstrate higher reliability when scoring from video. This suggests video observation is an acceptable method for lameness assessment, regardless of the observer's experience⁹. Lameness poses a significant welfare concern for dairy cattle, leading to various gait assessment methods. While subjective methods lack consistency, objective ones require advanced technology. This review evaluates the reliability, validity, and interplay of gait assessment with cattle factors, hoof pathologies, and environmental conditions¹⁰. This study employs a dynamic stochastic model to assess the welfare impact of various foot disorders in dairy cattle. Results highlight the significant negative welfare effects, with digital dermatitis having the highest impact, followed by subclinical disorders like sole hemorrhages and interdigital dermatitis. This study emphasizes the previously underappreciated impact of subclinical foot disorders and highlights the importance of considering pain intensity and clinical conditions in welfare assessments and management strategies¹¹.

Clinical lameness negatively affects milk yield and reproductive performance in cattle^12,13. Traditionally, farmers have relied on visual observation through locomotion scoring for diagnosing lameness¹⁴. However, this method is resource-intensive, time-consuming, and relies on qualitative assessments of gait and posture changes¹⁵. Additionally, individual cattle variations and dynamic movement pose challenges¹⁶. Claw lesions, both infectious and non-infectious, remain the primary cause of lameness in cattle¹⁷. Recent research has explored diverse characteristics of body motion to describe and detect lameness^18–20. One expert survey assigned weights to various gait factors, finding that analyzing cattle leg swing holds potential for lameness determination, given that most indicators relate to walking performance²¹. Identifying lameness remains a challenge in the dairy sector due to its impact on reproductive efficiency, milk production, and culling rates²². It ranks as the third most economically impactful disease in cattle, following fertility and mastitis²³. Early detection is crucial, leading to reduced antibiotic use and enhanced milk yield^19,24. To address these challenges, one study proposed a method that identifies deviations from normal cattle gait patterns using sensors like accelerometers to capture walking speed data and integrate it into a prediction model²⁵.

However, this contact-based approach may initially stress cattle unaccustomed to the equipment. Implementing it on a large scale would also increase labor and equipment costs. Overcoming these limitations, some researchers propose non-contact methods using monitoring cameras on farms. By modeling time and space for cattle tracking, they achieved relatively accurate results. However, this method lacked environmental robustness, as changes in conditions could negatively impact the algorithm's performance and lead to subpar results in long-term detection²⁶. In recent years, researchers have shown significant interest in object detection using convolutional neural networks (CNNs)^27,28 and feature classification based on recurrent neural networks (RNNs)^29–32. One study highlighted that deep learning-based object detection and recognition methods can overcome the limitations of manual design features, which often lack diversity³³. These approaches hold substantial promise for cattle lameness detection. By leveraging CNNs for cattle object detection and RNNs for feature extraction, valuable and continuous data can be effectively extracted, offering significant potential for cattle lameness detection. Studies have shown a correlation between the degree of curvature in a lame animal's back and the severity of lameness^19,20.

Utilizing automatic back posture measurements in daily routines allows for individual lameness classifications. Another study proposed a method based on consecutive 3D-video recordings for automatic detection of lameness³⁴. This study focuses on exploring the causes of lameness and early detection methods. Prior to classification, the system detects the cattle region using an instance segmentation algorithm to extract the region's mask value, which is crucial for feature extraction. This study combines image processing techniques and deep learning methods for detection and tracking. The extracted feature values are classified using popular machine learning algorithms like support vector machines and random forests. Our work utilizes the well-regarded Mask R-CNN instance segmentation algorithm for cattle region extraction. Subsequently, we employ Intersection-over-Union (IoU) in conjunction with frame-holding and entrance gating mechanisms to identify and track individual cattle. Finally, image processing techniques are leveraged to extract distinct features from these regions, enabling the calculation of cattle lameness. The primary contributions of our paper are as follows:

(i)
Accurate detection and instance segmentation of cattle located behind the frame or within covered areas remain critical challenges. This work seeks to address these complexities.
(ii)
To assess the efficacy of our custom-trained Mask R-CNN model, we performed a comparative analysis with the state-of-the-art YOLOv8 algorithm.
(iii)
For cattle tracking, we developed a light, customized algorithm that leverages IoU calculations combined with frame-holding and entrance gating mechanisms.
(iv)
This work proposes a novel approach to assess cattle lameness by analyzing the variance in movement patterns. We achieve this by calculating the three key points on the back curvature of the cattle, providing a new metric for lameness evaluation.
(v)
Comparing individual cattle lameness scores directly lack robustness. Instead, calculating the probabilities across all frames from each result folder (obtained from the cattle tracking phase) provides a more robust approach.
(vi)
The integration of cattle detection, tracking, feature extraction, and lameness calculation enables real-time cattle lameness detection.
(vii)
Early lameness detection in cattle remains a significant challenge, with most research focusing on differentiating between non-lame (level 1) and mildly lame (levels 2 and 3) animals.

Related work

In our Visual Information Focusing on animal welfare and technological advancements, the Visual Information Lab at Miyazaki University is currently engaged in the ongoing research and development of several key areas related to cattle management. Specifically, our focus lies on the following three systems such as Cattle Lameness Level Classification System, Cattle Body Condition Classification System, Cattle Mounting Detection System. For each of these systems, our research efforts encompass the development of robust detection, tracking and identification methodologies^35–40, specifically tailored to capture the relevant features and characteristics of the cattle under observation. Additionally, we are diligently working on devising efficient and accurate classification algorithms that will aid in categorizing the identified cattle attributes according to their respective criteria within each system⁴¹. Our goal is to contribute to advancements in cattle management practices by providing innovative and reliable technological solutions in this domain. Lameness in cattle is of utmost importance as it can have a significant impact on animal welfare, production efficiency, and economic losses in the livestock industry.

Early and accurate detection of lameness is essential to ensure timely intervention and appropriate treatment for affected animals. In recent years, cattle lameness classification has garnered considerable attention in the field of precision livestock farming. Researchers have explored various approaches to achieve accurate and timely identification of cattle lameness using advanced technologies and innovative methods. Several research endeavors have contributed significantly to the domain of cattle lameness detection and body movement variability. In the context of detecting lameness in dairy cattle, several research studies have explored intelligent methods that leveraged advanced computer vision techniques, specifically Mask-RCNN, to extract the region of interest encompassing the dairy cattle. By utilizing features derived from head bob patterns, the authors successfully identified potential signs of lameness in the cattle movement⁴². Several research studies have been conducted to address this critical issue, focusing on the development of accurate and efficient lameness detection systems.

To detect lameness in cattle,⁴³ proposed a computer vision-based approach that emphasizes the lameness detection of dairy cattle by implementing an intelligent visual perception system using deep learning instance segmentation and identification to provide a cutting-edge solution that effectively extracts cattle regions from complex backgrounds. The⁴⁴ study introduces an in-parlor scoring (IPS) technique and compares its performance with LS in pasture-based dairy cattle. IPS indicators, encompassing shifting weight, abnormal weight distribution, swollen heel or hock joint, and overgrown hoof, were observed, and every third cattle was scored. The findings suggest that IPS holds promise as a viable alternative to LS on pasture-based dairy farms, presenting opportunities for more effective lameness detection and management in this setting. The lameness monitoring algorithm based on back posture values derived from a camera by fine-tuning deviation thresholds and the quantity of historical data used is developed in⁴⁵. The paper introduces a high-performing lameness detection system with meaningful historical data utilization in deviation detection algorithms.

The⁴⁶ study proposes a novel lameness detection method that combines machine vision technology with a deep learning algorithm, focusing on the curvature features of dairy cattle’s' backs. The approach involves constructing three models: Cattle's Back Position Extraction (CBPE), Cattle's Object Region Extraction (CORE), and Cattle's Back Curvature Extraction (CBCE). A Noise + Bilateral Long Short-term Memory (BiLSTM) model is utilized to predict the curvature data and match the lameness features. The⁴⁷ introduces developing a computer vision system using deep learning to recognize individual cattle in real-time, track their positions, actions, and movements, and record their time history outputs. The YOLO neural network, trained on cattle coat patterns, achieved a mean average precision ranging from 0.64 to 0.66, demonstrating the potential for accurate cattle identification based on morphological appearance, particularly the piebald spotting pattern. Data augmentation techniques were employed to enhance network performance and provide insights for efficient detection in challenging data acquisition scenarios involving animals. The authors from⁴⁸ present an end-to-end Internet of Things (IoT) application that leverages advanced machine learning and data analytics techniques for real-time cattle monitoring and early lameness detection. Using long-range pedometers designed for dairy cattle, the system monitors each cattle's activity and aggregates the accelerometric data at the fog node. The development of an automatic and continuous system for scoring cattle locomotion, detecting and predicting lameness with high accuracy and practicality is represented in⁴⁹. Using computer vision techniques, the research focuses on analyzing leg swing and quantifying cattle movement patterns to classify lameness. By extracting six features related to gait asymmetry, speed, tracking up, stance time, stride length, and tenderness, the motion curves were analyzed and found to be nearly linear and separable within the three lameness classes.

In⁵⁰, the authors presented a pioneering method that harnessed depth imaging data to assess the variability of cattle body movements as an indicator of lameness. Their framework involved the development of an operational simulation model, integrating Monte Carlo simulation with prevalent probability distribution functions, including uniform, normal, Poisson, and Gamma distributions. By leveraging these techniques, the researchers were able to analyze the influence of key factors on cattle lameness status. The depth video camera-based system to detect cattle lameness from a top view position is researched in⁵¹. In their method, they extracted depth value sequences from the cattle body region and calculated the greatest value of the rear cattle area. By calculating the average of the maximum height values in the cattle backbone area, they created a feature vector for lameness classification. The authors then employed Support Vector Machine (SVM) to classify cattle lameness based on the computed average values. Their findings demonstrated the potential of using depth video cameras and SVM for efficient lameness detection.

This research⁵² communication explores the correlation between lameness occurrence and body condition score (BCS) by employing linear mixed-effects models to assess the relationship between BCS and lameness. The study revealed that the proportion of lame cattle increased with decreasing BCS, but also with increasing BCS. The likelihood of lameness was influenced by the number of lactations and decreased over time following the last claw clipping. This suggests the importance of adequate body condition in preventing lameness, while also raising questions about the impact of over conditioning on lameness and the influence of claw trimming on lameness assessment. The system⁵³ uses computer vision and deep learning techniques to accurately analyze the posture and gait of each cattle within the camera's field of view. The tracking of cattle as they move through the video sequence was performed using the SORT algorithm. The features obtained from the pose estimation and tracking were combined using the CatBoost gradient boosting algorithm. The system's accuracy was evaluated using threefold cross-validation, including recursive feature elimination. Precision was assessed using Cohen's kappa coefficient, and precision and recall were also considered.

Building upon the pioneering work of⁵⁴ represents a pioneering use of deep learning-based gait reconstruction and anomaly detection for early lameness detection, leveraging the portability and real-time capabilities of wearable gait analysis to enhance animal welfare, our proposed system prioritizes a more natural approach. We aim to minimize stress on cattle and avoid altering their environment. This focus on natural interaction positions our system as a significant advancement for animal welfare and management practices in the dairy industry. This work utilizes Mask R-CNN, a popular instance segmentation algorithm implemented by Detectron 2, and the state-of-the-art YOLOv8 for cattle detection. To optimize feature analysis for individual cattle across frames, we consider additional features and leverage cattle tracking with a simple and efficient IoU calculation and frame-holding logic. Our proposed cattle lameness detection methodology employs a three-point back curvature approach that integrates movement variances to calculate lameness levels. The “Methodology” section describes the specific technologies utilized in our proposed system.

Methodology

This study proposes a novel system for automatically calculating cattle lameness from farm video footage. As illustrated in Fig. 1, the system employs a five-stage pipeline: (1) Data Collection and Data Preprocessing, (2) Cattle Detection, (3) Cattle Tracking, (4) Feature Extraction, and (5) Cattle Lameness Classification. During the Data Collection and Data Preprocessing stage, videos are segmented into individual frames for subsequent analysis. Dedicated annotation tools are then employed to manually label frames, generating ground truth data for training the cattle detection model. The Detection stage leverages annotated data to train two object detection models for comparative analysis: a Mask R-CNN detector implemented within the Detectron2 framework²⁷ and the state-of-the-art YOLOv8 model. Both models are tasked with localization and classification of cattle within each frame, generating bounding boxes, mask regions, and class labels for each detected animal.

Subsequently, the Tracking stage leverages the bounding box information to track individual cattle traveling within a defined area. Each tracked animal is assigned a unique Local ID for subsequent analysis. In the Feature Extraction stage, a diverse set of features (F₁, F₂, …, F_n) are calculated for each tracked individual. These features, carefully chosen to capture relevant movement characteristics, serve as input for the subsequent lameness classification stage. Finally, the Classification stage employs various machine learning algorithms to categorize cattle into three groups: early lameness levels (2 and 3) and no lameness (level 1). The classification results are stored daily, providing valuable information for monitoring herd health, and enabling early intervention for lameness management. Overall, this proposed system offers a novel approach for utilizing farm video data to gain valuable insights into cattle well-being by facilitating the early detection and management of lameness.

Data collection and data preprocessing

This study utilized a cattle dataset collected at the Hokuren Kunneppu Demonstration Farm in Hokkaido, Japan. The farm features two distinct cattle passing lanes leading from the Cattle Barn to the Milking Parlor as shown in Fig. 2. Between these lanes lies cattle waiting area where groups of eight or seven cattle await their turn to enter the milking parlor. To facilitate data collection, a single camera was strategically positioned at the starting point of each lane, enabling comprehensive monitoring and analysis of cattle behavior and movement along the pathways. Figure 3a,b depict Lane A and Lane B, respectively.

Conceptual model of farm layout and organization: key areas of interest within the cattle facility include milking parlors, designated lanes (lane A and lane B), a waiting area, and cattle barn.

Sample Image from cattle farm (a) cattle lane A (b) cattle lane B.

Data collection

This study utilized two AXIS P 1448-LE 4K cameras (Fig. 4) strategically positioned at the starting points of the two cattle lanes connecting the cattle barn to the milking parlor (Fig. 5a,b). These cameras captured video recordings at a frame rate of 25 frames per second and an image resolution of 3840 × 2160. During preprocessing, a subset of images was extracted from the video at a reduced frame rate (13 fps) and resolution (1280 × 720) within the region of interest, optimizing data processing and resource allocation efficiency. The primary focus of the study was on Lane A, although Lane B was also present (Fig. 3). Camera recordings were conducted during specific time intervals (5 am–8 am and 2 pm–5 pm) to coincide with natural cattle movement patterns through the designated lanes. This approach ensured data collection occurred without the need for additional handling or disruptions to the cattle's daily routine, minimizing stress and maintaining their well-being.

Camera configuration for monitoring lane A in the cattle Farm and camera type of 4K camera usage in cattle farm.

Camera installation (a) 4K camera on lane A (b) front-view from 4K camera installed.

Data preprocessing

For data preprocessing in cattle region detection, we employed VGG Image Annotator (VIA)⁵⁵, a simple and standalone annotation tool for images. VIA facilitated manual annotation, allowing us to mark cattle regions using various shapes, such as box, polygon, and key points (skeleton), among others. In our study, we utilized the polygon shape to annotate cattle regions, where the boundaries of the cattle region were marked with pixel points. These grouped pixels were then labeled as "cattle," as depicted in Fig. 6a,b. Throughout the training process for cattle detection, we utilized a single class named "cattle." The use of VIA provided a lightweight and installation-free solution for efficient data annotation and preparation.

(a), (b) Sample annotations of cattle region by using VGG image annotator.

To prepare the cattle detection training dataset, we manually annotated 5458 Instances across three distinct dates: July 4th, September 30th, and November 19th, and for testing dataset we collected 247,250 instances from January 3rd, 5th, 6th, 7th, 10th, 11st, 12nd, 13rd, 14th, 23rd, 24th, 25th, 26th, 27th, 28th and 29th as detailed in Table 1. The January dataset presented the most significant challenge for cattle detection due to the combined effects of the winter season and camera setup environment. Specifically, the presence of smoke in the environment and proximity of cattle seeking warmth during cooler temperatures hindered accurate detection. Recognizing the crucial role of data quality in model performance, we implemented a multi-step approach to optimize the dataset: (1) Duplicate Removal: We meticulously identified and eliminated duplicate images, minimizing redundancy, and alleviating unnecessary computational strain during training. (2) Blur Mitigation: Recognizing the detrimental impact of blurred images on detection accuracy, we meticulously filtered out any exhibiting noticeable blurring, thereby fostering a dataset of enhanced quality. (3) Noise Reduction: Images afflicted by excessive noise, such as pixelation or distortion, were meticulously excluded, ensuring the inclusion of noise-free images that bolster the model's robustness. (4) Relevant Data Extraction: As our focus was on cattle detection, images devoid of cattle passing lanes or lacking pertinent cattle-related information were meticulously excluded, guaranteeing the training dataset comprised solely of high-quality, diverse, and relevant imagery, ultimately optimizing the detection model's performance and effectiveness.

Table 1.

Dataset used for training and testing of cattle detection.

Type of training	Date	Duration	Number of instances
Training	4th July 2022	Morning, evening	3000
	30th September 2022	Morning, evening	480
	19th November 2022	Morning, evening	1978
Testing	3rd January 2023	Morning, evening	17,322
	5th January 2023	Morning, evening	22,194
	6th January 2023	Morning	5059
	7th January 2023	Morning	5106
	10th January 2023	Morning, evening	20,720
	11st January 2023	Morning, evening	11,257
	12nd January 2023	Morning, evening	16,274
	13rd January 2023	Morning, evening	24,538
	14th January 2023	Morning	6074
	23rd January 2023	Morning, evening	18,579
	24th January 2023	Morning, evening	23,684
	25th January 2023	Morning, evening	15,596
	26th January 2023	Morning, evening	14,998
	27th January 2023	Morning, evening	12,934
	28th January 2023	Morning	15,619
	29th January 2023	Morning, evening	17,296

Open in a new tab

Cattle detection

In recent years, object detection has gained significant attention in the field of research, mainly due to the advancements made by machine learning and deep learning algorithms. Object detection involves precisely locating and categorizing objects of interest within images or videos. To achieve this, the positions and boundaries of the objects are identified and labeled accordingly. Presently, state-of-the-art object detection methods can be broadly classified into two main types: one-stage methods and two-stage methods. One-stage methods prioritize model speed and efficiency, making them well-suited for real-time applications. Some notable one-stage methods include the Single Shot multibox Detector (SSD)⁵⁶, You Only Look Once (YOLO)⁵⁷, and RetinaNet⁵⁸. On the other hand, two-stage methods are more focused on achieving high accuracy. These methods often involve a preliminary region proposal step followed by a detailed classification step. Prominent examples of two-stage methods include Faster R-CNN⁵⁹, Mask R-CNN⁶⁰, and Cascade R-CNN⁶¹. The choice between one-stage and two-stage methods depends on the specific requirements of the application. While one-stage methods are faster and more suitable for real-time scenarios, two-stage methods offer improved accuracy but may be computationally more intensive. As object detection continues to evolve, researchers and practitioners are exploring a range of techniques to strike the right balance between speed and accuracy for various use cases. In this study, we evaluated two state-of-the-art and widely used object detection algorithms within our research domain. The first algorithm is Mask R-CNN, which belongs to the two-stage methods and is implemented using the powerful Detectron2 framework. Mask R-CNN has gained popularity due to its robustness and flexibility in handling complex detection tasks, particularly in instances where both bounding boxes and pixel-wise segmentation masks are required. The second algorithm under consideration is YOLOv8, which is currently one of the most widely adopted and popular object detection models. YOLOv8 is known for its efficiency and real-time capabilities, making it suitable for various applications. It excels in handling detection tasks with high accuracy while maintaining impressive speed, making it a preferred choice in many scenarios. By comparing these two state-of-the-art detection algorithms, we aimed to gain insights into their performance and applicability within our research domain. The evaluation will help us determine which algorithm better suits our specific requirements and contributes to advancing object detection techniques in our field.

Detectron2, developed by Facebook, is a state-of-the-art vision library designed to simplify the creation and utilization of object detection, instance segmentation, key point detection, and generalized segmentation models. Specifically, for object detection, Detectron2's library offers a variety of models, including RCNN, Mask R-CNN, and Faster R-CNN. RCNN generates region proposals, extracts fixed-length features from each candidate region, and performs object classification. However, this process can be slow due to the independent passing of CNN over each region of interest (ROI). Faster R-CNN architecture overcomes this limitation by incorporating the Region Proposal Network (RPN) and the Fast R-CNN detector stages. It obtains class labels and bounding boxes of objects effectively. Mask R-CNN, sharing the same two stages as Faster R-CNN, extends its capabilities by also generating class labels, bounding boxes, and masks for objects. Notably, Mask R-CNN demonstrates higher accuracy in cattle detection, as indicated by previous research. Therefore, the proposed system adopts Mask R-CNN to extract mask features specifically for cattle, as illustrated in Fig. 7. During the detection phase, the default predictor, COCO-Instance Segmentation, and Mask R-CNN with a 0.7 score threshold value (MODEL.ROI_HEADS.SCORE_THRESH_TEST) are utilized. The training process employs annotated images, while for testing, videos are fed into Detectron2 to obtain color mask, and binary mask images and detection results including bounding boxes, mask value of detected cattle region and cattle confidence score. The mask image is subsequently utilized in cattle detection calculations. Overall, by leveraging the capabilities of Mask R-CNN from Detectron2, the proposed system aims to achieve accurate and efficient cattle detection, offering valuable insights for cattle health monitoring and management.

Architecture for cattle detection by using Detectron2: Custom training pipeline for our dataset.

Ultralytics YOLOv8 is a powerful and versatile object detection and instance segmentation model that is designed to be fast, accurate, and easy to use. It builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. Is applicable to a wide range of tasks, including object detection, tracking, instance segmentation, image classification, and pose estimation. There are five pre-trained models with different sizes available for instance segmentation: yolov8n (Nano), yolov8s (Small), yolov8m (Medium), yolov8l (Large), and yolov8x (Extra Large). Instance segmentation goes a step further than object detection by identifying individual objects in an image and segmenting them from the rest of the image. Instance segmentation is useful when you need to know not only where objects are in an image, but also what their exact shape is. In this research, we employ the YOLOv8x-seg model for cattle detection and segmentation. Unlike traditional methods, we do not rely on manual annotation or custom training specific to cattle. Instead, YOLOv8x-seg is applied directly to the task of cattle detection, leveraging its advanced capabilities and pretrained features. This approach, as shown in Fig. 8, allows us to achieve accurate and efficient cattle detection without the need for labor-intensive annotation processes or specialized training. By utilizing the powerful segmentation capabilities of YOLOv8x-seg, we can effectively identify and delineate cattle regions in images or videos, contributing to the overall success of the study.

Architecture for cattle detection by using YOLOv8 base model.

Cattle tracking

In this study, the movement of cattle through individual lanes necessitates precise individual tracking to enable accurate lameness calculation. Ensuring accurate cattle identification and storage is of paramount importance. The dataset's simplicity involves the traversal of just one or two cattle through the lanes. The system employs a designated region of interest, demarcated by the lines in Fig. 9, aligned with the camera settings. The tracking procedure features two vertically intersecting red lines within the frame. The left line represents the left threshold, and the corresponding right line serves as the right threshold. The amalgamation of computer vision, machine learning, and deep learning in recent years has yielded remarkable strides in tracking algorithms, leading to substantial improvements in accuracy.

Algorithm 1 — Customized tracking algorithm (CTA).

In pursuit of cost-effectiveness and time efficiency, the research leveraged Intersection over Union (IoU) calculations for tracking, supplemented by the incorporation of individual ID lifetimes. This strategy, bolstered by Fig. 10, tackles challenges like missed detections and other detection intricacies. And Table 2 explains the variable use in tracking flowchart. This approach stands in contrast to conventional cattle tracking algorithms, offering simplicity, ease of manipulation, and minimal inference time. Tracking initiation occurs when a cattle's position precedes the left threshold and concludes upon the right side of the cattle's bounding box surpassing the right threshold. Within this tracking phase, the assignment of a cattle's Local ID transpires prior to traversing the left threshold, with the resulting Local ID stored in the Temporal Database. This meticulous tracking method ensures accuracy and efficiency, thereby contributing to the study's comprehensive monitoring and analysis objectives.

Automated cattle tracking process flow diagram.

Table 2.

Definitions of key variables for cattle tracking analysis.

Symbols	Description
f_t	Current frame
a_bbox	Aspect ratio of bounding box
Tha_a	Aspect ratio threshold
Tha_left	Left boundary threshold
Tha_right	Right boundary threshold
ID(f_t)	ID of current frame
MAX_LIFE	Maximum lifetime of holding time of frame
MIN_LIFE	Minimum lifetime of holding time of frame
Lifetime_ID	Lifetime of ID

Open in a new tab

The provided Algorithm 1 outlines a comprehensive approach to assigning Local IDs to detected cattle regions based on Bounding Boxes obtained from the Detectron2-Detection process. The algorithm focuses on accurate and reliable tracking of individual cattle as they pass through lanes. Beginning with calculating the aspect ratio of each bounding box, the algorithm applies specific conditions to determine the appropriate assignment of Local IDs. It considers factors such as the position of bounding boxes relative to defined thresholds and employs techniques like Intersection over Union (IoU) calculations to ensure consistency in the tracking process. Moreover, the algorithm introduces the concept of lifespan (LifetimeID) for each assigned ID, allowing for refined management of cattle tracking.

By combining these strategies, the algorithm contributes to enhancing the accuracy and effectiveness of cattle tracking, thus enabling the monitoring of cattle behavior and health in various contexts. The local ID associated with a cattle's tracking information is systematically managed to ensure accurate and up-to-date records. This process involves the removal of the local ID from the database under certain circumstances. Specifically, when the horizontal position (x₂) of the bounding box aligns with the right threshold of the frames in Fig. 10, indicating that the cattle have completed their passage through the designated lane, the local ID is removed. Additionally, if cattle exit the lane or remain undetected for ten consecutive frames within the designated Region of Interest (ROI), the local ID is also removed from the database. Careful management of local IDs ensures that the tracking records remain aligned with the real-time movements and presence of cattle within the monitored area, contributing to accurate and reliable tracking outcomes.

Upon the culmination of the tracking process, the cattle are systematically organized based on their unique cattle IDs, creating distinct folders as depicted in Fig. 11. Each of these folders contains a collection of essential images and binary masks. These include binary images that accentuate the delineation of the cattle, color mask images that highlight specific features, original binary masks that capture the raw attributes, and the unaltered original images of the cattle. This meticulous organization ensures that each cattle's data is readily accessible and preserved, contributing to the efficient analysis and retrieval of pertinent information for further research and examination.

Cattle tracking summary: local IDs and associated results.

Feature extraction

A sequencies of frames [f_t-n, …, f_t-1, f_t, f_t+1, f_t+n] of each identification of cattle was captured by camera. Using Frame(f_t-n) might be suitable for extracting Feature 1 (F₁), but for the other features, it could present problems. The frame (f_t) shown in Fig. 12 was the optimal choice for extracting all features, but there may be instances where the cattle head overlaps with its body. Utilizing features from all frames could result in some regions of the cattle not contributing useful information for lameness classification. Following the completion of cattle detection and tracking, individual cattle were sorted by their respective local IDs. From these sorted cattle, binary masks representing the cattle regions are obtained.

These binary masks played a pivotal role in extracting pertinent features for the subsequent lameness classification process. Figure 11 offers a visual representation of sample binary mask images, each associated with different lameness scores. The distinctive characteristic lies in the shape of the cattle's back. While normal cattle exhibit a flat back structure, lame cattle display varying degrees of an arched back, ranging from mild to prominently arched. To extract meaningful features, the binary mask video sequence was initially subjected to a labelling process to identify the cattle regions. From these binary images, two key features were extracted⁴². The first feature involves measuring the vertical distance from the top border of the image to the head of cattle region. This feature holds significance in identifying lame cattle, as their head movement during walking tends to be either subtle or visibly pronounced. This process is depicted in Fig. 13a. By using the Eq. (3) the cattle ‘head distance value variance according to the height of cattle. To overcome this, the value of distance is quotient by each cattle’s height.

Feature extractions: (a) feature 1 (f₁): distance between the upper bounding box and head region points. (b) Feature 2 (f₂): the number of black pixels within the top 10% of the frame. (c) Feature 3 (f₃): three points on the back curvature of the cattle. (d) Feature 4 (f₄): slopes of the three points on the back curvature of the cattle.

According to the paper⁴², the second feature (F₂) revolves around calculating the area of the cattle back and the inclined head region. To compute this feature, the upper portion of the cattle region, spanning 10 percent of Height (H) of frame from the top edge, is cropped. The area ratio between the black region and the cattle object region is then determined using Eq. (4). The visual depiction of this process is presented in Fig. 13b. For feature (F₃), the study employs an innovative feature extraction technique that utilizes three distinct points along the curvature of the cattle back. These key points are strategically located on the back, in the middle of the body, and on the neck. The objective of this technique is to construct a feature vector, as defined by Eq. (5), which is a cropped binary image of the cattle anatomical region shown in Fig. 13c.

F_{1} = y_{p 3}

F_{2} = \frac{# (black pixel)}{W*H}

F_{3} = \frac{1}{H} (y_{p 1,} y_{p 2,} y_{p 3,})

F_{4} = (\frac{y_{p 2} - y_{p 1}}{x_{p 2} - x_{p 1}}, \frac{y_{p 2} - y_{p 3}}{x_{p 2} - x_{p 3}})

In this study, we carefully identified three specific points on the curvature of cattle back within the cropped binary image. These points are designated as follows: Point 1 (P1): Located at 15% of the length of the cropped binary image width (W), point P1 corresponds to the back of the cattle. Point 2 (P2): Located at the midpoint (50%) of the length of the cropped binary image width (W), point P2 represents the midriff or midsection of the cattle. Point 3 (P3): This point signifies the head of the cattle and is situated at 85% of the cattle's length within the cropped binary image width (W). By employing this method, we aim to extract valuable information from the curvature of the cattle back, which ultimately contributes to the feature vector and facilitates the analysis of the cattle features. Another enhancement in this study, feature 4 illustrated in Fig. 13d, is the calculation of two different gradients associated with the three points along the curvature of the cattle back described in Eq. (6). This enhancement was introduced to further enhance the analysis of cattle curvature. Incorporating information on specific points and gradients has the potential to increase the accuracy and comprehensiveness of the study results. The strategic placement of these three identified points along the curvature of the cattle back serves an important purpose. Each of these points is meticulously placed to address a specific area of interest. This strategic placement facilitates targeted analysis and accurate feature extraction within the study. This innovative approach not only considers the spatial distribution of these points, but also the gradients that characterize the transitions between these points. This holistic approach is expected to provide more nuanced insights into cattle curvature, thereby contributing to a more refined understanding of cattle anatomy.

Cattle lameness classification

In the Lameness classification phase, as the system needs correct ground truth dataset, we engaged a group of cattle experts to check all the cattle from farm manually and store the global identification and their lameness level sorted by the time frame that pass through the specific lane. The cattle experts check the cattle lameness level for the following dates as shown in Table 3.

Table 3.

Number of cattle in each lameness level.

Lameness level	Lame 1	Lame 2	Lame 3	Lame 4
No of cattle	84	24	7	1

Open in a new tab

Performance evaluation methods

P r e c i s i o n = \frac{TP}{TP + FP}

R e c a l l = \frac{TP}{TP + FN}

where TP = true positive, FP = false positive, FN = false negative, precision is measuring the percentage of correct positive predictions among all predictions made; and recall is measuring the percentage of correct positive predictions among all positive cases.

Ethics declarations

Ethical review and approval were waived for this study, due to no enforced nor uncomfortable restriction to the animals during the study period. The image data of calving process used for analysis in this study were collected by an installed camera without disturbing natural parturient behavior of animals and routine management of the farm.

Experimental results

In the area of computer vision and machine learning, Python was primarily used as the programming language; the PyTorch framework was employed to build the deep learning components, including instance-segmentation for Detectron2; Scikit-learn was used for lameness classification of various classifiers was employed to implement the data. Additionally, data preprocessing tasks were performed using OpenCV, while data analysis and visualization were performed using NumPy, Pandas, and Matplotlib. This study covers cattle detection, tracking, and lameness classification as separate processes, and their performance metrics and results are calculated as follows.