. 2025 Aug 12;11:e3079. doi: 10.7717/peerj-cs.3079

Table 8. Strengths and limitations of the reviewed studies.

Study	Strengths	Limitations
Li et al. (2015)	The authors proposed an RCNN-based detector, enhanced with traditional techniques, including a GPU-accelerated CHT for efficient proposal generation. This detector is trained to detect various types of balls, identifying 8 different classes of sports balls.	The comparison of the proposed model with two other algorithms highlights its weak points, getting a Recall rate of 14.6% below the best Recall-valued method. In terms of Inference Speeds, it also falls short. Although it achieves a better average overlap, its Precision rating is mediocre, especially when the objects are distant.
Ali Shah et al. (2018)	The authors managed an embedded ball detection and tracking solution using only conventional methods and mounted on a moving robot.	The authors mention real-time operation, but don’t provide any information regarding Inference times. They also experienced problems with synchronization between the robot and the camera, as well as motion blur from the movement of both, which significantly affected tracking. The only metric mentioned is Accuracy in detection and tracking.
Reno et al. (2018)	The authors present a CNN-based approach to tennis ball recognition, achieving great Accuracy and Precision rates while also being robust to variable lighting and visual noise.	The system has problems with false positives, particularly when court lines are mistaken for moving balls. The classifier also struggles with blurred ball images and occlusion caused by athletes and racquets. There is also no mention of real-time functioning or Inference times.
Burić, Pobar & Ivasic-Kos (2018a)	The authors evaluated YOLOv2 and Mask RCNN models, highlighting the trade-offs between Accuracy and speed. As expected, the YOLO model achieves good speeds, is ideal for real-time applications, and is more accurate than the Mask R-CNN model. They also compared different types of training, from pretrained models to training on public datasets to training on custom datasets, shedding light on the differences in these approaches.	The YOLO model reached good speeds but lacked Accuracy when the objects were distant. On the other hand, Mask R-CNN was a lot more reliable but slower, proving a challenge when applied in non-online systems or for faster object detection. This leaves a lot of room for a lightweight yet reliable middle-ground detector.
Burić, Pobar & Ivašić-Kos (2018b)	The authors test and evaluate 3 different object detection methods, touching on both one-stage and two-stage deep learning, and traditional computer vision. The latter proved unable to handle the complex task of handball detection. Still, the deep learning models yielded interesting results, showing the contrast between the superior Accuracy of a two-stage detector and the superior speed of a one-stage detector.	The two-stage detector proved somewhat accurate but computationally heavy, making it a tough choice for real-time applications. On the other hand, the one-stage model easily reached real-time Inference Speeds, but had problems detecting small or distant objects and was less reliable in detection. Overall, none of the models could satisfy the high reliability of detection required, and there were issues with false positives and false negatives.
Teimouri, Delavaran & Rezaei (2019)	The authors introduce a novel two-step ball detection pipeline that combines an efficient region proposal and a lightweight CNN, achieving real-time performance on low-end hardware.	The system proved susceptible to variations in lighting conditions and motion blur. It also had difficulty detecting new ball patterns for which it wasn’t trained.
Renolfi de Oliveira et al. (2019)	This article extensively evaluates and compares different configurations of an SSD model for real-time object detection in constrained CPU-only environments.	The proposed algorithm achieves acceptable Inference Speeds for the hardware on which it was processed, but still cannot manage smooth operation with consistent Accuracy, as there are significant trade-offs between algorithms. It showed significantly weaker detection Accuracy and Precision while running on the lightweight configurations.
Barry et al. (2019)	Proposal of a new subversion of YOLO with great computational efficiency, overtaking even tiny-YOLO variants in terms of speed, ideal for low-power devices and real-time applications.	The proposed model trades some Accuracy for superior efficiency, achieving subpar reliability. This is a deal-breaker in use cases where high Precision is the priority.
Deepa et al. (2019)	This article made an effective (although specific) comparison of three well-known and popular object detection methods and achieved real-time Inference Speeds. A handy graphical user interface was also designed to measure and evaluate the different approaches.	Although relevant to the article in question, the authors evaluated the proposed system in unusual metrics. This makes it hard to compare the approach to alternative systems and use-cases.
Calado et al. (2019a)	The authors create a stereo-vision computer vision-based detection model applied to boccia. They also use a developed graphical user interface to display the progress of the game’s results, calculated by the automated system.	Lack of information about the model’s training and evaluation metrics brings unreliable results. The camera setup is also fixed, with particular lighting conditions and camera locations and angles.
Calado et al. (2019b)	The authors make improvements on the Boccia framework in terms of flexibility and adaptability, bringing machine and deep learning ball detection into the game of boccia, an unpopular sport in the field of object detection. A comparison between machine and deep learning methods is also done through HOG-SVM and Tiny-YOLO.	The authors claim the evaluation of the system is in real-time, but hardly evaluate it, and mention a maximum of 8 FPS for the fastest algorithm. The dataset was also limited in both size and variability, contributing to underfitting.
Wang et al. (2019)	The authors propose a very accurate and quick algorithm that competes with state-of-the-art golf swing tracking. The authors’ approach outperformed a professional golf-certified device in virtually every metric evaluated.	Although very effective, this system requires a specific setting mounted at the top of a well-lit room. It also doesn’t track the ball after the hit. Occlusion with the golfer’s body may also affect the system’s performance.
Tian, Zhang & Zhang (2020)	The authors developed an anchor-free data-augmented detection framework for tennis ball detection. Their approach addresses challenges specific to real-world sports environments and focuses on effectively detecting high-speed tiny objects like the tennis ball. The proposed model outperforms the base YOLOv3 model in Precision, Recall, and F1-score.	The proposed algorithm was only tested offline, as there is no mention of real-time applicability or Inference Times. The hardware used in the processing is also unknown. The algorithm may have high computational costs.
Zhang et al. (2020)	Given the very small profile of golf balls and the high speed they can get to, it is a fair challenge to detect and track, especially in real-time. The authors focus on combating this issue, and with the help of computer vision techniques, they test, evaluate, and compare one-stage and two-stage deep learning methods, achieving great Inference Time and Accuracy accordingly.	Although their algorithms perform well, they require a high-end computing setup. Given the difficult task, running the same methods on constrained hardware would not yield the same results, especially on the two-stage method tested. They also claim that better results would be possible if a bigger dataset were used for training.
Sheng et al. (2020)	The authors develop a lightweight, real-time algorithm optimized for table tennis ball detection, based on feature fusion with fine-tuning and pruning. It is capable of extremely fast and accurate detection, a feat considering how small a table tennis ball is.	The processing was done using a medium-end GPU, and although the speed was good, no testing was done on more constrained hardware. Although the dataset used for training was extensive in quantity, all the images were of the same table and white balls, which may result in underfitting if the algorithm is applied to other tables and balls of different colours and features. Lighting variations may also pose a problem.
Fatekha, Dewantara & Oktavianto (2021)	The authors proposed a traditional algorithm capable of averaging 31 FPS on a low-end minicomputer.	This conventional computer vision system uses colour threshold filters with values specific to the scene it was tested in, and therefore its results would vary greatly in different scenarios. False positives due to similarly coloured objects being wrongly identified posed an issue. The evaluation is lacking, since the only metric tested was the time the system required to fully process each frame, thus not giving information on the Accuracy or Precision of the algorithm.
Meneghetti et al. (2021)	This article provides a comparison and evaluation of a wide array of algorithms in various conditions and hardware. It presents strong research into CNN’s performance in constrained hardware scenarios.	Computational cost was an issue, as there was a big fluctuation in the Accuracy-Inference trade-off. The dataset used for training was also comprised of a single football, and although the ball’s location changed and the lighting conditions varied, the court remained the same in all images. This may have resulted in underfitting for different scenarios.
Hiemann et al. (2021)	The article evaluated extensively on various metrics, many different versions of detection models, differentiated in training, augmentation, motion, real-time applicability, and tuning.	The authors emphasize real-time but only manage a maximum average of 12.9 FPS. Other than the computational cost, motion blur and occlusion issues are also mentioned. The article also aims to address general ball detection in varied sports, but the focused dataset might prove problematic when applying the proposed algorithm to other sports.
Balaji, Karthikeyan & Manikandan (2021)	The study introduced a new application of metaheuristic algorithms to volleyball sports analysis, comparing 3 different algorithms. The most notable of them managed great Accuracy, Precision and Recall. The model also managed shadows and fast movements well.	Real-time operation was not considered and occlusion was problematic.
Pawar et al. (2021)	The authors contribute a mobile-compatible object tracking system for holonomic robots applied to industry. Although it doesn’t reach high Inference Times, given its aim to be applied to industrial needs, high FPS isn’t mandatory for most real-time applications. The model is minimal and reaches satisfactory Accuracy rates.	The sole training in detecting rugby balls may lead to underfitting to the system, since this article aims to replicate industrial object detection, a broad and generic group of target objects. It’s also worth noting that this system is minimal in terms of Inference Speed.
Liu et al. (2021)	The article introduces a novel method that simultaneously detects and matches related objects using a single proposal box with multiple predictions, eliminating the need for post-processing matching.	Model’s performance highly depends on the overlap between the player and the related object.
Hassan, Karungaru & Terada (2023)	Proposal of a real-time capable hybrid system consisting of an enhanced version of YOLOv7 and various computer vision techniques for detecting handball events in football, using a single camera. The authors apply instance segmentation and tracking techniques to overcome limitations in bounding box-based methods.	Occlusion is a clear problem, given football’s dynamic and fast-paced nature, where players and referees can easily block the line of sight. The results also show problems with false negative detections.
Keča et al. (2023)	The article focuses on deep learning-based ball detection on low-power devices, one of the biggest recurring challenges in sports. The efficiency of many object detection and segmentation architectures was extensively analysed under these conditions and implemented on a Raspberry Pi-controlled robot under varied setups.	Although the algorithms were efficient, real-time application was still a complication due to the extremely constrained hardware used.
Li et al. (2023)	Authors present a high-quality system using 36 cameras, effectively addressing the challenge of small object detection and occlusion. It is also capable of real-time detection and 3D localization.	System developed is very high-budget. It is also reliant on camera calibrations. Changes in the environment may pose a problem.
Kulkarni et al. (2023)	The article introduces a novel method for detecting and classifying table tennis strokes using only 2D ball trajectory data, with high accuracy in ball tracking.	Real-time applications are limited to high-performance hardware. Accuracy in stroke classification could be higher. Occlusion may pose a problem.
Zhao (2024)	Proposal of an upgraded YOLOv5-based method, enhanced with various modules that help with some of the most common challenges in object detection, such as illumination, occlusion, and computational power. Through extensive research and evaluation, the authors managed to achieve better results than the base YOLOv5 model.	As is a recurrence in object detection applied to team sports, the system had trouble with occlusion. The algorithm’s performance depends on the quality and variety of training, as variations in player attire, pitch, or camera angles may negatively affect outcomes.
Modi et al. (2024)	The authors compare 3 different YOLO models and propose a real-time-capable hybrid system for object tracking, comprised of a deep learning algorithm enhanced with optical flow.	Limited datasets cause poor-quality training, leading to a lack of robustness and, according to the authors, bad mAP results. The 3 models were also trained on different datasets, making comparison between models harder.
Esfandiarpour, Mirshabani & Miandoab (2024)	Combination of computer vision techniques and a deep learning-based method, creating an efficient detection algorithm.	Difficulties in differentiating between the ball and similarly coloured objects due to using a colour detection filter. The dataset was also size-limited, resulting in low-quality training. Lighting variations posed an issue, probably due to the dataset used. It’s also worth noting that the evaluation included some uncommon metrics, making it difficult to compare the results of this approach with other studies.
Li & Zhao (2024)	The authors proposed an improved version of YOLOv5, enhanced to combat problems in tennis ball recognition and capable of real-time functioning. Compared to other state-of-the-art methods, these improvements managed better mAP values, FPS, and a smaller Model Size. Their approach can effectively combat common problems such as low lighting environments, multi-coloured ball detection, balls on the opposite side of the tennis net, and the objects’ fast movement.	The processing was done on a high-end computer, and although its performance was good, this algorithm isn’t lightweight enough for constrained hardware usage such as low-powered embedded devices.
Decorte et al. (2024)	The authors innovatively approached padel hit detection. Their multi-modal hit identifier is audio-based. A framework for analysing inter-player and player-to-net distances was also implemented, along with custom algorithms for hit assignment and player re-identification.	Sound-based hit detection brings new challenges to the table, as similar noises could end up in false positives, or different noises could mask the hitting of the racquet, ending in a false negative. This is especially problematic in areas with different courts next to one another, where multiple games occur simultaneously. Player tracking is also affected by occlusion, either by other people, nets, fences, or posts.
Fujimoto et al. (2024)	This work on tennis ball detection applies fine-tuning and compares 3 deep learning models, two of which are two-stage methods and the other being a somewhat underused one-stage method in sports object detection. The evaluation regarding Precision metrics was rigorous.	The evaluated models showed clear weaknesses in their average Inference Times, highlighting high computational cost. The Accuracy on the other hand, was good but not enough to consider it a reliable offline system. The dataset could also have been more extensive, even though it was varied.
Luo, Quan & Liu (2024)	The authors aimed to improve the Accuracy of small object detection for high-speed moving balls in sports, using a YOLOv8-based algorithm. This proposed model outperformed other equivalent versions regarding speed and detection reliability. The improvements included modifying the network structure, adding small object detection capabilities, and incorporating attention mechanisms.	Although the model outperformed its peers, it lacks robustness for generalized sports applications and may show limitations in very complex environments due to obscurity. The proposed algorithm shows improved Accuracy, but it could also be improved further with more rigorous training and lightweight enhancements.
Li, Luo & Islam (2024)	The authors proposed a new YOLO-based model, a hybrid algorithm enhanced with multi-feature data fusion aimed at detecting and track basketball players and actions. Many configurations were tested, and the final approach achieved great results in Accuracy.	The dataset proved lacking in variety, resulting in a lack of robustness. Occlusion is also a challenge, as it is recurrent in object detection. Real-time application is mentioned but not evaluated.
Yang et al. (2024b)	The authors introduced a new YOLO-based model, a hybrid algorithm mixed with the hourglass network to enhance feature learning across multiple levels. Other improvements included the addition of various modules to improve Accuracy, efficiency, and recognition capabilities. This method achieved superior Precision, Recall and F1-score over other tested algorithms.	Limitations to this algorithm included occasional false action recognition, such as false negatives and false positives. The evaluation of computational cost and Inference Times was also lacking.
Hu et al. (2024)	The authors propose a model that effectively tackles occlusion in basketball, outperforming some of the other state-of-the-art trackers. It also demonstrates robustness across different motion speeds.	Tracker is limited to a fixed number of players, limiting adaptability.
Fu, Chen & Song (2024)	The authors proposed a YOLO-based model, a hybrid algorithm augmented with key components for improved feature fusion, enhanced small object detection, improved feature extraction, and reduced computational load. The resulting model exhibits noticeable improvements over the base model in terms of Precision, mAP, and Recall.	Although the proposed model exhibits some noticeable improvements, its computational cost could be further reduced. According to the authors, the model may experience difficulties on low-powered devices due to the increase in GFLOPs.
Solberg et al. (2024)	The authors propose a user-friendly interfaced framework that integrates object detection, tracking, OCR, and color analysis to automate video highlight generation.	Team number recognition Accuracy is mediocre. The system also faces high inference times, which may hinder real-time applications.
Yin et al. (2024)	Introduction of a novel multi-object tracking method that, according to the authors, outperformed state-of-the-art methods, especially in occlusion-heavy scenarios.	Generalisability to other sports or different complex environments. Omission of hand keypoints reduces bounding box precision.