Abstract
Cucumber disease detection under complex agricultural conditions faces significant challenges due to multi-scale variation, background clutter, and hardware limitations. This study proposes YOLO-Cucumber, an improved lightweight detection algorithm based on YOLOv11n, incorporating four key innovations: (1) Deformable Convolutional Networks (DCN) for enhanced feature extraction of irregular targets, (2) a P2 prediction layer for fine-grained detection of early-stage lesions, (3) a Target-aware Loss (TAL) function addressing class imbalance, and (4) Channel Pruning via Batch Normalization (CPBN) for model compression. Experiments on our cucumber disease dataset demonstrate that YOLO-Cucumber achieves a 6.5% improvement in mAP@50 (93.8%), while reducing model size by 3.87 MB and increasing inference speed to 218 FPS. The model effectively handles symptom variability and complex detection scenarios, outperforming mainstream detection algorithms in accuracy, speed, and compactness, making it ideal for embedded agricultural applications.
Keywords: Cucumber disease detection, YOLOv11n, Deformable Convolution Networks (DCN), Target-aware Loss, Channel pruning, Embedded deployment, Precision agriculture
Introduction
In recent years, as China's strategy for high-quality economic development has been deeply implemented, developing new quality productive forces has become a national strategic priority. New quality productive forces represent a modern form of productivity driven primarily by scientific and technological innovation, achieving simultaneous improvements in resource allocation efficiency and output quality through digital, intelligent, and green transformation. In agriculture, developing new quality productive forces means empowering traditional agriculture with advanced technologies such as artificial intelligence and big data to build efficient, precise, and sustainable modern agricultural production systems. The optimized intelligent cucumber disease detection algorithm proposed in this study serves as a technological engine that provides efficient and precise decision support for agricultural production by improving the accuracy, real-time performance, and adaptability of disease diagnosis. This contributes to reducing pest and disease losses, improving agricultural product quality, and driving agricultural development toward intensive, efficient, and green directions, thus providing important technical support for building new agricultural productivity.
Crop yield and quality losses caused by diseases have become a major challenge facing global agricultural production. Research has shown that under natural conditions, plant diseases can cause 20–80% of crop losses [1]. With the development of intelligent agriculture, deep learning technology has demonstrated enormous potential in the field of crop disease detection [2, 3].
However, the automatic detection of agricultural diseases in actual agricultural production environments faces numerous challenges including complex background interference, uneven illumination, and diverse lesion characteristics. Particularly in complex scenarios such as greenhouse cultivation, changes in lighting, occlusion phenomena, and background complexity significantly affect detection accuracy [4, 5]. Cucumber, as an important economic crop, has particularly prominent disease problems [6].
Existing research has explored solutions from different perspectives. One approach is to improve the feature extraction capability through enhanced deep learning model structures. For instance, Wang et al. [7] proposed a text-vision fusion framework called WCG-VMamba, which improved the accuracy of corn disease recognition through image-text cross-modal feature alignment. Li et al. [8] designed a multi-modal classification model. Another approach involves building more comprehensive datasets to enhance model generalization capabilities, such as the large-scale dataset containing 271 plant diseases constructed by Liu et al. [9]. Nevertheless, numerous challenges remain when facing complex backgrounds in actual agricultural production environments. In particular, the balance between detection accuracy, operational efficiency, and model lightweight design has not been well addressed.
Although YOLO series models have made significant progress in the field of object detection [10–12], existing research still has notable limitations: First, most models perform well in laboratory environments but show significantly degraded performance in actual complex environments. Second, existing methods often overlook the recognition requirements for disease features at different scales, especially with poor detection performance for early small lesions [13]. Among these, YOLOv11n, as the most recently proposed object detection model, despite demonstrating its efficiency and lightweight advantages in various scenarios, still faces performance bottlenecks in complex greenhouse environments. Specifically, in situations with dramatic lighting changes and frequent leaf occlusion, YOLOv11n is prone to miss detections and false detections.
To address these limitations in cucumber disease detection under complex agricultural environments, we propose YOLO-Cucumber, an improved lightweight object detection framework built upon YOLOv11n. Our model integrates four key technical innovations that collectively enhance detection precision while optimizing computational efficiency for deployment on resource-constrained platforms.
While YOLO-Cucumber builds upon YOLOv11n, it incorporates several novel enhancements that set it apart from prior approaches in agricultural disease detection. First, we introduce Deformable Convolution Networks (DCN) to improve feature extraction for small targets under complex agricultural backgrounds. Second, the design of the P2 small-target detection layer, in place of the larger P5 layer, optimizes the model's ability to detect early-stage diseases in small lesions. Additionally, the Target-aware Loss (TAL) function addresses class imbalance, which is particularly problematic for early disease stages. Finally, the use of Channel Pruning via Batch Normalization (CPBN) reduces computational overhead, enabling the model to perform efficiently on embedded platforms.
Experimental results show that the improved YOLO-Cucumber model achieved significant improvement in the mAP50 metric compared to the baseline, while inference speed was enhanced and model parameter count was reduced, demonstrating good practical value.
The structure of this study is as follows: The second section reviews relevant literature; the third section introduces materials and methods; the fourth section presents experimental results and discusses them; the final section summarizes the research findings and outlines future research directions.
Literature review
In recent years, with the continuous growth of global food demand and the intensification of climate change, crop disease management faces unprecedented challenges [14]. Plant disease detection technology is essential for achieving precision agricultural management. With the rapid development of computer vision and deep learning technologies, intelligent disease detection methods have made significant progress, gradually expanding research from controlled laboratory environments to complex natural environments.
Advances in disease recognition under laboratory conditions
In laboratory environments, researchers have conducted extensive disease recognition work on standard datasets such as PlantVillage. Barbedo [15] systematically analyzed the key factors affecting deep learning in plant disease recognition by constructing a large-scale dataset containing nearly 50,000 images. Zhang et al. [16, 17] achieved good results even with small sample datasets by introducing transfer learning strategies.
To improve recognition accuracy, various strategies have been proposed. Toda and Okura [18] revealed the feature extraction mechanism of Convolutional Neural Networks (CNNs) in disease diagnosis, showing that CNNs can capture lesion color and texture features similar to human expert judgment. Wang et al. [7, 19, 20] proposed a tomato disease recognition method based on the Data-efficient Image Transformer (DeiT) model, enhancing recognition capability for complex symptoms. Zhao et al. [21] proposed a data generation method based on DoubleGAN, effectively addressing the data imbalance problem.
Attri et al. [22] designed a quantum-inspired deep learning model, EQID, achieving recognition accuracies of 98.96% and 99.61% on potato and tomato disease datasets, respectively. Karantoumanis et al. [23] proposed a real-time detection method for legume crop diseases using data augmentation and deep learning, which addressed the insufficiency of training data. They found that in complex natural environments, target occlusion and overlap caused a detection accuracy reduction of over 25%.
However, these laboratory achievements face serious challenges in practical applications. Standard datasets mostly involve ideal lighting conditions, lacking the variation found in natural environments. Additionally, the single background in laboratory settings cannot simulate complex field conditions, and existing models struggle to detect small targets, such as early lesions, which are crucial for timely warnings. These limitations have led to a shift in focus toward disease detection in complex natural environments.
Disease detection methods under complex backgrounds
In actual agricultural environments, disease detection faces three main challenges: complex background interference, small target detection, and real-time requirements. Research has shifted toward precise detection under complex backgrounds.
To address complex background interference, various deep learning-based solutions have been proposed. Bonora et al. [24] applied the YOLO architecture to fruit disease detection, achieving an accuracy of 64.7% in real-world environments. Mo et al. [25] introduced a cross-domain dynamic attention mechanism, improving model robustness in complex backgrounds. Hu et al. [26] designed a dual feature enhancement network, significantly enhancing adaptability to situations such as occlusion and shadows. Yang et al. [27] developed FSM-YOLO, which improved the detection of apple leaf diseases by capturing adaptive features and spatial context.
For small target detection, Cheng et al. [28] proposed a wheat Fusarium head blight fungal spore detection method based on the YOLOSE network. Zhang et al. [29] designed the Yolov5-ECA-ASFF detection algorithm using attention mechanisms, achieving 98.57% accuracy for wheat Fusarium spore recognition. However, detection performance decreases significantly when the lesion area is less than 5%, particularly for early lesions, highlighting limitations in current methods [30].
Considering practical application needs, lightweight design and real-time performance have become important research directions. Liu et al. [31] proposed EFDet, reducing model complexity by optimizing the backbone network and feature fusion module. Xu et al. [32] designed a real-time detection system based on an improved YOLO v5s, reducing model size by 85% while maintaining high accuracy. Johri et al. [33] compared various deep transfer learning techniques in cotton disease detection applications, providing insights for lightweight model design. Nonetheless, the trade-off between lightweight design and detection performance remains an urgent challenge.
Focusing on deployment requirements, researchers have also turned to intelligent monitoring systems. Kumar et al. [34] proposed a rice disease detection method based on a bidirectional feature attention pyramid network. Wójcik Gront et al. [35] highlighted the role of artificial intelligence in agricultural genomics research, offering new directions for disease prevention and control. Ye et al. [36] proposed a data augmentation method based on adversarial generative networks to enhance model generalization by generating training samples under different lighting conditions. In real-world applications, intelligent monitoring systems combining IoT technology and deep learning algorithms can provide early disease warnings [37]. While edge computing-based lightweight solutions meet real-time detection needs [38], these systems still face challenges in reliability and stability in complex field environments.
Research challenges
Object detection techniques have seen transformative advancements across various domains, enhancing complex target recognition [39], smart IoT-based surveillance [40], and pedestrian group detection in dynamic environments [41]. Additionally, optimization in IoT application execution and security mechanisms has contributed to real-time computational efficiency and robust adversary behavior detection [16, 42]. While these advancements have transformed various fields, their adaptation to agriculture requires overcoming distinct obstacles such as leaf occlusion and microscopic pathogen features. Agricultural applications face unique biological variability challenges as disease symptoms evolve dynamically and plants exhibit diverse morphological characteristics at different growth stages.
Despite significant progress in plant disease detection using deep learning, challenges persist in practical applications. Ahmed et al. [43] noted that current technologies face problems including uneven data quality, system integration difficulties, and limited application scenarios. Data acquisition and annotation are severely affected by lighting variations, angle, and occlusion [7], while disease symptom features in complex field environments are often unclear [44]. Models frequently show poor performance when transferred to new environments [45], and symptom similarity complicates feature extraction [46]. Non-IID datasets pose challenges to federated learning systems [47], while complex networks struggle to meet real-time monitoring needs [48]. Leaf overlap and occlusion affect detection accuracy [49], and disease progression dynamics increase detection difficulty [50]. Transfer performance between different regions and varieties needs improvement [27], while traditional recognition methods lack precision and robustness [51]. Feature extraction and target localization under complex backgrounds require higher accuracy [52], and interference between disease features reduces diagnostic precision [53].
In summary, the main gaps in current plant disease detection research are: (1) insufficient detection accuracy under complex backgrounds, particularly for early small-area lesions; (2) difficulty balancing model real-time performance with detection accuracy; and (3) insufficient adaptability to environmental factors such as lighting changes and target occlusion. These challenges hinder the application of disease detection technology in agricultural production. To address these issues, this study proposes an improved YOLOv11-based method for cucumber disease detection, aiming to achieve adaptable, efficient, and stable detection under complex backgrounds. The innovative solutions proposed in this study are expected to provide new approaches to solving disease detection challenges in real-world agricultural production.
Materials and methods
Cucumber disease image dataset
Currently, there is a lack of high-quality publicly available datasets specifically for cucumber disease detection. Therefore, this research constructed a cucumber disease detection dataset in greenhouse scenarios to provide data support for subsequent research. The cucumber disease dataset used in this study was collected from greenhouse cultivation bases in Shouguang City, Weifang, Shandong Province, China, with a geographical coordinate of 36.884689°N latitude and 118.775547°E longitude. The dataset covers various disease types at different growth stages, under varying lighting conditions (e.g., sunny, cloudy) and environmental variations (e.g., temperature, humidity). Images were captured using a Canon EOS 90D DSLR camera equipped with an 18-135mm lens for standard shots and a 100 mm macro lens for detailed symptom capture. All images were acquired at a resolution of 3456 × 2304 pixels (8 megapixels) and stored in JPEG format to maintain high image quality essential for accurate feature extraction during model training and evaluation. These image samples effectively reflect the complexity and diversity of cucumber diseases in actual production processes. Sample images are shown in Fig. 1.
Fig. 1.
Samples of cucumber disease images
The images were annotated by plant protection experts using LabelImg software, with bounding boxes drawn around disease lesions and their corresponding disease types. The dataset was divided into training, validation, and testing sets in an 8:1:1 ratio, respectively used for model training, parameter tuning, and performance evaluation. The detailed composition and statistical information of the dataset are shown in Table 1.
Table 1.
Sample counts of cucumber disease types
| Category | Quantity | TrainingSet | ValidationSet | Test Set |
|---|---|---|---|---|
| Healthy | 1630 | 1304 | 163 | 163 |
| Anthracnose | 880 | 704 | 88 | 88 |
| Bacterial Wilt | 790 | 632 | 79 | 79 |
| Pythium Fruit Rot | 750 | 600 | 75 | 75 |
| Downy Mildew | 690 | 552 | 69 | 69 |
| Total | 4740 | 3792 | 474 | 474 |
Cucumber disease detection model yolo-cucumber
YOLOv11 is a state-of-the-art algorithm in the field of object detection, providing faster detection speed, higher accuracy, and stronger efficiency compared to its previous generations. The YOLOv11 series includes five network models: YOLOv11n, YOLOv11s, YOLOv11m, YOLOv11l, and YOLOv11x. Among them, YOLOv11n is a lightweight version, specifically designed for real-time detection tasks on resource-constrained embedded platforms. YOLOv11n includes a backbone network, neck network, and head network. The backbone network extracts multi-scale features through convolutional layers and improved C3k2 modules. Subsequently, the Spatial Pyramid Pooling Fast (SPPF) module and Cross-layer Spatial Attention (C2PSA) layer further strengthen the transmission of multi-scale features and optimize the fusion of high-level semantic features with low-level details. The neck network adopts a Feature Pyramid Network plus Path Aggregation Network (FPN + PAN) structure, further enhancing the detection capability for multi-scale targets by combining upsampling and downsampling operations. The head network consists of three decoupled detection heads, responsible for predicting classification scores and regression coordinates. Through task alignment mechanisms, the head network effectively improves the accuracy of classification and localization while jointly optimizing classification scores and Intersection over Union (IoU), suppressing the generation of low-quality prediction boxes.
We selected YOLOv11n as the baseline model because it strikes an optimal balance between detection accuracy and computational efficiency. YOLOv11n, being a lightweight version of YOLOv4, has shown strong performance in real-time applications, making it suitable for embedded systems. Furthermore, its simplicity allows for the introduction of additional optimizations like deformable convolutions and pruning, which are central to our model’s enhancements.
Although YOLOv11n has demonstrated its efficiency and lightweight advantages in various scenarios, it still faces performance bottlenecks in complex greenhouse environments. Specifically, in situations with dramatic lighting changes and frequent leaf occlusion, YOLOv11n is prone to miss detections and false detections. To address this, this study proposes a cucumber disease detection model called YOLO-Cucumber based on YOLOv11n, as shown in Fig. 2.
Fig. 2.
YOLO-Cucumber network architecture
According to Fig. 2, the improvements include the following four aspects:
Feature Extraction Enhancement Mechanism (C3k2-DCN) Deformable Convolution Networks (DCN) are introduced into the C3k2 module, enhancing the model's feature extraction capability for multi-scale targets in complex backgrounds through adaptive receptive field adjustment, particularly for significantly deformed small targets, effectively improving the detection effect on small targets such as early lesions.
Small Target Detection Layer Optimization Strategy (SDL) Designed specifically for the high proportion of small targets in early cucumber disease images under complex backgrounds, a dedicated small target detection layer structure was implemented, adding P2 (Prediction Layer 2) for small target detection while removing P5 (Prediction Layer 5) large target layer to reduce computational redundancy, achieving efficient recognition of disease features at different scales.
Target-aware Loss Function (TAL) Proposed a Target-aware Loss (TAL) function that effectively solves the class imbalance problem in small target detection by integrating the design concepts of Focaler IoU and PIoUv2, while optimizing boundary box localization precision and accelerating model convergence.
Efficient Model Compression Strategy (CPBN) Adopted Channel Pruning via Batch Normalization (CPBN) technology, significantly reducing computational complexity while maintaining model performance by dynamically evaluating parameter importance at each layer and adaptively adjusting network structure, achieving model lightweight design and inference acceleration.
C3k2-DCN
Cucumber disease detection in complex backgrounds presents unique challenges due to scale variations, irregular morphology, and background clutter. Traditional convolutional neural networks with fixed receptive fields struggle to adapt to these varying conditions, particularly when dealing with small-scale lesions or deformed targets. To overcome these limitations, we introduce Deformable Convolution Networks (DCN) into the C3k2 module of YOLOv11n. The deformable convolution process is shown in Fig. 3.
Fig. 3.
Deformable convolution process
First, the input feature map generates offsets for each convolution kernel position through an additional convolutional layer. These offsets are then applied to the original convolution kernel, forming deformable convolution kernels. Finally, the output feature map is obtained by convolving the input feature map with these deformable convolution kernels. This method overcomes the limitations of fixed-grid convolutions, allowing adaptive feature extraction for multi-scale targets.
The C3k2-DCN structure is shown in Fig. 4. Specifically, the traditional convolution in the Bottleneck part of the C3k2 module is replaced with deformable convolution, forming a new Bottleneck-DCN structure. This module consists of two convolutional layers, where the first layer uses deformable convolution for feature extraction, and the second layer further fuses the extracted features. By introducing deformable convolution, the model can dynamically adjust the receptive field, thereby more precisely capturing the detailed features of targets, showing significant advantages particularly when processing small targets with large shape and scale variations. This method not only enhances sensitivity to complex local features of targets (such as edges and textures) but also maintains relatively low computational cost and efficient inference speed, adapting to the detection requirements of multi-scale targets under complex backgrounds. In cucumber disease target detection under complex backgrounds, the Bottleneck-DCN structure can also improve the model's localization accuracy of target boundaries, adapting to various transformations in image aspect ratio and rotation, thereby enhancing the detection effect of disease images with angle transformations. If the input feature map is X, the output after deformable convolution operation can be represented as:
![]() |
1 |
where: W is the convolution kernel; Δp is the learned offset; M is the sampling mask, indicating the importance of sampling points. The offset Δp is predicted through an additional convolutional layer, typically using bilinear interpolation to calculate the pixel values at corresponding positions, thus achieving efficient feature sampling.
Fig. 4.
C3k2-DCN model structure diagram
Small-target detection layer (SDL)
In cucumber disease images under complex backgrounds, small targets usually occupy the majority of the dataset due to shooting angles and object occlusion factors in the images themselves. Although the traditional YOLOv11n multi-scale detection framework can handle targets of different scales, it still has certain limitations in small target detection. The P3, P4, and P5 output layers of YOLOv11n are used for detecting targets of different scales, with the P5 layer mainly targeting large objects, suitable for larger target backgrounds. However, due to the larger downsampling factor of the P5 layer, it cannot provide sufficient detail information when processing small targets and has a larger computational load, which performs poorly in cucumber disease detection tasks under complex backgrounds where small targets predominate.
To address this, this study proposes removing the P5 large target detection layer and adding a new detection layer P2 specifically for small targets. The new layer enhances spatial resolution, achieving a feature map resolution of 160 × 160, compared to the 20 × 20 feature map of the P5 layer. The P2 layer can retain more small target detail information, thereby improving small target detection accuracy. Through feature fusion with the P3 and P4 layers, the P2 layer further enhances the model's feature extraction capability at multiple scales, ensuring effective capture of small target details while avoiding redundant computation. As shown in Fig. 5, the red arrows indicate the newly added structures, and the red rectangular box indicates the removed structure.
Fig. 5.
Improved object detection layer with small-target detection layer
Removing the P5 layer does not mean completely abandoning large target detection, but rather optimizing the allocation of computational resources by reducing the detection intensity for large targets, making the network more efficient when facing small targets. Moreover, since most targets in drone aerial photography are small and medium-sized due to the distance, detection for large targets is relatively less frequent, and the original P3 and P4 detection layers can already encompass this part of detection. This modification effectively reduces the computational burden, improves the accuracy of small target detection, and avoids excessively increasing model parameters, ensuring an overall improvement in computational efficiency.
The introduced small target detection layer enhances the YOLOv11n model's processing capability for drone aerial photography target detection tasks dominated by small targets, effectively reducing missed detections and false detections of small targets, improving detection accuracy while ensuring a balance in computational efficiency.
Target-aware Loss (TAL)
In images recorded for cucumber disease detection tasks under complex backgrounds, there is a certain degree of blurring, especially in the diseased areas, and the large area of planting background also increases the difficulty of detection tasks. Some penalty factors in the original loss function cause anchor box expansion behavior during regression, affecting model convergence speed. Therefore, this study proposes a Target-aware Loss function, which borrows design concepts from Focaler IoU and PIoUv2. Among them, PIoUv2 solves the problem of anchor boxes virtually increasing toward the target position during regression, resulting in slow model convergence; Focaler IoU reduces the side effects of sample imbalance and difficult-easy samples on bounding box regression. The Target-aware Loss function can accelerate model convergence speed and optimize cucumber disease detection effects through adaptive penalty factors and interval mapping methods.
The Focaler IoU function is shown in Eq. (2), which constructs IoU in the form of a piecewise function, thereby improving bounding box regression effect.
![]() |
2 |
In Eq. (2),
refers to the original value, d and u both belong to the interval from 0 to 1, with different regression samples corresponding to different d and u. Its loss is defined as shown in Eq. (3).
![]() |
3 |
By observing the Focaler IoU formula, it can be seen that when
is less than a specific value, the value of
is zero. However, the objects of this study are cucumber disease targets under complex backgrounds, many of which are relatively low-quality disease targets. Therefore, the improvement still adopts the form of a piecewise function, canceling the original two parameters, making sample data utilization more sufficient, with low-quality disease images also being well accommodated, enabling more comprehensive analysis of sample positions, thereby promoting more stable model convergence, which will have better effects on this study's dataset. Its loss is defined as shown in Eqs. (4) and (5).
![]() |
4 |
![]() |
5 |
Some common loss functions, when guiding anchor boxes, first gradually expand until completely covering the ground truth box, then perform shrinking operations to achieve linear regression, which leads to increased time cost and unreliable precision. The PIoU loss function can effectively solve the problem of anchor box expansion, as it can adaptively select penalty factors based on target size and adjust gradients based on anchor box quality. It directly minimizes the distance between the four edges of the anchor box and the ground truth box, guiding the anchor box to move to the ground truth box along a path approaching a straight line. Therefore, compared to other IoU functions, the PIoU loss function can perform linear regression more quickly. Its loss is defined as shown in Eqs. (6) and (7).
![]() |
6 |
![]() |
7 |
Where: P is the penalty factor, defined variables as shown in Fig. 6.
Fig. 6.
Defining variables for PIoU
Building upon PIoU, a non-monotonic attention layer m(x) was introduced to study the focusing mechanism, resulting in the PIoUv2 loss function by combining the attention layer with PIoU. Compared to its first version, it focuses more on medium-quality anchor boxes. The loss is defined as shown in Eqs. (8)- (10).
![]() |
8 |
![]() |
9 |
![]() |
10 |
According to Eq. (8), as p increases, q decreases. q represents the quality of an anchor box; when p equals 0, q equals 1, indicating that the ground truth box and prediction box are coincident at this moment. λ is a hyperparameter controlling attention.
Combining the Focaler IoU and PIoUv2 loss functions, the Target-aware Loss function is proposed, with its loss defined as shown in Eq. (11).
![]() |
11 |
The Target-aware Loss (TAL) function addresses class imbalance by weighting smaller lesions more heavily, ensuring that the model focuses on detecting early-stage diseases. The loss function dynamically adjusts the penalty factor based on target size and class distribution, allowing for better handling of small targets in cluttered backgrounds.
Channel Pruning via Batch Normalization(CPBN)
For the deployment of cucumber disease detection models, if the model volume is large and computational load is high, it will be hindered by high computational power costs in practical applications. Therefore, model lightweight design is necessary to remove redundant parts in the model, prune unimportant channels, and ultimately obtain a high-precision lightweight model.
First, sparse training is conducted. The purpose of sparse training is to differentiate between important channels and unimportant channels, preparing for pruning unimportant channels. An indicator is needed to evaluate channel importance. Since Batch Normalization (BN) layers have the function of channel-level scaling and shifting parameters and are widely used in neural network training optimization to improve training efficiency and prevent gradient explosion, parameters in BN layers are used as evaluation indicators. The principle of BN is:
![]() |
12 |
![]() |
13 |
where,
is the normalization result,
is the output result,
is a correction constant,γ and β are the scaling and shifting factors, respectively, and
and
are the mean and variance of each batch, with their formulas being:
![]() |
14 |
![]() |
15 |
It can be seen that the approach of the BN layer is to first normalize a batch of images to get input
, then update parameters γ and β by training the dataset's input
and output
. The γ parameter directly affects the output
; the smaller the γ value, the less important the channel information. Therefore, the γ parameter is used as the channel's scaling factor to measure channel importance. The scaling factor is added to the loss function for training, with the principle being:
![]() |
16 |
![]() |
17 |
where,
is the training loss of the network, x and y are the input and output of the training, respectively, W is the training parameter in the network;
is the penalty for the sparsity of the scaling factor, λ is the penalty factor, T is the set of all scaling factors in the BN layer;
is the L1 regularization operation performed by the BN layer on the scaling factor, multiplied by the output of the channel, then jointly training the network with weights and scaling factors.
Second, channel pruning is performed. The absolute values of all scaling factors γ in the BN layer are sorted, and a pruning ratio of 50% is set. A threshold η is set at the position of 50% from small to large. Channels corresponding to all scaling factors less than η are pruned. If all current scaling factors are less than η, the two channels with the largest values are retained to maintain the integrity of the network structure, thereby not affecting dimension matching with the backbone network. The pruning process is shown in Fig. 7, with the threshold set to 0.2.
Fig. 7.
The process of channel pruning
Finally, model fine-tuning is conducted. After arranging the scaling factors γ\gamma γ in ascending order, a pruning rate needs to be given for pruning. The higher the pruning rate, the smaller the model volume obtained. However, when the set pruning rate is too high, the accuracy after pruning will significantly decrease, but through fine-tuning, the pruned network can recover accuracy. This is achieved by using transfer learning to guide small model training with knowledge learned from high-precision large models, accelerating the convergence of smaller models to maintain high detection accuracy.
Results and discussion
Experimental environment and parameter configuration
To ensure a fair evaluation of cucumber disease detection performance, all experiments in this study were conducted on a unified hardware platform with consistent hardware configuration. This standardization ensures that any observed differences in model performance are attributed solely to the algorithmic enhancements rather than hardware variability. In order to increase the diversity and robustness of the training dataset, we employed several data augmentation techniques, including random brightness adjustments, Gaussian noise addition, motion blur, random flipping, and scale scaling. These methods are essential for making the model more resilient to various environmental factors and enhancing its generalization ability.
During model training, the AdamW optimizer was used to improve optimization efficiency, particularly in handling the sparse gradients often encountered in object detection tasks. This choice of optimizer contributes to faster convergence and more stable training. The specific hardware and software configurations used for the experiments are summarized in Table 2.
Table 2.
Hardware configuration and software environment
| Hardware | Configuration | Software | Configuration |
|---|---|---|---|
| CPU | Intel Xeon Gold 5220 CPU | System | Ubuntu 18.04 |
| GPU | NVIDIA GeForce RTX 3090 24G | PyTorch | 1.9.1 |
| Memory | 64G | CUDA | 11.4 |
| Hard Disk | 8 T | CUDNN | 8.2.4 |
The following hyperparameters were used throughout the experiments: a learning rate decay schedule of 0.5 every 30 epochs, a dropout rate of 0.3, a weight decay of 0.0005, and a batch size of 16. Additionally, we fine-tuned the hyperparameters of the Target-aware Loss (TAL) function, including the focal parameter and penalty factor, to optimize class imbalance handling, which is crucial for detecting small and early-stage lesions effectively.
Evaluation metrics
To comprehensively evaluate the performance of various models, commonly used accuracy, efficiency, and complexity evaluation metrics in the field of object detection were adopted.
Accuracy evaluation metrics: Precision (P), Recall (R), and mean Average Precision (mAP), calculated as follows:
![]() |
18 |
![]() |
19 |
![]() |
20 |
where: TP is the number of true positives, FN is the number of false negatives, FP is the number of false positives, and M is the total number of maturity levels.
Efficiency evaluation metric: Detection time (DT), which evaluates the average time required for a model to complete one detection, measuring its real-time detection capability, calculated as:
![]() |
21 |
where: N is the total number of images in the test set.
Complexity evaluation metrics: Parameters, which is the number of trainable parameters in the model; Model Size, which is the space occupied by the model on storage devices, affecting its deployment capability in resource-constrained environments.
Inference speed evaluation metric: Frames Per Second (FPS).
Experimental process
To significantly enhance the accuracy and efficiency of cucumber disease target detection, this study proposes YOLO-Cucumber, and the experimental process includes three key stages, as shown in Fig. 8.
Fig. 8.
The flow chart of vegetable disease detection
To validate the effectiveness of the proposed model, experiments were conducted on the self-built cucumber disease dataset. Figure 9 shows the convergence curves of various performance metrics of the proposed model on the training and validation sets, including bounding box regression loss (Box Loss), position loss of offsets (DFL Loss), classification loss (Classification Loss), as well as other evaluation metrics such as precision, recall, mAP@0.5, and mAP@0.5:0.95. The horizontal axis in the figure represents the number of training epochs.
Fig. 9.
Performance metrics of the proposed YOLO-Cucumber model
As can be seen from Fig. 9, during the 100 training epochs, all loss functions show a stable downward trend without drastic fluctuations or overfitting phenomena. Particularly in the early training stage (first 20 epochs), the descent rate of various loss functions is faster, indicating that the model can rapidly learn feature representations from the dataset. Meanwhile, detection performance metrics (precision, recall, mAP) exhibit a stable upward trend, gradually leveling off in the later stages, further proving that the proposed model has good convergence performance and training stability. Observing the convergence curves of mAP@0.5 and mAP@0.5:0.95, it can be found that the model's performance metric curves tend to stabilize after the 40th epoch. Ultimately, the model performs excellently on the test set: mAP@0.5 reaches 93.8%, and mAP@0.5:0.95 reaches 72%, fully demonstrating the robustness and superiority of the proposed model under different IoU thresholds. At the same time, the model exhibits good precision and recall performance, reaching 93% and 92% respectively, indicating that the model has strong generalization capability and stability in cucumber disease detection tasks. From the overall trend of training and validation metrics, the model achieves an ideal balance in cucumber disease detection tasks: ensuring high detection accuracy while possessing good recall capability.
Experimental results
To comprehensively assess the performance of the YOLO-Cucumber model in detecting various cucumber diseases, we conducted tests using a self-constructed dataset that included multiple disease categories. Table 3 presents the detection results of the YOLO-Cucumber model across five different disease types as well as healthy cucumber samples.
Table 3.
Detection results for different disease types
| Category | Precision(%) | Recall(%) | AP50(%) |
|---|---|---|---|
| Healthy | 98.5 | 98.6 | 98.2 |
| Anthracnose | 88.6 | 88.2 | 88.5 |
| Bacterial Wilt | 95.2 | 95.9 | 95.8 |
| Pythium Fruit Rot | 95.6 | 95.5 | 95.2 |
| Downy Mildew | 89.5 | 89.3 | 89.9 |
As shown in Table 3, the YOLO-Cucumber model demonstrates high detection accuracy, with precision, recall, and average precision (AP) values exceeding 88% for all five cucumber disease types, as well as for healthy samples. Specifically, the model achieves a mean Average Precision (mAP@0.5) of 93.8%, which confirms its robust performance in detecting various cucumber diseases. Additionally, the model's ability to accurately identify healthy cucumber samples helps mitigate the risks of misdiagnosis and unnecessary treatments, a crucial factor in precision agriculture.
In order to further evaluate the model’s performance, we compared the Precision-Recall (PR) curves of YOLO-Cucumber with those of the baseline YOLOv11n model. Figure 10 displays these PR curves, where the horizontal axis represents recall and the vertical axis represents precision. A curve that is closer to the upper-right corner indicates better performance. The PR curve for YOLO-Cucumber clearly covers a larger area compared to the baseline model, demonstrating a significant improvement in both precision and recall. Specifically, YOLO-Cucumber shows a 6.5% improvement in mAP50 over the YOLOv11n baseline, reaching 93.8%. This enhancement is particularly evident under experimental conditions that simulate typical greenhouse cultivation environments, where the model exhibited superior robustness in the presence of lighting variations and occlusion.
Fig. 10.
PR curve comparison
Ablation study
To further evaluate the contribution of each enhancement to cucumber disease detection performance, a series of ablation experiments were conducted using YOLOv11n as the baseline model. The results, summarized in Table 4, illustrate the impact of each improvement on detection accuracy, model size, parameters, and inference speed.
Table 4.
Ablation study results
| C3k2-DCN | SDL | TAL | CPBN | mAP(%) | Model Size(MB) | Parameters(M) | FPS |
|---|---|---|---|---|---|---|---|
| 87.3 | 5.35 | 2.58 | 186 | ||||
| Yes | 88.2 | 5.43 | 2.69 | 174 | |||
| Yes | 91.4 | 4.59 | 1.68 | 181 | |||
| Yes | 87.7 | 5.35 | 2.58 | 184 | |||
| Yes | Yes | Yes | 92.7 | 4.68 | 1.80 | 167 | |
| Yes | Yes | Yes | Yes | 93.8 | 1.48 | 0.56 | 218 |
From the results in Table 4, several insights can be drawn: First, replacing the original C3k2 module with C3k2-DCN slightly increased the model size and parameters, while the inference speed decreased by 12 FPS. However, mAP50 improved by 0.9%, indicating that the introduction of deformable convolution effectively enhances the model's ability to capture multi-scale features, especially for small and deformed targets in cucumber disease images.
Second, introducing the small target detection layer (SDL) and removing the large target detection layer led to a notable reduction in both model size and parameters by over 1 MB. This modification resulted in a significant improvement in mAP50 by 4.1%, with a minimal decrease in inference speed (only 5 FPS), confirming the efficacy of the small target detection layer for improving the detection of early lesions, which are typically small and challenging to detect.
Third, incorporating the Target-aware Loss (TAL) function resulted in a slight increase of 0.4% in mAP50, without affecting the model size or parameters. This improvement highlights TAL's ability to address class imbalance and handle low-quality samples more effectively, which is particularly crucial in early-stage disease detection.
Finally, the application of Channel Pruning via Batch Normalization (CPBN) produced significant improvements. mAP50 increased by 1.1%, while the model size was reduced by 3.87 MB and the parameter count by 1.24 MB. Inference speed improved by 51 FPS, demonstrating that CPBN not only reduces the model's computational load but also accelerates inference, all while maintaining high detection accuracy. However, extreme pruning levels could lead to a decrease in performance, and future research will focus on exploring the trade-off between compression and accuracy.
The ablation study demonstrates the complementary contributions of each proposed component. While C3k2-DCN enhances feature extraction for small and deformed targets (+ 0.9% mAP), SDL optimizes detection layer performance for early-stage lesions (+ 4.1% mAP), and TAL addresses class imbalance issues (+ 0.4% mAP). When integrated with CPBN for model compression, the comprehensive framework achieves significant improvements in both accuracy (+ 6.5% mAP) and computational efficiency (3.87MB reduction, + 32 FPS), creating an ideal solution for real-time disease detection in complex agricultural environments.
Comparative experiments
Figure 11 shows the comparative experimental results of the proposed YOLO-Cucumber model and the baseline model during the training process. The left figure (a) shows the Loss curves of YOLO-Cucumber and the baseline model. As can be seen from the figure, the baseline model has a higher Loss value and slower convergence speed, with the final Loss value still higher than that of the YOLO-Cucumber model. This indicates that the baseline model faces greater optimization challenges during the training process. The right figure (b) shows the mAP@0.5 curves of YOLO-Cucumber and the baseline model. The proposed YOLO-Cucumber model exhibits higher mAP during the training process and significantly faster convergence speed, with the final mAP value reaching 93.8%, significantly outperforming the baseline model's 87.3%. These results not only indicate that the YOLO-Cucumber model outperforms the baseline model in accuracy but also demonstrate its significant advantages in convergence and loss optimization during the training process, further validating the superiority of this model.
Fig. 11.
Comparison between the proposed YOLO-Cucumber and the baseline model
To verify the effectiveness of the proposed model, a comparative analysis was conducted with other common algorithms in this field, with results shown in Table 5. From the results in Table 5, it can be observed that: First, compared to single-stage models like SSD, YOLOv3, YOLOv5s, YOLOv6, and YOLOv8n, our model has higher detection accuracy, smaller model size and parameter count, and faster inference speed. Second, compared to the two-stage Faster R-CNN, the same superiority is observed. Third, compared to YOLOv3-tiny and YOLOv7-tiny, although our model lags behind by 67 FPS and 73 FPS respectively in terms of inference speed, it outperforms them in detection accuracy and model size, demonstrating performance advantages for detecting multi-scale small targets in cucumber disease images. Finally, compared to the baseline model YOLOv11n and the same-level YOLOv11s model, the proposed model has mAP50 values that are 6.5% and 5.6% higher, model sizes that are 3.87 and 16.56 MB lower, parameter counts that are 2.02 and 8.84 MB lower, and inference speeds that are 32 and 25 FPS higher, respectively. In summary, the proposed model outperforms other algorithms in the accuracy of cucumber disease target detection while ensuring real-time performance, with model size also reaching comprehensive optimality.
Table 5.
Performance comparison with state-of-the-art models
| Model | mAP(%) | Model Size(MB) | Parameters (M) | FPS |
|---|---|---|---|---|
| SSD | 76.7 | 28.4 | 24.20 | 171 |
| Faster R-CNN | 87.2 | 89.6 | 43.80 | 141 |
| YOLOv3 | 92.0 | 123.6 | 61.50 | 158 |
| YOLOv3-tiny | 69.8 | 17.5 | 12.70 | 285 |
| YOLOv5s | 85.6 | 16.5 | 7.10 | 158 |
| YOLOv6 | 84.2 | 8.9 | 9.71 | 176 |
| YOLOv7-tiny | 85.9 | 12.3 | 6.01 | 291 |
| YOLOv8n | 86.4 | 6.3 | 3.10 | 177 |
| YOLOv11n | 87.3 | 5.35 | 2.58 | 186 |
| YOLOv11s | 88.2 | 18.4 | 9.40 | 193 |
| Ours | 93.8 | 1.48 | 0.56 | 218 |
It is important to note that real-world cucumber disease detection presents several challenges. Disease symptoms often manifest with significant variability, as observed particularly with downy mildew, which can present as chlorosis (yellowing), necrosis (tissue death), or mixed symptoms depending on disease progression and environmental conditions. Furthermore, multiple diseases may co-occur on a single leaf, creating complex symptom patterns that are difficult to distinguish.
Our YOLO-Cucumber model addresses these challenges through several mechanisms. The DCN component enables adaptive feature extraction that can accommodate varying symptom morphologies, while the small target detection layer enhances sensitivity to early-stage symptoms that might otherwise be overlooked. The TAL function improves the model's ability to handle class imbalance issues that arise when certain symptom types are underrepresented in the training data. During model training, we deliberately included images with multiple disease presentations and varying symptom severities to ensure robustness to these real-world complexities.
As illustrated in Fig. 12, YOLO-Cucumber demonstrates robust detection capabilities across challenging real-world scenarios. Figure 12(a) showcases the model's ability to accurately identify disease symptoms at varying severity levels. This graduated detection sensitivity enables timely intervention at any phase of disease progression. Figure 12(b) highlights the model's resilience to uneven lighting conditions, a common challenge in greenhouse environments where shadows and bright spots can significantly alter visual features. Despite these illumination variations, the model maintains high detection confidence. Figure 12(c) demonstrates effective detection under partial occlusion scenarios, where leaf overlap or structural elements obscure portions of the infected tissue. These results validate the effectiveness of our proposed architectural enhancements, particularly the DCN component which dynamically adjusts its receptive field to accommodate environmental variations and morphological challenges. The visualization confirms that YOLO-Cucumber can reliably detect cucumber diseases even under the complex and variable conditions typically encountered in real-world agricultural settings.
Fig. 12.
Detection visualization of YOLO-Cucumber model under complex scenarios
Conclusion and future work
Conclusion
This study introduces YOLO-Cucumber, an advanced object detection framework based on the YOLOv11n model, aimed at addressing key challenges in cucumber disease detection under complex agricultural conditions. The model incorporates four key innovations: first, the integration of Deformable Convolution Networks (DCN) into the C3k2 module, which significantly enhances feature extraction for multi-scale and irregularly shaped targets, particularly small and deformed early-stage lesions; second, the optimization of the network structure by introducing a P2 small target detection layer while removing the redundant P5 large target layer, which improves sensitivity to small lesions and reduces computational redundancy; third, the development of a Target-aware Loss (TAL) function, which effectively mitigates the class imbalance issue in small target detection and refines boundary box localization; and fourth, the application of Channel Pruning via Batch Normalization (CPBN), enabling efficient model pruning, compression, and faster inference.
Experimental results on a self-constructed cucumber disease dataset show that YOLO-Cucumber outperforms existing methods across multiple performance metrics. Specifically, it improves mAP@50 by 6.5%, reaching 93.8%, while reducing the model size and parameter count by 3.87 MB and 2.02 million, respectively. Additionally, inference speed increased by 32 FPS to 218 FPS. Ablation studies confirm the individual effectiveness of each proposed module, with CPBN pruning playing a crucial role in reducing computational complexity without sacrificing accuracy. When compared to other mainstream object detection algorithms, such as YOLOv3, YOLOv5, and Faster R-CNN, YOLO-Cucumber demonstrates superior detection accuracy, inference speed, and computational efficiency, making it highly suitable for deployment on resource-constrained embedded systems in precision agriculture applications.
Future work
Despite these advances, YOLO-Cucumber still faces challenges in two key areas: distinguishing between different symptom presentations of the same disease (e.g., chlorotic versus necrotic manifestations of downy mildew) and accurately detecting disease co-occurrence on single leaves. These limitations reflect the inherent complexity of plant disease manifestation in actual agricultural settings and highlight directions for future research.
While YOLO-Cucumber shows significant improvements, disease co-occurrence and symptom variability remain challenging areas that warrant further research. Future work will focus on developing more sophisticated multi-label classification capabilities and incorporating temporal disease progression data to improve detection accuracy for complex cases. Specifically, we aim to explore optimization strategies for real-time deployment on mobile and edge devices, such as drones and agricultural robots, to enable efficient disease detection in real-world environments. Additionally, multi-modal data fusion will be investigated to integrate diverse data sources, such as images, spectral information, and environmental factors like temperature and humidity, to improve the model's performance under varying field conditions. Lastly, efforts will be directed at enhancing the model's interpretability and transferability by examining its decision-making processes, which will improve adaptability across different crops, regions, and disease types. These advancements will contribute to the widespread adoption of intelligent disease detection technologies, supporting the sustainable development and efficiency of modern agricultural practices.
Acknowledgments
CRediT authorship contribution statement
The research design was conceptualized by JL and XW. Both JL and XW conducted the experiments, performed data analysis, and prepared the initial draft of the manuscript. XW undertook the manuscript revision. All authors have reviewed and approved the final version of the manuscript.
Declaration of generative AI and AI-assisted technologies
Throughout the preparation of this manuscript, the authors utilized various AI tools to enhance language clarity and readability. Subsequently, the authors meticulously reviewed and edited the content as necessary, assuming full responsibility for the final publication.
Authors’ contributions
The research was designed by JL and XW. JL and XW carried out experiments, analyzed data, and drafted the manuscript. The manuscript was revised by XW, XL, PY, QC and JL. All authors reviewed and approved the final version of the manuscript.
Funding
This research was funded by the Shandong Province Natural Science Foundation (Grant Nos. ZR2023MF048, ZR2023QC116 & ZR2021QC173), the Key R&D Program of Shandong Province, China (Grant No. 2024RZB0206), the Disciplinary Construction Funds of Weifang University of Science and Technology, the Supporting Construction Funds for Shandong Province Data Open Innovation Application Laboratory, the School-level Talent Project (Grant No. 2018RC002), the Weifang Soft Science Project (Grant No. 2023RKX184), and the Weifang City Science and Technology Development Plan Project (Grant No. 2023GX051, 2023JH14 & 2024GX033).
Data availability
Part of the data can be accessed at https://data.mendeley.com/datasets/tg3z7xxkdb/1. The full datasets collected in this study are available on request to the corresponding author.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
All authors have consented to the publication of this manuscript.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Jun Liu, Email: liu_jun860116@wfust.edu.cn.
Qian Chen, Email: chenqianwork2019@163.com.
Peng Yan, Email: nic@stbu.edu.cn.
References
- 1.Paul N, Sunil GC, Horvath D, Sun X. Deep learning for plant stress detection: A comprehensive review of technologies, challenges, and future directions. Comput Electron Agric. 2025;229:109734. [Google Scholar]
- 2.Badgujar CM, Poulose A, Gan H. Agricultural object detection with You Only Look Once (YOLO) Algorithm: A bibliometric and systematic literature review. Comput Electron Agric. 2024;223:109090. [Google Scholar]
- 3.Upadhyay A, Chandel NS, Singh KP, Chakraborty SK, Nandede BM, Kumar M, Elbeltagi A. Deep learning and computer vision in plant disease detection: a comprehensive review of techniques, models, and trends in precision agriculture. Artificial Intelligence Review. 2025;58(3):1–64. [Google Scholar]
- 4.Lin J, Hu W, Zhu J, Zhou G, He M, Lv M, Jiang Y. LVF: A language and vision fusion framework for tomato diseases segmentation. Computers and Electronics in Agriculture. 2024;227(P1).
- 5.Hu G, Yin C, Wan M, Zhang Y, Fang Y. Recognition of diseased Pinus trees in UAV images using deep learning and AdaBoost classifier. Biosyst Eng. 2020;194.
- 6.Ali AM, Słowik A, Hezam IM, Abdel-Basset M. Sustainable smart system for vegetables plant disease detection: Four vegetable case studies. Comput Electron Agric. 2024;227:109672. [Google Scholar]
- 7.Wang H, He M, Zhu M, Liu G. WCG-VMamba: A multi-modal classification model for corn disease. Comput Electron Agric. 2024;230.
- 8.Li T, Zhang L, Lin J. Precision agriculture with YOLO-Leaf: advanced methods for detecting apple leaf diseases. Front Plant Sci. 2024;15. [DOI] [PMC free article] [PubMed]
- 9.Liu C, Zhu H, Guo W, Han X, Chen C, Wu H. EFDet: An efficient detection method for cucumber disease under natural complex environments. Comput Electron Agric. 2021;189.
- 10.Khanam R, Hussain M. Yolov11: An overview of the key architectural enhancements. 2024. arXiv preprint arXiv:2410.17725.
- 11.Jegham N, Koh CY, Abdelatti M, Hendawi A. Evaluating the evolution of yolo (you only look once) models: A comprehensive benchmark study of yolo11 and its predecessors. 2024. arXiv preprint arXiv:2411.00201.
- 12.Hidayatullah P, Syakrani N, Sholahuddin MR, Gelar T, Tubagus R. YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review. 2025. arXiv preprint arXiv:2501.13400.
- 13.Kang R, Huang J, Zhou X, Ren N, Sun S. Toward Real Scenery: A Lightweight Tomato Growth Inspection Algorithm for Leaf Disease Detection and Fruit Counting. Plant Phenom. 2024;6. [DOI] [PMC free article] [PubMed]
- 14.Friha O, Ferrag MA, Shu L, Maglaras L, Wang X. Internet of Things for the Future of Smart Agriculture: A Comprehensive Survey of Emerging Technologies. IEEE/CAA Journal of Automatica Sinica. 2021;8(04):718–52. [Google Scholar]
- 15.Barbedo JGA. Factors influencing the use of deep learning for plant disease recognition. Biosys Eng. 2018;172:84–91. [Google Scholar]
- 16.Zhang X, Xun Y, Chen Y. Automated identification of citrus diseases in orchards using deep learning. Biosys Eng. 2021;223.
- 17.Zhang J, Bhuiyan MZA, Yang X, Wang T, Xu X, Hayajneh T, Khan F. AntiConcealer: Reliable detection of adversary concealed behaviors in EdgeAI-assisted IoT. IEEE Internet Things J. 2021;9(22):22184–93. [Google Scholar]
- 18.Toda Y, Okura F. How Convolutional Neural Networks Diagnose Plant Disease. Plant Phenomics. 2019;2019:9237136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wang C, Li Q, Chu M, Kang X, Liu G. Research on tomato disease image recognition method based on DeiT. Eur J Agron. 2024;162.
- 20.Wang X, Yang J, Wang Y, Miao Q, Zhao A, Deng J, Li L. Discrimination of leaf diseases in Maize/Soybean intercropping system based on hyperspectral imaging. Front Plant Sci. 2024;15. [DOI] [PMC free article] [PubMed]
- 21.Zhao Y, Chen Z, Gao X, Song W, Xiong Q, Hu J, Zhang Z. Plant Disease Detection using Generated Leaves Based on DoubleGAN. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2021. [DOI] [PubMed]
- 22.Attri I, Awasthi LK, Sharma TP. EQID: Entangled quantum image descriptor an approach for early plant disease detection. Crop Protect. 2024;188.
- 23.Karantoumanis E, Balafas V, Louta M, Ploskas N. Real-time disease detection on bean leaves from a small image dataset using data augmentation and deep learning methods. Soft Comput. 2024;28(21).
- 24.Bonora A, Bortolotti G, Bresilla K, Grappadelli LC, Manfrini L. A convolutional neural network approach to detecting fruit physiological disorders and maturity in ‘Abbé Fétel’pears. Biosys Eng. 2021;212:264–72. [Google Scholar]
- 25.Mo H, Wei L. Lightweight citrus leaf disease detection model based on ARMS and cross-domain dynamic attention. J King Saud Univ Comput Inform Sci. 2024;36(7).
- 26.Hu G, Yin C, Wan M, Zhang Y, Fang Y. Detection of wheat Fusarium head blight utilizing the Yolov5-ECA-ASFF network structure. Comput Electron Agric. 2024.
- 27.Yang N, Chang K, Dong S, Tang J, Wang Q. FSM-YOLO: Apple leaf disease detection network based on adaptive feature capture and spatial context awareness. Digital Signal Process. 2024;155.
- 28.Cheng T, Zhang D, Gu C, Zhou X, Qiao H. YOLO-CG-HS: A lightweight spore detection method for wheat airborne fungal pathogens. Comput Electron Agric. 2024;227.
- 29.Zhang D, Zhang W, Cheng T, Zhou X, Yan Z. Detection of wheat scab fungus spores utilizing the Yolov5-ECA-ASFF network structure. Comput Electron Agric. 2023;210.
- 30.Kalezhi J, Shumba L. Cassava crop disease prediction and localization using object detection. Crop Protect. 2025;187.
- 31.Liu J, Chen Y, Wang H. Deep-Learning-Enhanced Multitarget Detection for End-Edge-Cloud Surveillance in Smart IoT. IEEE Internet Things J. 2021;8(11):9176–86. [Google Scholar]
- 32.Xu Y, Chen Q, Kong S, Xing L, Wang Q, Cong X, Zhou Y. Real-time object detection method of melon leaf diseases under complex background in greenhouse. J Real Time Image Process. 2022;19(5).
- 33.Johri P, Kim S, Dixit K, Sharma P, Kakkar B. Advanced deep transfer learning techniques for efficient detection of cotton plant diseases. Front Plant Sci. 2024;15. [DOI] [PMC free article] [PubMed]
- 34.Kumar VS, Jaganathan M, Viswanathan A, Umamaheswari M. Rice leaf disease detection based on bidirectional feature attention pyramid network with YOLO v5 model. Environ Res Commun. 2023;5(6).
- 35.Wójcik Gront E, Zieniuk B, Pawełkowicz M. Harnessing AI-Powered Genomic Research for Sustainable Crop Improvement. Agriculture. 2024;14(12).
- 36.Ye R, Shao G, Yang Z, Sun Y, Gao Q, Li T. Detection Model of Tea Disease Severity under Low Light Intensity Based on YOLOv8 and EnlightenGAN. Plants. 2024;13(10). [DOI] [PMC free article] [PubMed]
- 37.Ferrag MA, Shu L, Friha O, Yang X. Cyber security intrusion detection for Agriculture 40: Machine learning-based solutions, datasets, and future directions. IEEE/CAA Journal of Automatica Sinica. 2022;9(3):407–36. [Google Scholar]
- 38.Nagasubramanian G, Sakthivel RK, Patan R, Sankayya M, Daneshmand M, Gandomi AH. Ensemble classification and IoT-based pattern recognition for crop disease monitoring system. IEEE Internet Things J. 2021;8(16):12847–54. [Google Scholar]
- 39.Wang Y, Zhang J, Li X. Efficient complex ISAR object recognition using adaptive deep relation learning. IET Comput Vision. 2019;13(4):379–87. [Google Scholar]
- 40.Liu X, Min W, Mei S, Wang L, Jiang S. Plant Disease Recognition: A Large-Scale Benchmark Dataset and a Visual Region and Loss Reweighting Approach. IEEE transactions on image processing. 2021. [DOI] [PubMed]
- 41.Zou Z, Chen H, Yang X. Social Pedestrian Group Detection Based on Spatiotemporal-oriented Energy for Crowd Video Understanding. KSII Trans Internet Inf Syst. 2018;12(8):3851–73. [Google Scholar]
- 42.Peng X, Ren J, She L, Zhang D, Li J, Zhang Y. BOAT: A block-streaming app execution scheme for lightweight IoT devices. IEEE Internet Things J. 2018;5(3):1816–29. [Google Scholar]
- 43.Ahmed I, Ahmad M, Ghazouani H, Barhoumi W, Jeon G. Intelligent Computing for Crop Monitoring in CIoT: Leveraging AI and Big Data Technologies. Expert Syst. 2025;42(2):e13786. [Google Scholar]
- 44.Vásconez JP, Vásconez IN, Moya V, Calderón-Díaz MJ, Valenzuela M, Besoain X, Cheein FA. Deep learning-based classification of visual symptoms of bacterial wilt disease caused by Ralstonia solanacearum in tomato plants. Computers and Electronics in Agriculture. 2024;2024(227):109617. [Google Scholar]
- 45.Jelali M. Deep learning networks-based tomato disease and pest detection: a first review of research studies using real field datasets. Front Plant Sci. 2024;15. [DOI] [PMC free article] [PubMed]
- 46.Jian T, Qi H, Chen R, Jiang J, Liang G. Identification of tomato leaf diseases based on DGP-SNNet. Crop Protection. 2025;187.
- 47.Hari P, Singh MP. Adaptive knowledge transfer using federated deep learning for plant disease detection. Comput Electron Agric. 2024;229.
- 48.Cheemaladinne V, Reddy KS. Tomato leaf disease detection and management using VARMAx-CNN-GAN integration. J King Saud Univ Sci. 2024;36(8).
- 49.Mathieu L, Reder M, Siah A, Ducasse A, Perry CL. SeptoSympto: a precise image analysis of Septoria tritici blotch disease symptoms using deep learning methods on scanned images. Plant Methods. 2024;20(1). [DOI] [PMC free article] [PubMed]
- 50.Chang S, Yang G, Cheng J, Feng Z, Fan Z. Recognition of wheat rusts in a field environment based on improved DenseNet. Biosyst Eng. 2024;238.
- 51.Bouni M, Hssina B, Douzi K, Douzi S. Synergistic use of handcrafted and deep learning features for tomato leaf disease classification. Sci Rep. 2024;14(1). [DOI] [PMC free article] [PubMed]
- 52.Zhang D, Luo H, Cheng T, Li W, Zhou X. Enhancing wheat Fusarium head blight detection using rotation Yolo wheat detection network and simple spatial attention network. Comput Electron Agric. 2024;211.
- 53.Li M, Cheng S, Cui J, Li C, Li Z. High-Performance Plant Pest and Disease Detection Based on Model Ensemble with Inception Module and Cluster Algorithm. Plants. 2024;12(1). [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Part of the data can be accessed at https://data.mendeley.com/datasets/tg3z7xxkdb/1. The full datasets collected in this study are available on request to the corresponding author.

































