MD-Unet for tobacco leaf disease spot segmentation based on multi-scale residual dilated convolutions

Zili Chen; Yilong Peng; Jiadong Jiao; Aiguo Wang; Laigang Wang; Wei Lin; Yan Guo

doi:10.1038/s41598-025-87128-y

. 2025 Jan 22;15:2759. doi: 10.1038/s41598-025-87128-y

MD-Unet for tobacco leaf disease spot segmentation based on multi-scale residual dilated convolutions

Zili Chen ^1,², Yilong Peng ^1,², Jiadong Jiao ², Aiguo Wang ⁴, Laigang Wang ^1,³, Wei Lin ^2,^✉, Yan Guo ^1,^3,^✉

PMCID: PMC11754756 PMID: 39843759

Abstract

Identification and diagnosis of tobacco diseases are prerequisites for the scientific prevention and control of these ailments. To address the limitations of traditional methods, such as weak generalization and sensitivity to noise in segmenting tobacco leaf lesions, this study focused on four tobacco diseases: angular leaf spot, brown spot, wildfire disease, and frog eye disease. Building upon the Unet architecture, we developed the Multi-scale Residual Dilated Segmentation Model (MD-Unet) by enhancing the feature extraction module and integrating attention mechanisms. The results demonstrated that MD-Unet achieved 92.75%, 90.94%, 84.93%, and 91.81% for the lesion CPA, recall, IoU, and F1 metrics, respectively, with an overall Dice score of 94.67%. Furthermore, the model parameters, floating-point operations, and inference time per single image for MD-Unet were 4.65 × 10⁷, 2.3392 × 10¹¹, and 65.096 ms, respectively. Compared to Unet, PSP, DeepLab v3+, FCN, SegNet, UNET++, and DoubleU-Net, MD-Unet significantly improved accuracy while effectively managing model complexity, achieving optimal overall performance. This work provides the theoretical foundations and technical support for precise segmentation of tobacco lesions, with potential applications in the segmentation of other plant diseases.

Keywords: Deep learning, Tobacco leaf diseases, Lesion segmentation, Convolutional neural networks

Subject terms: Computational models, Image processing

Tobacco is one of China’s most important economic crops, accounting for approximately 35% of global production and playing a crucial role in agricultural and economic development¹. However, during its growth, tobacco is highly susceptible to diseases, which not only impair leaf photosynthesis and normal growth, leading to reduced yields, but also degrade the quality of tobacco leaves, further diminishing their market value and severely affecting farmers’ income^2,3. Accurate lesion segmentation forms the basis of quantitative disease diagnosis, as the segmentation results provide essential information for disease identification and severity estimation⁴. Therefore, the development of timely and accurate segmentation and diagnostic techniques is of great significance for devising effective prevention and control measures and ensuring high yield and quality in tobacco production⁵. Traditional lesion segmentation methods mainly include threshold-based⁶, edge-based⁷, region-based⁸, and clustering-based segmentation techniques⁹. However, these methods often involve overly complex procedures, cannot achieve simple end-to-end outputs, and require operators to possess relevant expertise. Additionally, they are susceptible to factors such as lighting conditions, noise, and seed point selection, resulting in insufficient accuracy and robustness, prolonged processing times, and an inability to meet real-time detection demands, thus exhibiting significant limitations.

In recent years, deep learning technology has been increasingly applied to agricultural disease segmentation and identification¹⁰. Compared to traditional methods, deep learning can automatically extract complex feature information, reducing human error, and it demonstrates superior real-time prediction capability and better generalization, thereby enabling more accurate and efficient extraction of disease characteristics^11,12. Consequently, an increasing number of researchers are combining machine vision with deep learning, successfully achieving the segmentation and diagnosis of various crop diseases, including those in apples, grapes, and potatoes¹³. In the field of tobacco disease diagnosis, although some researchers have explored deep learning-based approaches, most efforts have focused on image classification and object detection tasks. Models specifically designed for tobacco leaf lesion segmentation are scarce and often lack precision, making it difficult to achieve accurate localization of disease lesions¹⁴. Therefore, there is an urgent need to develop a high-precision model for tobacco leaf lesion segmentation, with the aim of achieving: (1) solving the problems of traditional tobacco leaf lesion segmentation methods, such as complex procedures, low accuracy, and insufficient robustness; (2) filling the research gap in existing tobacco leaf lesion segmentation models and addressing the issue of low segmentation accuracy in current models, providing the necessary technical support for precise segmentation and identification of tobacco leaf lesions to meet practical production needs; (3) enriching the application of deep learning in tobacco leaf lesion segmentation and providing insights for the segmentation of other plant diseases, thereby promoting the development of plant disease segmentation technology.

U-Net¹⁵ is an efficient encoder-decoder architecture that incorporates skip connections to preserve detailed features, enabling high-precision image segmentation even with small datasets. It has been widely applied across various fields and can be optimized for different tasks. DoubleU-Net¹⁶, by cascading two U-Net architectures, further enhances the model’s feature representation capabilities and segmentation performance. Inspired by the cascaded structure of DoubleU-Net, this study proposes a Multi-scale Residual Cavity Segmentation Model (MD-Unet) based on U-Net, designed for fine-grained segmentation of tobacco lesions. MD-Unet is a novel architecture that incorporates a series of key innovations, including multi-scale convolution modules and attention mechanisms, effectively achieving precise segmentation of tobacco leaf lesions.

Related work

Traditional plant lesion segmentation

Traditional plant disease segmentation methods, despite their significant limitations, have made important progress and demonstrated certain applicability and effectiveness. Arivazhagan et al.¹⁷ proposed a method for the segmentation and recognition of plant leaf diseases based on the extraction of texture features and the application of a support vector machine (SVM) classifier. The method initially converts leaf images from RGB to HSI color space, then applies thresholding to segment diseased areas and extracts their texture features, which are fed into the SVM for classification, achieving an accuracy of 94.74%. However, the feature selection is limited to texture features, which may result in the inability to accurately detect complex diseases due to the lack of incorporation of shape and color information. Furthermore, the restricted sample size constrains the model’s capacity for generalization. To address the low accuracy in recognizing pumpkin leaf diseases, Wang et al.¹⁸ employed K-means clustering for the segmentation of diseased areas, followed by morphological processing for the removal of noise and the marking of sample regions. The texture features of the diseased areas were extracted using the LBP operator, which was combined with grayscale images of the lesions. These features were then input into a dual-channel network for disease recognition via a Softmax classifier. The experimental results demonstrated an accuracy exceeding 96% in the recognition of leaf spot, powdery mildew, and downy mildew, although the method is computationally complex. To enhance the precision of tobacco brown spot identification, Teng¹⁹ utilized a methodology integrating thresholding, edge detection and K-means clustering for the segmentation of diseased regions and the extraction of texture features through a grey-level co-occurrence matrix. Subsequently, an enhanced binary-tree SVM was employed for classification purposes, resulting in an approximate 6% improvement in classification performance compared to the traditional one-vs-all SVM. However, the study was constrained by a relatively modest sample size of only 200 images. Furthermore, the robustness of the segmentation method with respect to complex backgrounds or varying lighting conditions requires further enhancement. Xu²⁰ proposed a saliency detection method based on seed point selection, utilizing target pixels and their features as prior information to segment frog-eye and brown spot lesions on tobacco leaves. The features were optimized using a particle swarm optimization (PSO) algorithm, and an SVM-based classification model was constructed. This resulted in an early disease recognition rate of 92% for both diseases. However, in instances where the contrast between lesions and the background is diminished, the clarity and accuracy of the segmentation may be adversely affected, particularly in the case of smaller lesions or diseases that are distributed in a sparse manner. Liu et al.²¹ proposed a method combining morphology and wavelet transform with the Otsu algorithm for tobacco leaf lesion segmentation. The method initially employs morphological processing on the background region to obtain the leaf surface image. Subsequently, it performs wavelet coefficient decomposition and low-frequency reconstruction to eliminate noise, and then utilizes the Otsu algorithm for secondary segmentation to identify lesions, achieving an accuracy of over 90%. Nevertheless, its dependence on grayscale features alone renders it less effective for images where the grayscale values of lesions and leaf surfaces are similar.

Deep learning-based segmentation of plant lesions

Research has shown that deep learning-based plant lesion segmentation techniques have achieved more advanced performance and greater accuracy. To address the issues of low accuracy and poor generalization caused by sample imbalance and complex imaging in the segmentation of apple leaf diseases, He et al.²² introduced an improved scSE attention mechanism, employed an asymmetric shuffle convolution module, and proposed the Asymmetric Shuffle Convolutional Neural Network (ASNet) through channel compression and shuffle operations. The model demonstrated an average segmentation accuracy of 94.7% for apple black rot, rust, black star disease, and healthy leaves; however, the segmentation accuracy for rust disease was relatively low, likely due to the limited sample size and the complexity of disease features. Yuan et al.²³ proposed an improved DeepLab v3 + network to tackle the challenge of accurately segmenting grape leaf black rot lesions against complex backgrounds. They innovatively incorporated a channel attention module (ECA) and a feature fusion branch based on a Feature Pyramid Network (FPN), while also optimizing the upsampling strategy to enhance pixel continuity and segmentation performance. The results indicated that this method achieved Dice scores of 91.8% and 86.1% on two test sets, although its robustness under conditions of leaf overlap and lighting variation still requires further enhancement. Nevertheless, the model’s capability in handling complex backgrounds remains to be improved. Fu et al.²⁴ addressed the issues of information loss and low segmentation accuracy in potato leaf disease image segmentation by utilizing ResNet50 as the backbone network and introducing the SE attention mechanism. They designed an improved UNet-based RS-UNet network to segment lesions of early blight, anthracnose, and late blight on potato leaves, achieving a Dice score of 88.86%. However, the experimental background was relatively uniform, which limited its generalization ability in complex backgrounds. In response to the current low segmentation accuracy of existing models for tomato leaf diseases, Zhao et al.²⁵ developed an improved multi-scale tomato disease segmentation algorithm based on U-Net, employing inception modules for multi-scale feature extraction and incorporating a channel attention mechanism in the decoding phase to emphasize important features, achieving an accuracy of 92.9%. To address the challenge of classifying the severity of cucumber leaf diseases, Yao et al.²⁶ proposed a two-stage segmentation framework that integrates TRNet and U-Net. TRNet effectively combines local and global features for leaf segmentation by merging convolutional networks with Transformer networks. U-Net, utilizing ResNet50 as its backbone, is employed for lesion segmentation. Subsequently, the severity classification of the disease is achieved based on the calculation of lesion area, yielding average accuracies of 94.49% and 94.43% for cucumber downy mildew and anthracnose, respectively. However, a notable drawback is the model’s large size, which necessitates extended inference times. Wang et al.²⁷ introduced a deep learning model, MFBP-UNet, based on a multi-feature fusion bottleneck pyramid, incorporating a stable diffusion model for data augmentation. This model partially overcomes the limitations of traditional methods in scenarios where pear leaf lesion areas are small and boundaries are indistinct by optimizing feature extraction and fusion strategies. Nevertheless, the model’s complexity demands significant computational resources, and its applicability in extreme environments remains to be validated. For tobacco, Ou et al.¹⁴ proposed a tobacco disease segmentation network, TDSSNet, which integrates attention modules, but achieved an average Intersection over Union (IoU) of only 64.99%.

A review of the related work indicates that tobacco lesions often display irregular coloration and indistinct boundaries, which presents a significant challenge for existing segmentation models in achieving optimal performance. Moreover, the existing models tend to demonstrate relatively low accuracy. In light of the aforementioned considerations, we have developed a multi-scale residual dilated segmentation model for tobacco leaf lesions, with the objective of addressing this gap in the field and enhancing accuracy.

Data collection and processing

Data collection

Dataset 1: From October 14 to 15, 2023, during the maturation period of tobacco, we collected 210 valid images in Yiyang County, Luoyang City, Henan Province, central China. Dataset 2: From June 12 to 13, 2024, during the period from the tobacco cluster stage to the maturation stage of the lower leaves, a total of 150 valid images were collected. The imaging device utilized was a Huawei Mate 30 Pro, and the data was collected over the course of an entire day through multi-period photography in a natural environment. The utilization of diverse lighting conditions facilitated the comprehensive coverage and diversity of the data set. The data set encompasses four disease types: angular leaf spot, brown spot, wildfire disease, and frog eye disease. Lesions resulting from angular leaf spot formation on affected leaves present as small, angular, black-brown spots with distinct edges and no discernible yellow halo. In contrast, lesions caused by brown spot are larger and have a distinctive appearance. They are brown or reddish-brown in color and exhibit prominent dark brown concentric rings with a wider yellow halo around the lesions. The ring patterns associated with wildfire disease are characterized by a high degree of irregularity and chaos, often exhibiting wavy and angular shapes. In the case of frog-eye disease, the lesions are smaller, presenting as grayish-white or brown, with a white center. Some samples are shown in Fig. 1.

Fig. 1 — Images of various lesion samples.

Image preprocessing and dataset creation

To minimize computational resource usage and eliminate interference from complex backgrounds, the original images were cropped to a standardized size of 512 × 512 pixels. This process focused exclusively on the leaf and lesion regions, ensuring that only the relevant areas were included in the analysis. The Labelme software (version 5.4.0) was employed for annotation, and can be accessed at https://pypi.org/project/labelme/5.4.0/#files. In this annotation process, the leaf background was designated as Category 0, while the lesions were classified as Category 1. A total of three annotators were involved in the process, all of whom possessed expertise in relevant fields, including tobacco disease identification, plant pathology, and agricultural science. Moreover, a series of image enhancement techniques were employed, including flipping, rotation, contrast and brightness adjustments, blurring, and noise injection, to augment the data in an offline setting^4,28,29. The augmented samples are shown in Fig. 2. Following the first data collection, the dataset was expanded to include 2000 images, designated as dataset (1) Similarly, the second data collection yielded 1500 images, forming dataset (2) Both datasets were randomly split into training and testing sets at an 80:20 ratio, with dataset 1 consisting of 1600 training images and 400 testing images, and dataset 2 comprising 1200 training images and 300 testing images.

Experimental methods

Overview of the MD-Unet overall architecture

In this study, an MD-Unet segmentation model based on Unet was designed to segment tobacco leaf lesions, with the overall network architecture shown in Fig. 3. The network consists of two stacked sub-networks. The first sub-network filters noise and generates an initial segmentation map. The second sub-network refines and adjusts the output to produce a more detailed segmentation result. An ROIE + module connects the two sub-networks and enhances the region of interest. The second sub-network reuses the output features corresponding to the encoder portion of the first sub-network. After being merged by channel-wise concatenation, these features enter the Channel and Spatial Attention Module (CBAM) as input for the next layer. This process increases the networks and improves the information flow and feature interaction between the two sub-networks, thus improving the representational capability of the network. To extract feature information from targets of different scales, a Multi-Scale Convolution (MC) module serves as the basic convolution block, followed by the CBAM attention module. This design helps the network maintain global context while focusing on significant local features, thus preventing deep networks from overemphasizing high-level semantic information at the expense of equally important low-level semantic features such as color, shape, and texture, which could lead to excessive globalization of the network. In addition, the second sub-network incorporates a Multi-Scale Dense Residual Convolution (MVCR) module, which flexibly processes different lesion features by combining receptive fields of different sizes, capturing richer contextual semantic information. Additionally, attention gating (AG) is introduced in each skip connection to focus on key information while suppressing irrelevant details. Finally, a Residual-based Feature Fusion (RUFF) method is employed in the upsampling section to compensate for information loss during resolution recovery and correct the misalignment of spatial information.

Improvement of the MD-Unet network module

MC module

As shown in Fig. 4, the MC module first applies a 3 × 3 convolution, followed by further feature extraction using convolutional layers formed by parallel 1 × 1, 3 × 3, and 5 × 5 convolutions of different sizes. Finally, all outputs are concatenated along the channel dimension. Batch normalization layers (BN) and ReLU activation functions after each convolutional layer enhance the stability of the data distribution and the nonlinear expressiveness of the networks. This module uses a common 3 × 3 convolution in the first convolution to save parameters, and then combines it with convolutions of different sizes to obtain multi-scale target features, effectively processing lesions of different sizes and shapes. Meanwhile, to avoid problems such as vanishing or exploding gradients during training, a shortcut connection with a 1 × 1 convolution serves as a residual connection, improving the training stability of deep networks.

MVCR module

Different receptive field sizes play a critical role in lesion segmentation. Smaller local receptive fields can capture detailed information within the pixel neighborhood, focusing on subtle changes in the lesions. Larger global receptive fields can extend over a broader spatial area, capturing the overall information of the lesions and their surrounding context to form a global understanding. Dilated convolution³⁰ introduces a dilation rate parameter to control the spacing within the convolution kernel. This approach allows each pixel prediction to access broader receptive field information without reducing image resolution, losing detail, or introducing additional parameters. It overcomes the information loss associated with conventional pooling methods that enlarge fields.

The MVCR module is shown in Fig. 5. It consists of four branches, all of which use 3 × 3 convolutions with different dilation rates. To avoid grid effects, this module sequentially combines four convolutions with dilation rates of 1, 2, 5, and 8 as the first branch. Each subsequent branch subtracts the last convolution in turn until only the convolution with a dilation rate of 1 remains. Finally, it concatenates the outputs of each branch and applies a 1 × 1 convolution for channel reduction, maintaining consistency in input and output dimensions and the number of channels. Compared to single-branch structures, each branch in this module has different receptive fields, allowing for more flexible handling of diverse lesion targets and richer semantic feature extraction, while capturing a broader context. Integrating residual connections in each branch ensures the flow of information, prevents the loss of original data, simplifies the network learning process, and alleviates the problems of gradient vanishing and explosion due to excessive depth, thus avoiding network degradation. In addition, the inclusion of Dropout layers introduces randomness by turning off a portion of the neurons during training, reducing network complexity and aiding in the prevention of overfitting.

RUFF method

Considering that upsampling may lose critical spatial location information important for pixel segmentation, the network may struggle with detail sensitivity. As shown in the overall architecture diagram of Network 3, we borrowed the residual concept to design an upsampling feature fusion method based on the residual paradigm (RUFF). First, we input the output of the last downsampling convolution block and the first three upsampling convolution blocks into the feature alignment module (BU). Within the BU block, we apply a 1 × 1 convolution to reduce dimensionality along the channel axis and perform bilinear interpolation for upsampling, ensuring that the size matches the output segmentation map. Next, we concatenate the resulting four different hierarchical feature maps, using a 3 × 3 convolution and a CBAM module to minimize the semantic disparity with the output of the last upsampling block while emphasizing important features. Finally, we add the resulting feature map as a residual branch to the output of the last upsampling convolution block by element-wise addition. This method efficiently integrates semantic information from multiple hierarchical levels of abstraction, allowing the network to learn richer feature representations and retain more detail, resulting in high sensitivity to subtle differences. During this process, the identity mapping for information fusion ensures stability and reliability, avoiding potential information loss and redundancy. Moreover, this structure mitigates gradient vanishing and explosion problems during backpropagation, making it easier to optimize the network.

ROIE + interest module

The ROIE + module of interest is located between two subnetworks. It aims to enhance the regions of interest within the input image, suppress irrelevant background, and improve the input image quality for the second subnetwork. This study builds upon the ROIE module proposed by Liu³¹ and incorporates two learnable parameters, α and β, which allow the network to autonomously learn optimal weights. As shown in Fig. 6, the ROIE + module first performs an element-wise multiplication between the input x1 and the output u of subnetwork 1, followed by adding x1 to serve as the input for subnetwork 2. This process can be represented by Eq. (1).

where, Inline graphic denotes the dot product operation. Parameters α and β are two learnable parameters that are initially set to 1. Since the outcomes of network 1 are somewhat imprecise, some pixels may receive incorrect classifications. Consequently, the dot product may inadvertently reduce certain target pixels. Therefore, it is essential to add the original image to the dot product results to allow sub-network 2 to correct these errors.

CBAM attention module

In this study, the CBAM attention module is introduced into the network to optimize the segmentation performance. As shown in Fig. 7, CBAM is a universal lightweight attention module consisting of two parts: the Channel Attention Module (CAM) and the Spatial Attention Module (SAM) in a serial configuration³². The CAM module captures the correlation among different channels and models the importance of each channel. In contrast, the SAM module captures the spatial structure information of feature maps and models the significance of each spatial location. By integrating channel and spatial attention, the network can better perceive image information, understand image content, and improve the ability to represent features.

AG module

As shown in Fig. 8, the AG module is a versatile self-attention framework³³. The core concept involves the introduction of a gating function that computes a weight vector based on the input feature map. This process regulates the flow of information and enhances feature interaction across different layers, thereby allowing the network to concentrate its attention on critical local regions.

This study investigates the integration of Attention-Gating (AG) with skip connections. By incorporating information from the coarse scale into the gating process, we effectively mitigate irrelevant and noisy responses that may result from skip connections. This approach facilitates precise feature selection, enabling the network to autonomously learn and focus on lesions of varying shapes and sizes while incorporating additional contextual information. We also replace the ReLU activation function in the original AG with a leaky ReLU activation function. The modification addresses the issue of neuron death, expands the range of the function, and better accommodates different data distributions, ultimately improving the model’s generalization capabilities.

Loss function

Dice Loss³⁴ measures the similarity between predicted values and actual values. However, in lesion segmentation tasks, Dice Loss inadequately addresses small lesion targets and is highly sensitive to small target predictions, which can lead to unstable training. Focal Loss³⁵ effectively tackles the issue of sample imbalance while assigning greater weight to small lesion targets that are difficult to classify. This study employs a combined loss function of Dice Loss and Focal Loss as the overall loss function to address class imbalance issues and ensure stable network training. The equation is defined as follows:

Inline graphic denotes the Dice Loss function, denotes the Focal Loss function, and α and β are the corresponding weight coefficients, both set to 1.

Where, Inline graphic and denote the actual value and predicted value of pixel i, respectively, and N signifies the total number of pixels.

Where, Inline graphic denotes the probability of the model predicting a positive sample. serves as the balancing factor, aimed at adjusting the weight between positive and negative samples. acts as the modulation factor, enhancing focus on hard-to-classify samples by varying the value of γ.

Evaluation indicators

Class Pixel Accuracy (CPA), Recall, Intersection over Union (IoU), Mean Intersection over Union (MIoU), F1 Score, and Dice Coefficient were employed to evaluate the segmentation performance of tobacco lesions. The calculation methods are shown in Eqs. (5–10).

where, TP is the number of samples correctly predicted as positive; FP is the number of negative samples incorrectly predicted as positive; FN is the number of positive samples incorrectly predicted as negative; TN is the number of correctly predicted negative samples. K is the total number of classes. X and Y represent the predicted and actual value sets, respectively, while |X∩Y| indicates the number of elements in the intersection of both sets, and |X| and |Y| indicates the number of elements in each set. CPA measures the proportion of correctly classified pixels for each category; Recall measures the model’s ability to identify all existing positive cases. IoU, MIoU, and Dice Score measure the similarity between pixel classification results and the actual values. The F1 score is the harmonic mean of precision and recall. All six evaluation metrics range from 0 to 1, with values closer to 1 indicating better model performance and values closer to 0 indicating worse performance.

Experimental environment and parameter settings

The environment is configured with a 30-core AMD EPYC 7742 CPU and an NVIDIA GeForce RTX A6000 GPU. It has 60.9GB of memory and uses CUDA version 11.8. The operating system is Linux, and the programming language is Python version 3.10.12. The deep learning framework used is PyTorch 2.0.1. The Adam optimizer is utilized, with an initial learning rate set to 0.0001 and a batch size of 16, for a total of 200 epochs of training.

Analysis of results

Comparative experiments

To validate the segmentation performance of different models, a comparative analysis was performed between MD-Unet and several other models, including Unet, PSP³⁶, DeepLab v3 +³⁷, FCN³⁸, SegNet³⁹, UNET + +⁴⁰, DoubleU-Net and TDSSNet¹⁴ on Dataset 1. The segmentation performance of each model on the test set, evaluated using multiple metrics, is summarized in Table 1. Examples of actual segmentation results are shown in Fig. 9.

Table 1.

Comparison of evaluation indexes of each model on dataset 1.

Model	Lesions				Leaf				Dice
Model	CPA	Recall	IoU	F1	CPA	Recall	IoU	F1	Dice
Unet	90.45	88.80	81.59	89.81	99.83	99.87	99.70	99.85	93.21
DoubleU-Net	90.21	88.82	81.06	89.49	99.83	99.86	99.69	99.85	92.92
UNET++	87.32	84.75	75.42	85.89	99.77	99.82	99.58	99.79	90.55
PSP	89.45	87.49	79.38	88.43	99.81	99.85	99.66	99.83	91.91
DeepLab v3+	77.85	81.58	66.22	79.37	99.73	99.69	99.43	99.71	85.78
FCN	88.34	88.68	79.45	88.43	99.84	99.83	99.67	99.83	92.33
SegNet	85.72	85.81	75.13	85.73	99.79	99.80	99.59	99.79	90.60
TDSSNet	86.55	81.87	72.61	84.06	99.72	99.80	99.53	99.76	89.74
MD-Unet	92.75	90.94	84.93	91.81	99.87	99.89	99.76	99.88	94.67

Open in a new tab

Fig. 9 — Examples of actual segmentation of each model.

Unet, a classical semantic segmentation model, achieved the highest accuracy on the test set, except for MD-Unet. The pixel accuracy for lesions reached 90.45%, with a recall rate of 88.80%. The Intersection over Union (IoU) and F1 scores for lesions were 81.59% and 89.81%, respectively. When considering both lesions and leaves, the achieved Dice score was 93.21%, demonstrating commendable generalization and adaptability. This performance is attributed to the skip connection design, which merges low-level visual features with high-level semantic information, thereby preserving essential spatial details of pixels. However, as shown in Fig. 9, the actual segmentation results reveal persistent problems with boundary adhesion, lack of clarity, and incomplete segmentation.

The DoubleU-Net achieved slightly lower results in CPA, IoU, F1, and Dice scores for lesion segmentation compared to Unet, with values of 90.21%, 81.06%, 89.49%, and 92.92% respectively. It achieved optimal performance, behind only MD-Unet and Unet, while its Recall metric was slightly higher than Unet at 88.82%. This indicates that the dual-stacked design based on Unet is both reasonable and effective. The other models showed relatively lower accuracy in lesion segmentation, with DeepLab v3 + performing the worst on all metrics. It showed significant over-segmentation and under-segmentation in certain areas, with a CPA of only 77.85%, Recall of 81.58%, IoU of 66.22%, F1 score of 79.37%, and Dice score of 85.78%. This is primarily because DeepLab v3 + inevitably loses some low-level semantic information from shallow networks during the downsampling process, especially when the network is overly focused on global features. It is worth noting that the performance of the model TDSSNet, specifically designed for tobacco leaf lesion segmentation, is also unsatisfactory. The lesion CPA, Recall, IoU, and F1 scores are 86.55%, 81.87%, 72.61%, and 84.06%, respectively, with an overall Dice score of 89.74%, which is only slightly better than DeepLab v3+. Analyzing the reasons behind this, it can be attributed mainly to the model’s weak multi-scale feature extraction capability, the excessively large upsampling factor in the decoder, which leads to significant information loss, and the lack of sufficient information exchange between different levels of the network.

MD-Unet performed the best across all evaluation metrics for lesion segmentation, with CPA, Recall, IoU, F1 and Dice scores of 92.75%, 90.94%, 84.93%, 91.81% and 94.67%, respectively. Compared to the relatively low accuracy of DeepLab v3+, these metrics improved by 14.9%, 9.36%, 18.71%, 12.44%, and 8.89%, respectively. Compared to Unet, MD-Unet’s results improved by 2.3%, 2.14%, 3.34%, 2%, and 1.46%. Figure 9 also shows that MD-Unet has superior actual segmentation performance, effectively distinguishing the boundaries of lesions of different sizes and shapes. This can be attributed to the introduction of the MC and MVCR modules, which enhance the multi-scale feature extraction capabilities of the network, allowing it to effectively handle dynamic variations in lesion size and shape. The dual-stacked architecture increases the depth of the network, allowing for the extraction of more abstract global features. In addition, ROIE + filtering of the output from the first subnetwork improves the model’s responsiveness to target regions. The residual-based upsampling feature fusion method, together with the attention mechanism, preserves more spatial location information and relevant low-level semantic features such as color textures, thereby improving the network’s ability to identify fine details and edge pixels in small lesions.

Table 1 shows that all models have high accuracy in segmenting the leaf background. This phenomenon is due to the sample imbalance, where the leaf region significantly outweighs the lesion area. Consequently, the models focus on learning features from the leaf region, which allows them to extract rich details. Furthermore, even if some pixels in the leaf region are misclassified, the proportion of these errors is minimal compared to the total number of pixels and thus has an insignificant impact on the results. Notwithstanding this, MD-Unet achieved the highest scores on all evaluation metrics, further confirming its superiority.

Epoch loss convergence comparison

The training loss values for each model vary with epochs, as shown in Fig. 10. As epochs increase, the loss steadily decreases, reaching a relatively stable state around 100 epochs and gradually converging to the minimum point. Compared to models such as Unet, MD-Unet shows a more comprehensive convergence of loss values. Moreover, all models show relatively smooth loss curves during the convergence process, indicating that the combined use of Dice Loss and Focal Loss ensures stable training of the models.

Figure 11 shows the trends of MD-Unet’s training and validation loss values over epochs. The data shows that as the epochs increase, both the training and validation losses gradually decrease, converging almost synchronously to a stable state and eventually reaching similar minimum loss values. This suggests that MD-Unet effectively learns the underlying patterns and features of the data, while maintaining strong generalization capability and performance on new datasets.

Fig. 11 — MD-Unet training and validation loss.

Comparison of the performance of different models

Using GFLOPs and parameters to measure model complexity, a higher value in these metrics indicates a more complex model architecture and an increased demand on computational resources during inference. The performance comparison of different models is shown in Table 2. MD-Unet ranks fourth in GFLOPs and fourth in parameters, demonstrating a moderate model complexity that effectively balances performance, accuracy, and computational resources. When inferring a single image in the same GPU environment, MD-Unet requires a longer processing time; however, the difference compared to other models is minimal, making it suitable for real-time detection needs.

Table 2.

Comparison of the performance of each model.

Model	GFLOPs	Parameters/M	Run time on GPU/ms
Unet	218.97	31.04	34.072
DoubleU-Net	215.84	29.29	35.274
UNET++	249.78	67.98	45.616
PSP	262.74	65.70	44.569
DeepLab v3+	86.90	54.70	40.347
FCN	102.00	18.64	30.185
SegNet	170.41	29.48	32.630
TDSSNet	416.79	38.64	45.698
MD-Unet	233.92	46.50	65.096

Open in a new tab

Grad-CAM feature visualization

Randomly select four images from the dataset that represent different lesion sizes and shapes. Use Gradient-weighted Class Activation Mapping to perform a feature visualization analysis of the model’s final output layer. This approach reveals the model’s level of attention in different regions during the decision processes. As shown in Fig. 12, the sequence from top to bottom includes the original image, the heatmap of response areas when focusing on the lesions, and the heatmap of response areas when focusing on the leaves. In this context, red indicates a high contribution, while blue indicates a low contribution. Deeper colors represent higher contributions, while lighter colors represent lower contributions.

The visualization results in Fig. 12 show that the MD-Unet model has strong responses to both lesion and leaf areas. It accurately captures the features and boundaries of the lesions and comprehensively understands the structural and textural information of the leaves. This confirms the effectiveness and reliability of the MD-Unet model in the tobacco leaf lesion segmentation task.

Ablation test

To validate the effectiveness of various enhancements to the MD-Unet model, we used the original dual-stacked Unet architecture as a baseline. We sequentially added modules such as MC, MVCR, RUFF, CBAM, and AG and performed a series of ablation experiments. All experiments used ROIE + to connect the two subnetworks and implemented feature reuse methods to leverage output features from the encoder of the first subnetwork. These two aspects were kept constant throughout the ablation studies for clarity in assessing each module’s contribution to algorithm performance.

The results from the ablation experiments, as shown in Table 3, indicate that using MC as the basic convolution block, inserting the MVCR module in the second subnetwork, applying the upsampling feature fusion method RUFF, integrating the CBAM attention mechanism post-convolution block, and incorporating the attention-gated AG in skip connections all improve network performance. Specifically, the mean intersection over union (MIoU) improved by 0.38%, 0.21%, 0.27%, 0.87%, and 0.25%, respectively, compared to the baseline model. When all modules were applied, there was an overall increase of 1.53%, robustly demonstrating the effectiveness and evolution of each module’s design in promoting the accuracy of segmentation results.

Table 3.

Comparison of the results of different ablation experiments.

Test	MC	MVCR	RUFF	CBAM	AG	MIoU
1	∕	∕	∕	∕	∕	90.82
2	√	∕	∕	∕	∕	91.20
3	∕	√	∕	∕	∕	91.03
4	∕	∕	√	∕	∕	91.09
5	∕	∕	∕	√	∕	91.69
6	∕	∕	∕	∕	√	91.07
7	√	√	√	√	√	92.35

Open in a new tab

√ indicates an improvement or addition, and ∕ indicates no improvement or addition.

Comparison of the effects of each model in the multi-growth period

To validate the segmentation performance of models on tobacco disease models at different growth stages, we merged Dataset 1 and Dataset 2 into a new dataset. We performed comparative experiments on this new dataset across using different models. Table 4 shows the segmentation performance of different networks on the test set under different evaluation metrics. Although the new dataset contains data from different growth periods, the MD-Unet model consistently outperformed others on all evaluation metrics. This indicates that MD-Unet has strong generalization capabilities, which is critical for model applicability. Furthermore, the model showed relative stability in addressing tobacco diseases at different growth stages, demonstrating good robustness and resistance to performance fluctuations due to small changes in input data. In addition, all models showed improved performance on the new dataset compared to dataset 1, highlighting that providing models with a larger and more diverse set of training samples improves their performance and generalization ability.

Table 4.

Comparison of evaluation indexes of each model on the new dataset.

Model	Lesions				Leaf				Dice
Model	CPA	Recall	IoU	F1	CPA	Recall	IoU	F1	Dice
Unet	93.22	91.31	85.65	92.21	99.78	99.82	99.60	99.80	93.67
DoubleU-Net	93.37	91.86	86.25	92.58	99.79	99.83	99.62	99.81	93.85
UNET++	92.92	88.60	82.99	90.51	99.68	99.81	99.50	99.75	92.78
PSP	92.00	89.39	82.98	90.64	99.73	99.80	99.53	99.76	91.54
DeepLab v3+	88.00	87.11	77.81	87.43	99.67	99.70	99.37	99.68	89.08
FCN	92.78	92.71	86.51	92.72	99.82	99.81	99.63	99.81	93.55
SegNet	92.18	91.17	84.66	91.64	99.78	99.79	99.57	99.79	93.15
TDSSNet	92.09	88.22	82.12	90.09	99.70	99.81	99.51	99.76	91.62
MD-Unet	94.99	94.29	89.84	94.62	99.87	99.86	99.72	99.86	95.73

Open in a new tab

The impact of data augmentation on model performance

To verify the impact of data augmentation on model performance, we trained the model using the original images without augmentation under the same conditions, and the results are presented in Table 5. It can be observed from the table that the performance metrics of the model using the unaugmented original images are consistently lower than those obtained with data augmentation. This can be attributed to the fact that data augmentation effectively enhances the diversity of the training dataset by introducing various random transformations. Such diversity aids the model in learning a broader range of variations and features present in the input data, thereby improving its ability to generalize when confronted with unseen data. Additionally, data augmentation can reduce the model’s sensitivity to noise or minor variations in the input data, thereby enhancing its robustness.

Table 5.

Model performance with and without data augmentation.

Dataset	Lesions				Leaf				Dice
Dataset	CPA	Recall	IoU	F1	CPA	Recall	IoU	F1	Dice
No augmentation	90.37	92.15	83.70	91.12	99.77	99.62	99.39	99.70	93.67
Augmentation	94.99	94.29	89.84	94.62	99.86	99.87	99.72	99.86	95.73

Open in a new tab

Discuss

For the segmentation of tobacco leaf lesions, traditional methods such as the K-means algorithm used by Teng¹⁹, the saliency detection method based on seed point selection employed by Xu²⁰, and the Otsu algorithm used by Liu et al.²¹, typically rely on simple mathematical and physical models. Although these methods are easy to understand and implement, with relatively low computational complexity, they tend to have poor robustness when images contain subtle variations in texture, shape, color, or lighting conditions. Furthermore, these methods heavily depend on manually set parameters, which can lead to significant fluctuations in recognition accuracy. Therefore, the limitations of using these methods for tobacco leaf lesion segmentation are quite evident. In contrast, the TDSSNet network proposed by Ou et al.¹⁴, which is based on deep learning, extracts features of lesions of different sizes using a Pyramid Pooling Module (PPM), which partially overcomes the limitations of traditional methods. However, the pooling operation leads to the loss of some spatial detail information, weakening the expression of key texture or edge information in the lesions. At the same time, TDSSNet al.so faces issues such as insufficient information flow between different layers and resolution loss during the upsampling process, resulting in an average Intersection over Union (IoU) of only 64.99%, which is much lower than the 84.93% achieved in this study. The MD-Unet model proposed in this research, on the other hand, not only extracts multi-scale disease features but also enhances the retention of detailed information. It uses the RUFF method to mitigate the loss of details during the upsampling process and strengthens information exchange and representation ability through multiple feature reutilizations. As shown in Figs. 9 and 12, MD-Unet not only performs excellently in the segmentation of large lesions but also effectively captures small lesions, which can be attributed to the focus loss function’s enhanced attention to small lesions. Therefore, although the MD-Unet, which is an improvement based on U-Net, was specifically designed for tobacco leaf disease segmentation, its core principles are also applicable to the segmentation of diseases in other crops. For example, Fu et al.²⁴, Zhao et al.²⁵, Yao et al.²⁶, and Wang et al.²⁷ have all proposed segmentation models based on U-Net for different crop diseases, achieving model accuracies of around 90%. Of course, fine-tuning would be required to account for the specific characteristics of the respective crop diseases.

In general, MD-Unet can complete the task of segmenting tobacco leaf disease spots better than the existing methods. However, it cannot be ignored that there are still some limitations. Specifically, first, the number of original images used in this experiment is relatively small, which may limit the further generalization ability of MD-Unet. Second, the training images used in the experiment have a relatively simple background. Although this greatly excludes the influence of background interference, it may lead to poor adaptability to complex backgrounds. At the same time, this study treats all types of disease spots as one category without detailed classification and severity grading, resulting in difficulty in providing more detailed reference information for tobacco farmers. In addition, the deep learning models of the automatic disease assessment system⁴¹ and lightweight design⁴² have strong application feasibility in practical scenarios. Although MD-Unet has an obvious competitive advantage in accuracy compared with other methods, its high computational complexity and more parameters limit its application on mobile devices with limited computing resources.

To further address the above remaining limitations, we plan to conduct experiments on different crops and perform necessary targeted optimizations to improve the cross-species applicability and performance of MD-Unet in practical applications. We will also improve the robustness and generalization ability of the model by enriching the sample types, quantities, and expanding the experimental scenarios. At the same time, detailed classification or accurate severity grading of disease spots will be carried out to enable tobacco farmers to respond to diseases more conveniently. In addition, we will also explore how to optimize the model through methods such as channel pruning and model distillation, reduce its size, and shorten the inference time, so as to improve the efficiency of the model while maintaining high performance and make it more suitable for real-time applications and resource-constrained devices.

Conclusion

This paper develops the MD-Unet model for tobacco lesion segmentation and constructs relevant disease datasets. By extracting multi-scale lesion features and embedding an attention mechanism in the network, it achieves fine-grained segmentation of target diseases: angular leaf spot, brown spot, wildfire disease, and frog-eye disease. The model achieves high segmentation accuracy. By comparing experimental results, we find that MD-Unet’s lesion categories achieve CPA of 92.75%, Recall of 90.94%, IoU of 84.93%, F1 of 91.81%, and overall Dice score of 94.67%. These metrics represent improvements of 2.3%, 2.14%, 3.34%, 2%, and 1.46%, respectively, over the best performing Unet. In addition, the model maintains a moderate number of parameters and computational load, making it suitable for practical tobacco lesion segmentation scenarios. It provides a novel approach to segmentation in tobacco and other plants, and highlights the significant potential and application value of deep learning in plant disease segmentation. While it provides a theoretical foundation for further exploration and innovation, challenges remain, such as inadequate handling of particularly small lesions. Future efforts will focus on expanding the dataset and continuously optimizing MD-Unet to increase accuracy and reduce model size, thereby improving its applicability.

Acknowledgements

This study received support from the Science and Technology Project of the China National Tobacco Corporation Henan Company: “Henan Strong Aroma” High-Quality Raw Material Production Process Intelligent Monitoring Technology System Construction (2023410000240025); and the Henan Academy of Agricultural Sciences Remote Sensing Innovation Team: Research and Development of High-Precision Remote Sensing Monitoring Technology for Agricultural Information from Ground and Space (2024TD28).

Author contributions

Z.L.C contributed to the conceptualization of the paper, designed and conducted the experiments, and authored the initial draft; Y.L.P participated in the construction of the dataset and performed experiments; J.D.J conducted experiments; A.G.W provided strategic support and reviewed the manuscript; L.G.W carried out data analysis; W.L. was involved in the review and revision of the initial draft; Y.G. participated in dataset construction and the review and revision of the initial draft.

Data availability

The dataset was publicly available and the linkage is https://doi.org/10.34740/kaggle/dsv/10393041.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Wei Lin, Email: 121025@htu.edu.cn.

Yan Guo, Email: 10914063@zju.edu.cn.

References

1.Liu, C. et al. Intelligent identification of tobacco leaf diseases based on YOLOv5. Chin. Tob. Sci.45, 93–101 (2024). [Google Scholar]
2.Liu, Y. et al. Detection of multiple tobacco leaf diseases based on YOLOv3. Chin. Tob. Sci.43, 94–100 (2022). [Google Scholar]
3.Lin, J. et al. CAMFFNet: A novel convolutional neural network model for tobacco disease image recognition. Comput. Electron. Agric.202, 107390 (2022). [Google Scholar]
4.Li, K., Zhang, H., Ma, J. & Zhang, L. Segmentation method for crop leaf spot based on semantic segmentation and visible spectral images. Spectrosc. Spectr. Anal.43, 1248 (2023). [Google Scholar]
5.Zhang, W. et al. Tobacco disease identification based on InceptionV3. J. Chin. Tob. Soc.27, 61–70 (2021). [Google Scholar]
6.Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. SMC. 9, 62–66 (1979). [Google Scholar]
7.Rosenfeld, A. The max Roberts operator is a hueckel-type edge detector. IEEE Trans. Pattern Anal. Mach. Intell.PAMI-3, 101–103 (1981). [DOI] [PubMed] [Google Scholar]
8.Tremeau, A. & Borel, N. A region growing and merging algorithm to color segmentation. Pattern Recogn.30, 1191–1203 (1997). [Google Scholar]
9.Cheng, Y. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell.17, 790–799 (1995). [Google Scholar]
10.Zeng, W., Li, H., Hu, G. & Liang, D. Lightweight dense-scale network (LDSNet) for corn leaf disease identification. Comput. Electron. Agric.197, 106943 (2022). [Google Scholar]
11.Zhai, Z., Cao, Y., Xu, H., Yuan, P. & Wang, H. A review of key technologies for the identification of crop diseases and insect pests. Trans. Chin. Soc. Agric. Mach.52, 1–18 (2021). [Google Scholar]
12.Zheng, Y., Chen, R., Yang, C. & Zhou, T. Identification method of citrus diseases and insect pests based on improved YOLOv5s model. J. Huazhong Agric. Univ.43, 134–143 (2024). [Google Scholar]
13.Wang, Y., Long, Y., Yang, Z. & Huang, L. Semantic segmentation method of apple leaf disease based on improved U-Net network. J. Zhejiang Agric. Sci.35, 2731–2741 (2023). [Google Scholar]
14.Ou, J. et al. Tobacco leaf disease segmentation based on TDSSNet. In Twelfth International Conference on Image Processing Theory, Tools and Applications (IPTA) 1–5 (2023).
15.Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-assisted Intervention–MICCAI 2015: 18th International Conference 234–241 (2015).
16.Jha, D., Riegler, M. A., Johansen, D., Halvorsen, P. & Johansen, H. D. Doubleu-net: A deep convolutional neural network for medical image segmentation. In IEEE 33rd International Symposium on Computer-based Medical Systems (CBMS) 558–564 (2020).
17.Arivazhagan, S. et al. Detection of unhealthy region of plant leaves and classification of plant leaf diseases using texture features. Agric. Eng. Int. CIGR J.15, 211–217 (2013). [Google Scholar]
18.Wang, C. et al. Application of dual-channel convolutional neural network in pumpkin disease identification. Comp. Eng. Appl.57, 183–189 (2021). [Google Scholar]
19.Teng, J. Research on Image Segmentation of Tobacco Leaves with red star Disease (Jishou University, 2017).
20.Xu, S. Segmentation and Classification and Identification of Early Lesions in Tobacco Leaves Based on Computer Vision. (North China University of Water Resources and Hydropower, 2019).
21.Liu, J. et al. Tobacco leaf lesion segmentation based on morphology and wavelet transform. J. Graph Sci.39, 933–938 (2018). [Google Scholar]
22.He, Z., Huang, J., Liu, Q. & Zhang, Y. Segmentation of apple leaf disease based on asymmetric mixed convolutional neural network. Trans. Chin. Soc. Agric. Mach.52, 221–230 (2021). [Google Scholar]
23.Yuan, H., Zhu, J., Wang, Q., Cheng, M. & Cai, Z. An improved DeepLab v3 + deep learning network applied to the segmentation of grape leaf black rot spots. Front. Plant. Sci.13, 795410 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Fu, J., Zhao, Y. & Wu, G. Potato leaf disease segmentation method based on improved UNet. Appl. Sci.13, 11179 (2023). [Google Scholar]
25.Zhao, X. et al. Multi-scale tomato disease segmentation algorithm based on improved U-Net network. Comput. Eng. Appl.58, 216–223 (2022). [Google Scholar]
26.Yao, H. et al. A cucumber leaf disease severity grading method in natural environment based on the fusion of TRNet and U-Net. Agronomy14, 72 (2023). [Google Scholar]
27.Wang, H. et al. MFBP-UNet: A network for pear leaf disease segmentation in natural agricultural environments. Plants12, 3209 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Ma, X., Chen, W. & Xu, Y. ERCP-Net: A channel extension residual structure and adaptive channel attention mechanism for plant leaf disease classification network. Sci. Rep.14, 4221 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Pandian, J. A., Geetharamani, G. & Annette, B. Data augmentation on plant leaf disease image dataset using image manipulation and deep learning techniques. In IEEE 9th International Conference on Advanced Computing (IACC) 199–204 (IEEE, 2019).
30.Yu, F. & Koltun, V. Multi-scale context aggregation by dilated convolutions. Preprint at 10.48550/arXiv.1511.07122 (2015). [Google Scholar]
31.Liu, G. et al. A region of interest focused triple UNet architecture for skin lesion segmentation. Int. J. Imaging Syst. Technol.34, e23090 (2024). [Google Scholar]
32.Woo, S., Park, J., Lee, J. Y. & Kweon, I. S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV) 3–19 (2018).
33.Oktay, O. et al. Attention u-net: Learning where to look for the pancreas. Preprint at (2018). 10.48550/arXiv.1804.03999
34.Milletari, F., Navab, N. & Ahmadi, S. A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Fourth International Conference on 3D Vision (3DV) 565–571 (2016). (2016).
35.Lin, T. Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision 2980–2988 (2017).
36.Zhao, H., Shi, J., Qi, X., Wang, X. & Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2881–2890 (2017).
37.Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV) 801–818 (2018).
38.Long, J., Shelhamer, E. & Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 3431–3440 (2015). [DOI] [PubMed]
39.Badrinarayanan, V., Kendall, A. & Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell.39, 2481–2495 (2017). [DOI] [PubMed] [Google Scholar]
40.Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N. & Liang, J. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging. 39, 1856–1867 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Deng, Y. et al. An effective image-based tomato leaf disease segmentation method using MC-Unet. Plant. Phenom.5, 0049 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Lu, B., Lu, J., Xu, X. & Jin, Y. Mixseg: A lightweight and accurate mix structure network for semantic segmentation of apple leaf disease in complex environments. Front. Plant. Sci.14, 1233241 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The dataset was publicly available and the linkage is https://doi.org/10.34740/kaggle/dsv/10393041.

[CR1] 1.Liu, C. et al. Intelligent identification of tobacco leaf diseases based on YOLOv5. Chin. Tob. Sci.45, 93–101 (2024). [Google Scholar]

[CR2] 2.Liu, Y. et al. Detection of multiple tobacco leaf diseases based on YOLOv3. Chin. Tob. Sci.43, 94–100 (2022). [Google Scholar]

[CR3] 3.Lin, J. et al. CAMFFNet: A novel convolutional neural network model for tobacco disease image recognition. Comput. Electron. Agric.202, 107390 (2022). [Google Scholar]

[CR4] 4.Li, K., Zhang, H., Ma, J. & Zhang, L. Segmentation method for crop leaf spot based on semantic segmentation and visible spectral images. Spectrosc. Spectr. Anal.43, 1248 (2023). [Google Scholar]

[CR5] 5.Zhang, W. et al. Tobacco disease identification based on InceptionV3. J. Chin. Tob. Soc.27, 61–70 (2021). [Google Scholar]

[CR6] 6.Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. SMC. 9, 62–66 (1979). [Google Scholar]

[CR7] 7.Rosenfeld, A. The max Roberts operator is a hueckel-type edge detector. IEEE Trans. Pattern Anal. Mach. Intell.PAMI-3, 101–103 (1981). [DOI] [PubMed] [Google Scholar]

[CR8] 8.Tremeau, A. & Borel, N. A region growing and merging algorithm to color segmentation. Pattern Recogn.30, 1191–1203 (1997). [Google Scholar]

[CR9] 9.Cheng, Y. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell.17, 790–799 (1995). [Google Scholar]

[CR10] 10.Zeng, W., Li, H., Hu, G. & Liang, D. Lightweight dense-scale network (LDSNet) for corn leaf disease identification. Comput. Electron. Agric.197, 106943 (2022). [Google Scholar]

[CR11] 11.Zhai, Z., Cao, Y., Xu, H., Yuan, P. & Wang, H. A review of key technologies for the identification of crop diseases and insect pests. Trans. Chin. Soc. Agric. Mach.52, 1–18 (2021). [Google Scholar]

[CR12] 12.Zheng, Y., Chen, R., Yang, C. & Zhou, T. Identification method of citrus diseases and insect pests based on improved YOLOv5s model. J. Huazhong Agric. Univ.43, 134–143 (2024). [Google Scholar]

[CR13] 13.Wang, Y., Long, Y., Yang, Z. & Huang, L. Semantic segmentation method of apple leaf disease based on improved U-Net network. J. Zhejiang Agric. Sci.35, 2731–2741 (2023). [Google Scholar]

[CR14] 14.Ou, J. et al. Tobacco leaf disease segmentation based on TDSSNet. In Twelfth International Conference on Image Processing Theory, Tools and Applications (IPTA) 1–5 (2023).

[CR15] 15.Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-assisted Intervention–MICCAI 2015: 18th International Conference 234–241 (2015).

[CR16] 16.Jha, D., Riegler, M. A., Johansen, D., Halvorsen, P. & Johansen, H. D. Doubleu-net: A deep convolutional neural network for medical image segmentation. In IEEE 33rd International Symposium on Computer-based Medical Systems (CBMS) 558–564 (2020).

[CR17] 17.Arivazhagan, S. et al. Detection of unhealthy region of plant leaves and classification of plant leaf diseases using texture features. Agric. Eng. Int. CIGR J.15, 211–217 (2013). [Google Scholar]

[CR18] 18.Wang, C. et al. Application of dual-channel convolutional neural network in pumpkin disease identification. Comp. Eng. Appl.57, 183–189 (2021). [Google Scholar]

[CR19] 19.Teng, J. Research on Image Segmentation of Tobacco Leaves with red star Disease (Jishou University, 2017).

[CR20] 20.Xu, S. Segmentation and Classification and Identification of Early Lesions in Tobacco Leaves Based on Computer Vision. (North China University of Water Resources and Hydropower, 2019).

[CR21] 21.Liu, J. et al. Tobacco leaf lesion segmentation based on morphology and wavelet transform. J. Graph Sci.39, 933–938 (2018). [Google Scholar]

[CR22] 22.He, Z., Huang, J., Liu, Q. & Zhang, Y. Segmentation of apple leaf disease based on asymmetric mixed convolutional neural network. Trans. Chin. Soc. Agric. Mach.52, 221–230 (2021). [Google Scholar]

[CR23] 23.Yuan, H., Zhu, J., Wang, Q., Cheng, M. & Cai, Z. An improved DeepLab v3 + deep learning network applied to the segmentation of grape leaf black rot spots. Front. Plant. Sci.13, 795410 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Fu, J., Zhao, Y. & Wu, G. Potato leaf disease segmentation method based on improved UNet. Appl. Sci.13, 11179 (2023). [Google Scholar]

[CR25] 25.Zhao, X. et al. Multi-scale tomato disease segmentation algorithm based on improved U-Net network. Comput. Eng. Appl.58, 216–223 (2022). [Google Scholar]

[CR26] 26.Yao, H. et al. A cucumber leaf disease severity grading method in natural environment based on the fusion of TRNet and U-Net. Agronomy14, 72 (2023). [Google Scholar]

[CR27] 27.Wang, H. et al. MFBP-UNet: A network for pear leaf disease segmentation in natural agricultural environments. Plants12, 3209 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Ma, X., Chen, W. & Xu, Y. ERCP-Net: A channel extension residual structure and adaptive channel attention mechanism for plant leaf disease classification network. Sci. Rep.14, 4221 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Pandian, J. A., Geetharamani, G. & Annette, B. Data augmentation on plant leaf disease image dataset using image manipulation and deep learning techniques. In IEEE 9th International Conference on Advanced Computing (IACC) 199–204 (IEEE, 2019).

[CR30] 30.Yu, F. & Koltun, V. Multi-scale context aggregation by dilated convolutions. Preprint at 10.48550/arXiv.1511.07122 (2015). [Google Scholar]

[CR31] 31.Liu, G. et al. A region of interest focused triple UNet architecture for skin lesion segmentation. Int. J. Imaging Syst. Technol.34, e23090 (2024). [Google Scholar]

[CR32] 32.Woo, S., Park, J., Lee, J. Y. & Kweon, I. S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV) 3–19 (2018).

[CR33] 33.Oktay, O. et al. Attention u-net: Learning where to look for the pancreas. Preprint at (2018). 10.48550/arXiv.1804.03999

[CR34] 34.Milletari, F., Navab, N. & Ahmadi, S. A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Fourth International Conference on 3D Vision (3DV) 565–571 (2016). (2016).

[CR35] 35.Lin, T. Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision 2980–2988 (2017).

[CR36] 36.Zhao, H., Shi, J., Qi, X., Wang, X. & Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2881–2890 (2017).

[CR37] 37.Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV) 801–818 (2018).

[CR38] 38.Long, J., Shelhamer, E. & Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 3431–3440 (2015). [DOI] [PubMed]

[CR39] 39.Badrinarayanan, V., Kendall, A. & Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell.39, 2481–2495 (2017). [DOI] [PubMed] [Google Scholar]

[CR40] 40.Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N. & Liang, J. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging. 39, 1856–1867 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Deng, Y. et al. An effective image-based tomato leaf disease segmentation method using MC-Unet. Plant. Phenom.5, 0049 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Lu, B., Lu, J., Xu, X. & Jin, Y. Mixseg: A lightweight and accurate mix structure network for semantic segmentation of apple leaf disease in complex environments. Front. Plant. Sci.14, 1233241 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

MD-Unet for tobacco leaf disease spot segmentation based on multi-scale residual dilated convolutions

Zili Chen

Yilong Peng

Jiadong Jiao

Aiguo Wang

Laigang Wang

Wei Lin

Yan Guo

Abstract

Related work

Traditional plant lesion segmentation

Deep learning-based segmentation of plant lesions

Data collection and processing

Data collection

Fig. 1.

Image preprocessing and dataset creation

Fig. 2.

Experimental methods

Overview of the MD-Unet overall architecture

Fig. 3.

Improvement of the MD-Unet network module

MC module

Fig. 4.

MVCR module

Fig. 5.

RUFF method

ROIE + interest module

Fig. 6.

CBAM attention module

Fig. 7.

AG module

Fig. 8.

Loss function

Evaluation indicators

Experimental environment and parameter settings

Analysis of results

Comparative experiments

Table 1.

Fig. 9.

Epoch loss convergence comparison

Fig. 10.

Fig. 11.

Comparison of the performance of different models

Table 2.

Grad-CAM feature visualization

Fig. 12.

Ablation test

Table 3.

Comparison of the effects of each model in the multi-growth period

Table 4.

The impact of data augmentation on model performance

Table 5.

Discuss

Conclusion

Acknowledgements

Author contributions

Data availability

Declarations

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases