Abstract
Esca is one of the most common disease that can severely damage grapevine. This disease, if not properly treated in time, is the cause of vegetative stress or death of the attacked plant, with the consequence of losses in production as well as a rising risk of propagation to the closer grapevines. Nowadays, the detection of Esca is carried out manually through visual surveys usually done by agronomists, requiring enormous amount of time. Recently, image processing, computer vision and machine learning methods have been widely adopted for plant diseases classification. These methods can minimize the time spent for anomaly detection ensuring an early detection of Esca disease in grapevine plants that helps in preventing it to spread in the vineyards and in minimizing the financial loss to the wine producers. In this article, an image dataset of grapevine leaves is presented. The dataset holds grapevine leaves images belonging to two classes: unhealthy leaves acquired from plants affected by Esca disease and healthy leaves. The data presented has been collected to be used in a research project jointly developed by the Department of Information Engineering, Polytechnic University of Marche, Ancona, Italy and the STMicroelectronics, Italy, under the cooperation of the Umani Ronchi SPA winery, Osimo, Ancona, Marche, Italy. The dataset could be helpful to researchers who use machine learning and computer vision algorithms to develop applications that help agronomists in early detection of grapevine plant diseases. The dataset is freely available at http://dx.doi.org/10.17632/89cnxc58kj.1
Keywords: Plant diseases recognition, Esca disease, Machine learning, Image dataset, Image classification
Specifications Table
Subject | Computer Science, Agricultural and Biological Sciences |
Specific subject area | Computer Vision and Pattern Recognition, Plant Diseases |
Type of data | Image |
How data were acquired | Images were acquired by using three different devices. The first device is an iPad Pro-tablet: 12 MP, f/1.8 aperture and optical image stabilization camera with autofocus. The second device is a Samsung J7 smartphone: 13 MP, f/1.7 camera with autofocus. The third device is an iPhone 8 smartphone: 12 MP, f/1.8 aperture and optical image stabilization camera with autofocus. |
Data format | Raw digital image (JPG format); |
Annotation file (CSV format); | |
Jupyter notebook (IPYNB format). | |
Parameters for data collection | Esca disease and healthy images of grapevine leaves were collected separately. The images were taken at a working distance of 30 cm from grapevine plants during sunny and windy days and considering scenarios with background variety. |
Description of data collection | Diseased and healthy images of grapevine leaves were acquired manually using the camera of three devices (a tablet and two different smartphones). The ground-truth of grapevine plants diseases presence was visually assessed by an expert. |
Data source location | Institution: Umani Ronchi SPA |
City/Town/Region: Maiolati Spontini, Ancona, Marche | |
Country: Italy | |
Latitude and longitude for collected samples/data: 43.470565, 13.112828 | |
Data accessibility | Repository name: ESCA-dataset |
Data identification number: 10.17632/89cnxc58kj.1 | |
Direct URL to data: http://dx.doi.org/10.17632/89cnxc58kj.1 |
Value of the Data
-
•
The data provide a collection of grapevine leaf images, related to two classes: unhealthy leaves affected by Esca disease and healthy leaves. Therefore, it enables researchers to perform machine learning methods for early identification of grapevine diseases.
-
•
The dataset can be used for training, testing and validation of classification algorithms using images of Esca disease and healthy leaves, and to develop computer, smartphone and/or embedded applications addressed to early detection of plant diseases.
-
•
The data are also suitable for different machine learning tasks such as images detection, image segmentation and image synthesis.
1. Data Description
Grapevine trunk diseases consists in a group of pathologies that affects vine caused by several fungal pathogens that live in and colonize the wood causing wood necrosis, wood discoloration, vascular infections, and white decays [1], [2]. Affected grapevines show, externally, a progressive decline in most cases associated with specific foliar symptoms [3] according to the different diseases, that initially can cause loss of productivity and eventually death of the vine [4].
One of the oldest of these diseases, named Esca, attacks the woody part of the vine causing a reduction in the transmission of organic components into the plant that determines a partially or a complete desiccation of the foliage with the death of the plant. Thus, Esca disease causes a vegetative stress of the attacked plants, with the consequence of small bunches of grapes and, in the worst situation, to the death of the attacked plants in addition to the risk of propagation to the neighbor grapevines. Therefore, the presence of these diseases and the lack of effective strategies to control these diseases determines severe losses in production [5]. The average incidence of Esca disease has reached 60% to 80% in some old vineyards of some central and southern Italy regions where epidemiological studies have been carried out [6].
Taking advantage of the fact that Esca can be visually assessed, some computer vision and machine learning techniques can be suitably applied to identify this plants disease [7], [8], [9], [10].
This article presents a dataset containing images of healthy and unhealthy grapevine leaves that could be useful for researchers that aim to identify the grapevine plants diseases applying different computer vision and image processing algorithms. The dataset contains images of different sides of grapevines leaves collected from vineyards crops showing two different states, healthy and unhealthy, where the unhealthy state corresponds to Esca disease, and considering scenarios with background variety. The Esca disease of the grapevine plants manifests itself during the July-September period only, when the climate is ideal for the disease. All images were acquired manually during this period and using three different devices (two smartphones and a tablet) with the help of a grapevine disease expert. The acquired images have a resolution of 19201080 pixels and 1280720 pixels, with randomly portrait and landscape orientation.
The proposed dataset consists in an archive of 1770 images. As shown in Fig. 1, the main folder named “esca_dataset” consists of two sub-folders, namely:
-
•
“esca”: contains the image files related to the esca class, named “esca_n_cam
Source.jpg”;
-
•
“healthy”: contains the image files related to the healthy class, named “healthy_n
_camSource.jpg”.
Fig. 1.
Directories tree of the dataset.
In both cases, n = “0”,,“N”, with N is the total image numbers and camSource = “cam1”, “cam2”, “cam3” indicates the camera used to acquire the corresponding image. An annotation file (.csv) containing the filenames list with the corresponding class ID, has also been included in the root directory. We additionally provide a Jupyter notebook (.ipynb) file useful to apply data augmentation on the dataset. To allow reproducibility of the results reported in Section 2.3, the Jupyter notebook for CNN training has been provided.
Table 1 contains the description of the dataset against each of its class and shows the size of the dataset before and after the augmentation.
Table 1.
Data consistency.
Class name | Class ID | Number of original images | Number of images after data augmentation |
---|---|---|---|
esca | 1 | 888 | 12,432 |
healty | 2 | 882 | 12,348 |
Total | 1770 | 24,780 |
2. Experimental Design, Materials and Methods
2.1. Processing
The grapevine plants images of unhealthy and healthy leaves were acquired using three different devices equipped with the following cameras: the first and the third (labelled in the dataset as “cam1” and “cam3”) are a 12 MP, f/1.8 aperture and optical image stabilization cameras with autofocus of a tablet and a smartphone respectively, the second (labelled in the dataset as “cam2”) is a 13 MP, f/1.7 smartphone camera with autofocus. The acquired images with “cam2” and “cam3” have a resolution of 19201080 pixels while “cam1” produces 1280720 pixels image, with randomly portrait and landscape orientation for all the cameras. In all cases, the images have been acquired at a working distance of 30 cm with/without zoom. To have real and representative samples of leaves, the images have been acquired under real conditions, that is considering multiple lighting brightness, backgrounds and different sides.
In machine learning context, classification aims finding a class to which a new observation belongs. To perform plant diseases classification standard methods such as support vector machine, decision trees, k-nearest neighbor classification, and also more recent methods like deep neural network, and specifically for images, convolutional neural networks (CNNs) [7], can be applied.
For classification problems, the dataset provides two labelled classes: unhealthy, that is affected by Esca disease (class 1) and healthy (class 2). They consist of 888 and 882 images, respectively as reported in Table 1. Fig. 2 shows examples of leaf images for each class. Labels are applied to the image set once a visual inspection has been performed by an expert to establish the leaf state. The fully-labelled image dataset can be used in research activity where machine learning methods are applied to solve not only classification problems but also detection, segmentation and image synthesis tasks.
Fig. 2.
Example of grapevine leaves belonging to different classes: a) Esca disease, b) healthy for classification task.
2.2. Augmentation procedure
Data Augmentation includes a suite of techniques to enhance the size and quality of training datasets such that more accurate deep learning models can be built using them [11]. The augmentation of images is a useful training technique to increase the diversity of the training set by applying random, but realistic, transformations such as geometric transformations, color space transformations, cropping, noise injection and random erasing. As a desired effect, data augmentation can improve the performance of trained models and expand limited datasets to take advantage of the capabilities of big data [11].
To implement data augmentation, we used the ImageDataGenerator class, provided by the Keras (v 2.4.3) deep learning library [12]. Python code released with dataset performs the following steps:
-
1.
Download dataset from Mendeley repository: in this step data are directly downloaded from the Mendeley repository containing the “esca_dataset” and stored in the current path.
-
2.Data Augment: in this step the code applies several transformations by using the ImageDataGenerator class to generate more images. It should be noted that the class returns only the augmented images but not the original images. However, in the provided code, also a copy of the original images is done in the augmented dataset, in order to obtain a new dataset ready-to-use. More specifically this code is able to:
-
-Import libraries such as Keras, TensorFlow and OpenCV, useful to the augmentation task.
-
-Enable/disable some of the provided transformations, simply uncomment/comment them.
-
-Achieve augmentation by creating ImageDataGenerator’s object class by the following transformations: horizontal and vertical flip, rotation, width and height shift, zoom, shear, blur, brightness, contrast, saturation, hue, gamma. The default configurations for the aforementioned transformations are: horizontal flip allowed; vertical flip allowed; rotation range angle of 40; width shift range of 0.2; height shift range of 0.2 (with fill in nearest mode); shear range of 0.2; zoom range of 0.2; blur filter enabled; brightness range [1.1, 1.5]; contrast of 0.5 ; saturation of 3; hue of 0.1; gamma of 2. User can modify these default values.
-
-
-
3.
Visualize images generated from data augmentation: this is an optional step that allows the user to visualize in a grid some original images compared with the applied transformations.
-
4.
Save augmented dataset: in this step an archive of the new augmented dataset is created and the user can automatically save this new dataset, named “augmented_esca_dataset”, to his own Google Drive.
Some examples of the augmented images are shown in Fig. 3 and Fig. 4, where figure a) represents the starting image (original) and figure b), c), d), e), f), g), h), i), l), m), n), o), p) represent the augmented images obtained by running the code. Summarizing, we have six geometric transformations (horizontal and vertical flip, rotation, width and height shift) and five color transformations (brightness, contrast, saturation, hue, gamma), plus two other image manipulations like zoom and blur. Geometric transformations could be very useful for the specific application, because during the path of the tractor, a leaf can be captured in different angles and especially in a vineyards the leaves are distributed in random positions with respect to other types of plants, even taking into account the movements of the tractor on a terrain which is naturally uneven. The color transformations are also very important in order to simulate different luminosity and exposition conditions. However, we have chosen only the color transformations do not alter too much the Esca spots.
Fig. 3.
Examples of augmentation for the class ’esca’. a) is the original image; b), c), d), e), f), g), h), i), l), m), n), o), p) are the augmented images.
Fig. 4.
Examples of augmentation for the class ’healthy’. a) is the original image; b), c), d), e), f), g), h), i), l), m), n), o), p) are the augmented images.
2.3. Classification with CNN
As an example of the aforementioned machine learning methods for the classification task, we trained a simple CNN using the proposed dataset with augmentation and for three different pixel sizes (1280720, 320180, 8045). We consider different resolutions, obtained from downsampling the original images, to show how the dataset can be used for different target applications, for example: a web application using a high-resolution smartphone camera and with no memory size constraints or an embedded application using a low-cost device with a low-resolution camera and/or reduced memory size.
The baseline CNN architecture, summarized in Fig. 5, consists of 5 convolutional 2D layers followed by ReLu activation function and 5 2D max pooling with 22 pool size. In the final stage a flatten, two dense layers, with ReLu and softmax activation function respectively and a dropout layer between them, are inserted to classify the provided input training images into 2 fill level classes. The initial input size in Fig. 5 refers to the maximum input size for this dataset, that is 1280720. The same architecture, with different dimensions, has been used for the two other input sizes. The number of epoch used to train the network is 50 for all the experiments. The CNN training, validation and testing have been performed on the augmented dataset splitted as in the following: 60% train, 15% validation and 25% test.
Fig. 5.
Model summary of the simple CNN example.
Loss and accuracy achieved on training, validation and testing data are reported in Table 2, while Fig. 6 shows the training/validation accuracy and loss in function of the number of epochs, although a detailed description of network performance is out the scope of this paper.
Table 2.
Performances of the example baseline model obtained with 3 different pixel sizes of the input images.
Loss |
Accuracy |
||||||
---|---|---|---|---|---|---|---|
Input pixel size | Training | Validation | Testing | Training | Validation | Testing | Epochs |
1280720 | 0.0102 | 0.1948 | 0.1367 | 0.9989 | 0.9890 | 0.9916 | 50 |
320180 | 0.0034 | 0.0263 | 0.0531 | 0.9996 | 0.9957 | 0.9948 | 50 |
8045 | 0.0004 | 0.2425 | 0.3819 | 0.9999 | 0.9736 | 0.9688 | 50 |
Fig. 6.
Training/validation accuracy and training/validation loss in function of the number of epochs and for three different pixel sizes a) 1280720, b) 320180, c) 8045.
Ethics Statement
The work did not involve any human or animal subjects, nor data from social media platforms.
CRediT Author Statement
M. Alessandrini: Data curation; R. Calero Fuentes Rivera: Conceptualization, Methodology, Software, Validation, Formal analysis, Data curation; L. Falaschetti: Conceptualization, Method- ology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Writing - review & editing, Visualization; D. Pau: Conceptualization, Methodology, Investigation, Writing - review & editing, Visualization, Supervision, Project administration, Funding acquisition; V. Tomaselli: Conceptualization, Writing - review & editing, Visualization, Funding acquisition; C. Turchetti: Conceptualization, Methodology, Investigation, Writing - review & editing, Visualization, Supervision, Project administration.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.
Acknowledgments
The authors would like to acknowledge the Umani Ronchi SPA winery, Osimo, Ancona, Italy, for its willingness to collect the images of the dataset.
This work is part of the H2020-ECSEL-2017-2-RIA-two-stage funded project AFarCloud, Aggregate FARming in the CLOUD.
References
- 1.Bertsch C., Ramrez-Suero M., Magnin-Robert M., Larignon P., Chong J., Abou-Mansour E., Spagnolo A., Clment C., Fontaine F. Grapevine trunk diseases: complex and still poorly understood. Plant Pathol. 2013;62(2):243–265. [Google Scholar]
- 2.Mugnai L., Graniti A., Surico G. Esca (black measles) and brown wood-streaking: two old and elusive diseases of grapevines. Plant Dis. 1999;83(5):404–418. doi: 10.1094/PDIS.1999.83.5.404. [DOI] [PubMed] [Google Scholar]
- 3.Lecomte P., Darrieutort G., Liminana J.-M., Comont G., Muruamendiaraz A., Legorburu F.-J., Choueiri E., Jreijiri F., El Amil R., Fermaud M. New insights into esca of grapevine: the development of foliar symptoms and their association with xylem discoloration. Plant Dis. 2012;96(7):924–934. doi: 10.1094/PDIS-09-11-0776-RE. [DOI] [PubMed] [Google Scholar]
- 4.Mondello V., Songy A., Battiston E., Pinto C., Coppin C., Trotel-Aziz P., Clment C., Mugnai L., Fontaine F. Grapevine trunk diseases: a review of fifteen years of trials for their control with chemicals and biocontrol agents. Plant Dis. 2018;102(7):1189–1217. doi: 10.1094/PDIS-08-17-1181-FE. [DOI] [PubMed] [Google Scholar]
- 5.Gallo R., Ristorto G., Daglio G., Massa N., Berta G., Lazzari M., Mazzetto F. New solutions for the automatic early detection of diseases in vineyards through ground sensing approaches integrating lidarand optical sensors. Chemical Engineering Transactions. 2017;58:673–678. [Google Scholar]
- 6.Romanazzi G., Murolo S., Pizzichini L., Nardi S. Esca in young and mature vineyards, and molecular diagnosis of the associated fungi. European Journal of Plant Pathology. 2009;125(2):277–290. [Google Scholar]
- 7.Kamilaris A., Prenafeta-Boldú F.X. Deep learning in agriculture: a survey. Comput. Electron. Agric. 2018;147:70–90. [Google Scholar]
- 8.ur Rahman H., Ch N.J., Manzoor S., Najeeb F., Siddique M.Y., Khan R.A. A comparative analysis of machine learning approaches for plant disease identification. Advancements in Life Sciences. 2017;4(4):120–126. [Google Scholar]
- 9.Fuentes A., Yoon S., Kim S., Park D. A robust deep-learning-based detector for real-time tomato plant diseases and pests recognition. Sensors. 2017;17(9):2022. doi: 10.3390/s17092022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Singh V., Misra A. Detection of plant leaf diseases using image segmentation and soft computing techniques. Information Processing in Agriculture. 2017;4(1):41–49. [Google Scholar]
- 11.Shorten C., Khoshgoftaar T.M. A survey on image data augmentation for deep learning. J. Big Data. 2019;6(1):60. doi: 10.1186/s40537-021-00492-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chollet F. Building powerful image classification models using very little data. Keras Blog. 2016 [Google Scholar]