Abstract
Intelligent agriculture heavily relies on the science of agricultural disease image recognition. India is also responsible for large production of French beans, accounting for 37.25% of total production. In India from south region of Maharashtra state this crop is cultivated thrice in year. Soyabean plant is planted between the months of June through July, during the months of October and September during the rabi season, as well as in February. In the Maharashtrian regions of Pune, Satara, Ahmednagar, Solapur, and Nashik, among others, Soyabean plant is a common crop. In Maharashtra, Soyabean plant is grown over an area of around 31,050 hectares. This research presents a dataset of leaves from soyabean plants that are both insect-damaged and healthy. Images were taken over the course of fewer than two to three seasons on several farms. There are 3363 photos altogether in the seven folders that make up the dataset. Six categories comprise the dataset: I) Healthy plants II) Vein Necrosis III) Dry leaf IV) Septoria brown spot V) Root images VI) Bacterial leaf blight. This study's goal is to give academics and students accessibility to our dataset so they may use it for their studies and to build machine learning models.
Keywords: Soyabean leaf (Glycine max), Datasets, Image classification, Machine learning, Deep learning
Specifications Table
| Subject | Agronomy, Agricultural science, Horticulture, Computer science |
| Specific subject area | Image Processing, Deep Learning, plant diseases |
| Data format | Raw and filtered images are in .jpg format. |
| Type of data | Raw and preprocess images of Soyabean leaf |
| Data collection | Two smartphones are used to manually take high-quality photos. photos taken on a beautiful day. The top of the leaf is more heavily scrutinised to determine if it is healthy or unhealthy. |
| Data source location | Goudgaon Village farm of (Sub. Major Gulab Alam Kotwal), Tal: Barshi, Dist: Solapur, Maharashtra, India.413406. 18.2157727 Latitude and 75.6680118 Longitude. |
| Data accessibility | Repository name: An India soyabean leaf dataset Data identification number: 10.17632/bshkvgbzpt.1 Direct URL to data: https://data.mendeley.com/datasets/bshkvgbzpt/1 Instructions for accessing these data: Datasets consist of Single leaf and multi-leaf folder. |
| Related research article |
|
1. Value of the Data
-
•
The dataset presented here is a collection of leaves from Soyabean plant that were gathered using mobile devices.
-
•
Researchers as well as learners from many fields can use the dataset, which comprises of 1500 processed photos [2]. Researchers may utilise the dataset to review and validate the data as needed using various Predictive model, and to evaluate the precision of the algorithms.
-
•
The dataset is a freely downloadable open source that is accessible to the general audience. So, without performing any additional pre-processing or confirmation, researchers may train the machine learning model using this dataset.
-
•
The information may be used to develop high-quality tools for identifying and categorising diseases in Soyabean plant leaves that benefit society [3].
2. Background
2.1. Objectives
-
(a)
A dataset with several disease classifications present on Soyabean plant leaves can aid AI/ML algorithms in real-time illness detection and classification.
-
(b)
Pre-processing a dataset can help an AI/ML model perform more accurately.
3. Data Description
The six classifications that make up this dataset are healthy, vein necrosis, dry leaf, Septoria brown spot, root images and bacteria leaf blight. The first folder has 288 healthy images (Single leaf image and multi-image leaf). The second folder has 138 images of vein necrosis (Single leaf image and multi-image leaf). The dry leaf is in the third folder with 230 images (Single leaf image and multi-image leaf). Fourth folder contains 284 images of Septoria brown spot. The fifth folder contain 10 images of root. The sixth folder 226 images of bacteria leaf blight, while leaf images all(raw) is the last folder with 2187 images (Table 1).
Table 1.
Types of disease and number of images.
| Folder | Number of images |
|---|---|
| Healthy | 288 |
| Vein necrosis | 138 |
| Dry leaf | 230 |
| Septoria brown spot | 284 |
| Root images | 10 |
| Bacteria leaf blight | 226 |
In this part, we examine the unusual symptoms of several diseases identified in our dataset's leaf photos. Examples of each ailment and the healthy group are shown in Fig. 1.
Fig. 1.
Healthy leaf with single and multi-image.
Vein necrosis is brought on by a fungus that needs water on the surface of leaves to flourish, thus watering at the plant's base will help eliminate moisture on the leaves.
One of the most destructive diseases of the common bean in tropical and subtropical production zones is Bacterial leaf spot (BLS), which is brought on by the bacterium pheudocercospora griseola.
Common Septoria brown spot illness fusarium wilt has symptoms that resemble verticillium wilt. Yellowing, stunting, and deadness of seedlings are among the symptoms, as are yellowing and stunting of older plants (Fig. 2).
Fig. 2.
Single leaf and Multi-leaf of Septoria brown disease.
4. Experimental Design, Materials and Methods
Images from smartphones were taken in July 2022 from a small village called Goudgaon, Tal: Barshi, Dist: Solapur, in the Maharashtra area. Since the timeframe is ideal for Soyabean plant in the area, the procedure of taking the pictures took place at that time. Plants are photographed under sunny conditions at various stages.
Three stages made up the pre-processing of the photos.
-
(1)
Data Acquisition:
The photographs were taken using the high-quality back camera of a smart phone. 3363 pictures were all taken using a camera, sorted, and stored to the appropriate folder as shown in Fig. 3.
-
(2)
Image size:
Fig. 3.
Collection of dataset.
In this phase, the images of different size collected from a village goudgaon situated in Maharashtra, India.
-
(3)
Dataset split:
Separating a dataset into test and training sets to assess how well a machine learning model works. Splitting is necessary to solve the issue of overfitting.
The first phases of pre-processing are to arrange the images into six folders for the classification purpose: 1) Healthy 2) Vein necrosis 3) Dry leaf 4) Septoria brown spot 5) Root images 6) Bacteria leaf blight.
The second step is one of the most crucial since photographs are taken using smartphones, varying in size from 1600 × 1200 pixels in width and height to 96 dots per inch. We use the resize () function from the Python programming language to retain the usual picture size of 300*30 pixels. The dimensions of the photographs are being adjusted to be the same. To rotate the picture, zoom, alter the brightness range, and perform other operations on the image, we utilise the ImageDataGenerator function from keras. preprocessing.
The third phases are to split the dataset for training and testing as shown in Fig. 4.
Fig. 4.
Splitting of dataset for machine learning model.
Limitations
‘None’.
Ethics Statement
The article “An India soyabean dataset for identification and classification of diseases using computer-vision algorithms” the following is fulfilled:
1. This article is the authors’ own original work, which has not been previously published elsewhere.
Credit Author Statement
Jameer Gulab Kotwal: collection of dataset and preprocessing. Dr. Ramgopal Kashyap: data augmentation, methodology. Dr. Shafi Pathan: Review, writing and editing.
Acknowledgments
We would like to express our gratitude to Subedar Major Mr. Gulab Alam Kotwal (Indian Army) for his support in the collection of the soyabean dataset.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data Availability
An India soyabean leaf dataset (Original data) (https://data.mendeley.com/datasets/bshkvgbzpt/1).
References
- 1.Kotwal J.G., Kashyap R., Shafi P.M. Artificial Driving based EfficientNet for Automatic Plant Leaf Disease Classification. Multimed Tools Appl. 2023 doi: 10.1007/s11042-023-16882-w. [DOI] [Google Scholar]
- 2.Siddique M.M., Islam T., Tusher Y.A., Ema R.R., Adnan M.N., Galib S.M. Paddynet: an organized dataset of paddy leaves for a smart fertilizer recommendation system. Data Br. 2023;50 doi: 10.1016/j.dib.2023.109516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Zhao J., Almodfer R., Wu X., Wang X. A dataset of pomegranate growth stages for machine learning-based monitoring and analysis. Data Br. 2023;50 doi: 10.1016/j.dib.2023.109468. [DOI] [PMC free article] [PubMed] [Google Scholar]
Further Reading
- 1.Plotnikova O.O., Lebedeva M.P., Tsymbarovich P.R., Devyatykh V.A. A dataset of soil microstructure features and physicochemical properties for 1968 and climate sequence for 1951–1992 in the Caspian lowland. Data Br. 2023;46:0–6. doi: 10.1016/j.dib.2023.108909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Islam S., Ahmed M.R., Islam S., Rishad M.M.A., Ahmed S., Utshow T.R., Siam M.I. BDMediLeaves: a leaf images dataset for Bangladeshi medicinal plants identification. Data Br. 2023;50:109488. doi: 10.1016/j.dib.2023.109488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Rodrigues P.C. Simulated data from a genotype-to-phenotype crop growth model for pepper. Data Br. 2021;36:107119. doi: 10.1016/j.dib.2021.107119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ma Y., Tang K., Xu Y., Thomas-Danguin T. A dataset on odor intensity and odor pleasantness of 222 binary mixtures of 72 key food odorants rated by a sensory panel of 30 trained assessors. Data Br. 2021;36:107143. doi: 10.1016/j.dib.2021.107143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Abdelghafour F., Keresztes B., Deshayes A., Germain C., Da Costa J.-P. An annotated image dataset of downy mildew symptoms on Merlot grape variety. Data Br. 2021;37:107250. doi: 10.1016/j.dib.2021.107250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Alessandrini M., Calero Fuentes Rivera R., Falaschetti L., Pau D., Tomaselli V., Turchetti C. A grapevine leaves dataset for early detection and classification of esca disease in vineyards through machine learning. Data Br. 2021;35:106809. doi: 10.1016/j.dib.2021.106809. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Jepkoech J., Mugo D.M., Kenduiywo B.K., Too E.C. Arabica coffee leaf images dataset for coffee leaf disease detection and classification. Data Br. 2021;36:107142. doi: 10.1016/j.dib.2021.107142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kotwal J., Kashyap R., Pathan S. Agricultural plant diseases identification: from traditional approach to deep learning. Mater. Today: Proc. 2023 doi: 10.1016/j.matpr.2023.02.370. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
An India soyabean leaf dataset (Original data) (https://data.mendeley.com/datasets/bshkvgbzpt/1).




