Abstract
The PlantCity dataset addresses significant agricultural yield losses in Pakistan from plant diseases. It provides 10,667 high-resolution images of leaves from 12 key crops: apple, apricot, bean, cherry, maize, fig, grape, loquat, pear, tomato, walnut, and persimmon. The images are organized into 52 classes (41 diseased and 11 healthy) and augmented to a total of 52,273 images. Data was collected in real-field conditions in Charsadda (34.15°N, 71.74°E, typical temperature 40–44 °C) and Chitral (35.85°N, 71.79°E, typical temperature 25–30 °C) from April to July 2023–2024. The dataset enables the development of deep learning models for automated disease classification and captures a range of environmental factors, including high temperatures that can exacerbate disease symptoms. It utilizes smartphone-based computer vision to facilitate early disease identification, thereby supporting precision farming and sustainable agriculture in Pakistan.
Keywords: Plants leaf diseases, Multi-crop dataset, Precision Agriculture, Pakistan
Specifications Table
| Subject | Computer Sciences |
| Specific subject area | Image Classification, Multi Class Leaves Disease Classification, Deep Learning, and Computer Vision. |
| Type of data | Image. |
| Data collection | Field surveys conducted from April to June in 2023 and 2024 for the collection of tomato leaf disease, and from June to July in 2023 and 2024 for the collection of the remaining 11 species, supervised by experts, ensured comprehensive image acquisition under various environmental conditions. The dataset comprises a collection of 10,667 images depicting various stages of different leaf diseases. These images are categorized into 52 classes. Each class represents a distinct expression of diseases, pests, or environmental stress in plant leaves. |
| Data source location | Charsadda (Village Sarki) chosen for tomato and Chitral (Village Danin) for the other 11 crops, Khyber Pakhtunkhwa, Pakistan |
| Data accessibility | Repository name: Mendeley Data Data identification number: 10.17632/w8kh2xkspx.2 Direct URL to data: https://data.mendeley.com/datasets/w8kh2xkspx/1 |
| Related research article | None |
1. Value of the Data
-
•
The PlantCity dataset is comprehensive, consisting of 10,667 high-resolution images across 52 classes (41 diseased, 11 healthy) from 12 crop species, collected from Charsadda (tomato disease symptoms) and Danin Chitral (selected for its agro-climatic suitability for fruit crops like walnut, apple, and grapes, high altitude conditions, and thriving in cooler conditions). Most current datasets (like PlantVillage and PlantDoc) are constrained in scope and are usually gathered under controlled laboratory conditions, which do not represent the difficulties of real-field environments. The majority of publicly accessible plant diseases lack real-world diversity since they were collected in laboratory settings. Although they are helpful for research, models developed using such datasets can hinder the application of real-world datasets for disease categorization in practice. This dataset includes various diseases that are not available in publicly accessible datasets.
-
•
The creation and training of complex algorithms for automated disease identification and classification are made possible by the availability of high-quality, tagged images of multiple plant leaves afflicted by different plant leaf diseases. Early illness detection with high accuracy and low computation time is nearly impossible with conventional methods. Therefore, it's critical to develop illness diagnosis techniques that can complete the work automatically, precisely, quickly, and affordably [1].
-
•
For agricultural scientists, legislators, and researchers in computer vision and machine learning, a comprehensive multi-class plant leaf disease dataset is an indispensable resource. It aids precision agriculture, supports the development of intelligent diagnostic systems, and informs data-driven decisions for crop protection and environmentally friendly agricultural methods. Commonly found in local agricultural areas, especially in Pakistan, the dataset comprises plants and serves as a foundation for advancing research and developing innovative solutions to pressing agricultural challenges.
2. Background
Over one-third of Pakistan's population relies on the agricultural industry for their livelihoods, making it a crucial pillar of the country's economy. For both domestic food security and export markets, crops such as apples, apricots, beans, cherries, maize, figs, grapes, loquats, pears, tomatoes, walnuts, and persimmons are essential. However, several pests and leaf diseases pose a constant threat to these crops, resulting in significant yield losses and financial difficulties for producers. These difficulties are exacerbated by factors such as inefficient disease management approaches and insufficient available datasets for identifying leaf diseases in regions like Khyber Pakhtunkhwa.
Existing datasets, such as PlantDoc [2], Plant Village [3], Plant Pathology [4], and Plant_K [5], are among the most popular datasets for multi-crop leaf disease. However, they primarily consist of images from laboratory environments, and some are based on binary classifications of diseased plants. Additionally, these datasets often cover fewer crop species or disease classes (e.g., PlantVillage: 14 species, 38 classes; PlantDoc: 13 species, 28 classes), limiting their applicability to diverse agricultural contexts. On the other hand, single-crop datasets, such as the Indian Bay Leaf dataset [6], address a more limited range of agricultural issues by focusing on a single species (Cinnamomum tamala, comprising 5966 pictures and 3 classes: healthy, dry, and diseased). The PlantCity dataset addresses these limitations by providing 10,667 high-resolution images across 12 economically vital crops and 52 classes (41 diseased and 11 healthy), augmented to 52,219 images. The images were taken at various times of the day, representing fluctuating environmental and lighting conditions. This dataset supports resilience and wealth in various farming systems worldwide by promoting sustainable agricultural methods.
3. Data Description
The PlantCity dataset, collected in Pakistan, comprises 10,667 high-resolution images of leaves from 12 economically significant crop species, including persimmon, apple, apricot, bean, cherry, maize, fig, grape, loquat, pear, tomato, and walnut. The dataset has been augmented to 52,273 images to enhance model robustness and is organized into 52 classes, comprising 41 disease classes and 11 healthy classes. The image collection took place under real-field conditions in Charsadda (34.15°N, 71.74°E; typical temperature: 40–44 °C) and Chitral (35.85°N, 71.79°E; typical temperature: 25–30 °C), Khyber Pakhtunkhwa, Pakistan, from April to July 2023. Charsadda serves as the primary region for tomato cultivation, owing to its fertile plains and well-developed irrigation system. However, the area also faces challenges from various tomato leaf diseases. The remaining 11 crops were selected from Chitral due to its established reputation as a center for fruit cultivation and its diverse agro-climatic conditions that facilitate optimal crop growth.
-
1
All 52 classes are listed in Table 1, along with the number of augmented and original images in each class. The variability of healthy and diseased leaves across 12 crops under various climatic conditions (40–44 °C in Charsadda and 25–30 °C in the Chitral mountainous regions) is depicted in Fig. 1, which features a sample image from each class. For instance, Table 2 highlights PlantCity's advantages, including its larger class count (52 classes), real-field image collection under diverse climatic circumstances, and its comprehensive coverage of crops and diseases compared to state-of-the-art datasets such as PlantVillage [3], PlantDoc [2], Plant Pathology [4], Plant_K [5], and the Indian Bay Leaf dataset [6]. PlantCity is particularly distinct because it includes an extensive number of crops and rare disease classes that are typically absent or under-represented in other datasets, such as walnut leaf gall mite (not found in PlantVillage walnut classes), cherry purple leaf spot (rare in existing datasets), fig blight (specific to real-field settings), and maize holcus leaf spot (uncommon in lab-based collections). Unlike some datasets that are limited to binary classification (healthy or diseased only), PlantCity captures a broader range of disease variability and environmental contexts. Newly included disease classes, previously missing from earlier datasets, are listed in Table 1.
Table 1.
PlantCity Dataset Classes and Image Counts.
| Class | Exist in Literature | Images | Augmented | Total |
|---|---|---|---|---|
| Apple brown Spot | No | 265 | 1060 | 1325 |
| Apple normal | Yes | 206 | 824 | 1030 |
| Apple black Spot | No | 102 | 469 | 571 |
| Apricot normal | No | 163 | 652 | 815 |
| Apricot blight leaf disease | No | 90 | 312 | 402 |
| Apricot shot hole | No | 208 | 832 | 1040 |
| Bean fungal leaf disease | No | 194 | 776 | 970 |
| Bean normal leaf | No | 223 | 900 | 1123 |
| Bean rust | No | 124 | 500 | 624 |
| Bean shot hole | No | 98 | 392 | 490 |
| Cherry leaf scorch | No | 282 | 1101 | 1383 |
| Cherry normal leaf | Yes | 125 | 500 | 625 |
| Cherry brown spot | No | 242 | 968 | 1210 |
| Cherry purple leaf spot | Yes | 240 | 987 | 1227 |
| Cherry shot hole disease | No | 110 | 440 | 550 |
| Maize fungal leaf | No | 83 | 332 | 415 |
| Maize normal leaf | Yes | 142 | 568 | 710 |
| Maize gray leaf spot | Yes | 132 | 528 | 660 |
| Maize holcus leaf spot | No | 108 | 432 | 540 |
| Fig bight leaf disease | No | 191 | 763 | 954 |
| Fig brown spot | No | 107 | 428 | 535 |
| Fig normal leaf | No | 150 | 600 | 750 |
| Fig rust leaf | No | 125 | 500 | 625 |
| Grape anthracnose leaf | No | 249 | 996 | 1245 |
| Grape brown spot leaf | No | 160 | 640 | 800 |
| Grape downy mildew leaf | No | 155 | 624 | 779 |
| Grape mites leaf disease | No | 125 | 500 | 625 |
| Grape normal leaf | Yes | 372 | 1485 | 1857 |
| Grape powdery mildew leaf | No | 301 | 1208 | 1509 |
| Grape shot hole leaf disease | No | 213 | 852 | 1065 |
| Loquat normal leaf | No | 129 | 501 | 630 |
| Pear black spot leaf disease | No | 226 | 904 | 1130 |
| Pear normal leaf | No | 191 | 764 | 955 |
| Pear fire blight | No | 95 | 380 | 475 |
| Walnut anthracnose leaf disease | No | 162 | 648 | 810 |
| Walnut blotch leaf disease | No | 339 | 1354 | 1693 |
| Walnut normal leaf | No | 178 | 712 | 890 |
| Walnut shot hole | No | 231 | 916 | 1147 |
| Walnut leaf gall mite | No | 98 | 392 | 490 |
| Loquat leaf spot | No | 198 | 792 | 990 |
| persimmons brown spot | No | 166 | 663 | 829 |
| tomato spider mites | Yes | 127 | 508 | 635 |
| tomato verticillium wilt | No | 105 | 420 | 525 |
| Tomato bacterial spot | Yes | 400 | 1305 | 1705 |
| Tomato early blight | Yes | 400 | 1525 | 1925 |
| Tomato healthy leaf | Yes | 400 | 1478 | 1878 |
| Tomato late blight | Yes | 294 | 1112 | 1406 |
| Tomato leaf curl | No | 413 | 1480 | 1893 |
| Tomato leaf miner | No | 400 | 1497 | 1897 |
| Tomato leaf mold | No | 400 | 1510 | 1910 |
| Tomato septoria leaf | Yes | 341 | 1258 | 1599 |
| tomato fusarium wilt | No | 89 | 318 | 407 |
| Total samples | 10,667 | 41,606 | 52,273 |
Fig. 1.
Sample images from the PlantCity dataset, which includes all 52 classes (41 diseased, 11 healthy) and 12 crop species, were taken in a variety of field circumstances, such as hot (40–44 °C in Charsadda plains) and cold (25–30 °C in Chitral mountains).
Table 2.
Comparison table with state of the art in multi crop leaves dataset.
| Dataset | Species | Disease | Class | Environment | Images |
|---|---|---|---|---|---|
| Plant Village [3] | 14 | 22 | 38 | Lab | 54,306 |
| PlantDoc [2] | 13 | 17 | 28 | internet downloads | 2598 |
| Plant Pathology [4] | 12 | 10 | 20 | Real-Field Plains | 4503 |
| Plant_K [5] | 8 | 9 | 16 | Lab | 2157 |
| Indian Bay Leaf [6] | 1 | 2 | 3 | Lab | 5966 |
| PlantCity | 12 | 41 | 52 | Real-Field (Plains/Mountains) | 10,667 |
Note: PlantVillage dataset 54,306 images are the original count, though studies often apply augmentation (e.g., flips, rotations) during model training.
4. Experimental Design, Materials and Methods
This section discusses the general experimental design and dataset preparation approach. This procedure is depicted in Fig. 2.
Fig. 2.
Shows all the flow methodology of proposed dataset.
4.1. Objective
The dataset aims to develop deep learning models that can effectively recognize and classify diseases in images of plant leaves, facilitating the early identification and treatment of plant issues.
4.2. Location
Data were collected in two agro-ecologically distinct regions of Khyber Pakhtunkhwa, Pakistan: Charsadda (Village Sarki, 34.15°N, 71.74°E, with a typical temperature of 40–44 °C during collection) and Chitral (Village Danin, 35.85°N, 71.79°E, with temperatures ranging from 25 to 30 °C.
4.3. Data collection process
4.3.1. Camera device
The dataset was collected using a Canon EOS 1200D model with a high resolution of 5184×3456 at ISO settings of 1600/800, an Infinix Hot 10 model X682B with a resolution of 4608×3456, and an Infinix Hot 5 model X657B with a resolution of 1872×4160. Cameras were positioned at a distance of 30–50 cm from the leaves, at angles of 45–90°, under natural daylight (morning and afternoon, with care taken to avoid direct noon sunlight to minimize glare). These settings ensured consistent, high-quality images suitable for deep learning tasks.
4.3.2. Engagement with experts
Initial interactions with field workers and agricultural specialists verified the types and symptoms of plant leaf diseases as well as other crucial details.
4.3.3. On-Site collection
To capture images of tomato leaves exhibiting various diseases, we visited different tomato fields. To ensure a varied representation of the conditions and phases of disease progression, the collection was conducted over a period of 3 months in 2023 and 2024, encompassing 20–25 distinct cultivations in the district of Charsadda. The remaining dataset was collected over the 2 months of June and July 2023 and 2024 in District Chitral. Collections occurred in the morning (7–10 AM) and afternoon (2–5 PM) under typical environmental conditions: temperatures of 40–44 °C and humidity of 50–60 % in Charsadda, and 25–30 °C and 55–70 % in Chitral.
4.4. Data processing
Pre-processing enhances key features of image data, making it more suitable for analysis and increasing the overall accuracy and reliability of the dataset. All the images were captured at different resolutions using various devices, and the resolutions were too high for the deep learning task. Therefore, we have resized all the images to 1000×800 and 1200×800 resolutions.
4.5. Data augmentation
The multiclass plant leaf classification data technique is essential to the agriculture industry and its sustainability. To improve the data sets from various perspectives, we employ augmentation techniques and the imgaug library to apply a number of image augmentation methods. In particular, we flipped both horizontally and vertically to mimic changes in the target features' positions. To make the model more resilient to noisy inputs, additive Gaussian noise was also added to simulate sensor noise and actual image errors. Moreover, orientation variability was taken into account by applying random affine rotations between −45° and +45° (see Fig. 3).
Fig. 3.
All the augmentation on proposed dataset.
Limitations
The PlantCity dataset may not be as applicable to areas with varied climates or disease prevalence, as it is region-specific to Charsadda (34.15°N, 71.74°E, with a typical temperature of 40 to 44 °C) and Chitral (35.85°N, 71.79°E, with usual temperatures of 25–30 °C). In Charsadda, high temperatures may have exacerbated disease symptoms that might not be observed in colder climates, such as increased fungal development in maize gray leaf spot or more leaf withering in tomato early blight. Camera resolution differences (e.g., Canon EOS 1200D at 5184×3456 vs. Infinix Hot 5 at 1872×4160) could influence the consistency of feature recognition and, consequently, model performance. To cover a wider range of temperatures and illness profiles, future research could standardize camera settings and extend the collection to other areas, such as Punjab, which may impact the consistency of feature detection and, consequently, model performance.
Ethics Statement
Our study fulfills with Data in Brief's dataset ethics guidelines because it doesn't use any human or animal participants. Thus, be sure that ethical issues are being followed.
Credit Author Statement
Muhammad Sheraz Khan: Conceptualization, Visualization, Methodology, Data Curation, Writing - original draft; Kainat Nisa: Methodology, Data Curation, Writing - original draft; Irshad Ahmad: Conceptualization, Supervision, Methodology, Writing –review & editing. Muhammad Zubair: Supervision, Writing –review & editing; Kaznah Alshammari: Writing –review & editing.
Acknowledgements
The authors extend their appreciation to the Deanship of Scientific Research at Northern Border University, Arar, Kingdom of Saudi Arabia, for funding this research work through project number “NBU- FFR-2025–2467–04”.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Contributor Information
Muhammad Sheraz Khan, Email: msherazkhan8888527@gmail.com.
Irshad Ahmad, Email: irshad@icp.edu.pk.
Muhammad Zubair, Email: zubair@icp.edu.pk.
Kaznah Alshammari, Email: Khaznah.alshammari2@nbu.edu.sa.
Data Availability
References
- 1.Adeel A., et al. Entropy-controlled deep features selection framework for grape leaf diseases recognition. Expert. Syst. 2020 doi: 10.1111/exsy.12569. May. [DOI] [Google Scholar]
- 2.Singh D., et al. Proceedings of the 7th ACM IKDD CoDS and 25th COMAD. 2020. PlantDoc: a dataset for visual plant disease detection; pp. 249–253. [DOI] [Google Scholar]
- 3.Mohanty S.P., Hughes D.P., Salathé M. Using deep learning for image-based plant disease detection. Front. Plant. Sci. 2016;7:1419. doi: 10.3389/fpls.2016.01419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Chouhan S.S., et al. 2019 4th International Conference on Information Systems and Computer Networks (ISCON) IEEE; 2019. A data repository of leaf images: practice towards plant conservation with plant pathology; pp. 703–707. [DOI] [Google Scholar]
- 5.Kour V.P., Arora S. Recent Innovations in Computing: Proceedings of ICRIC 2021, Volume 1. Springer; 2022. Plantaek: a leaf database of native plants of Jammu and Kashmir; pp. 359–368. [DOI] [Google Scholar]
- 6.Paygude P., et al. A dataset revolutionizing Indian bay leaf analysis. Data Brief. 2024;57 doi: 10.1016/j.dib.2024.111024. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



