Skip to main content
Data in Brief logoLink to Data in Brief
. 2025 Oct 6;63:112130. doi: 10.1016/j.dib.2025.112130

PlantCity: A comprehensive image based on multi crop leaves in Pakistan

Muhammad Sheraz Khan a,, Kainat Nisa a, Irshad Ahmad a, Muhammad Zubair a, Kaznah Alshammari b
PMCID: PMC12550188  PMID: 41140863

Abstract

The PlantCity dataset addresses significant agricultural yield losses in Pakistan from plant diseases. It provides 10,667 high-resolution images of leaves from 12 key crops: apple, apricot, bean, cherry, maize, fig, grape, loquat, pear, tomato, walnut, and persimmon. The images are organized into 52 classes (41 diseased and 11 healthy) and augmented to a total of 52,273 images. Data was collected in real-field conditions in Charsadda (34.15°N, 71.74°E, typical temperature 40–44 °C) and Chitral (35.85°N, 71.79°E, typical temperature 25–30 °C) from April to July 2023–2024. The dataset enables the development of deep learning models for automated disease classification and captures a range of environmental factors, including high temperatures that can exacerbate disease symptoms. It utilizes smartphone-based computer vision to facilitate early disease identification, thereby supporting precision farming and sustainable agriculture in Pakistan.

Keywords: Plants leaf diseases, Multi-crop dataset, Precision Agriculture, Pakistan


Specifications Table

Subject Computer Sciences
Specific subject area Image Classification, Multi Class Leaves Disease Classification, Deep Learning, and Computer Vision.
Type of data Image.
Data collection Field surveys conducted from April to June in 2023 and 2024 for the collection of tomato leaf disease, and from June to July in 2023 and 2024 for the collection of the remaining 11 species, supervised by experts, ensured comprehensive image acquisition under various environmental conditions. The dataset comprises a collection of 10,667 images depicting various stages of different leaf diseases. These images are categorized into 52 classes. Each class represents a distinct expression of diseases, pests, or environmental stress in plant leaves.
Data source location Charsadda (Village Sarki) chosen for tomato and Chitral (Village Danin) for the other 11 crops, Khyber Pakhtunkhwa, Pakistan
Data accessibility Repository name: Mendeley Data
Data identification number: 10.17632/w8kh2xkspx.2
Direct URL to data: https://data.mendeley.com/datasets/w8kh2xkspx/1
Related research article None

1. Value of the Data

  • The PlantCity dataset is comprehensive, consisting of 10,667 high-resolution images across 52 classes (41 diseased, 11 healthy) from 12 crop species, collected from Charsadda (tomato disease symptoms) and Danin Chitral (selected for its agro-climatic suitability for fruit crops like walnut, apple, and grapes, high altitude conditions, and thriving in cooler conditions). Most current datasets (like PlantVillage and PlantDoc) are constrained in scope and are usually gathered under controlled laboratory conditions, which do not represent the difficulties of real-field environments. The majority of publicly accessible plant diseases lack real-world diversity since they were collected in laboratory settings. Although they are helpful for research, models developed using such datasets can hinder the application of real-world datasets for disease categorization in practice. This dataset includes various diseases that are not available in publicly accessible datasets.

  • The creation and training of complex algorithms for automated disease identification and classification are made possible by the availability of high-quality, tagged images of multiple plant leaves afflicted by different plant leaf diseases. Early illness detection with high accuracy and low computation time is nearly impossible with conventional methods. Therefore, it's critical to develop illness diagnosis techniques that can complete the work automatically, precisely, quickly, and affordably [1].

  • For agricultural scientists, legislators, and researchers in computer vision and machine learning, a comprehensive multi-class plant leaf disease dataset is an indispensable resource. It aids precision agriculture, supports the development of intelligent diagnostic systems, and informs data-driven decisions for crop protection and environmentally friendly agricultural methods. Commonly found in local agricultural areas, especially in Pakistan, the dataset comprises plants and serves as a foundation for advancing research and developing innovative solutions to pressing agricultural challenges.

2. Background

Over one-third of Pakistan's population relies on the agricultural industry for their livelihoods, making it a crucial pillar of the country's economy. For both domestic food security and export markets, crops such as apples, apricots, beans, cherries, maize, figs, grapes, loquats, pears, tomatoes, walnuts, and persimmons are essential. However, several pests and leaf diseases pose a constant threat to these crops, resulting in significant yield losses and financial difficulties for producers. These difficulties are exacerbated by factors such as inefficient disease management approaches and insufficient available datasets for identifying leaf diseases in regions like Khyber Pakhtunkhwa.

Existing datasets, such as PlantDoc [2], Plant Village [3], Plant Pathology [4], and Plant_K [5], are among the most popular datasets for multi-crop leaf disease. However, they primarily consist of images from laboratory environments, and some are based on binary classifications of diseased plants. Additionally, these datasets often cover fewer crop species or disease classes (e.g., PlantVillage: 14 species, 38 classes; PlantDoc: 13 species, 28 classes), limiting their applicability to diverse agricultural contexts. On the other hand, single-crop datasets, such as the Indian Bay Leaf dataset [6], address a more limited range of agricultural issues by focusing on a single species (Cinnamomum tamala, comprising 5966 pictures and 3 classes: healthy, dry, and diseased). The PlantCity dataset addresses these limitations by providing 10,667 high-resolution images across 12 economically vital crops and 52 classes (41 diseased and 11 healthy), augmented to 52,219 images. The images were taken at various times of the day, representing fluctuating environmental and lighting conditions. This dataset supports resilience and wealth in various farming systems worldwide by promoting sustainable agricultural methods.

3. Data Description

The PlantCity dataset, collected in Pakistan, comprises 10,667 high-resolution images of leaves from 12 economically significant crop species, including persimmon, apple, apricot, bean, cherry, maize, fig, grape, loquat, pear, tomato, and walnut. The dataset has been augmented to 52,273 images to enhance model robustness and is organized into 52 classes, comprising 41 disease classes and 11 healthy classes. The image collection took place under real-field conditions in Charsadda (34.15°N, 71.74°E; typical temperature: 40–44 °C) and Chitral (35.85°N, 71.79°E; typical temperature: 25–30 °C), Khyber Pakhtunkhwa, Pakistan, from April to July 2023. Charsadda serves as the primary region for tomato cultivation, owing to its fertile plains and well-developed irrigation system. However, the area also faces challenges from various tomato leaf diseases. The remaining 11 crops were selected from Chitral due to its established reputation as a center for fruit cultivation and its diverse agro-climatic conditions that facilitate optimal crop growth.

  • 1

    All 52 classes are listed in Table 1, along with the number of augmented and original images in each class. The variability of healthy and diseased leaves across 12 crops under various climatic conditions (40–44 °C in Charsadda and 25–30 °C in the Chitral mountainous regions) is depicted in Fig. 1, which features a sample image from each class. For instance, Table 2 highlights PlantCity's advantages, including its larger class count (52 classes), real-field image collection under diverse climatic circumstances, and its comprehensive coverage of crops and diseases compared to state-of-the-art datasets such as PlantVillage [3], PlantDoc [2], Plant Pathology [4], Plant_K [5], and the Indian Bay Leaf dataset [6]. PlantCity is particularly distinct because it includes an extensive number of crops and rare disease classes that are typically absent or under-represented in other datasets, such as walnut leaf gall mite (not found in PlantVillage walnut classes), cherry purple leaf spot (rare in existing datasets), fig blight (specific to real-field settings), and maize holcus leaf spot (uncommon in lab-based collections). Unlike some datasets that are limited to binary classification (healthy or diseased only), PlantCity captures a broader range of disease variability and environmental contexts. Newly included disease classes, previously missing from earlier datasets, are listed in Table 1.

Table 1.

PlantCity Dataset Classes and Image Counts.

Class Exist in Literature Images Augmented Total
Apple brown Spot No 265 1060 1325
Apple normal Yes 206 824 1030
Apple black Spot No 102 469 571
Apricot normal No 163 652 815
Apricot blight leaf disease No 90 312 402
Apricot shot hole No 208 832 1040
Bean fungal leaf disease No 194 776 970
Bean normal leaf No 223 900 1123
Bean rust No 124 500 624
Bean shot hole No 98 392 490
Cherry leaf scorch No 282 1101 1383
Cherry normal leaf Yes 125 500 625
Cherry brown spot No 242 968 1210
Cherry purple leaf spot Yes 240 987 1227
Cherry shot hole disease No 110 440 550
Maize fungal leaf No 83 332 415
Maize normal leaf Yes 142 568 710
Maize gray leaf spot Yes 132 528 660
Maize holcus leaf spot No 108 432 540
Fig bight leaf disease No 191 763 954
Fig brown spot No 107 428 535
Fig normal leaf No 150 600 750
Fig rust leaf No 125 500 625
Grape anthracnose leaf No 249 996 1245
Grape brown spot leaf No 160 640 800
Grape downy mildew leaf No 155 624 779
Grape mites leaf disease No 125 500 625
Grape normal leaf Yes 372 1485 1857
Grape powdery mildew leaf No 301 1208 1509
Grape shot hole leaf disease No 213 852 1065
Loquat normal leaf No 129 501 630
Pear black spot leaf disease No 226 904 1130
Pear normal leaf No 191 764 955
Pear fire blight No 95 380 475
Walnut anthracnose leaf disease No 162 648 810
Walnut blotch leaf disease No 339 1354 1693
Walnut normal leaf No 178 712 890
Walnut shot hole No 231 916 1147
Walnut leaf gall mite No 98 392 490
Loquat leaf spot No 198 792 990
persimmons brown spot No 166 663 829
tomato spider mites Yes 127 508 635
tomato verticillium wilt No 105 420 525
Tomato bacterial spot Yes 400 1305 1705
Tomato early blight Yes 400 1525 1925
Tomato healthy leaf Yes 400 1478 1878
Tomato late blight Yes 294 1112 1406
Tomato leaf curl No 413 1480 1893
Tomato leaf miner No 400 1497 1897
Tomato leaf mold No 400 1510 1910
Tomato septoria leaf Yes 341 1258 1599
tomato fusarium wilt No 89 318 407
Total samples 10,667 41,606 52,273

Fig. 1.

Fig. 1:

Sample images from the PlantCity dataset, which includes all 52 classes (41 diseased, 11 healthy) and 12 crop species, were taken in a variety of field circumstances, such as hot (40–44 °C in Charsadda plains) and cold (25–30 °C in Chitral mountains).

Table 2.

Comparison table with state of the art in multi crop leaves dataset.

Dataset Species Disease Class Environment Images
Plant Village [3] 14 22 38 Lab 54,306
PlantDoc [2] 13 17 28 internet downloads 2598
Plant Pathology [4] 12 10 20 Real-Field Plains 4503
Plant_K [5] 8 9 16 Lab 2157
Indian Bay Leaf [6] 1 2 3 Lab 5966
PlantCity 12 41 52 Real-Field (Plains/Mountains) 10,667

Note: PlantVillage dataset 54,306 images are the original count, though studies often apply augmentation (e.g., flips, rotations) during model training.

4. Experimental Design, Materials and Methods

This section discusses the general experimental design and dataset preparation approach. This procedure is depicted in Fig. 2.

Fig. 2.

Fig. 2:

Shows all the flow methodology of proposed dataset.

4.1. Objective

The dataset aims to develop deep learning models that can effectively recognize and classify diseases in images of plant leaves, facilitating the early identification and treatment of plant issues.

4.2. Location

Data were collected in two agro-ecologically distinct regions of Khyber Pakhtunkhwa, Pakistan: Charsadda (Village Sarki, 34.15°N, 71.74°E, with a typical temperature of 40–44 °C during collection) and Chitral (Village Danin, 35.85°N, 71.79°E, with temperatures ranging from 25 to 30 °C.

4.3. Data collection process

4.3.1. Camera device

The dataset was collected using a Canon EOS 1200D model with a high resolution of 5184×3456 at ISO settings of 1600/800, an Infinix Hot 10 model X682B with a resolution of 4608×3456, and an Infinix Hot 5 model X657B with a resolution of 1872×4160. Cameras were positioned at a distance of 30–50 cm from the leaves, at angles of 45–90°, under natural daylight (morning and afternoon, with care taken to avoid direct noon sunlight to minimize glare). These settings ensured consistent, high-quality images suitable for deep learning tasks.

4.3.2. Engagement with experts

Initial interactions with field workers and agricultural specialists verified the types and symptoms of plant leaf diseases as well as other crucial details.

4.3.3. On-Site collection

To capture images of tomato leaves exhibiting various diseases, we visited different tomato fields. To ensure a varied representation of the conditions and phases of disease progression, the collection was conducted over a period of 3 months in 2023 and 2024, encompassing 20–25 distinct cultivations in the district of Charsadda. The remaining dataset was collected over the 2 months of June and July 2023 and 2024 in District Chitral. Collections occurred in the morning (7–10 AM) and afternoon (2–5 PM) under typical environmental conditions: temperatures of 40–44 °C and humidity of 50–60 % in Charsadda, and 25–30 °C and 55–70 % in Chitral.

4.4. Data processing

Pre-processing enhances key features of image data, making it more suitable for analysis and increasing the overall accuracy and reliability of the dataset. All the images were captured at different resolutions using various devices, and the resolutions were too high for the deep learning task. Therefore, we have resized all the images to 1000×800 and 1200×800 resolutions.

4.5. Data augmentation

The multiclass plant leaf classification data technique is essential to the agriculture industry and its sustainability. To improve the data sets from various perspectives, we employ augmentation techniques and the imgaug library to apply a number of image augmentation methods. In particular, we flipped both horizontally and vertically to mimic changes in the target features' positions. To make the model more resilient to noisy inputs, additive Gaussian noise was also added to simulate sensor noise and actual image errors. Moreover, orientation variability was taken into account by applying random affine rotations between −45° and +45° (see Fig. 3).

Fig. 3.

Fig. 3:

All the augmentation on proposed dataset.

Limitations

The PlantCity dataset may not be as applicable to areas with varied climates or disease prevalence, as it is region-specific to Charsadda (34.15°N, 71.74°E, with a typical temperature of 40 to 44 °C) and Chitral (35.85°N, 71.79°E, with usual temperatures of 25–30 °C). In Charsadda, high temperatures may have exacerbated disease symptoms that might not be observed in colder climates, such as increased fungal development in maize gray leaf spot or more leaf withering in tomato early blight. Camera resolution differences (e.g., Canon EOS 1200D at 5184×3456 vs. Infinix Hot 5 at 1872×4160) could influence the consistency of feature recognition and, consequently, model performance. To cover a wider range of temperatures and illness profiles, future research could standardize camera settings and extend the collection to other areas, such as Punjab, which may impact the consistency of feature detection and, consequently, model performance.

Ethics Statement

Our study fulfills with Data in Brief's dataset ethics guidelines because it doesn't use any human or animal participants. Thus, be sure that ethical issues are being followed.

Credit Author Statement

Muhammad Sheraz Khan: Conceptualization, Visualization, Methodology, Data Curation, Writing - original draft; Kainat Nisa: Methodology, Data Curation, Writing - original draft; Irshad Ahmad: Conceptualization, Supervision, Methodology, Writing –review & editing. Muhammad Zubair: Supervision, Writing –review & editing; Kaznah Alshammari: Writing –review & editing.

Acknowledgements

The authors extend their appreciation to the Deanship of Scientific Research at Northern Border University, Arar, Kingdom of Saudi Arabia, for funding this research work through project number “NBU- FFR-2025–2467–04”.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Contributor Information

Muhammad Sheraz Khan, Email: msherazkhan8888527@gmail.com.

Irshad Ahmad, Email: irshad@icp.edu.pk.

Muhammad Zubair, Email: zubair@icp.edu.pk.

Kaznah Alshammari, Email: Khaznah.alshammari2@nbu.edu.sa.

Data Availability

References

  • 1.Adeel A., et al. Entropy-controlled deep features selection framework for grape leaf diseases recognition. Expert. Syst. 2020 doi: 10.1111/exsy.12569. May. [DOI] [Google Scholar]
  • 2.Singh D., et al. Proceedings of the 7th ACM IKDD CoDS and 25th COMAD. 2020. PlantDoc: a dataset for visual plant disease detection; pp. 249–253. [DOI] [Google Scholar]
  • 3.Mohanty S.P., Hughes D.P., Salathé M. Using deep learning for image-based plant disease detection. Front. Plant. Sci. 2016;7:1419. doi: 10.3389/fpls.2016.01419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chouhan S.S., et al. 2019 4th International Conference on Information Systems and Computer Networks (ISCON) IEEE; 2019. A data repository of leaf images: practice towards plant conservation with plant pathology; pp. 703–707. [DOI] [Google Scholar]
  • 5.Kour V.P., Arora S. Recent Innovations in Computing: Proceedings of ICRIC 2021, Volume 1. Springer; 2022. Plantaek: a leaf database of native plants of Jammu and Kashmir; pp. 359–368. [DOI] [Google Scholar]
  • 6.Paygude P., et al. A dataset revolutionizing Indian bay leaf analysis. Data Brief. 2024;57 doi: 10.1016/j.dib.2024.111024. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement


Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES