Abstract
Computer vision has emerged as a critical enabler of sustainable production in protected agriculture by offering efficient and non-invasive crop disease diagnosis. The development of accurate disease recognition models relies heavily on the availability of high-quality image datasets. This study introduces a tomato disease image dataset collected in 2024 from greenhouse facilities within a modern agricultural park in Sichuan Province, China. The dataset comprises 1026 high-resolution images, including 417 images of viral disease, 82 images of gray mold, and 527 images of bacterial wilt, totaling approximately 2.78 GB. Captured under real-world greenhouse conditions and from multiple angles and distances, the images effectively capture multi-scale phenotypic disease features. Manual annotation was conducted using the LabelImg tool under the guidance of plant pathology experts, with labeled regions covering leaves, fruits, and stems. Annotation files are stored in XML format, each corresponding to a specific image. This dataset is well-suited for research in disease classification, object detection, and phenotyping, and supports deep learning model training and cross-crop transfer learning applications.
Keywords: Tomato disease, Greenhouse, Object detection, Image classification, Computer vision
Specifications Table
| Subject | Computer Sciences |
| Specific subject area | Computer Vision, Deep learning, Plant Pathology, Agriculture |
| Type of data | Raw: jpg, Annotation: XML |
| Data collection | A total of 1026 tomato disease images were collected in 2024 from greenhouse facilities located in a modern agricultural park in Sichuan Province, China, using an iPhone 14 smartphone as the primary imaging device. To construct a multi-scale dataset suitable for complex environmental conditions, images of the same diseased plant were captured from varying angles and distances. Based on disease categories, the images were organized into separate folders. Manual annotation of distinct pathological features across the three disease types was conducted using the LabelImg tool, with all annotation files saved in XML format. The resulting dataset is compatible with mainstream deep learning frameworks and provides robust support for tomato disease classification and object detection tasks. |
| Data source location | Data collection was conducted at the Modern Agricultural Science and Technology Innovation Demonstration Park of the Sichuan Academy of Agricultural Sciences (30.7797° N, 104.2082° E), located in Sichuan Province, China. |
| Data accessibility | Repository name: Mendeley Data Data identification number: DOI: 10.17632/c2×8rynybg.1 Direct URL to data: https://data.mendeley.com/datasets/c2×8rynybg/1 |
| Related research article | None. |
1. Value of the Data
The dataset contains 1026 annotated images of tomato plants exhibiting three major disease types, collected in 2024 from greenhouse environments in Sichuan’s Modern Agricultural Demonstration Park. Plant pathology specialists manually labeled all samples. Its technical strengths are reflected in four areas:
-
•
This dataset provides high-resolution images categorized into three common tomato diseases, enabling researchers to train and evaluate deep learning models such as Transformers, ResNet, and VGG with robust and well-organized samples.
-
•
All images are manually annotated by plant pathology experts, covering multiple organs including leaves, fruits, and stems. These expert annotations support tasks such as object detection, lesion segmentation, and detailed phenotypic analysis.
-
•
The images were captured in real-world greenhouse environments under natural lighting conditions, with variations in angle, occlusion, and background complexity. This makes the dataset highly applicable for developing robust, real-time disease monitoring systems.
-
•
In addition to image content, each sample includes acquisition time and geolocation metadata (30.7797° N, 104.2082° E), allowing researchers to link disease occurrence with crop growth stages and local weather data, providing valuable insights for epidemiological modeling and disease forecasting.
2. Background
Tomato is a major economic crop in China. According to data from the FAO Statistical Database (FAOSTAT), China’s tomato cultivation area reached 1.1115 million hectares in 2020, with a production volume of 64.87 million tons, accounting for 22.00 % and 34.72 % of the global total, respectively, making China the world’s largest tomato producer [1]. As global agricultural production transitions toward intelligent greenhouse systems, tomato cultivation in protected environments has become a crucial strategy for mitigating climate change and ensuring food security. However, disease threats have intensified. For example, tomato yellow leaf curl virus (TYLCV) can cause yield losses ranging from 40 % to 90 %, and soil-borne diseases such as bacterial wilt may result in complete crop failure in continuously cropped greenhouses, leading to global economic losses exceeding USD 5 billion [2]. Traditional disease diagnosis relies heavily on manual field surveys, which are labor-intensive, subjective, and limited in scope—insufficient for large-scale operations. Computer vision technologies, powered by high-resolution imaging and deep learning algorithms, have overcome the spatial and temporal limitations of conventional approaches. With improved models such as YOLO and Transformer-based architectures, leaf disease classification accuracy for tomatoes has exceeded 90 % [3]. These techniques provide efficient and non-destructive diagnostic tools, significantly reducing the risk of pesticide overuse, and have become essential for ensuring the sustainable development of protected agriculture [4].
In recent years, several tomato disease image datasets have been released. For instance, Oni and Prama introduced the Comprehensive Tomato Leaf Dataset, which was collected from field environments in Bangladesh; however, the dataset is limited to only two categories, namely healthy leaves and diseased leaves, without providing more detailed disease-specific labels [5]. Similarly, Imtiaz et al. published the Tomato Leaf Dataset, comprising six classes of diseased and healthy leaf images captured under natural conditions, yet the dataset remains restricted to the leaf organ [6]. Furthermore, the Tomato-Village dataset attempted to establish an end-to-end framework for disease detection; however, the dataset was not collected under natural conditions and did not systematically represent tomato disease phenotypes across multiple organs and scales [7]. By contrast, our dataset includes three major tomato diseases collected under authentic greenhouse production conditions. In addition to categorical labeling of diseases, it provides precise annotations of the specific affected organs—including leaves, fruits, and stems. This design offers a robust foundation for developing tomato disease classification and object detection models, while also facilitating fine-grained phenotypic identification and analysis in image-based plant research.
3. Data Description
The dataset is organized into two main components: (1) a collection of high-resolution images capturing three common tomato diseases in greenhouse environments, and (2) a corresponding set of annotation files that delineate the diseased regions in each image. The image collection comprises 1026 JPG-format samples (approximately 2.78 GB) affected by three diseases: 417 images of tomato viral disease, 82 images of gray mold, and 527 images of bacterial wilt. All images were captured using an iPhone 14 smartphone between May and November 2024 in a tomato trial greenhouse located at the Modern Agriculture Innovation Park in Sichuan Province, China. Each image has a resolution of 3024 × 4023 pixels. The number of plant specimens corresponding to each disease category is summarized in Table 1.
Table 1.
Statistics on the number of images and plants.
| Disease type | No. of images | No. of plant specimens |
|---|---|---|
| Tomato Viral | 417 | ∼80 plants |
| Gray Mold | 82 | ∼15 plants |
| Bacterial Wilt | 527 | ∼80 plants |
| Total | 1026 | ∼175 plants |
To streamline data access and downstream analysis, each image file is categorized by disease type and follows a standardized naming convention: DiseaseType_CollectionDate_SerialNumber (e.g., Viral_0617_001.jpg for the first viral disease image collected on June 17). Summary statistics for each disease category are provided in Table 2.
Table 2.
Overview of diseases by tomato class.
| Name of class | Description | Visualization |
|---|---|---|
| Tomato Viral | Tomato virus diseases are primarily caused by Tomato yellow leaf curl virus (TYLCV) and Tomato chlorosis virus (ToCV) in our database. Infected plants often exhibit a range of characteristic symptoms, including leaf curling, chlorosis, brittleness, mosaic patterns, striping, fern-like leaf morphology, and stunted growth [[8], [9]]. | ![]() |
| Gray Mold | Tomato gray mold caused by Botrytis cinerea mainly occurs during the flowering and fruiting stages. It can infect flowers, fruits, leaves, and stems. Typical symptoms include water-soaked lesions, leaf V-shaped lesion, and grayish-brown mold growth. The pathogen enters the plant through wounds or directly invades healthy tissues, with disease severity increasing under high humidity conditions [[10], [11]]. | ![]() |
| Bacterial Wilt | Tomato bacterial wilt caused by Ralstonia solanacearum is a soil-borne disease. The disease tends to break out under high-temperature and high-humidity conditions. Typical symptoms include sudden wilting of the plant, downward drooping of leaves with green discoloration, vascular browning in the stem, milky bacterial ooze from cross-sections, and necrosis of whole plant at late stage [[12], [13]]. | ![]() |
This dataset includes region-level annotations for 1026 tomato disease images, manually labeled by plant pathology experts. The annotated areas span key plant organs such as leaves, fruits, and stems. To ensure maximum compatibility with standard computer vision frameworks, all annotations were carried out using the LabelImg tool in Pascal VOC format and stored as XML files. File names strictly correspond to their image counterparts (e.g., Viral_0617_001.xml for an image captured on June 17 showing viral symptoms). The dataset directory structure is detailed in Fig. 1.
Fig. 1.
Dataset directory structure.
4. Experimental Design, Materials and Methods
4.1. Materials and methods
The images in this dataset were collected from May to November 2024 inside a tomato greenhouse at the Modern Agriculture Demonstration Park, Sichuan Province, China (104.2082° N, 30.7797° E). Target subjects included tomato plants at different developmental stages, with particular focus on organs prone to disease—namely leaves, stems, and fruits. All images were captured under natural lighting conditions between 9:00 a.m. and 4:00 pm., using an iPhone 14 smartphone at a fixed resolution of 3024 × 4023 pixels in JPG format. To ensure coverage of multi-scale disease phenotypes, images were taken from various distances and angles (approximately 0.2–2.0 m), including organ-level close-ups, mid-range part-level views, and full-plant perspectives. No artificial lighting or scene arrangement was applied, thereby preserving the realistic environmental conditions typical of greenhouse production. Each image retains its original timestamp and geolocation metadata. A manual quality control step was performed post-acquisition to remove low-quality images (e.g., blurred or overexposed), ensuring the dataset’s consistency and practical value for downstream applications.
4.2. Environmental and climatic conditions
Tomato disease incidence is strongly influenced by climatic conditions. Weather data were continuously recorded from May to November 2024 using a mini meteorological station installed in the greenhouse. Disease development could be categorized into three phases:
-
•
May–June: 16–30 °C, 76–86 % relative humidity. Gray mold was dominant, with some viral symptoms also observed.
-
•
July–August: 24–35 °C, 65–96 % relative humidity. Viral diseases remained common, particularly affecting leaves and fruits.
-
•
September–November: 10–33 °C, 46–94 % relative humidity. High soil moisture from autumn rainfall promoted bacterial wilt outbreaks, resulting in widespread plant wilting.
This integration of climatic data with disease prevalence provides an ecological backdrop that enhances the representativeness and utility of the dataset.
4.3. Data annotation
To establish a high-quality object detection dataset for tomato disease research, each image in this study underwent detailed manual annotation of symptomatic regions. The annotations were performed using the open-source tool LabelImg and subsequently reviewed by plant pathology specialists to ensure both precision and consistency. Given the multi-scale nature of the imagery, we applied distinct annotation protocols depending on image perspective: for close-range images, bounding boxes were delineated at the organ level (e.g., affected leaf, fruit, or stem), whereas for medium- and long-range images, symptomatic regions were categorized and annotated according to the plant’s vertical structure—specifically, as Top, Middle, or Base zones (see Fig. 2).
Fig. 2.
Annotation methods for different shooting distances. (a) Leaves with virus disease captured in close range. (b) Fruit with gray mold disease captured in close range. (c) Stem with bacterial wilt disease captured in close range. (d) Top of bacterial wilt disease captured in a distant view.
This annotation strategy effectively integrates fine-grained symptom localization with whole-plant contextual information, making the dataset suitable for training multi-scale detection models. Annotations were stored in Pascal VOC-compliant XML files, with label names following a standardized “DiseaseType_Region” format (e.g., GrayMold_Leaf, Viral_Top), facilitating seamless data parsing, training pipeline integration, and cross-condition comparisons. A complete naming convention is provided in Table 3.
Table 3.
Complete naming convention.
| Tomato Viral Disease | Gray Mold | Bacterial Wilt | |
|---|---|---|---|
| Leaf | Viral_leaf | GrayMold_Leaf | Wilt_Leaf |
| Fruit | Viral_Fruit | GrayMold_Fruit | Wilt_Fruit |
| Stem | Viral_Stem | GrayMold_Stem | Wilt_Stem |
| Top | Viral_Top | GrayMold_Top | Wilt_Top |
| Middle | Viral_Middle | GrayMold_Middle | Wilt_Middle |
| Base | Viral_Base | GrayMold_Base | Wilt_Base |
Limitations
While the dataset developed in this study provides a valuable resource featuring multi-class tomato diseases with comprehensive multi-scale, multi-organ annotations, it also presents several limitations that warrant consideration:
-
•
Geographic and environmental limitations: All images were acquired from a single greenhouse facility in Sichuan Province, China. The lack of regional and cultivation-system diversity may hinder the generalizability of models across different growing environments.
-
•
Data imbalance: Due to uneven disease prevalence during data collection, bacterial wilt and viral disease samples are overrepresented compared to gray mold. This class imbalance could affect model performance, particularly for algorithms sensitive to sample distribution.
-
•
Modality constraints: The dataset currently includes only static RGB images. The absence of additional imaging modalities (e.g., infrared, multispectral) restricts its utility in multimodal research settings.
To address these limitations, future work will expand data acquisition to include different ecological regions and tomato cultivars. The incorporation of multimodal imaging technologies is also planned to enhance the dataset’s value for broader research applications.
Ethics Statement
This work does not involve human participants, animal experiments, or any personal or sensitive data. Therefore, no ethical approval was required for this study.
CRediT author statement
Yongbo Liu: Methodology, Writing–Original Draft, Review & Editing. Yuhang Zhu: Data curation, Writing–Review & Editing. Liang Hu: Writing–Review & Editing. Yao Huo: Data curation. Wenbo Gao: Methodology. Rongping Hu: Supervision. Peng He: Conceptualization, Methodology.
Acknowledgements
This work was supported by the “5+1” Strategic Program for Cutting-edge Agricultural Technologies – Research and Development of General-purpose Agricultural AI Technologies and Equipment (Grant No. 5+1QYGG006), Sichuan Provincial Financial Independent Innovation Special Project (Grant No. 2022ZZCX034).
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data Availability
Mendeley DataTomato Disease Dataset (Original data)
References
- 1.Y. Sun, J. He, F. Wei, W. Yang, Evaluation on the development and internationalcompetitiveness ofChina's tomato industry during the 13th Five-Year Plan period, China Cucurbit. Veg. 36(01) (2023) 112–116. 10.16861/j.cnki.zggc.2023.0011. [DOI]
- 2.Agarwal M., Singh A., Arjaria S., Sinha A., Gupta S. ToLeD: tomato leaf disease detection using Convolution Neural Network. Procedia Comput. Sci. 2020;167:293–301. doi: 10.1016/j.procs.2020.03.225. [DOI] [Google Scholar]
- 3.Dong Y., Liu L., Zhai X., Li W. Artificial intelligence in agricultural pest and disease management: current applications and future prospects. Advanc. Resourc. Res. 2025;5(2):971–986. doi: 10.50908/arr.5.2_971. [DOI] [Google Scholar]
- 4.Wang J., Zhang W., Liu L., Huang S. Summary of crop diseases andpests image recognition technology. Comput. Eng. Sci. 2014;36(07):1363–1370. [Google Scholar]
- 5.Oni M.K., Prama T.T. A comprehensive dataset of tomato leaf images for disease analysis in Bangladesh. Data Brief. 2025 doi: 10.1016/j.dib.2025.111327. [pre-proof] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Imtiaz A., Swapnil F.B.I., Masud S.R., Karmaker D. Tomato leaf dataset: a dataset for multiclass disease detection and classification. Data Brief. 2025;60 doi: 10.1016/j.dib.2025.111520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Gehlot M., Saxena R., Gandhi G.C. Tomato-Village”: a dataset for end-to-end tomato disease detection in a real-world environment. Multimedia Syst. 2023;29:3305–3328. doi: 10.1007/s00530-023-01158-y. [DOI] [Google Scholar]
- 8.Ahmed N., Zaidi S.S., Amin I., Scheffler B.E., Mansoor S. Tomato leaf curl Oman virus and associated betasatellite causing leaf curl disease in tomato in Pakistan. Eur. J. Plant Pathol. 2021;160(2):249–257. doi: 10.1007/s10658-021-02242-7. [DOI] [Google Scholar]
- 9.Li J., Wang J., Ding T., Chu D. Synergistic effects of a Tomato chlorosis virus and Tomato yellow leaf curl virus mixed infection on host tomato plants and the whitefly vector. Front. Plant Sci. 2021;12 doi: 10.3389/fpls.2021.672400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Rhouma A., Hajji-Hedfi L., Kouadri M., Atallaoui K., Matrood A., Khrieba M. Botrytis cinerea: the cause of tomatoes gray mold. Egypt. J. Phytopathol. 2023;51(2):68–75. doi: 10.21608/ejp.2023.224842.1101. [DOI] [Google Scholar]
- 11.Sun L., Chen Y., Liu S., Ou X., Wang Y., Zhao Z., Tang R., Yan Y., Zeng X., Feng S., Zhang T., Li Z., Jian W. Biocontrol performance of a novel Bacillus velezensis L33a on tomato gray mold and its complete genome sequence analysis. Postharvest Biol. Technol. 2024;213 doi: 10.1016/j.postharvbio.2024.112925. [DOI] [Google Scholar]
- 12.Wu S., Su H., Gao F., Yao H., Fan X., Zhao X., Li Y. An insight into the prevention and control methods for bacterial wilt disease in tomato plants. Agronomy. 2023;13(12):3025. doi: 10.3390/agronomy13123025. [DOI] [Google Scholar]
- 13.Balamurugan A., Kumar A., Muthamilan M., Sakthivel K., Vibhuti M., Ashajyothi M., Sheoran N., Kamalakannan A., Shanthi A., Arumugam T. Outbreak of tomato wilt caused by Ralstonia solanacearum in Tamil Nadu, India and elucidation of its genetic relationship using multilocus sequence typing (MLST) Eur. J. Plant Pathol. 2018;151(3):831–839. doi: 10.1007/s10658-017-1414-3. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Mendeley DataTomato Disease Dataset (Original data)





