Abstract
There are some classification methods that generate nuances in the final accuracy caused by objects positioning, framing and damage. These occurrences may result in a drop of accuracy in computer vision systems that were trained with structured static datasets and are intended to be used in day-to-day applications in which the images are not always as organized as the trained dataset, like some biometric classification systems such as iris and fingerprint. In this regard, this paper presents six image datasets processed with different methods to help researchers analyze the impact of object positioning, framing and damage in their taxonomies.
Keywords: Cocoa beans, Cut Test, Object positioning, Taxonomy evaluation
Specifications Table
Subject area | Agronomy, Image Processing |
More specific subject area | Crop Science |
Type of data | Images (JPG) |
How data was acquired | Different image processing methods, implemented with R language and MATLAB, were applied to an existing dataset of cut-test-classified cocoa beans. |
Data format | Raw (preprocessed .JPG image files.) |
Experimental factors | The methods were selected with the intent of generating datasets to facilitate the use and comparison of different classification methods and of the impact of the positioning of objects in images. |
Experimental features | The source dataset (presented inRef. [1]) was submitted to multiple research groups in Brazil and some tests (such as with fuzzy associative memories) returned different accuracies for the same objects as their position were algorithmically manipulated [2,3]. |
Data source location | Ilhéus, Bahia, Brazil. |
Data accessibility | Data is available at Mendeley Data, under the doi: <https://doi.org/10.17632/pcx7mj68yn.4> (also in Ref.[4]). |
Related research article | Felipe A. Santos (2019).“Modelagem de Um Sistema de Visão Computacional para a Classificação de Amêndoas de Cacau na Prova de Corte (Master's thesis)”. State University of Santa Cruz, Ilhéus, Brazil. |
Value of the Data
|
1. Data
Six image datasets of cut-test-classified cocoa beans were created from the research presented in Ref. [3]. Each dataset contains 14 classes (namely: Compartmentalized Brown, Compartmentalized White, Compartmentalized Partially Purple, Compartmentalized Purple, Compartmentalized Slatty, Plated Brown, Plated White, Plated Partially Purple, Plated Purple, Plated Slatty, Moldered, Flattened, Brittle and Agglutinated) with 100 images per class, totaling 1400 images per dataset. Fig. 1 presents an image of the source dataset and Fig. 2 presents the six results of the same beans accordingly with the preprocessing methods.
Fig. 1.
A Compartmentalized Brown bean from the source dataset.
Fig. 2.
The six versions created from the bean from Fig. 1.
2. Materials and methods
This section brings the explanation about the processing applied to the source image to produce the resulting datasets. Each section of this chapter will present the methods used to reach each of the six preprocessed versions. Those methods were obtained through empirical tests with a fuzzy associative memories implementation sensible to objects positioning, those six datasets versions presented six different accuracies for the same classification method, with a standard deviation of approximately 10.07% between them.
2.1. A-Method: Background Removed
Presented under the name of “background_removed_-_version_1_-_method_a.rar” in the repository, five steps (see Fig. 3) were applied to generate the images of this dataset, being:
-
i.
The CIELAB color space were used to remove the background;
-
ii.
The image was binarized, everything that was not pure RGB black were turned into pure RGB white;
-
iii.
All connect white pixels were labeled as regions and only the largest region of the step (ii) were preserved;
-
iv.
The (iii) result were applied as a mask in the source image.
Fig. 3.
(a) Source; (b) Background removed; (c) Binarized; (d) Preservation of largest body; (e) Application of mask in source image.
2.2. A-Method: Only image
To create this dataset all images resulted from the “ A-Method: Background Removed ” were cropped to the minimum rectangle capable of fitting each bean, resulting in images with varying widths and heights, as shown in Fig. 4. This dataset is under the name “framed_and_centralized_-_version_1_-_method_a.rar” in the repository.
Fig. 4.
(a) and (b) are images from the “A-Method: Background Removed” and (c) and (d) are their respective cropped images to the minimum fitting rectangle.
2.3. A-Method: Framed and centralized
All beans from “ A-Method: Background Removed ” were measured and then centralized in the smallest rectangle capable of fitting all beans, thus all images have the same dimensions: 3011x2851. Two samples of this process can be seen in Fig. 5. This dataset is under the name “framed_and_centralized_-_version_2_-_method_a.rar” in the repository.
Fig. 5.
(a) and (b) are images from “A-Method: Background Removed” and (c) and (d) are their respective centralized beans in the created frame.
2.4. B-Method: Background Removed
This dataset (named “ background_removed_-_version_2_-_method_b.rar ” in the repository) was created with additional steps to restore the damage caused to some beans during the background removal, as shown in Fig. 6, being:
-
i.
The CIELAB color space were used to remove the background;
-
ii.
The image was binarized, everything that was not pure RGB black were turned into pure RGB white;
-
iii.
All connect white pixels were labeled as regions and only the largest region of the step (ii) were preserved;
-
iv.
The image was inverted;
-
v.
All connect white pixels were labeled as regions and only the largest region of the step (iv) were preserved;
-
vi.
The image was inverted again;
-
vii.
The (vi) result were applied as a mask in the source image.
Fig. 6.
(a) Source; (b) Background removed; (c) Binarized; (d) Preservation of largest body; (e) Inverted; (f) Preservation of largest body; (g) Inverted; (h) Application of mask in source image.
Important to perceive that the process to restore damages to the beans also restore parts of the backgrounds that were contained inside them, as shown in Fig. 7, caused by hollow areas (such as broken ones) in the beans.
Fig. 7.
Sample of restored background.
2.5. B-Method: Only beam
The same process to create “ A-Method: Only Image ” were applied to the “ B-Method: Background Removed ” to create this dataset (see Fig. 8). This dataset is under the name “framed_and_centralized_-_version_3_-_method_b.rar” in the repository.
Fig. 8.
(a) and (b) are images from the “B-Method: Background Removed” and (c) and (d) are their respective cropped images to the minimum fitting rectangle.
2.6. B-Method: Framed and centralized
The same process to create “ A-Method: Framed and Centralized ” were applied to the “ B-Method: Background Removed ” images to create this dataset. The rectangle obtained were of the same dimensions as the one from “ A-Method: framed and centralized ” (3011x2851). Two samples of this process can be seen in Fig. 9. This dataset in under the name “framed_and_centralized_-_version_4_-_method_b.rar” in the repository.
Fig. 9.
(a) and (b) are images from “B-Method: Background Removed” and (c) and (d) are their respective centralized beans in the created frame.
Acknowledgments
This research was made in partnership with the Programa de Pós-Graduação em Modelagem Computacional em Ciência e Tecnologia (PPGMC) and with the Centro de Inovação do Cacau (CIC). The financial support was provided by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior and Fundação de Amparo à Pesquisa do Estado da Bahia.
Contributor Information
F.A. Santos, Email: fante.antunes@gmail.com.
E.S. Palmeira, Email: espalmeira@uesc.br.
G.Q. Jesus, Email: gildsonj@gmail.com.
Conflict of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- 1.Santos Felipe A., Palmeira Eduardo S., Jesus Gildson Q. An image dataset of cut-test-classified cocoa beans. Data in Brief. June 2019;24:103916. doi: 10.1016/j.dib.2019.103916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Santos Felipe A., Palmeira Eduardo S., Jesus Gildson Q. Color, structural and textural features for the classification of a cocoa beans image dataset using artificial neural network. In: XIV Workshop de Visão Computacional, 2018, Ilhéus. ANAIS WVC. 2018;2018:80–84. [Google Scholar]
- 3.Santos Felipe A. State University of Santa Cruz; Ilhéus, Brazil: 2019. Modelagem de Um Sistema de Visão Computacional para a Classificação de Amêndoas de Cacau na Prova de Corte. (Master's thesis) [Google Scholar]
- 4.Santos Felipe A. “Image Datasets of Cocoa Beans for Taxonomy Nuances Evaluation”. Mendeley Data. 2019;2 doi: 10.1016/j.dib.2019.104655. [DOI] [PMC free article] [PubMed] [Google Scholar]