A dataset of 1050-tampered color and grayscale images (CG-1050)

Maikol Castro; Dora M Ballesteros; Diego Renza

doi:10.1016/j.dib.2019.104864

. 2019 Nov 21;28:104864. doi: 10.1016/j.dib.2019.104864

A dataset of 1050-tampered color and grayscale images (CG-1050)

Maikol Castro ¹, Dora M Ballesteros ^1,^∗, Diego Renza ¹

PMCID: PMC6909131 PMID: 31872002

Abstract

This paper presents the CG-1050 dataset consisting of 100 original images, 1050 tampered images and their corresponding masks. The dataset is organized into four directories: original images, tampered images, mask images, and a description file. The directory of original images includes 15 color and 85 grayscale images. The directory of tampered images has 1050 images obtained through one of the following type of tampering: copy-move, cut-paste, retouching, and colorizing. The true mask between every pair of original and its tampered image is included in the mask directory (1380 masks). The description file shows the names of the images (i.e., original, tampered and mask), the image description, the photo location, the type of tampering, and the manipulated object in the image. With this dataset, the researchers can train and validate fake image classification methods, either for labelling the tampered image or for forgery pixel-detection.

Keywords: Fake image, Tampering detection, Forgery detection, Copy-move, Cut-paste, Retouching, Colorizing

Specifications Table

Subject	Computer Vision and Pattern Recognition
Specific subject area	Image processing related to identify/classify tampered data
Type of data	Images Table
How data were acquired	The original images were captured with the camera Huawei ref. ANE-LX3 in the following places: street (51 photos), park (20 photos), touristic place (11), mall (8 photos), shop (4 photos), classroom (2 photos), parking lot (1 photo), room (1 photo), kitchen (1 photo), and playroom (1 photo). The tampered images were obtained by using Adobe Photoshop.
Data format	Raw: original images (JPEG) Modified (copy-move, cut-paste, retouching, colorizing): tampered images (JPEG) Mask images (PNG)
Parameters for data collection	Four types of tampering were performed: copy-move, cut-paste, retouching and colorizing
Description of data collection	The dataset is composed by four directories, organized as follows: 1 Original: 100 images (i.e., 15 color and 85 grayscale images) 2 Tampered: 1050 images (i.e., 338 images with copy-move operation, 50 images of cut-paste forgery, 308 retouched images and 354 colorized images) 3 Mask: 100 sub-directories with the corresponding masks between the pairs of original and tampered images 4 Description: a spreadsheet file with the description of the dataset content
Data source location	City: Bogotá Country: Colombia
Data accessibility	Repository name: Mendeley Data name: CG-1050: Original and tampered images (Color and grayscale) [1] Direct URL to data: https://doi.org/10.17632/dk84bmnyw9.2

Open in a new tab

Value of the Data

•
All the original images are real photos captured in different indoor/outdoor places. The tampered images are created using Adobe Photoshop, providing a natural effect not obvious to the human eye. The modified pixels correspond to realistic regions instead of fixed blocks.
•
Most of the tampered image datasets available for benchmarking are focused on only one or two types of tampering [2], for example, the IMD and the MICC-F600 for copy-move operation, or the CASIA v2.0 dataset for copy–move and cut–paste manipulations. Unlike the above, our dataset includes the following manipulations: copy–move, cut–paste, retouching and colorizing. This allows training and validating image tampering detection models for a wider scenario.
•
Some of the tampered image datasets available for benchmarking do not include the true mask [2], like the Columbia gray, the CASIA, and the MICC-F2000. In our dataset, for every tampered image, the true mask is provided; in addition, for color images, there is a mask for every color band. This allows evaluating the accuracy of forgery-pixel detection methods.
•
The ratio between the number of tampered images and the number of and original images is 10/1, being higher than the ratio in other datasets such as COVERAGE (1/1), MICC-F600 (4/11), and CASIA v2.0 (5/7). This characteristic is useful to avoid overfitting, as the model is trained with several examples of tampered images by each original image.

Open in a new tab

1. Data

The dataset is organized in four directories: Original images, Tampered images, Mask images, and a Description file [1].

Fig. 1 shows the structure of the dataset, which is explained below:

•
The directory of Original images includes 15 color and 85 grayscale images. All original images are in JPEG format.
•
The directory of Tampered images has 100 sub-directories (i.e., T_1 to T_100) with 1050 images obtained through one of the following types of tampering: copy-move, cut-paste, retouching, and colorizing. The first 50 sub-directories have 11 tampered images, each one; the last 50 sub-directories have 10 tampered images by directory. All tampered images are in JPEG format.
•
The directory named Mask has 100 sub-directories (i.e., Mask_1 to Mask_100) with their true masks obtained by each pair of original and tampered image. In the case of color images, every of the 15 sub-directories has 11 folders and 3 masks by folder, that is, 495 masks. In the case of grayscale images, every one of the first 35 sub-directories have 11 masks; and every one of the last 50 sub-directories have 10 masks, i.e., 885 masks. To sum up, the entire CG-1050 dataset has 1380 masks. In the mask image, the manipulated pixels are black and the unmodified pixels are white.
•
Finally, the directory named Description has an excel file with information about the dataset details: original images (i.e., photo name, image description, photo place), tampered images (i.e., folder name, type of tampering, tampered photo name, object, location), and mask (i.e., folder name, mask photo name).

Fig. 2 shows an example of cut-paste manipulation for a color image, with the original, the tampered image, and the mask. Fig. 3 shows an example of a copy-move operation and associated images. Fig. 4, Fig. 5 show an example of grayscale images for colorizing and retouching. Table 1, Table 2, Table 3, Table 4 describe the information in Fig. 2, Fig. 3, Fig. 4.

Fig. 4 — An example of colorizing: a) original image, b) tampered image, c) mask.

Fig. 5 — An example of retouching: a) original image, b) tampered image, c) mask.

Table 1.

Data description of the cut-paste example.

Original image	Photo name	Im_9
	Image description	Color 4608 × 3456
	Photo place	Touristic place
Tampered image	Folder name	T_9
	Type of Tampering	Cut-paste
	Photo name (tampered)	Im9_cmfr1.jpg
	Object	Light pole
	Location	Middle right
Mask image	Folder name	Mask_9
Mask image	Photo name (mask)	Mask8_cmfr1.png

Open in a new tab

Table 2.

Data description of the copy-move example.

Original image	Photo name	Im_9
	Image description	Color 4608 × 3456
	Photo place	Touristic place
Tampered image	Folder name	T_9
	Type of Tampering	Copy-move
	Photo name (tampered)	Im9_cm1.jpg
	Object	Bush
	Location	Upper left
Mask image	Folder name	Mask_9
Mask image	Photo name (mask)	Mask9_cm1.png

Open in a new tab

Table 3.

Data description of the colorizing example.

Original image	Photo name	Im_29
	Image description	Grayscale 4608 × 3456
	Photo place	Park
Tampered image	Folder name	T_29
	Type of Tampering	Colorizing
	Photo name (tampered)	Im29_col2.jpg
	Object	Sidewalk and a dress
	Location	Low right
Mask image	Folder name	Mask_29
Mask image	Photo name (mask)	Mask29_col2.png

Open in a new tab

Table 4.

Data description of the retouching example.

Original image	Photo name	Im_80
	Image description	Grayscale 4608 × 3456
	Photo place	Street
Tampered image	Folder name	T_80
	Type of Tampering	Retouching
	Photo name (tampered)	Im80_r3.jpg
	Object	Lines of the road
	Location	Middle right
Mask image	Folder name	Mask_80
Mask image	Photo name (mask)	Mask80_r3.png

Open in a new tab

2. Experimental design, materials, and methods

The natural images were captured in the following places: street (51 photos), park (20 photos), touristic place (11), mall (8 photos), shop (4 photos), classroom (2 photos), parking lot (1 photo), room (1 photo), kitchen (1 photo), and playroom (1 photo). Size of the images are (3456 × 4608) or (4608 × 3456) pixels. For every original image, 10 to 11 tampered images (i.e., with copy-move [3,4], cut-paste [2], retouching [5,6] and colorizing [7,8]) are obtained.

Fig. 2 shows an example of cut-paste modification of a color image. The left plot is the original image, the middle plot is the tampered image, and the right plot is its corresponding mask (G band). The light pole located in the right side of Fig. 2b is the object copied from another image. Table 1 shows the details of these images found in the Description directory.

Fig. 3 shows another example of tampered color images. In this case, a copy-move modification is applied to the original image, pasting twice a bush. The true mask is presented in Fig. 3c. Table 2 shows the details of the original, tampered and mask images for this example.

For the third example, a grayscale image of the CG-1050 dataset is selected. The intensity of the girl's dress is changed as well as the intensity of the sidewalk. Fig. 4 shows the original image, the tampered image and its mask. Table 3 lists the details of these images.

The last example is shown in Fig. 5. The lines of the road are blurred, through a retouching effect. The tampered object is located in the middle right of the image (see Fig. 5c). Table 4 shows the details of this manipulation.

Acknowledgments

This work is supported by the “Universidad Militar Nueva Granada – Vicerrectoria de Investigaciones” under the grant IMP-ING-2936.

Conflict of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

1.Castro M., Ballesteros D.M., Renza D. CG-1050: Original and tampered images (Color and grayscale), Mendeley Data, v2. 2019. [DOI] [PMC free article] [PubMed]
2.Zheng L., Zhang Y., Thing V.L. A survey on image tampering and its detection in real-world photos. J. Vis. Commun. Image Represent. 2019:380–399. [Google Scholar]
3.Huang H.Y., Ciou A.J. Copy-move forgery detection for image forensics using the superpixel segmentation and the Helmert transformation. EURASIP J. Image Video Process. 2019;1(2019):68. [Google Scholar]
4.Li Y., Zhou J. Fast and effective image copy-move forgery detection via hierarchical feature point matching. IEEE Trans. Inf. Forensics Secur. 2018;14:1307–1322. [Google Scholar]
5.Liu X., Xie L., Zhong B., Du J., Peng Q. Automatic facial flaw detection and retouching via discriminative structure tensor. IET Image Process. 2017;11:1068–1076. [Google Scholar]
6.Ortiz H., Renza D., Ballesteros D. Tampering detection on digital evidence for forensics purposes. Ingenieria y Ciencia. 2018;14(27):53–74. [Google Scholar]
7.Guo Y., Cao X., Zhang W., Wang R. Fake colorized image detection. IEEE Trans. Inf. Forensics Secur. 2018;13(8):1932–1944. [Google Scholar]
8.Zhuo L., Tan S., Zeng J., Lit B. IEEE Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2018. Fake colorized image detection with channel-wise convolution based deep-learning framework; pp. 733–736. [Google Scholar]

[bib1] 1.Castro M., Ballesteros D.M., Renza D. CG-1050: Original and tampered images (Color and grayscale), Mendeley Data, v2. 2019. [DOI] [PMC free article] [PubMed]

[bib2] 2.Zheng L., Zhang Y., Thing V.L. A survey on image tampering and its detection in real-world photos. J. Vis. Commun. Image Represent. 2019:380–399. [Google Scholar]

[bib3] 3.Huang H.Y., Ciou A.J. Copy-move forgery detection for image forensics using the superpixel segmentation and the Helmert transformation. EURASIP J. Image Video Process. 2019;1(2019):68. [Google Scholar]

[bib4] 4.Li Y., Zhou J. Fast and effective image copy-move forgery detection via hierarchical feature point matching. IEEE Trans. Inf. Forensics Secur. 2018;14:1307–1322. [Google Scholar]

[bib5] 5.Liu X., Xie L., Zhong B., Du J., Peng Q. Automatic facial flaw detection and retouching via discriminative structure tensor. IET Image Process. 2017;11:1068–1076. [Google Scholar]

[bib6] 6.Ortiz H., Renza D., Ballesteros D. Tampering detection on digital evidence for forensics purposes. Ingenieria y Ciencia. 2018;14(27):53–74. [Google Scholar]

[bib7] 7.Guo Y., Cao X., Zhang W., Wang R. Fake colorized image detection. IEEE Trans. Inf. Forensics Secur. 2018;13(8):1932–1944. [Google Scholar]

[bib8] 8.Zhuo L., Tan S., Zeng J., Lit B. IEEE Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2018. Fake colorized image detection with channel-wise convolution based deep-learning framework; pp. 733–736. [Google Scholar]

PERMALINK

A dataset of 1050-tampered color and grayscale images (CG-1050)

Maikol Castro

Dora M Ballesteros

Diego Renza

Abstract

1. Data

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Table 1.

Table 2.

Table 3.

Table 4.

2. Experimental design, materials, and methods

Acknowledgments

Conflict of Interest

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A dataset of 1050-tampered color and grayscale images (CG-1050)

Maikol Castro

Dora M Ballesteros

Diego Renza

Abstract

1. Data

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Table 1.

Table 2.

Table 3.

Table 4.

2. Experimental design, materials, and methods

Acknowledgments

Conflict of Interest

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases