Abstract
This paper presents the CG-1050 dataset consisting of 100 original images, 1050 tampered images and their corresponding masks. The dataset is organized into four directories: original images, tampered images, mask images, and a description file. The directory of original images includes 15 color and 85 grayscale images. The directory of tampered images has 1050 images obtained through one of the following type of tampering: copy-move, cut-paste, retouching, and colorizing. The true mask between every pair of original and its tampered image is included in the mask directory (1380 masks). The description file shows the names of the images (i.e., original, tampered and mask), the image description, the photo location, the type of tampering, and the manipulated object in the image. With this dataset, the researchers can train and validate fake image classification methods, either for labelling the tampered image or for forgery pixel-detection.
Keywords: Fake image, Tampering detection, Forgery detection, Copy-move, Cut-paste, Retouching, Colorizing
Specifications Table
| Subject | Computer Vision and Pattern Recognition |
| Specific subject area | Image processing related to identify/classify tampered data |
| Type of data | Images Table |
| How data were acquired | The original images were captured with the camera Huawei ref. ANE-LX3 in the following places: street (51 photos), park (20 photos), touristic place (11), mall (8 photos), shop (4 photos), classroom (2 photos), parking lot (1 photo), room (1 photo), kitchen (1 photo), and playroom (1 photo). The tampered images were obtained by using Adobe Photoshop. |
| Data format | Raw: original images (JPEG) Modified (copy-move, cut-paste, retouching, colorizing): tampered images (JPEG) Mask images (PNG) |
| Parameters for data collection | Four types of tampering were performed: copy-move, cut-paste, retouching and colorizing |
| Description of data collection | The dataset is composed by four directories, organized as follows:
|
| Data source location | City: Bogotá Country: Colombia |
| Data accessibility | Repository name: Mendeley Data name: CG-1050: Original and tampered images (Color and grayscale) [1] Direct URL to data: https://doi.org/10.17632/dk84bmnyw9.2 |
Value of the Data
|
1. Data
The dataset is organized in four directories: Original images, Tampered images, Mask images, and a Description file [1].
Fig. 1 shows the structure of the dataset, which is explained below:
-
•
The directory of Original images includes 15 color and 85 grayscale images. All original images are in JPEG format.
-
•
The directory of Tampered images has 100 sub-directories (i.e., T_1 to T_100) with 1050 images obtained through one of the following types of tampering: copy-move, cut-paste, retouching, and colorizing. The first 50 sub-directories have 11 tampered images, each one; the last 50 sub-directories have 10 tampered images by directory. All tampered images are in JPEG format.
-
•
The directory named Mask has 100 sub-directories (i.e., Mask_1 to Mask_100) with their true masks obtained by each pair of original and tampered image. In the case of color images, every of the 15 sub-directories has 11 folders and 3 masks by folder, that is, 495 masks. In the case of grayscale images, every one of the first 35 sub-directories have 11 masks; and every one of the last 50 sub-directories have 10 masks, i.e., 885 masks. To sum up, the entire CG-1050 dataset has 1380 masks. In the mask image, the manipulated pixels are black and the unmodified pixels are white.
-
•
Finally, the directory named Description has an excel file with information about the dataset details: original images (i.e., photo name, image description, photo place), tampered images (i.e., folder name, type of tampering, tampered photo name, object, location), and mask (i.e., folder name, mask photo name).
Fig. 1.
CG-1050 dataset structure.
Fig. 2 shows an example of cut-paste manipulation for a color image, with the original, the tampered image, and the mask. Fig. 3 shows an example of a copy-move operation and associated images. Fig. 4, Fig. 5 show an example of grayscale images for colorizing and retouching. Table 1, Table 2, Table 3, Table 4 describe the information in Fig. 2, Fig. 3, Fig. 4.
Fig. 2.
An example of cut-paste manipulation: a) original image, b) tampered image, c) mask (green band).
Fig. 3.
An example of copy-move manipulation: a) original image, b) tampered image, c) mask.
Fig. 4.
An example of colorizing: a) original image, b) tampered image, c) mask.
Fig. 5.
An example of retouching: a) original image, b) tampered image, c) mask.
Table 1.
Data description of the cut-paste example.
| Original image | Photo name | Im_9 |
| Image description | Color 4608 × 3456 | |
| Photo place | Touristic place | |
| Tampered image | Folder name | T_9 |
| Type of Tampering | Cut-paste | |
| Photo name (tampered) | Im9_cmfr1.jpg | |
| Object | Light pole | |
| Location | Middle right | |
| Mask image | Folder name | Mask_9 |
| Photo name (mask) | Mask8_cmfr1.png |
Table 2.
Data description of the copy-move example.
| Original image | Photo name | Im_9 |
| Image description | Color 4608 × 3456 | |
| Photo place | Touristic place | |
| Tampered image | Folder name | T_9 |
| Type of Tampering | Copy-move | |
| Photo name (tampered) | Im9_cm1.jpg | |
| Object | Bush | |
| Location | Upper left | |
| Mask image | Folder name | Mask_9 |
| Photo name (mask) | Mask9_cm1.png |
Table 3.
Data description of the colorizing example.
| Original image | Photo name | Im_29 |
| Image description | Grayscale 4608 × 3456 | |
| Photo place | Park | |
| Tampered image | Folder name | T_29 |
| Type of Tampering | Colorizing | |
| Photo name (tampered) | Im29_col2.jpg | |
| Object | Sidewalk and a dress | |
| Location | Low right | |
| Mask image | Folder name | Mask_29 |
| Photo name (mask) | Mask29_col2.png |
Table 4.
Data description of the retouching example.
| Original image | Photo name | Im_80 |
| Image description | Grayscale 4608 × 3456 | |
| Photo place | Street | |
| Tampered image | Folder name | T_80 |
| Type of Tampering | Retouching | |
| Photo name (tampered) | Im80_r3.jpg | |
| Object | Lines of the road | |
| Location | Middle right | |
| Mask image | Folder name | Mask_80 |
| Photo name (mask) | Mask80_r3.png |
2. Experimental design, materials, and methods
The natural images were captured in the following places: street (51 photos), park (20 photos), touristic place (11), mall (8 photos), shop (4 photos), classroom (2 photos), parking lot (1 photo), room (1 photo), kitchen (1 photo), and playroom (1 photo). Size of the images are (3456 × 4608) or (4608 × 3456) pixels. For every original image, 10 to 11 tampered images (i.e., with copy-move [3,4], cut-paste [2], retouching [5,6] and colorizing [7,8]) are obtained.
Fig. 2 shows an example of cut-paste modification of a color image. The left plot is the original image, the middle plot is the tampered image, and the right plot is its corresponding mask (G band). The light pole located in the right side of Fig. 2b is the object copied from another image. Table 1 shows the details of these images found in the Description directory.
Fig. 3 shows another example of tampered color images. In this case, a copy-move modification is applied to the original image, pasting twice a bush. The true mask is presented in Fig. 3c. Table 2 shows the details of the original, tampered and mask images for this example.
For the third example, a grayscale image of the CG-1050 dataset is selected. The intensity of the girl's dress is changed as well as the intensity of the sidewalk. Fig. 4 shows the original image, the tampered image and its mask. Table 3 lists the details of these images.
The last example is shown in Fig. 5. The lines of the road are blurred, through a retouching effect. The tampered object is located in the middle right of the image (see Fig. 5c). Table 4 shows the details of this manipulation.
Acknowledgments
This work is supported by the “Universidad Militar Nueva Granada – Vicerrectoria de Investigaciones” under the grant IMP-ING-2936.
Conflict of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- 1.Castro M., Ballesteros D.M., Renza D. CG-1050: Original and tampered images (Color and grayscale), Mendeley Data, v2. 2019. [DOI] [PMC free article] [PubMed]
- 2.Zheng L., Zhang Y., Thing V.L. A survey on image tampering and its detection in real-world photos. J. Vis. Commun. Image Represent. 2019:380–399. [Google Scholar]
- 3.Huang H.Y., Ciou A.J. Copy-move forgery detection for image forensics using the superpixel segmentation and the Helmert transformation. EURASIP J. Image Video Process. 2019;1(2019):68. [Google Scholar]
- 4.Li Y., Zhou J. Fast and effective image copy-move forgery detection via hierarchical feature point matching. IEEE Trans. Inf. Forensics Secur. 2018;14:1307–1322. [Google Scholar]
- 5.Liu X., Xie L., Zhong B., Du J., Peng Q. Automatic facial flaw detection and retouching via discriminative structure tensor. IET Image Process. 2017;11:1068–1076. [Google Scholar]
- 6.Ortiz H., Renza D., Ballesteros D. Tampering detection on digital evidence for forensics purposes. Ingenieria y Ciencia. 2018;14(27):53–74. [Google Scholar]
- 7.Guo Y., Cao X., Zhang W., Wang R. Fake colorized image detection. IEEE Trans. Inf. Forensics Secur. 2018;13(8):1932–1944. [Google Scholar]
- 8.Zhuo L., Tan S., Zeng J., Lit B. IEEE Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2018. Fake colorized image detection with channel-wise convolution based deep-learning framework; pp. 733–736. [Google Scholar]





