. Author manuscript; available in PMC: 2021 Jul 30.

Published in final edited form as: ACM Trans Intell Syst Technol. 2020 Jul 5;11(5):1–46. doi: 10.1145/3400066

Table 5.

List and description of computer vision datasets from Tables 2 and 3

	Computer Vision Datasets used for Domain Adaptation
MNIST[130]^a	This is a binary (mostly black and white, but actually grayscale due to anti-aliasing) handwritten digit dataset (digits 0–9), which stands for “modified NIST.” It is based on the National Institute of Standards and Technology’s (NIST) Special Database 1 and 3, one of which was easier than the other, so MNIST is a combination of the two that are size normalized to fit in a 20×20 box preserving the aspect ratio and centered in a 28×28 pixel image.
MNIST-M[73]^b	This is a modification of MNIST where the digits are blended with random patches from BSDS500 dataset color photos.
USPS[131]^c	This is another handwritten digit dataset (digits 0–9). It consists of handwritten zipcodes scanned and segmented by the U.S. Postal Service (USPS). They were size normalized to 16×16 pixels preserving the aspect ratio. The values are normalized to be between −1 and 1.
SVHN[175]^d	The Streetview House Numbers (SVHN) consists of single digits extracted from images of urban house numbers in Google Street View. The digits have been size normalized to 32×32 pixels.
SYN_N[73]^b	Ganin et al.[73] used Microsoft Windows fonts to create a synthetic digit dataset (“Syn Numbers”) consisting of 1–3 digit numbers with various positions, orientation, background color, stroke color, and amount of blur.
SYN_S[168]^e	This is a synthetic sign dataset created from modifications to Wikipedia pictograms of traffic signs. It consists of 100,000 images and 43 classes of signs.
GTSRB[224]^f	The German Traffic Signs Recognition Benchmark (GTSRB) is a dataset created from video taken driving around Germany. It consists of about 50,000 images and 43 classes of signs.
Office[202]^g	This dataset consists of 31 classes of objects in three different domains: Amazon (taken from its online website; medium resolution and studio lighting), DSLR (taken with a digital SLR camera; high resolution and in a real-world environment), and Webcam (taken with a 640×480 computer webcam; have noise, artifacts, and white balance issues). Note: due to Office’s small size, some networks [73, 199, 226] were pre-trained on ImageNet.

http://yann.lecun.com/exdb/mnist/

See Ganin’s website http://yaroslav.ganin.net/ for links to download.

This can be found on various sites and some Github repositories. One such place: https://web.stanford.edu/~hastie/ElemStatLearn/data.html

http://ufldl.stanford.edu/housenumbers

The synthetic dataset linked to on: http://graphics.cs.msu.ru/en/research/projects/imagerecognition/trafficsign

http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset

http://ai.bu.edu/adaptation.html