. 2017 Sep 27;5(1):2. doi: 10.1007/s13755-017-0023-z

Table 1.

The data sets used (ordered by # of training instances × # of attributes)

Name	# of training instances	# of attributes	# of test instances	# of classes
Mammographic Mass	673	6	288	2
Car	1209	6	519	4
Yeast	1038	8	446	10
German credit	700	20	300	2
Diabetic retinopathy debrecen	806	20	345	2
Parkinson speech	728	26	312	2
Abalone	2923	8	1254	28
Cardiotocography	1488	23	638	3
Wine quality	3425	11	1469	11
KR-vs-KP	2237	37	959	2
Arrhythmia	316	279	136	16
Waveform	3500	40	1500	3
Semeion	1115	256	478	10
Shuttle	43,500	9	14,500	7
Secom	1096	591	471	2
Madelon	1820	500	780	2
Arcene	100	10,000	100	2
Convex	8000	784	50,000	2
KDD09-appentency	35,000	230	15,000	2
Dexter	420	20,000	180	2
MNIST basic	12,000	784	50,000	10
ROT. MNIST + BI	12,000	784	50,000	10
Amazon	1050	10,000	450	49
Gisette	4900	5000	2100	2
CIFAR-10-small	10,000	3072	10,000	10
Dorothea	805	100,000	345	2
CIFAR-10	50,000	3072	10,000	10

Small data sets are shown in the first 16 rows. Large data sets are shown in the last 11 rows