Skip to main content
. 2017 Sep 27;5(1):2. doi: 10.1007/s13755-017-0023-z

Table 1.

The data sets used (ordered by # of training instances × # of attributes)

Name # of training instances # of attributes # of test instances # of classes
Mammographic Mass 673 6 288 2
Car 1209 6 519 4
Yeast 1038 8 446 10
German credit 700 20 300 2
Diabetic retinopathy debrecen 806 20 345 2
Parkinson speech 728 26 312 2
Abalone 2923 8 1254 28
Cardiotocography 1488 23 638 3
Wine quality 3425 11 1469 11
KR-vs-KP 2237 37 959 2
Arrhythmia 316 279 136 16
Waveform 3500 40 1500 3
Semeion 1115 256 478 10
Shuttle 43,500 9 14,500 7
Secom 1096 591 471 2
Madelon 1820 500 780 2
Arcene 100 10,000 100 2
Convex 8000 784 50,000 2
KDD09-appentency 35,000 230 15,000 2
Dexter 420 20,000 180 2
MNIST basic 12,000 784 50,000 10
ROT. MNIST + BI 12,000 784 50,000 10
Amazon 1050 10,000 450 49
Gisette 4900 5000 2100 2
CIFAR-10-small 10,000 3072 10,000 10
Dorothea 805 100,000 345 2
CIFAR-10 50,000 3072 10,000 10

Small data sets are shown in the first 16 rows. Large data sets are shown in the last 11 rows