Skip to main content
. 2021 Aug 28;23(9):1123. doi: 10.3390/e23091123

Table A1.

A list of 79 datasets from the UCI repository [41] that contained 500 or more instances.

Dataset Number of Attributes Number of Instances
ada_agnostic 49 4562
ada_prior 15 4562
analcatdata_authorship 71 841
analcatdata_dmft 5 797
analcatdata_halloffame 18 1340
anneal 39 898
anneal.ORIG 39 898
australian 15 690
balance-scale 5 625
breast-w 10 699
car 7 1728
cardiotocography 23 2126
CH 37 3196
cmc 10 1473
cps_85_wages 11 534
credit-a 16 690
credit-g 21 1000
csb_ch12 7 1601
csb_ch9 4 3240
cylinder-bands 40 540
diabetes 9 768
eucalyptus 20 736
eye_movements 28 10,936
genresTrain 192 12,495
gina_agnostic 971 3468
gina_prior 785 3468
gina_prior2 785 3468
HY 26 3163
hypothyroid 30 3772
ilpd 11 583
irish 6 500
jm1 22 10,885
kc1 22 2109
kc2 22 522
kdd_ipums_la_97-small 61 7019
kdd_ipums_la_98-small 61 7485
kdd_ipums_la_99-small 61 8844
kdd_synthetic_control 62 600
kropt 7 28,056
kr-vs-kp 37 3196
landsat 37 6435
letter 17 20,000
mammographic_masses 6 961
mc1 39 9466
mfeat-factors 217 2000
mfeat-fourier 77 2000
mfeat-karhunen 65 2000
mfeat-morphological 7 2000
mfeat-pixel 241 2000
mfeat-zernike 48 2000
mozilla4 6 15,545
MU 23 8124
mushroom 23 8124
nursery 9 12,960
optdigits 65 5620
page-blocks 11 5473
pc1 22 1109
pc3 38 1563
pc4 38 1458
pendigits 17 10,992
scopes-bf 21 621
SE 26 3163
segment 20 2310
sick 30 3772
soybean 36 683
spambase 58 4601
splice 62 3190
sylva_agnostic 217 14,395
sylva_prior 109 14,395
ticdata_categ 86 5822
tic-tac-toe 10 958
titanic 4 2201
train 15 5000
vehicle 19 846
visualizing_fly 2 823
vowel 14 990
waveform-5000 41 5000
wisconsin-diagnostic 31 569
yeast 9 1484