Table 1.
Dataset | Domain | Instances | Nodes | Edges | Average In-degree |
---|---|---|---|---|---|
Statlog (Australian Credit Approval) | Industry | 690 | 15 | 33 | 2.20 |
Breast Cancer | Biology | 699 | 10 | 20 | 2.00 |
Car Evaluation | Industry | 1,728 | 7 | 9 | 1.29 |
Cleveland Heart Disease | Biology | 303 | 14 | 22 | 1.57 |
Credit Approval | Industry | 690 | 16 | 35 | 2.19 |
Diabetes | Biology | 768 | 9 | 13 | 1.44 |
Glass Identification | Industry | 214 | 10 | 17 | 1.70 |
Statlog (Heart) | Biology | 270 | 14 | 21 | 1.50 |
Hepatitis | Biology | 155 | 20 | 36 | 1.80 |
Iris | Biology | 150 | 5 | 8 | 1.60 |
Nursery | Industry | 12,960 | 9 | 14 | 1.56 |
Statlog (Vehicle Silhouettes) | Industry | 846 | 19 | 40 | 2.11 |
Congressional Voting Records | Political | 436 | 17 | 46 | 2.71 |
This table describes all of the datasets we used in this study. Dataset gives the name of the dataset in the UCI machine learning repository. Domain gives a rough indication of the domain of the dataset. Instances gives the number of instances in the original dataset. Nodes gives the number of variables in the dataset (and the number of nodes in the corresponding Bayesian network). Edges gives the number of edges in the optimal Bayesian network learned from the original dataset. This is the gold standard network used throughout the rest of the evaluation. Average In - degree gives the average number of parents of each variable in the learned Bayesian network.