Skip to main content
. 2012 Sep 11;13(Suppl 15):S14. doi: 10.1186/1471-2105-13-S15-S14

Table 1.

Summary of gold standard networks

Dataset Domain Instances Nodes Edges Average In-degree
Statlog (Australian Credit Approval) Industry 690 15 33 2.20
Breast Cancer Biology 699 10 20 2.00
Car Evaluation Industry 1,728 7 9 1.29
Cleveland Heart Disease Biology 303 14 22 1.57
Credit Approval Industry 690 16 35 2.19
Diabetes Biology 768 9 13 1.44
Glass Identification Industry 214 10 17 1.70
Statlog (Heart) Biology 270 14 21 1.50
Hepatitis Biology 155 20 36 1.80
Iris Biology 150 5 8 1.60
Nursery Industry 12,960 9 14 1.56
Statlog (Vehicle Silhouettes) Industry 846 19 40 2.11
Congressional Voting Records Political 436 17 46 2.71

This table describes all of the datasets we used in this study. Dataset gives the name of the dataset in the UCI machine learning repository. Domain gives a rough indication of the domain of the dataset. Instances gives the number of instances in the original dataset. Nodes gives the number of variables in the dataset (and the number of nodes in the corresponding Bayesian network). Edges gives the number of edges in the optimal Bayesian network learned from the original dataset. This is the gold standard network used throughout the rest of the evaluation. Average In - degree gives the average number of parents of each variable in the learned Bayesian network.