. Author manuscript; available in PMC: 2014 Jun 1.

Published in final edited form as: J Biomed Inform. 2013 Apr 4;46(3):480–496. doi: 10.1016/j.jbi.2013.03.008

Table 2.

Summary of datasets used in our experiments, where the class distribution (i.e., the percentage of positive and negative outcome variables) has been listed as reference.

Dataset	Dataset description	# of covariates	# of samples	Class distribution (positive/negative)
1	Simulated i.i.d. data	5	500	0.618 / 0.382
2	Simulated correlated data	6	500	0.764 / 0.236
3	Simulated i.i.d. data	15	1500	0.641 / 0.359
4	Simulated correlated data	15	1500	0.651 / 0.349
5	Simulated binary data	5	500	0.846 / 0.154
6	Simulated binary data	15	1500	0.726 / 0.274
7	Biomarker (CA-19 and CA-125)	2	141	0.638 / 0.362
8	Low birth weight study	8	488	0.309 / 0.691
9	UMASS aids research	8	575	0.256 / 0.744
10	Mammography experience study	8	412	0.432 / 0.568
11	Myocardial infarction	9	1253	0.219 / 0.781