Skip to main content
. Author manuscript; available in PMC: 2019 Sep 1.
Published in final edited form as: J Biomed Inform. 2018 Jul 17;85:168–188. doi: 10.1016/j.jbi.2018.07.015

Table 1:

Simulation study datasets. 30 replicates of each configuration were generated. Model architecture difficulties are designated by ‘E’ (easy), and ‘H’ (hard). Simulation method generation is designated as either ‘G’ (GAMETES), ‘C’ (custom script), or ‘G+C’ (GAMETES modified by custom script).

Simulated Data Group Description or Pattern of Association Configurations Config. Variations Predictive Features Total Features Model Difficulty Heritability Instances Simulation Method

2-way Pure Epistais
(Core Datasets)
Others marked by ‘*’
32 - 2 20 E,
H
0.05,
0.1,
0.2,
0.4
200,
400,
800,
1600
G

1-Feature Main Effect 8 - 1 20 E, 0.05, 1600 G

H 0.1,
0.2,
0.4

2-Feature Additive Effect 2 50:50,
75:25
2 20 E 0.4 1600 G

4-Feature Additive Effect 1 - 1 20 E 0.4 1600 G

4-Feat. Additive 2 50:50, 2 20 E 0.4 1600 G
2-way Epistasis 75:25

4-Feat. Heterogeneous 2 50:50, 2 20 E 0.4 1600 G
2-way Epistasis 75:25

3-way Pure Epistasis 1 - 3 20 E 0.2 1600 G

Number of Features* 4 2 100,
1000,
10000,
100000
E 0.4 1600 G

Continuous Features* 1 - 2 20 E 0.4 1600 G+C

Mix of Discrete and Continuous Features* 1 2 20 E 0.4 1600 G+C

Continuous Endpoint* 3 0.2, 0.5, 0.8 2 20 E 0.4 1600 G

Continuous Endpoint* (1-Threshold Model) 1 2 20 E 0.4 1600 G+C

Missing Data* 4 0.001, 0.01, 2 20 E 0.4 1600 G+C
0.1, 0.5

Imbalanced Data* 2 0.6, 0.9 2 20 E 0.4 1600 G

Multi-class Endpoint 2 3-class, 2 20 N/A 1 1600 C
(Impure 2-way Epistasis) 9-class

XOR Model 4 2-way, 2 20 N/A 1 1600 C
(Pure Epistasis) 3-way, 3
4-way, 4
5-way 5

Multiplexer (MUX) 6 6-bit → 2 6 3-way 1 500 C
(Pure Epistasis and 11-bit → 3 11 4-way 1000
Heterogeneous 20-bit → 4 20 5-way 2000
Associations) 37-bit → 5 37 6-way 5000
70-bit → 6 70 7-way 10000
135-bit → 7 135 8-way 20000