Table 1:
Simulation study datasets. 30 replicates of each configuration were generated. Model architecture difficulties are designated by ‘E’ (easy), and ‘H’ (hard). Simulation method generation is designated as either ‘G’ (GAMETES), ‘C’ (custom script), or ‘G+C’ (GAMETES modified by custom script).
Simulated Data Group Description or Pattern of Association | Configurations | Config. Variations | Predictive Features | Total Features | Model Difficulty | Heritability | Instances | Simulation Method |
---|---|---|---|---|---|---|---|---|
2-way Pure Epistais (Core Datasets) Others marked by ‘*’ |
32 | - | 2 | 20 | E, H |
0.05, 0.1, 0.2, 0.4 |
200, 400, 800, 1600 |
G |
1-Feature Main Effect | 8 | - | 1 | 20 | E, | 0.05, | 1600 | G |
H | 0.1, | |||||||
0.2, | ||||||||
0.4 | ||||||||
2-Feature Additive Effect | 2 | 50:50, 75:25 |
2 | 20 | E | 0.4 | 1600 | G |
4-Feature Additive Effect | 1 | - | 1 | 20 | E | 0.4 | 1600 | G |
4-Feat. Additive | 2 | 50:50, | 2 | 20 | E | 0.4 | 1600 | G |
2-way Epistasis | 75:25 | |||||||
4-Feat. Heterogeneous | 2 | 50:50, | 2 | 20 | E | 0.4 | 1600 | G |
2-way Epistasis | 75:25 | |||||||
3-way Pure Epistasis | 1 | - | 3 | 20 | E | 0.2 | 1600 | G |
Number of Features* | 4 | 2 | 100, 1000, 10000, 100000 |
E | 0.4 | 1600 | G | |
Continuous Features* | 1 | - | 2 | 20 | E | 0.4 | 1600 | G+C |
Mix of Discrete and Continuous Features* | 1 | 2 | 20 | E | 0.4 | 1600 | G+C | |
Continuous Endpoint* | 3 | 0.2, 0.5, 0.8 | 2 | 20 | E | 0.4 | 1600 | G |
Continuous Endpoint* (1-Threshold Model) | 1 | “ | 2 | 20 | E | 0.4 | 1600 | G+C |
Missing Data* | 4 | 0.001, 0.01, | 2 | 20 | E | 0.4 | 1600 | G+C |
0.1, 0.5 | ||||||||
Imbalanced Data* | 2 | 0.6, 0.9 | 2 | 20 | E | 0.4 | 1600 | G |
Multi-class Endpoint | 2 | 3-class, | 2 | 20 | N/A | 1 | 1600 | C |
(Impure 2-way Epistasis) | 9-class | |||||||
XOR Model | 4 | 2-way, | 2 | 20 | N/A | 1 | 1600 | C |
(Pure Epistasis) | 3-way, | 3 | ||||||
4-way, | 4 | |||||||
5-way | 5 | |||||||
Multiplexer (MUX) | 6 | 6-bit → | 2 | 6 | 3-way | 1 | 500 | C |
(Pure Epistasis and | 11-bit → | 3 | 11 | 4-way | 1000 | |||
Heterogeneous | 20-bit → | 4 | 20 | 5-way | 2000 | |||
Associations) | 37-bit → | 5 | 37 | 6-way | 5000 | |||
70-bit → | 6 | 70 | 7-way | 10000 | ||||
135-bit → | 7 | 135 | 8-way | 20000 |