Table II.
Comparison of all six methods with respect to the average number of true and noise variables selected under all case-prevalence levels, all scenarios for data configuration, and all missing data patterns.
Scenario | Missing pattern |
GLM | t-test | CART | Random Forest |
LASSO | Elastic Net |
Imputed LASSO |
---|---|---|---|---|---|---|---|---|
Prevalence = 0.5 | ||||||||
C1.Independent | Complete | 9.5 (3.7) | 9 | 4.1 (2.8) | 7.3 | 9.3 (4) | 9.3 (4.8) | – |
M1 | 7.3 (6.6) | 8.9 | 3.9 (2.8) | 7.4 | 7 (7.8) | 7.2 (8.3) | 9.1 (4.2) | |
M2 | 7.2 (7) | 8.7 | 3.7 (2.6) | 6 | 7.6 (6.8) | 7.8 (8.2) | 8.8 (4.4) | |
C2.Correlated (True, Noise) | Complete | 6.4 (3.8) | 6.4 | 3.6 (4) | 6 | 9 (6.7) | 9.2 (8.2) | – |
M1 | 4.3 (7.3) | 6.3 | 3.5 (4) | 5.9 | 7.3 (8) | 7.7 (9.4) | 8.7 (7.1) | |
M2 | 4 (6.8) | 6.3 | 3.3 (3.6) | 3.8 | 7.1 (9.3) | 7.5 (10.8) | 8.4 (6.6) | |
C3.Correlated (True, True) | Complete | 2.6 (3.8) | 10 | 3.3 (2.8) | 10 | 6.2 (0.9) | 8.4 (0.7) | – |
M1 | 2.3 (7.5) | 10 | 3.3 (2.8) | 10 | 4.8 (1.4) | 7.1 (1.2) | 6.3 (0.8) | |
M2 | 2.4 (8.1) | 10 | 3.3 (2.8) | 8.5 | 4.8 (1.3) | 7.2 (1.2) | 3.3 (0.4) | |
C4.Omitted interactions | Complete | 9.6 (3.6) | 8.8 | 3.7 (2.5) | 7.3 | 8.9 (4.5) | 8.9 (4.8) | – |
M1 | 7.3 (6.8) | 8.7 | 3.5 (2.5) | 7.2 | 6.6 (7.1) | 6.9 (7.6) | 8.7 (4.7) | |
M2 | 7.3 (6.8) | 8.5 | 3.3 (2.4) | 3.8 | 7.2 (8.4) | 7.4 (9.3) | 8.2 (5.1) | |
C5.Unobserved true predictors | Complete | 7.1 (3.5) | 7.7 | 2.9 (3.1) | 5.2 | 7.2 (7) | 7.4 (7.7) | – |
M1 | 4.3 (6) | 7.5 | 2.7 (3.5) | 5 | 5.1 (9.6) | 5.4 (10.8) | 7 (7.5) | |
M2 | 4.5 (5.5) | 7.3 | 2.5 (3) | 3.4 | 4.4 (8.9) | 4.7 (10.3) | 6.8 (8.2) | |
C6.Complex | Complete | 2.9 (4.1) | 5.9 | 3.4 (4.2) | 5.9 | 6.5 (5.2) | 7.7 (6.5) | – |
M1 | 2.2 (7) | 5.8 | 3.3 (3.8) | 5.9 | 4.9 (6.6) | 5.9 (8.1) | 6.6 (5.2) | |
M2 | 2.2 (7) | 5.6 | 3.1 (3.8) | 3.9 | 4.5 (6) | 5.6 (7.2) | 6.3 (6.2) | |
Prevalence = 0.3 | ||||||||
C1.Independent | Complete | 9.2 (4) | 8.6 | 1.2 (0.6) | 6.9 | 8.9 (5.6) | 9 (8.1) | – |
M1 | 5.8 (8) | 8.6 | 0.9 (1.4) | 6.9 | 5 (4) | 5.5 (5.5) | 8.7 (5.5) | |
M2 | 6 (7.5) | 8.3 | 0.8 (0.4) | 5.8 | 3.4 (3.5) | 3.8 (4.8) | 8.1 (6.2) | |
C2.Correlated (True, Noise) | Complete | 5.9 (3.9) | 6.2 | 0.9 (0.7) | 5.7 | 8.2 (7.5) | 8.5 (10.1) | – |
M1 | 3.8 (7.5) | 6.2 | 0.9 (0.8) | 5.8 | 3.4 (4.2) | 4 (6.2) | 7.8 (7.6) | |
M2 | 4 (7.8) | 6 | 0.7 (0.6) | 4.2 | 3.4 (4.9) | 3.9 (6.6) | 7.9 (8) | |
C3.Correlated (True, True) | Complete | 4.3 (4.9) | 10 | 3.9 (3) | 10 | 8.3 (1.1) | 9.7 (1.1) | – |
M1 | 0.6 (2.6) | 10 | 3.9 (2.8) | 10 | 6.4 (1.6) | 8.9 (1.9) | 8.4 (1.1) | |
M2 | 0.8 (3.4) | 10 | 3.7 (2.6) | 9 | 6.6 (1.8) | 9 (2.5) | 6.8 (1) | |
C4.Omitted interactions | Complete | 9.7 (3.9) | 9 | 1.6 (0.8) | 7.3 | 9.5 (5.8) | 9.7 (8.2) | – |
M1 | 6.2 (7.5) | 8.8 | 1.3 (0.7) | 7.2 | 6.1 (5.3) | 6.3 (7) | 9.4 (5.9) | |
M2 | 6.7 (8) | 8.8 | 1.2 (0.6) | 4.9 | 7.4 (5.8) | 7.8 (8.3) | 9.2 (6) | |
C5.Unobserved true predictors | Complete | 5.1 (3.8) | 6.1 | 0.1 (0.1) | 3.8 | 0.5 (0.6) | 0.4 (0.6) | – |
M1 | 3.1 (6.5) | 6 | 0.1 (0.1) | 3.6 | 0.3 (0.6) | 0.3 (0.6) | 0.4 (0.6) | |
M2 | 3.7 (6.2) | 5.7 | 0.1 (0.1) | 2.9 | 0.3 (0.6) | 0.3 (0.8) | 0.6 (0.8) | |
C6.Complex | Complete | 3.6 (4.3) | 5.9 | 3.6 (3.5) | 6.1 | 8.1 (6.5) | 9.2 (9.2) | – |
M1 | 2.4 (7.9) | 5.9 | 3.3 (3.4) | 6.2 | 5.8 (7) | 7.5 (9.9) | 8.1 (6.5) | |
M2 | 2 (7.2) | 5.6 | 3.3 (3.8) | 4.3 | 5.6 (6.4) | 7.3 (9.7) | 7.8 (7.2) |
Average number of selected noise terms are shown in parentheses.
Bold cells correspond to the methods that selected the most true predictors under each condition.
Two-sample t-test and Random Forest rank all variables in each simulation, and we selected the top 10 of them. Thus, the average number of noise variables selected is 10 minus the average number of true variables, and they are not shown for these two methods.
GLM, generalized linear model; CART, Classification and Regression Tree; LASSO, Least Absolute Shrinkage and Selection Operator.